Explainable poisoned classifier identification
Overview
We propose an approach to analyze and identify backdoor poisoned classifiers with adversarial examples. Based on this approach, we develop a classifier forensics tool, where our tool helps users visualize adversarial examples and test the predictions under different custom patches. For more details on our approach and tool, see our technical report and demo.
Intended Use
Our tool aims to help users easily analyze poisoned classifiers with a user-friendly interface. When users want to analyze a poisoned classifier or identify if a classifier is poisoned, they can use our classifier forensics tool.
Limitations
Our approach considers mostly patch based backdoor poisoning.
References
@article{sun2020poisoned,
title={Poisoned classifiers are not only backdoored, they are fundamentally broken},
author={Sun, Mingjie and Agarwal, Siddhant and Kolter, J. Zico},
journal = {CoRR},
volume = {abs/2010.09080},
url={https://arXiv/abs/2010.09080},
year={2020}
}