The following is a list of papers from the field of explainable AI. Please feel free to submit a pull request to contribute to this list.


  1. Fan, Feng-Lei, et al. “On Interpretability of Artificial Neural Networks: A Survey.” IEEE Transactions on Radiation and Plasma Medical Sciences, IEEE, 2021.
  2. Janizek, Joseph D., et al. “Explaining Explanations: Axiomatic Feature Interactions for Deep Networks.” Journal of Machine Learning Research, vol. 22, no. 104, 2021, pp. 1–54.
  3. Zunino, Andrea, et al. “Excitation Dropout: Encouraging Plasticity in Deep Neural Networks.” International Journal of Computer Vision, vol. 129, no. 4, Springer, 2021, pp. 1139–52.


  1. Alipour, Kamran, et al. “A Study on Multimodal and Interactive Explanations for Visual Question Answering.” ArXiv Preprint ArXiv:2003.00431, 2020.
  2. Arrieta, Alejandro Barredo, et al. “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI.” Information Fusion, vol. 58, Elsevier, 2020, pp. 82–115.
  3. Chakraborti, Tathagata, et al. “The Emerging Landscape of Explainable Ai Planning and Decision Making.” ArXiv Preprint ArXiv:2002.11697, 2020.
  4. Chen, Zhi, et al. “Concept Whitening for Interpretable Image Recognition.” Nature Machine Intelligence, vol. 2, no. 12, Nature Publishing Group, 2020, pp. 772–82.
  5. Ehsan, Upol, and Mark O. Riedl. “Human-Centered Explainable Ai: Towards a Reflective Sociotechnical Approach.” International Conference on Human-Computer Interaction, Springer, 2020, pp. 449–66.
  6. Elton, Daniel C. “Self-Explaining AI as an Alternative to Interpretable AI.” International Conference on Artificial General Intelligence, Springer, 2020, pp. 95–106.
  7. Guidotti, Riccardo, et al. “Black Box Explanation by Learning Image Exemplars in the Latent Feature Space.” ArXiv Preprint ArXiv:2002.03746, 2020.
  8. ---. “Black Box Explanation by Learning Image Exemplars in the Latent Feature Space.” ArXiv Preprint ArXiv:2002.03746, 2020.
  9. Guo, Weisi. “Explainable Artificial Intelligence for 6G: Improving Trust between Human and Machine.” IEEE Communications Magazine, vol. 58, no. 6, IEEE, 2020, pp. 39–45.
  10. Islam, Sheikh Rabiul, et al. “Towards Quantification of Explainability in Explainable Artificial Intelligence Methods.” The Thirty-Third International Flairs Conference, 2020.
  11. Jung, Alexander, and Pedro H. J. Nardelli. “An Information-Theoretic Approach to Personalized Explainable Machine Learning.” IEEE Signal Processing Letters, vol. 27, IEEE, 2020, pp. 825–29.
  12. Kuźba, Michał, and Przemysław Biecek. “What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations.” Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2020, pp. 447–59.
  13. Lakkaraju, Himabindu, and Osbert Bastani. “" How Do I Fool You?" Manipulating User Trust via Misleading Black Box Explanations.” Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020, pp. 79–85.
  14. Liu, Wenqian, et al. “Towards Visually Explaining Variational Autoencoders.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8642–51.
  15. Liu, Yi-Chieh, et al. “Interpretable Self-Attention Temporal Reasoning for Driving Behavior Understanding.” ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, pp. 2338–42.
  16. Madumal, Prashan, et al. “Distal Explanations for Explainable Reinforcement Learning Agents.” ArXiv Preprint ArXiv:2001.10284, 2020.
  17. Mai, Theresa, et al. “Keeping It’ Organized and Logical’ after-Action Review for AI (AAR/AI).” Proceedings of the 25th International Conference on Intelligent User Interfaces, 2020, pp. 465–76.
  18. Patir, Rupam, et al. “Interpretability of Black Box Models Through Data-View Extraction and Shadow Model Creation.” International Conference on Neural Information Processing, Springer, 2020, pp. 378–85.
  19. Phillips, P. Jonathon, and Mark Przybocki. “Four Principles of Explainable AI as Applied to Biometrics and Facial Forensic Algorithms.” ArXiv Preprint ArXiv:2002.01014, 2020.
  20. Putnam, Vanessa. Toward XAI for Intelligent Tutoring Systems: a Case Study. University of British Columbia, 2020.
  21. Saralajew, Sascha, et al. “Fast Adversarial Robustness Certification of Nearest Prototype Classifiers for Arbitrary Seminorms.” NeurIPS, 2020.
  22. Schrills, Tim, and Thomas Franke. “How to Answer Why–Evaluating the Explanations of AI Through Mental Model Analysis.” ArXiv Preprint ArXiv:2002.02526, 2020.
  23. Schwalbe, Gesina, and Martin Schels. “A Survey on Methods for the Safety Assurance of Machine Learning Based Systems.” 10th European Congress on Embedded Real Time Software and Systems (ERTS 2020), 2020.
  24. Slack, Dylan, et al. “Fooling Lime and Shap: Adversarial Attacks on Post Hoc Explanation Methods.” Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020, pp. 180–86.
  25. Sokol, Kacper, and Peter Flach. “Explainability Fact Sheets: a Framework for Systematic Assessment of Explainable Approaches.” Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020, pp. 56–67.
  26. Tuckey, David, et al. “A General Framework for Scientifically Inspired Explanations in AI.” ArXiv Preprint ArXiv:2003.00749, 2020.
  27. Visani, Giorgio, et al. “Statistical Stability Indices for LIME: Obtaining Reliable Explanations for Machine Learning Models.” Journal of the Operational Research Society, Taylor & Francis, 2020, pp. 1–11.


  1. Annasamy, Raghuram Mandyam, and Katia Sycara. “Towards Better Interpretability in Deep q-Networks.” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 4561–69.
  2. Artelt, André, and Barbara Hammer. “On the Computation of Counterfactual Explanations–A Survey.” ArXiv Preprint ArXiv:1911.07749, 2019.
  3. Atrey, Akanksha, et al. “Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning.” ArXiv Preprint ArXiv:1912.05743, 2019.
  4. Chapman-Rounds, Matt, et al. “EMAP: Explanation by Minimal Adversarial Perturbation.” ArXiv Preprint ArXiv:1912.00872, 2019.
  5. Cheng, Taoli. “Interpretability Study on Deep Learning for Jet Physics at the Large Hadron Collider.” ArXiv Preprint ArXiv:1911.01872, 2019.
  6. Cui, Wanxin, et al. “Face Recognition via Convolutional Neural Networks and Siamese Neural Networks.” 2019 International Conference on Intelligent Computing, Automation and Systems (ICICAS), IEEE, 2019, pp. 746–50.
  7. Du, Mengnan, et al. “Techniques for Interpretable Machine Learning.” Communications of the ACM, vol. 63, no. 1, ACM New York, NY, USA, 2019, pp. 68–77.
  8. Edmonds, Mark, et al. “A Tale of Two Explanations: Enhancing Human Trust by Explaining Robot Behavior.” Science Robotics, vol. 4, no. 37, Science Robotics, 2019.
  9. Fadnis, Kshitij, et al. “Heuristics for Interpretable Knowledge Graph Contextualization.” ArXiv Preprint ArXiv:1911.02085, 2019.
  10. Goyal, Yash, et al. “Counterfactual Visual Explanations.” International Conference on Machine Learning, PMLR, 2019, pp. 2376–84.
  11. Hossain, M. D. Zakir, et al. “A Comprehensive Survey of Deep Learning for Image Captioning.” ACM Computing Surveys (CsUR), vol. 51, no. 6, ACM New York, NY, USA, 2019, pp. 1–36.
  12. Juozapaitis, Zoe, et al. “Explainable Reinforcement Learning via Reward Decomposition.” IJCAI/ECAI Workshop on Explainable Artificial Intelligence, 2019.
  13. Keane, Mark T., and Eoin M. Kenny. “How Case-Based Reasoning Explains Neural Networks: A Theoretical Analysis of XAI Using Post-Hoc Explanation-by-Example from a Survey of ANN-CBR Twin-Systems.” International Conference on Case-Based Reasoning, Springer, 2019, pp. 155–71.
  14. Le, Thai, et al. “Why X Rather than Y? Explaining Neural Model’Predictions by Generating Intervention Counterfactual Samples.” CoRR, 2019.
  15. Li, Xiao, et al. “A Formal Methods Approach to Interpretable Reinforcement Learning for Robotic Planning.” Science Robotics, vol. 4, no. 37, Science Robotics, 2019.
  16. Licato, John, et al. “Scenarios and Recommendations for Ethical Interpretive Ai.” ArXiv Preprint ArXiv:1911.01917, 2019.
  17. Lucic, Ana, et al. “FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles.” ArXiv Preprint ArXiv:1911.12199, 2019.
  18. Mahajan, Divyat, et al. “Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers.” ArXiv Preprint ArXiv:1912.03277, 2019.
  19. Miller, Tim. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” Artificial Intelligence, vol. 267, Elsevier, 2019, pp. 1–38.
  20. ---. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” Artificial Intelligence, vol. 267, Elsevier, 2019, pp. 1–38.
  21. Mittelstadt, Brent, et al. “Explaining Explanations in AI.” Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 279–88.
  22. Morichetta, Andrea, et al. “EXPLAIN-IT: towards Explainable AI for Unsupervised Network Traffic Analysis.” Proceedings of the 3rd ACM CoNEXT Workshop on Big DAta, Machine Learning and Artificial Intelligence for Data Communication Networks, 2019, pp. 22–28.
  23. Mundhenk, T. Nathan, et al. “Efficient Saliency Maps for Explainable AI.” ArXiv Preprint ArXiv:1911.11293, 2019.
  24. Odena, Augustus, et al. “Tensorfuzz: Debugging Neural Networks with Coverage-Guided Fuzzing.” International Conference on Machine Learning, PMLR, 2019, pp. 4901–11.
  25. Park, Young-Jin, and Han-Lim Choi. “InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-Modal Dynamics.” AIAA Scitech 2019 Forum, 2019, p. 0681.
  26. Ramon, Yanou, et al. “Counterfactual Explanation Algorithms for Behavioral and Textual Data.” ArXiv Preprint ArXiv:1912.01819, 2019.
  27. Schölkopf, Bernhard. “Causality for Machine Learning.” ArXiv Preprint ArXiv:1911.10500, 2019.
  28. Seo, Dasom, et al. “Regional Multi-Scale Approach for Visually Pleasing Explanations of Deep Neural Networks.” IEEE Access, vol. 8, IEEE, 2019, pp. 8572–82.
  29. Zafar, Muhammad Rehman, and Naimul Mefraz Khan. “DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems.” ArXiv Preprint ArXiv:1906.10263, 2019.
  30. Zhang, Hao, et al. “Towards a Unified Evaluation of Explanation Methods without Ground Truth.” ArXiv Preprint ArXiv:1911.09017, 2019.
  31. Zhang, Xiao, et al. An Anomaly Contribution Explainer for Cyber-Security Applications. 2019.
  32. Zhang, Yundong, et al. “Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining.” 2019 Ieee Winter Conference on Applications of Computer Vision (Wacv), IEEE, 2019, pp. 349–57.


  1. Abdul, Ashraf, et al. “Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An Hci Research Agenda.” Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018, pp. 1–18.
  2. Alonso, Jose M., et al. “A Bibliometric Analysis of the Explainable Artificial Intelligence Research Field.” International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Springer, 2018, pp. 3–15.
  3. Alvarez-Melis, David, and Tommi S. Jaakkola. “On the Robustness of Interpretability Methods.” ArXiv Preprint ArXiv:1806.08049, 2018.
  4. ---. “Towards Robust Interpretability with Self-Explaining Neural Networks.” ArXiv Preprint ArXiv:1806.07538, 2018.
  5. Besold, Tarek R., and Sara L. Uckelman. “The What, the Why, and the How of Artificial Explanations in Automated Decision-Making.” ArXiv Preprint ArXiv:1808.07074, 2018.
  6. Blandfort, Philipp, et al. “An Overview of Computational Approaches for Interpretation Analysis.” ArXiv Preprint ArXiv:1811.04028, 2018.
  7. Chang, Chun-Hao, et al. “Explaining Image Classifiers by Adaptive Dropout and Generative in-Filling.” ArXiv Preprint ArXiv:1807.08024, vol. 2, 2018.
  8. Charles, Adam S. “Interpreting Deep Learning: The Machine Learning Rorschach Test?” ArXiv Preprint ArXiv:1806.00148, 2018.
  9. Dhurandhar, Amit, et al. “Explanations Based on the Missing: Towards Contrastive Explanations with Pertinent Negatives.” ArXiv Preprint ArXiv:1802.07623, 2018.
  10. Došilović, Filip Karlo, et al. “Explainable Artificial Intelligence: A Survey.” 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), IEEE, 2018, pp. 0210–15.
  11. Elsayed, Gamaleldin F., et al. “Adversarial Reprogramming of Neural Networks.” ArXiv Preprint ArXiv:1806.11146, 2018.
  12. Ge, Xiaoyu, et al. “Towards Explainable Inference about Object Motion Using Qualitative Reasoning.” Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, 2018.
  13. Gilpin, Leilani H., et al. “Explaining Explanations: An Overview of Interpretability of Machine Learning.” 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, 2018, pp. 80–89.
  14. ---. Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning.(2018). 2018.
  15. Grosz, Barbara J., and Peter Stone. “A Century-Long Commitment to Assessing Artificial Intelligence and Its Impact on Society.” Communications of the ACM, vol. 61, no. 12, ACM New York, NY, USA, 2018, pp. 68–73.
  16. Guidotti, Riccardo, et al. “A Survey of Methods for Explaining Black Box Models.” ACM Computing Surveys (CSUR), vol. 51, no. 5, ACM New York, NY, USA, 2018, pp. 1–42.
  17. Harbecke, David, et al. “Learning Explanations from Language Data.” ArXiv Preprint ArXiv:1808.04127, 2018.
  18. Hind, Michael, et al. “Increasing Trust in AI Services through Supplier’s Declarations of Conformity.” ArXiv Preprint ArXiv:1808.07261, vol. 18, 2018, pp. 2813–69.
  19. Hoffman, Robert R., et al. “Explaining Explanation for ‘Explainable Ai.’” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 62, no. 1, SAGE Publications Sage CA: Los Angeles, CA, 2018, pp. 197–201.
  20. ---. “Metrics for Explainable AI: Challenges and Prospects.” ArXiv Preprint ArXiv:1812.04608, 2018.
  21. Hohman, Fred, et al. “Visual Analytics in Deep Learning: An Interrogative Survey for the next Frontiers.” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 8, IEEE, 2018, pp. 2674–93.
  22. Honegger, Milo. “Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods That Explain Individual Predictions.” ArXiv Preprint ArXiv:1808.05054, 2018.
  23. Hu, Linwei, et al. “Locally Interpretable Models and Effects Based on Supervised Partitioning (LIME-SUP).” ArXiv Preprint ArXiv:1806.00663, 2018.
  24. Hu, Ronghang, et al. “Explainable Neural Computation via Stack Neural Module Networks.” Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 53–69.
  25. Iyer, Rahul, et al. “Transparency and Explanation in Deep Reinforcement Learning Neural Networks.” Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 144–50.
  26. Karim, Abdul, et al. “Machine Learning Interpretability: A Science Rather than a Tool.” ArXiv Preprint ArXiv:1807.06722, 2018.
  27. Kleinerman, Akiva, et al. “Providing Explanations for Recommendations in Reciprocal Environments.” Proceedings of the 12th ACM Conference on Recommender Systems, 2018, pp. 22–30.
  28. Lage, Isaac, et al. “Human-in-the-Loop Interpretability Prior.” Advances in Neural Information Processing Systems, vol. 31, NIH Public Access, 2018.
  29. Lipton, Zachary C. “The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability Is Both Important and Slippery.” Queue, vol. 16, no. 3, ACM New York, NY, USA, 2018, pp. 31–57.
  30. Noothigattu, Ritesh, et al. “Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration.” ArXiv Preprint ArXiv:1809.08343, 2018.
  31. Ouarti, Nizar, and David Carmona. “Out of the Black Box: Properties of Deep Neural Networks and Their Applications.” ArXiv Preprint ArXiv:1808.04433, 2018.
  32. Petsiuk, Vitali, et al. “Rise: Randomized Input Sampling for Explanation of Black-Box Models.” ArXiv Preprint ArXiv:1806.07421, 2018.
  33. Ras, Gabriëlle, et al. “Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges.” Explainable and Interpretable Models in Computer Vision and Machine Learning, Springer, 2018, pp. 19–36.
  34. Takahashi, Ryo, et al. “Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder.” ArXiv Preprint ArXiv:1805.09547, 2018.
  35. Tomsett, Richard, et al. “Interpretable to Whom? A Role-Based Model for Analyzing Interpretable Machine Learning Systems.” ArXiv Preprint ArXiv:1806.07552, 2018.
  36. Vaughan, Joel, et al. “Explainable Neural Networks Based on Additive Index Models.” ArXiv Preprint ArXiv:1806.01933, 2018.
  37. Veličković, Petar, et al. “Deep Graph Infomax.” ArXiv Preprint ArXiv:1809.10341, 2018.
  38. Wagstaff, Kiri L., and Jake Lee. “Interpretable Discovery in Large Image Data Sets.” ArXiv Preprint ArXiv:1806.08340, 2018.
  39. Xiang, Weiming, et al. “Verification for Machine Learning, Autonomy, and Neural Networks Survey.” ArXiv Preprint ArXiv:1810.01989, 2018.
  40. Zhang, Quanshi, et al. “Interpreting Cnn Knowledge via an Explanatory Graph.” Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  41. ---. “Examining Cnn Representations with Respect to Dataset Bias.” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  42. ---. “Interpretable Convolutional Neural Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8827–36.
  43. ---. “Interpretable Convolutional Neural Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8827–36.
  44. Zhang, Quanshi, and Song-Chun Zhu. “Visual Interpretability for Deep Learning: a Survey.” ArXiv Preprint ArXiv:1802.00614, 2018.
  45. Zhou, Bolei, et al. “Interpreting Deep Visual Representations via Network Dissection.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, IEEE, 2018, pp. 2131–45.


  1. Biran, O., and C. Cotton. Explanation and Justification in Ml: A Survey. IJCAI, 2017.
  2. Chakraborti, Tathagata, et al. “AI Challenges in Human-Robot Cognitive Teaming.” ArXiv Preprint ArXiv:1707.04775, 2017.
  3. Goodman, Bryce, and Seth Flaxman. “European Union Regulations on Algorithmic Decision-Making and a ‘Right to Explanation.’” AI Magazine, vol. 38, no. 3, 2017, pp. 50–57.
  4. Lundberg, Scott M., and Su-In Lee. “A Unified Approach to Interpreting Model Predictions.” Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 4768–77.
  5. Nematzadeh, Aida, et al. “Evaluating Vector-Space Models of Word Representation, or, The Unreasonable Effectiveness of Counting Words Near Other Words.” CogSci, 2017.
  6. Riguzzi, Fabrizio, et al. “A Survey of Lifted Inference Approaches for Probabilistic Logic Programming under the Distribution Semantics.” International Journal of Approximate Reasoning, vol. 80, Elsevier, 2017, pp. 313–33.


  1. Ulianov, Serghei. “Intelligent Self-Organized Robust Control Design Based on Quantum/Soft Computing Technologies and Kansei Engineering.” Computer Science Journal of Moldova, vol. 62, no. 2, 2013, pp. 242–79.


  1. Amershi, Saleema, et al. “Examining Multiple Potential Models in End-User Interactive Concept Learning.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2010, pp. 1357–60.