The Counterfactual Visual Question Answering (VQA) Dataset


The Counterfactual VQA Dataset aims to enable the study of effective ways to produce counterfactual explanations: contrasting a VQA model’s answers to original VQA example with the answers to counterfactual examples. Our dataset provides 484 GAN edited VQA v2.0 images for studing the effectiveness of counterfactual examples. For each image, a human annotator looks at the original image and a natural language question about the image from the Visual Question Answering (VQA) dataset, and edit the image such that consistently answering the question on the original and edited images is challenging.

Intended Use

For XAI researchers, the counterfactual VQA Dataset provides a diverse set of counterfactual examples for studying how to generate such counterfactual examples for counterfactual explanations, such as which types of edits (e.g. color change, cropping, object removal) are more effective, ways of improving explanation effectiveness (e.g. removing salient objects, removing small but highly importance regions), and how human annotators perform counterfactual edits.


The Counterfactual VQA Dataset provides 484 GAN edited VQA v2.0 images. There are 4 types of image edits: 1) Inpaint a box region, 2) Inpaint the background except a box foreground, 3) Turning the image black-and-white, 4) Zooming into a part of the original image.

For inpainting we used a modified DeepFillv2 inpainter, code available at this github repo.


Human answers to counterfactual questions are not collected, because how such counterfactual examples should be produced in the first place is still under active research. As a result, the Counterfactual VQA Dataset could not be used for consistency training of VQA models.


If you use the Counterfactual VQA Dataset as part of published research, please cite the following paper:

	doi = {10.22541/au.162464875.59047443/v1},
	url = {},
	year = 2021,
	month = {jun},
	publisher = {Authorea,
	author = {Kamran Alipour and Arijit Ray and Xiao Lin and Michael Cogswell and Jurgen Schulze and Yi Yao and Giedrius Burachas},
	title = {Improving Users{\textquotesingle} Mental Model with Attention-directed Counterfactual Edits}