SISPI Benchmark

About SISPI

SISPI (Social Inclusive Synthetic Professionals Images) is a synthetic benchmark designed to systematically measure social bias in text-image retrieval models. Built from 49,664 generated images across 194 professional roles, SISPI ensures a balanced representation across gender and ethnicity demographics.

Dataset Details

49,664 Images

Covering 194 professions with 2 gender and 4 ethnicity variants.

Stable Diffusion XL

Generated with controlled seeds to minimize non-protected attribute artifacts.

Open Access

Dataset and evaluation framework freely available for research purposes.

Link to dataset: https://huggingface.co/datasets/lluisgomez/SISPI
Link to eval code: https://github.com/sispi-benchmark/sispi-eval

Ethical Considerations

In this study, we have adopted broad ethnic categories – "Asian," "White," "Black," and "Latinx" – and gender categories of "Male" and "Female." While these are common in demographic research for their simplicity, they inherently oversimplify complex identities.

Ethnic Categorization: These categories encompass diverse cultures and histories, and terms like "Asian" oversimplify the rich diversity within each group. They also vary in perception and definition across regions.

Gender Categorization: The binary gender categories used here do not capture all gender identities. We acknowledge and respect non-binary and transgender identities.

Cultural Sensitivity and Inclusivity: We approach these classifications with sensitivity and acknowledge their limitations. Individuals' self-identification may be more nuanced, and we are open to feedback for improving our practices.

We made efforts to adhere to ethical practices in dataset generation by using consistent initial seeds for different demographic groups, ensuring a roughly equal distribution of unprotected attributes. This approach aims to minimize demographic artifacts and dataset biases. However, we acknowledge that the text-to-image generation model used may still introduce unintended biases that we have not been able to detect through manual inspection.

Citation

@inproceedings{gomez2025sispi,
  title={Measuring Text-Image Retrieval Fairness with Synthetic Data},
  author={Lluis Gomez i Bigorda and co-authors},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2025},
  month={July 13--18},
  address={Padua, Italy}
}