INTERCHART is a diagnostic benchmark designed to evaluate how well Vision-Language Models (VLMs) reason across multiple related charts—a task central to real-world analytical domains such as scientific reporting, financial analysis, and public health dashboards.
The benchmark decomposes and structures reasoning difficulty into three subsets:
- DECAF – Decomposed Elementary Charts with Answerable Facts (factual and comparative reasoning)
- SPECTRA – Synthetic Plots for Event-based Correlated Trend Reasoning and Analysis (trend and correlation reasoning)
- STORM – Sequential Temporal Reasoning Over Real-world Multi-domain Charts (semantic abstraction and temporal synthesis)
Each subset probes a distinct reasoning capability under increasing visual and semantic complexity.
- Paper: INTERCHART: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information
- Dataset & Scripts: Hugging Face – coral-lab-asu/InterChart
- Project Website: https://coral-lab-asu.github.io/interchart/
INTERCHART employs multiple LLM-based semantic judges (Gemini, Phi, Qwen) to assess answer correctness through majority voting. This allows flexible evaluation of paraphrases, numeric ranges, and unit variations beyond simple string matching.
If you use INTERCHART or its code, please cite:
@inproceedings{iyengar2025interchart,
title={INTERCHART: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information},
author={Iyengar, Anirudh Iyengar Kaniyar Narayana and Mukhopadhyay, Srija and Qidwai, Adnan and Singh, Shubhankar and Roth, Dan and Gupta, Vivek},
booktitle={Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics},
pages={2046--2067},
year={2025}
}
