Representing Charts as Text for Language Models: An In-Depth Study of Question Answering for Bar Charts.
Published in IEEE VIS 2024, 2024
Recommended citation: Victor S. Bursztyn, Jane Hoffswell, Eunyee Koh, and Shunan Guo, "HRepresenting Charts as Text for Language Models: An In-Depth Study of Question Answering for Bar Charts," 2024 IEEE Visualization and Visual Analytics (VIS), St. Pete Beach, FL, USA, 2024, pp. 266-270, doi: 10.1109/VIS55277.2024.00061. https://ieeexplore.ieee.org/abstract/document/10771151
Machine Learning models for chart-grounded Q&A (CQA) often treat charts as images, but performing CQA on pixel values has proven challenging. We thus investigate a resource overlooked by current ML-based approaches: the declarative documents describing how charts should visually encode data (i.e., chart specifications). In this work, we use chart specifications to enhance language models (LMs) for chart-reading tasks, such that the resulting system can robustly understand language for CQA. Through a case study with 359 bar charts, we test novel fine tuning schemes on both GPT-3 and T5 using a new dataset curated for two CQA tasks: question-answering and visual explanation generation. Our text-only approaches strongly outperform vision-based GPT-4 on explanation generation (99% vs. 63% accuracy), and show promising results for question-answering (57–67% accuracy). Through in-depth experiments, we also show that our text-only approaches are mostly robust to natural language variation.
