Generating natural language model insights for data charts using light language models distilled from large language models.
U.S. Patent No. 18/338,033. Washington, DC: U.S. Patent and Trademark Office., 2024
Recommended citation: Victor S. Bursztyn, Wei Zhang, Prithvi Bhutani, Eunyee Koh, and Abhisek Trivedi. 2024. Generating natural language model insights for data charts using light language models distilled from large language models. U.S. Patent No. 18/338,033. Washington, DC: U.S. Patent and Trademark Office. https://patents.google.com/patent/US20240320421A1/en
The present disclosure relates to systems, methods, and non-transitory computer readable media for generating naturally phrased insights about data charts using light language models distilled from large language models. To synthesize training data for the light language model, in some embodiments, the disclosed systems leverage insight templates for prompting a large language model for generating naturally phrased insights. In some embodiments, the disclosed systems anonymize and augment the synthesized training data to improve the accuracy and robustness of model predictions. For example, the disclosed systems anonymize training data by injecting noise into data charts before prompting the large language model for generating naturally phrased insights from insight templates. In some embodiments, the disclosed systems further augment the (anonymized) training data by splitting or partitioning data charts into folds that act as individual data charts.
