Generative AI (LLMs) for Data Visualization | A Review of Tools and Research
Issue #21 | An in-depth analysis and guide on emerging tools/practices that apply large language models (LLMs) like ChatGPT/GPT4 to automate data visualization
Creating high-quality visualizations is crucial for conveying insights from data, a critical skill for any business. The significance of this skill is underscored by the widespread use of tools such as Tableau, PowerBI, Excel, Looker, Sigma, etc which aid business analysts in creating visualizations.
However, crafting effective visualizations from business data requires considerable skill and effort. Users must excel at asking pertinent questions, choosing the right chart type (e.g., bar chart, pie chart, radar chart), performing data transformations (such as merging columns), and implementing the final visualization (for instance, using a tool's interface or writing code). Consequently, many users struggle to create visualizations, and experts often invest substantial time in this task.
With the advent of Generative AI, there is potential to overcome these challenges. Specifically, large language models such as GPT-4, Gemini, Claude, and Cohere LLAMA as well as large multimodal models (GPT4V1, LLAVA2) can offer assistance in this area. And there are many business benefits - from enabling customers to easily/rapidly explore their data, improving productivity of data analysts to simplifying dashboard or automated report generation.
How? LLMs can address multiple subtasks that lead up to visualization generation. First, they can help generate data summaries, generate potential questions or goals relevant to the data, or even generate visualization specifications as text/code.
Given these capabilities, there is growing interest from product teams that offer some form of data processing stack (e.g, Notebook experiences, low code data processing pipelines, machine learning or modeling tools) to innovate by integrating automated visualization capabilities.
Importantly, as the capabilities of LLMs and LMMs grow, there is opportunity to completely reinvent (disrupt) the visualization authoring experience. This will likely materialize in brand new product categories, built from the ground up to support natural language to visualization workflows.
Note: Insights in this article is based on both an observation of industry trends and on my research on machine learning for automated data visualization [1,3].
This post seeks to provide insights on how to accomplish these goals and covers the following topics:
An overview of approaches and tradeoffs related to visualization generation using LLMs. This includes the target user group and reliability/productivity tradeoffs.
A taxonomy of visualization-related tasks that large language models tools for data visualization can effectively address, including data summarization, goal generation, visualization generation, explanation, and verification.
A compilation of current tools and research generative AI + data visualization. Each tool is evaluated along dimensions such as AI Enabled Data visualization features, availability, and pricing (OSS, Free, Paid). Tools covered include LIDA, Noteable, Akkio, + over 20 other tools mentioned.
Open issues with automated visualization with LLMs including - Trust and oversight (correctness verification), Reliability, Latency, Cost.
A reference section on research papers that explore LLMs for data visualization.
Opportunities: A list of potential directions that could truly disrupt visualization generation and storytelling in general.
Automating Visualizations - What and Why
Automated data visualization tools enable end users to generate data visualizations, such as bar charts and line charts, from their data with minimal user input. This general area is not novel and has been researched extensively in the past. Academic projects like Draco, Voyager, Lyra, Data2Vis, have explored methods for automatically recommending visualizations to users, and some tools like Excel have features like “Analyze Data3” that recommend visualizations. However, many of these approaches are heuristics based and have limited coverage for many data visualization subtasks (more on automated visualization tasks below).
Large Language Models (LLMs) refer to neural network models trained on vast amounts of data, and then leverage the learned representations of this data in solving tasks. For example ChatGPT has state of the art performance on tasks such as text summarization, etc. Large MultiModal Models (LFMs) take the same approach but learn representations across multiple modalities. E.g.g GPT4V that can take both image and text input.
LLMs and LFMs provide a step-function increase in the gamut of tasks that can be addressed e.g. generating relevant goals/hypothesis given the data. With these models, two types of support become possible:
1. Fully autonomous support, where the tool independently interprets the data, proposes data visualization objectives, and creates visualizations without user intervention.
2. Semi-autonomous support, where users can articulate their intent, and the tool generates visualizations accordingly. Intent may be conveyed through natural language or a hybrid interface that integrates direct manipulation (such as selection, drag-and-drop, clicking, etc.) with natural language input.
What benefits can be obtained?
By automating visualizations, businesses benefit from increased efficiency (time and cost savings) and consistency as the process enables rapid generation of insights and maintains high-quality standards. It also makes data analysis more accessible to non-experts, fostering a data-driven culture throughout the organization. Cost-effectiveness is improved by reducing the reliance on specialized personnel. Automation facilitates real-time data representation, ensuring that decision-makers have the most current information at their disposal. Additionally, automated tools can uncover novel patterns and correlations, enriching data exploration and allowing teams to concentrate on analysis rather than the creation of visualizations.
Keep reading with a 7-day free trial
Subscribe to Designing with Machine Learning to keep reading this post and get 7 days of free access to the full post archives.