LLM and the Automation of Data Analysis

LLM and the Automation of Data Analysis

LLMs are revolutionizing data analysis by automating tasks and improving efficiency. These models democratize data insights and face challenges like context understanding. Their integration signifies a major shift in business intelligence, enhancing workflows and innovation.

On this page

Hello to the cherished HEARTCOUNT Global Community,

Greetings from Sidney Yang at HEARTCOUNT. Springing into the second quarter of 2024, I am excited to delve into the transformative impact of Large Language Models (LLMs) on data analytics. Together, let’s envision how these groundbreaking technologies are reshaping our workflows. May this year bring even brighter insights and successes to all your endeavors.

Self-Serve Analytics: Past, Present, and Future

The concept of Self-Serve Analytics encapsulates the suite of technologies that empower practitioners to utilize data effortlessly. While the era from the early 21st century through 2022 explored the potentials and limitations of Self-Serve Analytics through enhanced data literacy and more accessible tools, 2023 marked a prolific year for envisioning the integration of LLMs in this domain.

Heartcount has contributed to this evolving landscape by introducing innovative functions like TTS (Text-to-SQL) and interactive analysis (Dialogue), which seamlessly translate natural language into SQL queries.

Looking ahead to 2024, we continue to explore the depths of data analysis automation with LLMs—probing both its far-reaching potentials and its current limitations, alongside developing strategies and technologies to transcend these boundaries.

The core of these seismic shifts features industry giants like OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and Microsoft’s Fabric/Copilot. These entities, having secured substantial technological and financial capital, are unfolding ambitious visions for future advancements. Simultaneously, the emergence of open-source initiatives like Mistral and niche tech companies introducing cost-effective Private LLMs promise to diversify and enrich the ecosystem.

Self-Serve Analytics: The Limitations of LLMs

In the summer of 2023, Microsoft showcased its new data analysis platform, Fabric, through an impressive demo that promised a bright data-driven future—encouraging users to unearth deep insights with AI with just the simplest interactions.

Yet, the reality that LLM-generated SQL queries sometimes missed the mark in understanding user queries cannot be overlooked. This highlights a crucial aspect: effective data query automation requires not only raw processing power but a nuanced understanding of business terminology (Declared knowledge) and analytical methodologies (Procedural knowledge).

For instance, accurately responding to a query about user characteristics following conversions in Vietnam last year necessitates a deep understanding of what ‘conversion’ means within that specific business context—whether it refers to sign-ups or subscription upgrades—and applying the correct analytical frameworks.

The precision in understanding such terms as "last year" — whether it means calendar year or fiscal year — typically relies on an organization’s internal knowledge base. One promising approach to enhancing LLM accuracy in such instances is through Retrieval-Augmented Generation (RAG), which leverages reliable internal documents like KPI definitions to fine-tune responses in a cost-effective manner.

Easing Your Data Workload

The challenge in transforming data into actionable insights isn't merely due to a scarcity of accurate SQL statements or the complexity of interpreting statistical data. Advancements in data technology, while significant, do not inherently increase the volume of information within the data itself.

Nevertheless, reducing barriers to data access and enhancing the productivity of knowledge workers in generating insights are crucial. This technological evolution in data analytics, powered by LLMs and AI, stands to offer substantial value, especially to those who have previously faced significant obstacles in harnessing the power of data due to time constraints or technical complexities.

My role encompasses a myriad of tasks, with data-driven decision-making involving numerous sub-tasks. While it's improbable that AI will master all aspects of data analysis shortly, the enhanced productivity offered by intelligent tools in managing automatable analysis tasks is undeniable.

In 2024, HEARTCOUNT is committed to continuing as your dedicated and efficient assistant in data analysis, not replacing but augmenting your efforts to navigate the complex data landscape.

Warm regards,

Sidney Yang

HEARTCOUNT is a visualization and analysis tool for all practitioners.
Log in with your Google account to try it right away.