How to use ChatGPT for Data Analysis (I)

Launched in November 2022 by OpenAI, ChatGPT quickly made its mark in the “human” world, turning terms like “AI”[1], “Machine Learning”[2], and “Language Model”[3] into almost common parlance. In a nutshell, ChatGPT is a complex artificial intelligence model (trained on a virtually limitless corpus of texts and fine-tuned with a combined strategy of reinforcement and supervised learning), developed for generating textual content through conversational interactions akin to a “chatbot”. That is, requests are made in natural language, and the responses received are evaluated.

But how can ChatGPT be used in the field of Data Analysis to speed up processes and suggest approaches? In this column, we will introduce some possible uses of ChatGPT in data analysis: from the generation of synthetic datasets to enriched exploratory analysis (with graphs and statistical models), and finally, to the necessary accompanying documentation.

ChatGPT Prompting Strategies

Task 1: Code Generation

Prompt Data Import/Export

Task 2: Code Generation

Example: Request to Implement a Feature

Prompt Composition Strategies:

  • Specify the programming language (e.g., Python).
  • Indicate the file format (CSV/JSON…) and the data structure, if you have a dataset.
  • Clearly detail the feature to be implemented.
  • Provide examples of the properties and/or the desired outcome’s format.
Prompt Request to Implement a Feature

Task 3: Synthetic Data Creation

Example: Creating a sample dataset with specific characteristics

Prompt Composition Strategies:

  • Specify the application domain.
  • Indicate the desired format and structure of the data (for tabular structure, specify: number of columns, headers, data types, value ranges… and define what they represent).
  • Ask ChatGPT to provide a series of questions to better characterize the context.
  • Provide data examples or describe the data layout [optional but recommended if the structure is complex].


Suppose you are working on developing a demonstrative project, a use case to propose to potential clients, but you do not have an appropriate dataset
. If you are interested in generating a modest number of values, mostly of a categorical (non-numerical) nature that also present a fair variability, you can certainly rely on ChatGPT for the creation of a synthetic dataset. In the example below, the application domain will be exposed, then ChatGPT will be asked to formulate a series of questions deemed useful to better understand our request, task, and finally, the dataset synthesis.

The context and task are defined using the Ask Before Answer technique. The interaction begins by outlining the application context and continues with the indication of the task to be completed, followed by an explicit request to provide appropriate questions to increase the accuracy of the response, and thus guide the user in specifying all the relevant details. The following are some of the questions posed by ChatGPT, which, as you can guess, are at the heart of this prompting strategy, capable of guiding even less experienced users in the complete formulation of the request before it is taken over by the model.

Prompt Creating a sample dataset
ChatGPT Answer
Generated text saved as Excel
Data Visualization

Conclusions

We’ve discovered how ChatGPT can revolutionize Data Analysis, from creating synthetic datasets to exploratory analysis and documentation. We’ve highlighted the importance of prompting strategies to achieve accurate results and how ChatGPT facilitates and enriches the work of analysts.

Don’t miss the next episode, where we will explore further useful applications of ChatGPT in Data Analysis.

Read all our articles on Data Science

Do you want to discover the latest news about Fivetran and new data science technologies?

Visualitics Team
This article was written and edited by one of our consultants.

Sources:
[1] www.trends.google.com
[2] www.trends.google.com 
[3] www.trends.google.com 

Share now on your social channels or via email: