How to use ChatGPT for Data Analysis (II)

In our previous article “How to Use ChatGPT for Data Analysis (Part I)” from February 27, we explored some of the capabilities of ChatGPT in data analysis, such as code generation and the creation of synthetic datasets.

In today’s article, we go further, delving into additional useful applications, starting from exploratory data analysis (EDA) with graphs and statistical models, to concluding with the necessary documentation for support.

Task 4: applications related to data analysis in a narrow sense

  • Data Cleaning and Preparation: request methods for handling missing data, removing duplicates, or converting data types, or describe the desired outcome by requesting translation into Python code.
  • Exploratory Data Analysis (EDA): generation of descriptive statistics, graphs, and correlations.
  • Statistical Modeling and Machine Learning: request code to build, evaluate, and optimize models.

Prompt Composition Strategies

Practical Use Cases with ChatGPT

Workflow EDA
Writing Documentation
Initial Interaction
Dataset Overview
    
     #Dataset Overview
dataset_overview = {
    'Number of Rows': candy_data.shape[0],
    'Number of Columns': candy_data.shape[1],
    'Column Names': candy_data.columns.tolist(),
    'Missing Values': candy_data.isnull().sum().sum()  # Total missing values across the dataset
}
    
   
Graphically representing the results
Statistical-descriptive analysis
Access to the code
Matrix with the profile of ingredients
Profilo composizionale completo
Multivariate regression

Final Considerations

In conclusion, ChatGPT can be a valuable aid for conducting general analysis on a relatively simple dataset (denormalized data model with a single table), deriving insights through the formulation (prompting) of the task to be performed and the context (description of the application domain) in natural language, without the need to technically/operationally know how to achieve that result. In this perspective, one could input the transaction log (Excel or CSV) of a business and request the total sales, the average selling price per product category, or the percentage change compared to the previous month (MoM). What follows is an example of such interaction, but it is important to emphasize that ChatGPT is a useful but fallible tool, to be seen as an assistant and not a specialist. For this reason, it is crucial to formulate tasks unambiguously, ensuring to define the context useful for the analysis in as much detail as possible. The following are the questions posed by ChatGPT, which constitute, as you can guess, the heart of this prompting strategy, capable of guiding even the less experienced users in the correct and complete definition of the request before it is taken over by the model.

Analisi dati registro delle transizioni
Output generated by ChatGPT
Output generated by BI tools
Export Code

Read all our articles on Data Science

Do you want to discover the latest news about Fivetran and new data science technologies?

Visualitics Team
Questo articolo è stato scritto e redatto da uno dei nostri consulenti.

 

Condividi ora sui tuoi canali social o via email: