Home > Backend Development > Python Tutorial > How to Perform Data Analysis in Python Using the OpenAI API

How to Perform Data Analysis in Python Using the OpenAI API

Jennifer Aniston
Release: 2025-02-10 12:21:10
Original
466 people have browsed it

How to Perform Data Analysis in Python Using the OpenAI API

Core points

  • Utilizing Python and OpenAI APIs, users can systematically analyze datasets for valuable insights without over-designing code or wasting time, providing a universal solution for data analysis.
  • OpenAI API and Python can be used to analyze text files (such as Nvidia's latest earnings call), by extracting specified information from records and printing them out.
  • OpenAI API and Python can also analyze CSV files (such as Medium article dataset) to find the overall tone of each post, the main lessons/points, and the "click bait score" from 0 to 3 (0 means click bait, 3 means extreme click bait).
  • To automatically analyze multiple files, users can place them in one folder, install the glob library, and use a for loop to read the contents of each file and save the output of each file analysis to a separate file middle.

This tutorial will guide you how to use Python and OpenAI API to mine and analyze data.

Manually analyzing datasets to extract useful data, or even performing the same operations with a simple program, can often become complex and time-consuming. Fortunately, with the OpenAI API and Python, datasets can be systematically analyzed for interesting information without over-designing code and wasting time. This can be used as a universal solution for data analysis without the need to use different methods, libraries, and APIs to analyze different types of data and data points in datasets.

Let's learn step by step how to analyze your data using OpenAI API and Python, first of all, how to set it up.

Settings

To use the OpenAI API to mine and analyze data through Python, install the openai and pandas libraries:

pip3 install openai pandas
Copy after login
Copy after login
Copy after login

After this is done, create a new folder and create an empty Python file in the new folder.

Analyze text files

In this tutorial, I think it would be interesting to have Python analyze Nvidia's latest earnings call.

Download the latest NVIDIA earnings call I got from The Motley Fool and move it to your project folder.

Then open your empty Python file and add this code.

This code reads the Nvidia financial report record you downloaded and passes it to the extract_info function as a transcript variable.

extract_info function passes prompts and records as user input, as well as temperature=0.3 and model="gpt-3.5-turbo-16k". The reason it uses the "gpt-3.5-turbo-16k" model is that it can handle large text such as this record. This code uses the openai.ChatCompletion.create endpoint to get the response and pass the propt and transcript variables as user input:

completions = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-16k",
    messages=[
        {"role": "user", "content": prompt+"\n\n"+text}
    ],
    temperature=0.3,
)
Copy after login
Copy after login

The complete input will look like this:

<code>从文本中提取以下信息:
    英伟达的收入
    英伟达本季度做了什么
    关于人工智能的评论

英伟达财报记录在此处</code>
Copy after login
Copy after login

Now if we pass the input to the openai.ChatCompletion.create endpoint, the full output will look like this:

pip3 install openai pandas
Copy after login
Copy after login
Copy after login

As you can see, it returns text responses as well as requested token usage, which is useful if you are tracking expenditures and optimization costs. But since we are only interested in the response text, we get it by specifying the completions.choices[0].message.content response path.

If you run the code, you should get output similar to what is quoted below:

From the text, we can extract the following information:

  1. NVIDIA's revenue: In the second quarter of fiscal 2024, NVIDIA reported a record second-quarter revenue of US$13.51 billion, an increase of 88% month-on-month and 101% year-on-year.
  2. What Nvidia did this quarter: Nvidia has seen significant growth in all areas. Their revenue in the data center sector hit a record, up 141% month-on-month and 171% year-on-year. Their gaming division also achieved growth, with revenues up 11% month-on-month and 22% year-on-year. In addition, their professional visualization department's revenue increased by 28% month-on-month. They also announced partnerships and partnerships with companies such as Snowflake, ServiceNow, Accenture, Hugging Face, VMware and SoftBank.
  3. Comments on AI: Nvidia highlights strong demand for its AI platform and accelerated computing solutions. They mentioned the deployment of major cloud service providers and consumer internet companies in their HGX systems. They also discuss the application of generative artificial intelligence in various industries such as marketing, media and entertainment. Nvidia emphasizes the potential of generative artificial intelligence to create new market opportunities and improve productivity in different sectors.

As you can see, the code extracts the information specified in the prompt (Nvidia's revenue, what Nvidia did this quarter, and comments about artificial intelligence) and prints it out.

Analyze CSV files

Analyzing earnings call and text files is cool, but to systematically analyze large amounts of data, you need to use CSV files.

As a working example, download this Medium article CSV dataset and paste it into your project file.

If you look at the CSV file, you will see that it has columns like Author, Likes, Reading Time, Link, Title, and Text. In order to analyze media articles using OpenAI, you only need the "Title" and "Text" columns.

Create a new Python file in your project folder and paste this code.

This code is slightly different from the code we use to analyze text files. It reads the CSV line by line, extracts the specified pieces of information, and adds them to the new column.

In this tutorial, I chose a CSV dataset for the Medium article, which I got from HSANKESARA on Kaggle. This CSV analysis code will use the CSV file's "Title" and "Article" columns to find the overall tone and main lessons/points of each post. Since I always encounter clickbait articles on Medium, I think letting it judge each article by giving each article 0 to 3 “clickbait rating” (0 means no clickbait, 3 means extreme clickbait) The level of the article's "click bait" is also very interesting.

Analyzing the entire CSV file will take too long and consume too much API points before I explain the code, so in this tutorial I made the code analyze the first five articles using df = df[:5].

You may be confused by the following parts of the code, so let me explain:

pip3 install openai pandas
Copy after login
Copy after login
Copy after login

This code iterates over all articles (lines) in the CSV file and gets the title and body of each article on each iteration and passes it to the extract_info function we saw before. It then uses the following code to convert the response of the extract_info function into a list to separate different information snippets using this code:

completions = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-16k",
    messages=[
        {"role": "user", "content": prompt+"\n\n"+text}
    ],
    temperature=0.3,
)
Copy after login
Copy after login

Next, it adds each piece of information to the list and if an error occurs (if there is no value), adds "no result" to the list:

<code>从文本中提取以下信息:
    英伟达的收入
    英伟达本季度做了什么
    关于人工智能的评论

英伟达财报记录在此处</code>
Copy after login
Copy after login

Finally, after the for loop is finished, the list containing the extracted information will be inserted into a new column in the CSV file:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "实际响应",
        "role": "assistant"
      }
    }
  ],
  "created": 1693336390,
  "id": "request-id",
  "model": "gpt-3.5-turbo-16k-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 579,
    "prompt_tokens": 3615,
    "total_tokens": 4194
  }
}
Copy after login

As you can see, it adds the list to the new CSV columns called "Tone", "Main_lesson_or_point", and "Clickbait_score".

Then use index=False to attach them to the CSV file:

for di in range(len(df)):
    title = titles[di]
    abstract = articles[di]
    additional_params = extract_info('Title: '+str(title) + '\n\n' + 'Text: ' + str(abstract))
    try:
        result = additional_params.split("\n\n")
    except:
        result = {} 
Copy after login

The reason why index=False must be specified is to avoid creating new index columns every time a new column is attached to a CSV file.

Now if you run a Python file, wait for it to finish and check our CSV file in the CSV file viewer, you will see the new column as shown in the following image.

How to Perform Data Analysis in Python Using the OpenAI API

If you run the code multiple times, you will notice that the generated answers are slightly different. This is because the code uses temperature=0.3 to add some creativity to its answers, which is very useful for subjective topics like clickbait.

Processing multiple files

If you want to automatically analyze multiple files, you need to first place them in one folder and make sure that the folder contains only files that you are interested in, to prevent your Python code from reading irrelevant files. Then, install the glob library using pip3 install glob and import it in your Python file using import glob.

In your Python file, use this code to get a list of all files in the data folder:

try:
    result = additional_params.split("\n\n")
except:
    result = {} 
Copy after login

Then put the code that executes the analysis in the for loop:

try:
    apa1.append(result[0])
except Exception as e:
    apa1.append('No result')
try:
    apa2.append(result[1])
except Exception as e:
    apa2.append('No result')
try:
    apa3.append(result[2])
except Exception as e:
    apa3.append('No result')
Copy after login

In a for loop, read the contents of each file of the text file like this:

df = df.assign(Tone=apa1)
df = df.assign(Main_lesson_or_point=apa2)
df = df.assign(Clickbait_score=apa3)
Copy after login

For CSV files, it's also like this:

df.to_csv("data.csv", index=False)
Copy after login

Also, make sure to save the output of each file analysis to a separate file using something similar to the following:

data_files = glob.glob("data_folder/*")
Copy after login

Conclusion

Remember to experiment with your temperature parameters and adjust them according to your use case. If you want AI to generate more creative answers, increase the temperature value; if you want it to generate more factual answers, make sure to lower it.

The combination of OpenAI and Python data analysis has many applications in addition to article and financial report call record analysis. For example, news analysis, book analysis, customer review analysis, etc.! That is, when testing your Python code on large datasets, make sure to test it on only a small part of the full dataset to save API points and time.

Frequently Asked Questions (FAQs) about OpenAI APIs for Python Data Analysis

What is the OpenAI API and how does it work?

The OpenAI API is a powerful tool that allows developers to access and leverage the capabilities of the OpenAI model. It works by sending a request to the API endpoint, which then processes the request and returns the output. APIs can be used for a variety of tasks, including text generation, translation, summary, and more. It is designed to be easy to use, with a simple interface and clear documentation.

How do I use OpenAI API for data analysis?

The OpenAI API allows data analysis by leveraging its machine learning capabilities. For example, you can use it to analyze text data, extract insights, and make predictions. You can use your data to send a request to the API and it will return the analysis results. This can be done using Python, as the API supports Python integration.

What are the benefits of using OpenAI API for data analysis?

There are many benefits of using OpenAI API for data analysis. First, it allows you to take advantage of the power of machine learning without having to build and train your own models, saving you time and resources. Second, it can handle large amounts of data and provide insights that may be difficult to obtain manually. Finally, it is flexible and can be used in a variety of data analysis tasks.

How do I integrate OpenAI API with Python?

Integrating OpenAI API with Python is very simple. You need to install the OpenAI Python client, which can be done using pip. Once the installation is complete, you can import the OpenAI library in your Python script and use it to send requests to the API. You also need to set up your API key, which you can get from the OpenAI website.

What tasks can be accomplished using the OpenAI API?

The OpenAI API can be used for various tasks. For example, it can be used for text generation, which can generate human-like text based on prompts. It can also be used in translation, abstracts and sentiment analysis. In the context of data analysis, it can be used to analyze text data, extract insights, and make predictions.

What are the limitations of using OpenAI API?

While the OpenAI API is powerful, it does have some limitations. For example, there is a limit on the number of requests you can send to the API per minute. Also, the API is not free and the cost may increase if you are working on a lot of data. Finally, while the API is usually accurate, it is not perfect and the results should be used as part of a broader analytical strategy.

How do I troubleshoot using OpenAI API?

If you have problems using the OpenAI API, you can take a few steps. First, check for the error message, as it usually provides clues about the cause of the problem. You can also refer to the API documentation, which provides detailed information on how to use the API and troubleshoot frequently asked questions. If you are still having problems, you can contact the OpenAI community for help.

What is the security level of OpenAI API?

OpenAI API is designed with security in mind. All data sent to the API is encrypted during transmission, and OpenAI has strict policies to protect your data. But, like any online service, it is important to use the API responsibly and follow data security best practices.

Can I use the OpenAI API for commercial use?

Yes, you can use the OpenAI API for commercial purposes. However, you should know that using the API will incur costs and you should review the API's terms of service to make sure your intended use meets the requirements.

What is the future of OpenAI API?

The future of OpenAI API is bright. OpenAI is constantly improving its models and extending the capabilities of the API. With the continuous evolution of machine learning and artificial intelligence, we can expect APIs to become more powerful and versatile.

The above is the detailed content of How to Perform Data Analysis in Python Using the OpenAI API. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template