Dataset produk Amazon

PHPz
Lepaskan: 2024-08-27 06:02:02
asal
397 orang telah melayarinya

Hi, I found a dataset of Amazon products in Kaggle and decided to find a relationship between price and star rating.

Full code in :
https://github.com/victordalet/Kaggle_analysis/tree/feat/amazon_products


I - Preparing data

To do this, I use SQLAlchemy to convert the csv file into a small database, and plotly to display the information.

pip install SQLAlchemy
pip install plotly
Salin selepas log masuk

In the following script, I extract the data and obtain :

  • ratio between price and number of stars
  • final rating and number of stars
  • price and number of stars
import pandas as pd
from sqlalchemy import create_engine, text
import plotly.express as px


class Main:
    def __init__(self):
        self.result = None
        self.connection = None

        self.engine = create_engine("sqlite:///my_database.db", echo=False)
        self.df = pd.read_csv("amazon_product.csv")
        self.df.to_sql("products", self.engine, index=False, if_exists="append")

        self.get_data()
        self.transform_data()
        self.display_graph()
        self.get_data_number_start_and_price()
        self.transform_data()
        self.display_graph()
        self.get_data_number_start_and_start()
        self.display_graph()

    def get_data(self):
        self.connection = self.engine.connect()
        query = text(
            "SELECT product_price, product_star_rating FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()

    def get_data_number_start_and_price(self):
        query = text(
            "SELECT product_price, product_num_ratings FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()

    def get_data_number_start_and_start(self):
        query = text(
            "SELECT product_star_rating, product_num_ratings FROM products where product_price != '$0.00'"
        )
        self.result = self.connection.execute(query).fetchall()
        for i in range(len(self.result)):
            self.result[i] = [self.result[i][0], self.result[i][1]]

    def transform_data(self):
        for i in range(len(self.result)):
            self.result[i] = [float(self.result[i][0].split("$")[1]), self.result[i][1]]

    def display_graph(self):
        fig = px.scatter(
            self.result, x=0, y=1, title="Amazon Product Price vs Star Rating"
        )
        fig.show()


Main()
Salin selepas log masuk

II - Result

Price and notation

Amazon product dataset

Price and number of notation

Amazon product dataset

Notation and number of opinion

Amazon product dataset

III - Conclusion

We can see, there's not necessarily a relationship between price and rating, but the higher the price, the lower the rating, and the more reviews, the higher the rating.
Which seems logical, since if a product is bought a lot, it means it's popular.

Atas ialah kandungan terperinci Dataset produk Amazon. Untuk maklumat lanjut, sila ikut artikel berkaitan lain di laman web China PHP!

Kenyataan Laman Web ini
Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn
Tutorial Popular
Lagi>
Muat turun terkini
Lagi>
kesan web
Kod sumber laman web
Bahan laman web
Templat hujung hadapan