Hello, I found a dataset on kaggle in the time of use of a website, so I want to find a ratio between the number of pages visited and the total time in the website.
You can find the dataset and the code in my github : https://github.com/victordalet/Kaggle_analysis/tree/feat/website_traffic
To do this, I use sqlalchemy in python to convert my csv into a database and plotly to display my results.
pip install plotly pip install sqlalchemy
I create a Main class, in which I retrieve my csv and put it in a database, using the get_data method.
The result is a list of tuples, so I create the transform_data method to obtain a double list.
Finally, I can display a simple graph between the number of pages viewed and the total time.
import pandas as pd from sqlalchemy import create_engine, text import plotly.express as px class Main: def __init__(self): self.result = None self.connection = None self.engine = create_engine("sqlite:///my_database.db", echo=False) self.df = pd.read_csv("website_wata.csv") self.df.to_sql("website_data", self.engine, index=False, if_exists="append") self.get_data() self.transform_data() self.display_graph() def get_data(self): self.connection = self.engine.connect() query = text("SELECT Page_Views, Time_on_Page FROM website_data") self.result = self.connection.execute(query).fetchall() def transform_data(self): for i in range(len(self.result)): self.result[i] = list(self.result[i]) def display_graph(self): fig = px.scatter( self.result, x=0, y=1, title="" ) fig.show() Main()
The x-axis indicates the number of pages visited by the user, while the y-axis shows the time spent on the website in minutes.
We can see that the users who stay the longest visit between 4 and 6 pages, and that between 11 and 15 pages all users stay at least a few minutes.
The above is the detailed content of Website Time dataset. For more information, please follow other related articles on the PHP Chinese website!