How to build a simple recommendation system in Python
Recommendation systems are designed to help people discover and select items that may be of interest to them. Python provides a wealth of libraries and tools that can help us build a simple but effective recommendation system. This article will introduce how to use Python to build a user-based collaborative filtering recommendation system and provide specific code examples.
Collaborative filtering is a common algorithm for recommendation systems. It infers similarities between users based on users' behavioral history data, and then uses these similarities to predict and recommend items. We will use the MovieLens dataset, which contains a set of user ratings of movies. First, we need to install the required libraries:
pip install pandas scikit-learn
Next, we will import the required libraries and load the MovieLens dataset:
import pandas as pd from sklearn.model_selection import train_test_split # 加载数据集 data = pd.read_csv('ratings.csv')
The dataset contains userId## The three columns #,
movieId and
rating represent the user ID, movie ID and rating respectively. Next, we split the data set into a training set and a test set:
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)
# 计算用户之间的相似度 def calculate_similarity(train_data): similarity = dict() for user in train_data['userId'].unique(): similarity[user] = dict() user_ratings = train_data[train_data['userId'] == user] for movie in user_ratings['movieId'].unique(): similarity[user][movie] = 1.0 return similarity # 计算用户之间的相似度得分 def calculate_similarity_score(train_data, similarity): for user1 in similarity.keys(): for user2 in similarity.keys(): if user1 != user2: user1_ratings = train_data[train_data['userId'] == user1] user2_ratings = train_data[train_data['userId'] == user2] num_ratings = 0 sum_of_squares = 0 for movie in user1_ratings['movieId'].unique(): if movie in user2_ratings['movieId'].unique(): num_ratings += 1 rating1 = user1_ratings[user1_ratings['movieId'] == movie]['rating'].values[0] rating2 = user2_ratings[user2_ratings['movieId'] == movie]['rating'].values[0] sum_of_squares += (rating1 - rating2) ** 2 similarity[user1][user2] = 1 / (1 + (sum_of_squares / num_ratings) ** 0.5) return similarity # 计算电影之间的相似度得分 def calculate_movie_similarity_score(train_data, similarity): movie_similarity = dict() for user in similarity.keys(): for movie in train_data[train_data['userId'] == user]['movieId'].unique(): if movie not in movie_similarity.keys(): movie_similarity[movie] = dict() for other_movie in train_data[train_data['userId'] == user]['movieId'].unique(): if movie != other_movie: movie_similarity[movie][other_movie] = similarity[user][other_user] return movie_similarity # 构建推荐系统 def build_recommendation_system(train_data, similarity, movie_similarity): recommendations = dict() for user in train_data['userId'].unique(): user_ratings = train_data[train_data['userId'] == user] recommendations[user] = dict() for movie in train_data['movieId'].unique(): if movie not in user_ratings['movieId'].unique(): rating = 0 num_movies = 0 for other_user in similarity[user].keys(): if movie in train_data[train_data['userId'] == other_user]['movieId'].unique(): rating += similarity[user][other_user] * train_data[(train_data['userId'] == other_user) & (train_data['movieId'] == movie)]['rating'].values[0] num_movies += 1 if num_movies > 0: recommendations[user][movie] = rating / num_movies return recommendations # 计算评价指标 def calculate_metrics(recommendations, test_data): num_users = 0 sum_of_squared_error = 0 for user in recommendations.keys(): if user in test_data['userId'].unique(): num_users += 1 for movie in recommendations[user].keys(): if movie in test_data[test_data['userId'] == user]['movieId'].unique(): predicted_rating = recommendations[user][movie] actual_rating = test_data[(test_data['userId'] == user) & (test_data['movieId'] == movie)]['rating'].values[0] sum_of_squared_error += (predicted_rating - actual_rating) ** 2 rmse = (sum_of_squared_error / num_users) ** 0.5 return rmse # 计算用户之间的相似度 similarity = calculate_similarity(train_data) # 计算用户之间的相似度得分 similarity = calculate_similarity_score(train_data, similarity) # 计算电影之间的相似度得分 movie_similarity = calculate_movie_similarity_score(train_data, similarity) # 构建推荐系统 recommendations = build_recommendation_system(train_data, similarity, movie_similarity) # 计算评价指标 rmse = calculate_metrics(recommendations, test_data)
print(recommendations) print('RMSE:', rmse)
The above is the detailed content of How to build a simple recommendation system in Python. For more information, please follow other related articles on the PHP Chinese website!