Cara membina sistem pengesyoran mudah dalam Python
Sistem pengesyoran adalah untuk membantu orang ramai menemui dan memilih item yang mungkin menarik minat mereka direka. Python menyediakan banyak perpustakaan dan alatan yang boleh membantu kami membina sistem pengesyoran yang mudah tetapi berkesan. Artikel ini akan memperkenalkan cara menggunakan Python untuk membina sistem pengesyoran penapisan kolaboratif berasaskan pengguna dan memberikan contoh kod khusus.
Penapisan kolaboratif ialah algoritma biasa untuk sistem pengesyoran. Ia menyimpulkan persamaan antara pengguna berdasarkan data gelagat sejarah mereka, dan kemudian menggunakan persamaan ini untuk meramal dan mengesyorkan item. Kami akan menggunakan set data MovieLens, yang mengandungi set penilaian pengguna bagi filem. Mula-mula, kita perlu memasang perpustakaan yang diperlukan:
pip install pandas scikit-learn
Seterusnya, kami akan mengimport perpustakaan yang diperlukan dan memuatkan dataset MovieLens:
import pandas as pd from sklearn.model_selection import train_test_split # 加载数据集 data = pd.read_csv('ratings.csv')
Dataset data ini mengandungi #🎜 🎜#Tiga lajur, mewakili ID pengguna, ID filem dan rating masing-masing. Seterusnya, kami membahagikan set data kepada set latihan dan set ujian: userId
、movieId
和rating
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)
# 计算用户之间的相似度 def calculate_similarity(train_data): similarity = dict() for user in train_data['userId'].unique(): similarity[user] = dict() user_ratings = train_data[train_data['userId'] == user] for movie in user_ratings['movieId'].unique(): similarity[user][movie] = 1.0 return similarity # 计算用户之间的相似度得分 def calculate_similarity_score(train_data, similarity): for user1 in similarity.keys(): for user2 in similarity.keys(): if user1 != user2: user1_ratings = train_data[train_data['userId'] == user1] user2_ratings = train_data[train_data['userId'] == user2] num_ratings = 0 sum_of_squares = 0 for movie in user1_ratings['movieId'].unique(): if movie in user2_ratings['movieId'].unique(): num_ratings += 1 rating1 = user1_ratings[user1_ratings['movieId'] == movie]['rating'].values[0] rating2 = user2_ratings[user2_ratings['movieId'] == movie]['rating'].values[0] sum_of_squares += (rating1 - rating2) ** 2 similarity[user1][user2] = 1 / (1 + (sum_of_squares / num_ratings) ** 0.5) return similarity # 计算电影之间的相似度得分 def calculate_movie_similarity_score(train_data, similarity): movie_similarity = dict() for user in similarity.keys(): for movie in train_data[train_data['userId'] == user]['movieId'].unique(): if movie not in movie_similarity.keys(): movie_similarity[movie] = dict() for other_movie in train_data[train_data['userId'] == user]['movieId'].unique(): if movie != other_movie: movie_similarity[movie][other_movie] = similarity[user][other_user] return movie_similarity # 构建推荐系统 def build_recommendation_system(train_data, similarity, movie_similarity): recommendations = dict() for user in train_data['userId'].unique(): user_ratings = train_data[train_data['userId'] == user] recommendations[user] = dict() for movie in train_data['movieId'].unique(): if movie not in user_ratings['movieId'].unique(): rating = 0 num_movies = 0 for other_user in similarity[user].keys(): if movie in train_data[train_data['userId'] == other_user]['movieId'].unique(): rating += similarity[user][other_user] * train_data[(train_data['userId'] == other_user) & (train_data['movieId'] == movie)]['rating'].values[0] num_movies += 1 if num_movies > 0: recommendations[user][movie] = rating / num_movies return recommendations # 计算评价指标 def calculate_metrics(recommendations, test_data): num_users = 0 sum_of_squared_error = 0 for user in recommendations.keys(): if user in test_data['userId'].unique(): num_users += 1 for movie in recommendations[user].keys(): if movie in test_data[test_data['userId'] == user]['movieId'].unique(): predicted_rating = recommendations[user][movie] actual_rating = test_data[(test_data['userId'] == user) & (test_data['movieId'] == movie)]['rating'].values[0] sum_of_squared_error += (predicted_rating - actual_rating) ** 2 rmse = (sum_of_squared_error / num_users) ** 0.5 return rmse # 计算用户之间的相似度 similarity = calculate_similarity(train_data) # 计算用户之间的相似度得分 similarity = calculate_similarity_score(train_data, similarity) # 计算电影之间的相似度得分 movie_similarity = calculate_movie_similarity_score(train_data, similarity) # 构建推荐系统 recommendations = build_recommendation_system(train_data, similarity, movie_similarity) # 计算评价指标 rmse = calculate_metrics(recommendations, test_data)
print(recommendations) print('RMSE:', rmse)
Atas ialah kandungan terperinci Bagaimana untuk membina sistem cadangan mudah dalam Python. Untuk maklumat lanjut, sila ikut artikel berkaitan lain di laman web China PHP!