此腳本將從 BoardGameGeek API 取得棋盤遊戲資料並將資料儲存在 CSV 檔案中。 API 回應採用XML 格式,並且由於沒有端點可以一次獲取多個棋盤遊戲數據,因此這將透過根據棋盤遊戲ID 向端點發出單一棋盤遊戲的請求來實現,同時遞增每個請求後給定ID 範圍內的ID。
查看我的 GitHub 個人資料上的儲存庫
為每個棋盤遊戲取得和儲存的資訊如下:
名稱、遊戲 ID、評級、權重、發布年份、最少玩家數、最大玩家數、最短遊戲時間、最大支付時間、最小年齡、所屬者、類別、機制、設計師、藝術家和發行商。
我們先匯入此腳本所需的函式庫:
# Import libraries from bs4 import BeautifulSoup from csv import DictWriter import pandas as pd import requests import time
我們需要定義請求的標頭以及每個請求之間的暫停(以秒為單位)。 BGG API 文件中沒有有關請求速率限制的信息,並且其論壇中有一些非官方信息表明每秒的請求數被限制為 2 個。如果腳本開始達到限制速率,則可能需要調整請求之間的暫停。
# Define request url headers headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:85.0) Gecko/20100101 Firefox/85.0", "Accept-Language": "en-GB, en-US, q=0.9, en" } # Define sleep timer value between requests SLEEP_BETWEEN_REQUEST = 0.5
接下來是定義需要從 BGG 取得並處理的棋盤遊戲 ID 的範圍。在創建此腳本時,現有棋盤遊戲數據的上限約為 402000 個 ids,並且這個數字將來很可能會增加。
# Define game ids range game_id = 264882 # initial game id last_game_id = 264983 # max game id (currently, it's around 402000)
以下是腳本完成時根據 ID 範圍呼叫的函數。此外,如果發出請求時發生錯誤,則會呼叫此函數以儲存截至異常發生時附加到遊戲清單的所有資料。
# CSV file saving function def save_to_csv(games): csv_header = [ 'name', 'game_id', 'rating', 'weight', 'year_published', 'min_players', 'max_players', 'min_play_time', 'max_play_time', 'min_age', 'owned_by', 'categories', 'mechanics', 'designers', 'artists', 'publishers' ] with open('BGGdata.csv', 'a', encoding='UTF8') as f: dictwriter_object = DictWriter(f, fieldnames=csv_header) if f.tell() == 0: dictwriter_object.writeheader() dictwriter_object.writerows(games)
以下是這個腳本的主要邏輯。它將在ID範圍內執行程式碼,這意味著它將向BGG API發出請求,使用BeautifulSoup獲取所有數據,進行必要的檢查數據是否與棋盤遊戲相關(有數據與棋盤遊戲相關)其他類別。請參閱 BGG API 以了解更多資訊。 ),之後它將處理資料並將其附加到遊戲清單中,最後儲存到 CSV 檔案中。
# Create an empty 'games' list where each game will be appended games = [] while game_id <= last_game_id: url = "https://boardgamegeek.com/xmlapi2/thing?id=" + str(game_id) + "&stats=1" try: response = requests.get(url, headers=headers) except Exception as err: # In case of exception, store to CSV the fetched items up to this point. save_to_csv(games) print(">>> ERROR:") print(err) soup = BeautifulSoup(response.text, features="html.parser") item = soup.find("item") # Check if the request returns an item. If not, break the while loop if item: # If the item is not a board game - skip if not item['type'] == 'boardgame': game_id += 1 continue # Set values for each field in the item name = item.find("name")['value'] year_published = item.find("yearpublished")['value'] min_players = item.find("minplayers")['value'] max_players = item.find("maxplayers")['value'] min_play_time = item.find("minplaytime")['value'] max_play_time = item.find("maxplaytime")['value'] min_age = item.find("minage")['value'] rating = item.find("average")['value'] weight = item.find("averageweight")['value'] owned = item.find("owned")['value'] categories = [] mechanics = [] designers = [] artists = [] publishers = [] links = item.find_all("link") for link in links: if link['type'] == "boardgamecategory": categories.append(link['value']) if link['type'] == "boardgamemechanic": mechanics.append(link['value']) if link['type'] == "boardgamedesigner": designers.append(link['value']) if link['type'] == "boardgameartist": artists.append(link['value']) if link['type'] == "boardgamepublisher": publishers.append(link['value']) game = { "name": name, "game_id": game_id, "rating": rating, "weight": weight, "year_published": year_published, "min_players": min_players, "max_players": max_players, "min_play_time": min_play_time, "max_play_time": max_play_time, "min_age": min_age, "owned_by": owned, "categories": ', '.join(categories), "mechanics": ', '.join(mechanics), "designers": ', '.join(designers), "artists": ', '.join(artists), "publishers": ', '.join(publishers), } # Append the game (item) to the 'games' list games.append(game) else: # If there is no data for the request - skip to the next one print(f">>> Empty item. Skipped item with id ({game_id}).") game_id += 1 continue # Increment game id and set sleep timer between requests game_id += 1 time.sleep(SLEEP_BETWEEN_REQUEST) save_to_csv(games)
下面您可以以 pandas DataFrame 的形式預覽 CSV 檔案中的前幾行記錄。
# Preview the CSV as pandas DataFrame df = pd.read_csv('./BGGdata.csv') print(df.head(5))
以上是BoardGameGeek 使用 Python 取得棋盤遊戲數據的詳細內容。更多資訊請關注PHP中文網其他相關文章!