我有一個資料框:seller
、item
、price
、shipping
、免費送貨最低
、count available
和count required
。我的目標是根據稍後計算的 total
找到 seller
和 item
的最便宜的組合(計算程式碼如下所示)。範例資料如下:
import pandas as pd item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2'] seller1 = ['seller 1', 'seller 2', 'seller 3', 'seller 4', 'seller 1'] price1 = [1.85, 1.94, 2.00, 2.00, 2.02] shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99] freeship1 = [5, 5, 5, 50, 5] countavailable1 = [1, 2, 2, 5, 2] countneeded1 = [2, 1, 2, 2, 1] df1 = pd.dataframe({'seller':seller1, 'item':item1, 'price':price1, 'shipping':shipping1, 'free shipping minimum':freeship1, 'count available':countavailable1, 'count needed':countneeded1}) # create columns that states if seller has all counts needed. # this will be used to sort by to prioritize the smallest number of orders possible for index, row in df1.iterrows(): if row['count available'] >= row['count needed']: df1.at[index, 'fulfills count needed'] = 'yes' else: df1.at[index, 'fulfills count needed'] = 'no' # dont want to calc price based on [count available], so need to check if seller has count i need and calc cost based on [count needed]. # if doesn't have [count needed], then calc cost on [count available]. for index, row in df1.iterrows(): if row['count available'] >= row['count needed']: df1.at[index, 'price x count'] = row['count needed'] * row['price'] else: df1.at[index, 'price x count'] = row['count available'] * row['price']
但是,任何一個seller
都可以出售多個item
。我想盡量減少支付的運費,所以我想透過 seller
將 item
s 分組在一起。因此,我根據我在另一個線程中看到的方式使用 .first()
方法對它們進行分組,以便將每一列保留在新的分組資料框中。
# don't calc [total] until sellers have been grouped # use first() method to return all columns and perform no other aggregations grouped1 = df1.sort_values('price').groupby(['seller', 'item']).first()
此時我想透過seller
計算total
。所以我有以下程式碼,但它為每個item
計算total
,而不是seller
,這意味著shipping
根據每個群組中的商品數量被多次添加,或當price x count
結束時不應用免費送貨最低免運費
。
# calc [Total] for index, row in grouped1.iterrows(): if (row['Free Shipping Minimum'] == 50) & (row['Price x Count'] > 50): grouped1.at[index, 'Total'] = row['Price x Count'] + 0 elif (row['Free Shipping Minimum'] == 5) & (row['Price x Count'] > 5): grouped1.at[index, 'Total'] = row['Price x Count'] + 0 else: grouped1.at[index, 'Total'] = row['Price x Count'] + row['Shipping']
實際上看起來我可能需要在計算total
時對每個seller
求和price x count
,但這本質上是同一個問題,因為我不知道如何計算外部索引的每行列。我可以使用什麼方法來做到這一點?
另外,如果有人對如何實現我的後半部目標有任何建議,請儘管提出。我只想退回我需要的每件商品。例如,我需要 2 個「項目 1」和 2 個「項目 2」。如果“賣家1”有2 個“商品1”和1 個“商品2”,而“賣家2”有1 個“商品1”和1 個“商品2”,那麼我想要“賣家1”的所有商品(假設它最便宜),但只有「賣家2」的1 個「商品1」。這似乎會影響 total
列的計算,但我不確定如何實現它。
我最終決定先將seller
分組,並對price x count
進行求和以找到subtotal
s,將其轉換為資料幀,然後將df1
與新的subtotal
資料幀合併以建立groupedphpcnend cphpcn 資料框。然後我使用 <code>np.where
建議創建了 totals
列(這比我的 for 循環優雅得多,並且可以輕鬆處理 nan 值)。最後按seller
、total
、item
分組傳回我想要的結果。最終程式碼如下:
import pandas as pd import numpy as np item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2'] seller1 = ['Seller 1', 'Seller 2', 'Seller 3', 'Seller 4', 'Seller 1'] price1 = [1.85, 1.94, 2.69, 2.00, 2.02] shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99] freeship1 = [5, 5, 5, 50, 5] countavailable1 = [1, 2, 2, 5, 2] countneeded1 = [2, 1, 2, 2, 1] df1 = pd.DataFrame({'Seller':seller1, 'Item':item1, 'Price':price1, 'Shipping':shipping1, 'Free Shipping Minimum':freeship1, 'Count Available':countavailable1, 'Count Needed':countneeded1}) # create columns that states if seller has all counts needed. # this will be used to sort by to prioritize the smallest number of orders possible for index, row in df1.iterrows(): if row['Count Available'] >= row['Count Needed']: df1.at[index, 'Fulfills Count Needed'] = 'Yes' else: df1.at[index, 'Fulfills Count Needed'] = 'No' # dont want to calc price based on [count available], so need to check if seller has count I need and calc cost based on [count needed]. # if doesn't have [count needed], then calc cost on [count available]. for index, row in df1.iterrows(): if row['Count Available'] >= row['Count Needed']: df1.at[index, 'Price x Count'] = row['Count Needed'] * row['Price'] else: df1.at[index, 'Price x Count'] = row['Count Available'] * row['Price'] # subtotals by seller, then assign calcs to column called [Subtotal] and merge into dataframe subtotals = df1.groupby(['Seller'])['Price x Count'].sum().reset_index() subtotals.rename({'Price x Count':'Subtotal'}, axis=1, inplace=True) grouped = df1.merge(subtotals[['Subtotal', 'Seller']], on='Seller') # calc [Total] grouped['Total'] = np.where(grouped['Subtotal'] > grouped['Free Shipping Minimum'], grouped['Subtotal'], grouped['Subtotal'] + grouped['Shipping']) grouped.groupby(['Seller', 'Total', 'Item']).first()
以上是計算多索引 pandas 資料幀外部索引每行的總和的詳細內容。更多資訊請關注PHP中文網其他相關文章!