我有一个数据框:seller
、item
、price
、shipping
、免费送货最低
、count available
和count required
。我的目标是根据稍后计算的 total
找到 seller
和 item
的最便宜的组合(计算代码如下所示)。示例数据如下:
import pandas as pd item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2'] seller1 = ['seller 1', 'seller 2', 'seller 3', 'seller 4', 'seller 1'] price1 = [1.85, 1.94, 2.00, 2.00, 2.02] shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99] freeship1 = [5, 5, 5, 50, 5] countavailable1 = [1, 2, 2, 5, 2] countneeded1 = [2, 1, 2, 2, 1] df1 = pd.dataframe({'seller':seller1, 'item':item1, 'price':price1, 'shipping':shipping1, 'free shipping minimum':freeship1, 'count available':countavailable1, 'count needed':countneeded1}) # create columns that states if seller has all counts needed. # this will be used to sort by to prioritize the smallest number of orders possible for index, row in df1.iterrows(): if row['count available'] >= row['count needed']: df1.at[index, 'fulfills count needed'] = 'yes' else: df1.at[index, 'fulfills count needed'] = 'no' # dont want to calc price based on [count available], so need to check if seller has count i need and calc cost based on [count needed]. # if doesn't have [count needed], then calc cost on [count available]. for index, row in df1.iterrows(): if row['count available'] >= row['count needed']: df1.at[index, 'price x count'] = row['count needed'] * row['price'] else: df1.at[index, 'price x count'] = row['count available'] * row['price']
但是,任何一个seller
都可以出售多个item
。我想尽量减少支付的运费,所以我想通过 seller
将 item
s 分组在一起。因此,我根据我在另一个线程中看到的方式使用 .first()
方法对它们进行分组,以便将每一列保留在新的分组数据框中。
# don't calc [total] until sellers have been grouped # use first() method to return all columns and perform no other aggregations grouped1 = df1.sort_values('price').groupby(['seller', 'item']).first()
此时我想通过seller
计算total
。所以我有以下代码,但它为每个 item
计算 total
,而不是 seller
,这意味着 shipping
根据每个组中的商品数量被多次添加,或者当 price x count
结束时不应用免费送货最低免运费
。
# calc [Total] for index, row in grouped1.iterrows(): if (row['Free Shipping Minimum'] == 50) & (row['Price x Count'] > 50): grouped1.at[index, 'Total'] = row['Price x Count'] + 0 elif (row['Free Shipping Minimum'] == 5) & (row['Price x Count'] > 5): grouped1.at[index, 'Total'] = row['Price x Count'] + 0 else: grouped1.at[index, 'Total'] = row['Price x Count'] + row['Shipping']
实际上看起来我可能需要在计算 total
时对每个 seller
求和 price x count
,但这本质上是同一个问题,因为我不知道如何计算外部索引的每行列。我可以使用什么方法来做到这一点?
另外,如果有人对如何实现我的后半部分目标有任何建议,请尽管提出。我只想退回我需要的每件商品。例如,我需要 2 个“项目 1”和 2 个“项目 2”。如果“卖家 1”有 2 个“商品 1”和 1 个“商品 2”,而“卖家 2”有 1 个“商品 1”和 1 个“商品 2”,那么我想要“卖家 1”的所有商品(假设它最便宜),但只有“卖家 2”的 1 个“商品 1”。这似乎会影响 total
列的计算,但我不确定如何实现它。total
列的计算,但我不确定如何实现它。
我最终决定首先对 seller
进行分组,并对 price x count
进行求和以找到 subtotal
s,将其转换为数据帧,然后将 df1
与新的 subtotal
数据帧合并以创建 groupedphpcnend cphpcn 数据框。然后我使用 <code>np.where
建议创建了 totals
列(这比我的 for 循环优雅得多,并且可以轻松处理 nan 值)。最后按seller
、total
、item
seller
进行分组,并对 price x count
进行求和以找到 subtotal
s,将其转换为数据帧,然后将 df1
与新的 subtotal
数据帧合并以创建 groupedphpcnend cphpcn 数据框。然后我使用 <code>np.where
建议创建了 totals
列(这比我的 for 循环优雅得多,并且可以轻松处理 nan 值)。最后按seller
、total
、item
分组返回我想要的结果。最终代码如下:🎜
import pandas as pd import numpy as np item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2'] seller1 = ['Seller 1', 'Seller 2', 'Seller 3', 'Seller 4', 'Seller 1'] price1 = [1.85, 1.94, 2.69, 2.00, 2.02] shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99] freeship1 = [5, 5, 5, 50, 5] countavailable1 = [1, 2, 2, 5, 2] countneeded1 = [2, 1, 2, 2, 1] df1 = pd.DataFrame({'Seller':seller1, 'Item':item1, 'Price':price1, 'Shipping':shipping1, 'Free Shipping Minimum':freeship1, 'Count Available':countavailable1, 'Count Needed':countneeded1}) # create columns that states if seller has all counts needed. # this will be used to sort by to prioritize the smallest number of orders possible for index, row in df1.iterrows(): if row['Count Available'] >= row['Count Needed']: df1.at[index, 'Fulfills Count Needed'] = 'Yes' else: df1.at[index, 'Fulfills Count Needed'] = 'No' # dont want to calc price based on [count available], so need to check if seller has count I need and calc cost based on [count needed]. # if doesn't have [count needed], then calc cost on [count available]. for index, row in df1.iterrows(): if row['Count Available'] >= row['Count Needed']: df1.at[index, 'Price x Count'] = row['Count Needed'] * row['Price'] else: df1.at[index, 'Price x Count'] = row['Count Available'] * row['Price'] # subtotals by seller, then assign calcs to column called [Subtotal] and merge into dataframe subtotals = df1.groupby(['Seller'])['Price x Count'].sum().reset_index() subtotals.rename({'Price x Count':'Subtotal'}, axis=1, inplace=True) grouped = df1.merge(subtotals[['Subtotal', 'Seller']], on='Seller') # calc [Total] grouped['Total'] = np.where(grouped['Subtotal'] > grouped['Free Shipping Minimum'], grouped['Subtotal'], grouped['Subtotal'] + grouped['Shipping']) grouped.groupby(['Seller', 'Total', 'Item']).first()
以上是计算多索引 pandas 数据帧外部索引每行的总和的详细内容。更多信息请关注PHP中文网其他相关文章!