On the pandas tag, I often see users asking questions about melting dataframes in pandas. I'll try to do a canonical Q&A (self-answer) on this topic.
I want to clarify:
What is melting?
How to use melt?
When to use melt?
I saw some popular questions about melting, such as:
Convert columns to rows using pandas: This one might actually be good, but more explanation would be nice.
pandas melt function: A good question with a good answer, but a bit too vague without much explanation.
Melting pandas dataframe: Also a great answer! But that's just for a specific case, it's simple, just pd.melt(df)
pandas dataframe using columns as rows (melted) : Very neat! But the problem is, it only addresses the specific question asked by the OP, which also requires using pivot_table
.
So I will try to do a canonical Q&A on this topic.
I will find all the answers in this dataset of random grades for random people at random ages (easier to explain answer :d):
import pandas as pd df = pd.dataframe({'name': ['bob', 'john', 'foo', 'bar', 'alex', 'tom'], 'math': ['a+', 'b', 'a', 'f', 'd', 'c'], 'english': ['c', 'b', 'b', 'a+', 'f', 'a'], 'age': [13, 16, 16, 15, 15, 13]})
>>> df name math english age 0 bob a+ c 13 1 john b b 16 2 foo a b 16 3 bar f a+ 15 4 alex d f 15 5 tom c a 13
How to melt the data frame so that the original data frame becomes the following?
name age subject grade 0 bob 13 english c 1 john 16 english b 2 foo 16 english b 3 bar 15 english a+ 4 alex 17 english f 5 tom 12 english a 6 bob 13 math a+ 7 john 16 math b 8 foo 16 math a 9 bar 15 math f 10 alex 17 math d 11 tom 12 math c
I want to transpose it so that one column is for each subject and the other columns are the repeated names of students along with their ages and scores.
This is similar to question 1, but this time I want question 1 to output the subject
column with only math
, and I want to filter out the english
column:
name age subject grades 0 bob 13 math a+ 1 john 16 math b 2 foo 16 math a 3 bar 15 math f 4 alex 15 math d 5 tom 13 math c
I want the output to look like above.
If I were to group the melts and sort them by students' scores, how would I do it to get the desired output like this:
value name subjects 0 a foo, tom math, english 1 a+ bob, bar math, english 2 b john, john, foo math, english, english 3 c tom, bob math, english 4 d alex math 5 f bar, alex math, english
I need to sort it with names separated by commas and subjects
separated by commas in the same order respectively.
How do I unfreeze a molten dataframe? Let's say I've melted this dataframe:
df = df.melt(id_vars=['name', 'age'], var_name='subject', value_name='grades')
become:
name age subject grades 0 bob 13 math a+ 1 john 16 math b 2 foo 16 math a 3 bar 15 math f 4 alex 15 math d 5 tom 13 math c 6 bob 13 english c 7 john 16 english b 8 foo 16 english b 9 bar 15 english a+ 10 alex 15 english f 11 tom 13 english a
So how do I convert it back to the original data frame like below?
name math english age 0 bob a+ c 13 1 john b b 16 2 foo a b 16 3 bar f a+ 15 4 alex d f 15 5 tom c a 13
What would I do if I wanted to group by student name and separate subjects and grades with commas?
name subject grades 0 alex math, english d, f 1 bar math, english f, a+ 2 bob math, english a+, c 3 foo math, english a, b 4 john math, english b, b 5 tom math, english c, a
I want a dataframe like the one above.
If I were to completely melt my dataframe, with all columns as values, what would I do?
Column Value 0 Name Bob 1 Name John 2 Name Foo 3 Name Bar 4 Name Alex 5 Name Tom 6 Math A+ 7 Math B 8 Math A 9 Math F 10 Math D 11 Math C 12 English C 13 English B 14 English B 15 English A+ 16 English F 17 English A 18 Age 13 19 Age 16 20 Age 16 21 Age 15 22 Age 15 23 Age 13
I want a dataframe like the one above. All columns as values.
##Notes for pandas version < 0.20.0: I will use df.melt(... ) as my example, but you need to use
pd.melt(df, .. .) instead.
melt, so to know how to < 一起使用/a>melt
, please refer to the documentation illustrate.
math and
english columns and makes the dataframe copy (longer).
subject column, which is the subject of the
grades column value:
melt function.
pd.dataframe.melt
Use the following code:
print(df.melt(id_vars=['name', 'age'], var_name='subject', value_name='grades'))
id_vars parameter to
['name', 'age'] and then automatically sets the
value_vars to the other columns (
['math', 'english']), which is the transposed conversion to this format.
print( df.set_index(["name", "age"]) .stack() .reset_index(name="grade") .rename(columns={"level_2": "subject"}) .sort_values("subject") .reset_index(drop=true) )
name and
age columns as indexes, stacks the remaining columns
math and
english, and resets the index and specify
grade as the column name, then rename the other columns to
level_2phpcnendcphp cn to subject<code> and then press the
subject column and finally reset the index again.
name age subject grade 0 bob 13 english c 1 john 16 english b 2 foo 16 english b 3 bar 15 english a+ 4 alex 17 english f 5 tom 12 english a 6 bob 13 math a+ 7 john 16 math b 8 foo 16 math a 9 bar 15 math f 10 alex 17 math d 11 tom 12 math c
这和我的第一个问题类似,但是这个我只在 math
列中进行过滤,这时候 value_vars
参数就可以派上用场了,如下所示:
print( df.melt( id_vars=["name", "age"], value_vars="math", var_name="subject", value_name="grades", ) )
或者我们也可以使用 stack
与列规格:
print( df.set_index(["name", "age"])[["math"]] .stack() .reset_index(name="grade") .rename(columns={"level_2": "subject"}) .sort_values("subject") .reset_index(drop=true) )
这两种解决方案都给出:
name age subject grade 0 bob 13 math a+ 1 john 16 math b 2 foo 16 math a 3 bar 15 math f 4 alex 15 math d 5 tom 13 math c
问题3可以通过melt
解决和 groupby
,使用 agg
函数和 ' , '.join
,如下所示:
print( df.melt(id_vars=["name", "age"]) .groupby("value", as_index=false) .agg(", ".join) )
它会融合数据框,然后按等级进行分组,聚合它们并用逗号将它们连接起来。
stack
也可以用来解决这个问题,与 stack
和 groupby
如下所示:
print( df.set_index(["name", "age"]) .stack() .reset_index() .rename(columns={"level_2": "subjects", 0: "grade"}) .groupby("grade", as_index=false) .agg(", ".join) )
这个 stack
函数只是转置数据帧以相当于 melt
的方式,然后重置索引,重命名列、组和聚合。
两种解决方案输出:
grade name subjects 0 a foo, tom math, english 1 a+ bob, bar math, english 2 b john, john, foo math, english, english 3 c bob, tom english, math 4 d alex math 5 f bar, alex math, english
这可以通过 pivot_table
来解决。我们必须指定参数 values
、index
、columns
以及 aggfunc
。
我们可以用下面的代码来解决这个问题:
print( df.pivot_table("grades", ["name", "age"], "subject", aggfunc="first") .reset_index() .rename_axis(columns=none) )
输出:
name age english math 0 alex 15 f d 1 bar 15 a+ f 2 bob 13 c a+ 3 foo 16 b a 4 john 16 b b 5 tom 13 a c
融化的数据帧被转换回与原始数据帧完全相同的格式。
我们首先旋转融化的数据框,然后重置索引并删除列轴名称。
print( df.melt(id_vars=["name", "age"], var_name="subject", value_name="grades") .groupby("name", as_index=false) .agg(", ".join) )
融化并按 name
分组。
或者您可以stack
: p>
print( df.set_index(["name", "age"]) .stack() .reset_index() .groupby("name", as_index=false) .agg(", ".join) .rename({"level_2": "subjects", 0: "grades"}, axis=1) )
两个代码输出:
name subjects grades 0 alex math, english d, f 1 bar math, english f, a+ 2 bob math, english a+, c 3 foo math, english a, b 4 john math, english b, b 5 tom math, english c, a
问题6可以通过melt
解决并且不需要指定列,只需指定预期的列名称:
print(df.melt(var_name='column', value_name='value'))
这会融化整个数据框。
或者您可以stack
: p>
print( df.stack() .reset_index(level=1) .sort_values("level_1") .reset_index(drop=true) .set_axis(["column", "value"], axis=1) )
两个代码输出:
Column Value 0 Age 16 1 Age 15 2 Age 15 3 Age 16 4 Age 13 5 Age 13 6 English A+ 7 English B 8 English B 9 English A 10 English F 11 English C 12 Math C 13 Math A+ 14 Math D 15 Math B 16 Math F 17 Math A 18 Name Alex 19 Name Bar 20 Name Tom 21 Name Foo 22 Name John 23 Name Bob
The above is the detailed content of How to melt pandas dataframe?. For more information, please follow other related articles on the PHP Chinese website!