In [1360]: pd.pivot_table(df,index=["Manager","Rep",'Product'],values=["Price","Quantity"],aggfunc=[np.sum],fill_value=0)
Out[1360]:
sum
Price Quantity
Manager Rep Product
Debra Henley Craig Booker CPU 65000 2
Maintenance 5000 2
Software 10000 1
Daniel Hilton CPU 105000 4
Software 10000 1
John Smith CPU 35000 1
Maintenance 5000 2
Fred Anderson Cedric Moss CPU 95000 3
Maintenance 5000 1
Software 10000 1
Wendy Yule CPU 165000 7
Maintenance 7000 3
Monitor 5000 2
In [1362]: df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()
Out[1362]:
Price Quantity
Manager Rep Product
Debra Henley Craig Booker CPU 65000 2
Maintenance 5000 2
Software 10000 1
Daniel Hilton CPU 105000 4
Software 10000 1
John Smith CPU 35000 1
Maintenance 5000 2
Fred Anderson Cedric Moss CPU 95000 3
Maintenance 5000 1
Software 10000 1
Wendy Yule CPU 165000 7
Maintenance 7000 3
Monitor 5000 2
In [1414]: table = pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
...: columns=["Product"],aggfunc=[np.sum],fill_value=0)
In [1415]: table
Out[1415]:
sum
Price Quantity
Product CPU Maintenance Monitor Software CPU Maintenance Monitor Software
Manager Rep
Debra Henley Craig Booker 65000 5000 0 10000 2 2 0 1
Daniel Hilton 105000 0 0 10000 4 0 0 1
John Smith 35000 5000 0 0 1 2 0 0
Fred Anderson Cedric Moss 95000 5000 0 10000 3 1 0 1
Wendy Yule 165000 7000 5000 0 7 3 2 0
In [1416]: table.index
Out[1416]:
MultiIndex([( 'Debra Henley', 'Craig Booker'),
( 'Debra Henley', 'Daniel Hilton'),
( 'Debra Henley', 'John Smith'),
('Fred Anderson', 'Cedric Moss'),
('Fred Anderson', 'Wendy Yule')],
names=['Manager', 'Rep'])
In [1417]: table.columns
Out[1417]:
MultiIndex([('sum', 'Price', 'CPU'),
('sum', 'Price', 'Maintenance'),
('sum', 'Price', 'Monitor'),
('sum', 'Price', 'Software'),
('sum', 'Quantity', 'CPU'),
('sum', 'Quantity', 'Maintenance'),
('sum', 'Quantity', 'Monitor'),
('sum', 'Quantity', 'Software')],
names=[None, None, 'Product'])
In [1418]: xx = df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()
In [1419]: xx
Out[1419]:
Price Quantity
Manager Rep Product
Debra Henley Craig Booker CPU 65000 2
Maintenance 5000 2
Software 10000 1
Daniel Hilton CPU 105000 4
Software 10000 1
John Smith CPU 35000 1
Maintenance 5000 2
Fred Anderson Cedric Moss CPU 95000 3
Maintenance 5000 1
Software 10000 1
Wendy Yule CPU 165000 7
Maintenance 7000 3
Monitor 5000 2
In [1420]: xx.unstack()
Out[1420]:
Price Quantity
Product CPU Maintenance Monitor Software CPU Maintenance Monitor Software
Manager Rep
Debra Henley Craig Booker 65000.0 5000.0 NaN 10000.0 2.0 2.0 NaN 1.0
Daniel Hilton 105000.0 NaN NaN 10000.0 4.0 NaN NaN 1.0
John Smith 35000.0 5000.0 NaN NaN 1.0 2.0 NaN NaN
Fred Anderson Cedric Moss 95000.0 5000.0 NaN 10000.0 3.0 1.0 NaN 1.0
Wendy Yule 165000.0 7000.0 5000.0 NaN 7.0 3.0 2.0 NaN
In [1421]: xx.unstack().index
Out[1421]:
MultiIndex([( 'Debra Henley', 'Craig Booker'),
( 'Debra Henley', 'Daniel Hilton'),
( 'Debra Henley', 'John Smith'),
('Fred Anderson', 'Cedric Moss'),
('Fred Anderson', 'Wendy Yule')],
names=['Manager', 'Rep'])
In [1422]: xx.unstack().columns
Out[1422]:
MultiIndex([( 'Price', 'CPU'),
( 'Price', 'Maintenance'),
( 'Price', 'Monitor'),
( 'Price', 'Software'),
('Quantity', 'CPU'),
('Quantity', 'Maintenance'),
('Quantity', 'Monitor'),
('Quantity', 'Software')],
names=[None, 'Product'])
In [1441]: table = pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
...: columns=["Product"],aggfunc={'Price':np.sum, 'Quantity':np.sum},fill_value=0)
In [1442]: table
Out[1442]:
Price Quantity
Product CPU Maintenance Monitor Software CPU Maintenance Monitor Software
Manager Rep
Debra Henley Craig Booker 65000 5000 0 10000 2 2 0 1
Daniel Hilton 105000 0 0 10000 4 0 0 1
John Smith 35000 5000 0 0 1 2 0 0
Fred Anderson Cedric Moss 95000 5000 0 10000 3 1 0 1
Wendy Yule 165000 7000 5000 0 7 3 2 0
In [1443]: table.index
Out[1443]:
MultiIndex([( 'Debra Henley', 'Craig Booker'),
( 'Debra Henley', 'Daniel Hilton'),
( 'Debra Henley', 'John Smith'),
('Fred Anderson', 'Cedric Moss'),
('Fred Anderson', 'Wendy Yule')],
names=['Manager', 'Rep'])
In [1444]: table.columns
Out[1444]:
MultiIndex([( 'Price', 'CPU'),
( 'Price', 'Maintenance'),
( 'Price', 'Monitor'),
( 'Price', 'Software'),
('Quantity', 'CPU'),
('Quantity', 'Maintenance'),
('Quantity', 'Monitor'),
('Quantity', 'Software')],
names=[None, 'Product'])
In [1445]: xx = df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()
In [1448]: xx.unstack()
Out[1448]:
Price Quantity
Product CPU Maintenance Monitor Software CPU Maintenance Monitor Software
Manager Rep
Debra Henley Craig Booker 65000.0 5000.0 NaN 10000.0 2.0 2.0 NaN 1.0
Daniel Hilton 105000.0 NaN NaN 10000.0 4.0 NaN NaN 1.0
John Smith 35000.0 5000.0 NaN NaN 1.0 2.0 NaN NaN
Fred Anderson Cedric Moss 95000.0 5000.0 NaN 10000.0 3.0 1.0 NaN 1.0
Wendy Yule 165000.0 7000.0 5000.0 NaN 7.0 3.0 2.0 NaN
In [1449]: xx.unstack().index
Out[1449]:
MultiIndex([( 'Debra Henley', 'Craig Booker'),
( 'Debra Henley', 'Daniel Hilton'),
( 'Debra Henley', 'John Smith'),
('Fred Anderson', 'Cedric Moss'),
('Fred Anderson', 'Wendy Yule')],
names=['Manager', 'Rep'])
In [1450]: xx.unstack().columns
Out[1450]:
MultiIndex([( 'Price', 'CPU'),
( 'Price', 'Maintenance'),
( 'Price', 'Monitor'),
( 'Price', 'Software'),
('Quantity', 'CPU'),
('Quantity', 'Maintenance'),
('Quantity', 'Monitor'),
('Quantity', 'Software')],
names=[None, 'Product'])
解释:
aggfunc: function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions.
pd.DataFrame(data=[df['指派给'],df['创建于'],df['状态'],df['处理状况']]).T
data = df[['指派给', '创建于', '状态', '处理状况']]
##
df_ori.groupby([key1, key2])[key1].count().unstack().fillna(0)
df_ori.groupby([key1, key2]).size().sort_values(ascending=False).reset_index(name='counts')
melt
与 pivot_table
In [95]: cheese
Out[95]:
first last height weight
0 John Doe 5.5 130
1 Mary Bo 6.0 150
In [96]: redf = cheese.melt(id_vars=['first', 'last'])
In [97]: redf
Out[97]:
first last variable value
0 John Doe height 5.5
1 Mary Bo height 6.0
2 John Doe weight 130.0
3 Mary Bo weight 150.0
In [98]: redf.pivot_table(values='value', index=['first', 'last'], columns='variable').reset_index()
Out[98]:
variable first last height weight
0 John Doe 5.5 130.0
1 Mary Bo 6.0 150.0
In [100]: odf = redf.pivot_table(values='value', index=['first', 'last'], columns='variable').reset_index()
In [101]: odf.columns.name = None
In [102]: odf
Out[102]:
first last height weight
0 John Doe 5.5 130.0
1 Mary Bo 6.0 150.0
Ref
- Pandas —— 透视表pivot_table()和交叉表crosstab()
- 利用Python进行数据分组/数据透视表
- Explore Happiness Data Using Python Pivot Tables
- 3 Examples Using Pivot Table in Pandas
- Pandas Pivot Table Explained
- Pandas | 一文看懂透视表pivot_table
- 使用Python数据透视表挖掘幸福数据
- pandas之groupby分组与pivot_table透视表
- pandas中pivot()方法和groupby()方法的说明和对比
- Pandas: group by and Pivot table difference
- Pivot Tables
- Data Analysis and Visualization with pandas and Jupyter Notebook in Python 3
- 如何用pandas pivot_table透视表数据画图表
- Reshaping and pivot tables
- Aggregate, Group By and Pivot with… Pivot Table!