0%

pandas-groupby-pivottable-crosstab

Reshaping and pivot tables

In [1360]: pd.pivot_table(df,index=["Manager","Rep",'Product'],values=["Price","Quantity"],aggfunc=[np.sum],fill_value=0)
Out[1360]: 
                                            sum         
                                          Price Quantity
Manager       Rep           Product                     
Debra Henley  Craig Booker  CPU           65000        2
                            Maintenance    5000        2
                            Software      10000        1
              Daniel Hilton CPU          105000        4
                            Software      10000        1
              John Smith    CPU           35000        1
                            Maintenance    5000        2
Fred Anderson Cedric Moss   CPU           95000        3
                            Maintenance    5000        1
                            Software      10000        1
              Wendy Yule    CPU          165000        7
                            Maintenance    7000        3
                            Monitor        5000        2


In [1362]: df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()
Out[1362]:
                                          Price  Quantity
Manager       Rep           Product
Debra Henley  Craig Booker  CPU           65000         2
                            Maintenance    5000         2
                            Software      10000         1
              Daniel Hilton CPU          105000         4
                            Software      10000         1
              John Smith    CPU           35000         1
                            Maintenance    5000         2
Fred Anderson Cedric Moss   CPU           95000         3
                            Maintenance    5000         1
                            Software      10000         1
              Wendy Yule    CPU          165000         7
                            Maintenance    7000         3
                            Monitor        5000         2
In [1414]: table = pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
      ...:                columns=["Product"],aggfunc=[np.sum],fill_value=0)

In [1415]: table
Out[1415]:
                                sum
                              Price                              Quantity
Product                         CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep
Debra Henley  Craig Booker    65000        5000       0    10000        2           2       0        1
              Daniel Hilton  105000           0       0    10000        4           0       0        1
              John Smith      35000        5000       0        0        1           2       0        0
Fred Anderson Cedric Moss     95000        5000       0    10000        3           1       0        1
              Wendy Yule     165000        7000    5000        0        7           3       2        0

In [1416]: table.index
Out[1416]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1417]: table.columns
Out[1417]:
MultiIndex([('sum',    'Price',         'CPU'),
            ('sum',    'Price', 'Maintenance'),
            ('sum',    'Price',     'Monitor'),
            ('sum',    'Price',    'Software'),
            ('sum', 'Quantity',         'CPU'),
            ('sum', 'Quantity', 'Maintenance'),
            ('sum', 'Quantity',     'Monitor'),
            ('sum', 'Quantity',    'Software')],
           names=[None, None, 'Product'])

In [1418]: xx = df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()

In [1419]: xx
Out[1419]:
                                          Price  Quantity
Manager       Rep           Product
Debra Henley  Craig Booker  CPU           65000         2
                            Maintenance    5000         2
                            Software      10000         1
              Daniel Hilton CPU          105000         4
                            Software      10000         1
              John Smith    CPU           35000         1
                            Maintenance    5000         2
Fred Anderson Cedric Moss   CPU           95000         3
                            Maintenance    5000         1
                            Software      10000         1
              Wendy Yule    CPU          165000         7
                            Maintenance    7000         3
                            Monitor        5000         2

In [1420]: xx.unstack()
Out[1420]:
                                Price                              Quantity
Product                           CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep
Debra Henley  Craig Booker    65000.0      5000.0     NaN  10000.0      2.0         2.0     NaN      1.0
              Daniel Hilton  105000.0         NaN     NaN  10000.0      4.0         NaN     NaN      1.0
              John Smith      35000.0      5000.0     NaN      NaN      1.0         2.0     NaN      NaN
Fred Anderson Cedric Moss     95000.0      5000.0     NaN  10000.0      3.0         1.0     NaN      1.0
              Wendy Yule     165000.0      7000.0  5000.0      NaN      7.0         3.0     2.0      NaN

In [1421]: xx.unstack().index
Out[1421]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1422]: xx.unstack().columns
Out[1422]:
MultiIndex([(   'Price',         'CPU'),
            (   'Price', 'Maintenance'),
            (   'Price',     'Monitor'),
            (   'Price',    'Software'),
            ('Quantity',         'CPU'),
            ('Quantity', 'Maintenance'),
            ('Quantity',     'Monitor'),
            ('Quantity',    'Software')],
           names=[None, 'Product'])

In [1441]: table = pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
      ...:                columns=["Product"],aggfunc={'Price':np.sum, 'Quantity':np.sum},fill_value=0)

In [1442]: table
Out[1442]: 
                              Price                              Quantity                             
Product                         CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep                                                                                     
Debra Henley  Craig Booker    65000        5000       0    10000        2           2       0        1
              Daniel Hilton  105000           0       0    10000        4           0       0        1
              John Smith      35000        5000       0        0        1           2       0        0
Fred Anderson Cedric Moss     95000        5000       0    10000        3           1       0        1
              Wendy Yule     165000        7000    5000        0        7           3       2        0

In [1443]: table.index
Out[1443]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1444]: table.columns
Out[1444]:
MultiIndex([(   'Price',         'CPU'),
            (   'Price', 'Maintenance'),
            (   'Price',     'Monitor'),
            (   'Price',    'Software'),
            ('Quantity',         'CPU'),
            ('Quantity', 'Maintenance'),
            ('Quantity',     'Monitor'),
            ('Quantity',    'Software')],
           names=[None, 'Product'])

In [1445]: xx = df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()

In [1448]: xx.unstack()
Out[1448]:
                                Price                              Quantity
Product                           CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep
Debra Henley  Craig Booker    65000.0      5000.0     NaN  10000.0      2.0         2.0     NaN      1.0
              Daniel Hilton  105000.0         NaN     NaN  10000.0      4.0         NaN     NaN      1.0
              John Smith      35000.0      5000.0     NaN      NaN      1.0         2.0     NaN      NaN
Fred Anderson Cedric Moss     95000.0      5000.0     NaN  10000.0      3.0         1.0     NaN      1.0
              Wendy Yule     165000.0      7000.0  5000.0      NaN      7.0         3.0     2.0      NaN

In [1449]: xx.unstack().index
Out[1449]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1450]: xx.unstack().columns
Out[1450]:
MultiIndex([(   'Price',         'CPU'),
            (   'Price', 'Maintenance'),
            (   'Price',     'Monitor'),
            (   'Price',    'Software'),
            ('Quantity',         'CPU'),
            ('Quantity', 'Maintenance'),
            ('Quantity',     'Monitor'),
            ('Quantity',    'Software')],
           names=[None, 'Product'])

解释:

aggfunc: function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions.

pd.DataFrame(data=[df['指派给'],df['创建于'],df['状态'],df['处理状况']]).T
data = df[['指派给', '创建于', '状态', '处理状况']]

##

df_ori.groupby([key1, key2])[key1].count().unstack().fillna(0)
df_ori.groupby([key1, key2]).size().sort_values(ascending=False).reset_index(name='counts')

meltpivot_table

In [95]: cheese
Out[95]:
  first last  height  weight
0  John  Doe     5.5     130
1  Mary   Bo     6.0     150

In [96]: redf = cheese.melt(id_vars=['first', 'last'])

In [97]: redf
Out[97]:
  first last variable  value
0  John  Doe   height    5.5
1  Mary   Bo   height    6.0
2  John  Doe   weight  130.0
3  Mary   Bo   weight  150.0

In [98]: redf.pivot_table(values='value', index=['first', 'last'], columns='variable').reset_index()
Out[98]:
variable first last  height  weight
0         John  Doe     5.5   130.0
1         Mary   Bo     6.0   150.0

In [100]: odf = redf.pivot_table(values='value', index=['first', 'last'], columns='variable').reset_index()

In [101]: odf.columns.name = None

In [102]: odf
Out[102]:
  first last  height  weight
0  John  Doe     5.5   130.0
1  Mary   Bo     6.0   150.0

Ref

  1. Pandas —— 透视表pivot_table()和交叉表crosstab()
  2. 利用Python进行数据分组/数据透视表
  3. Explore Happiness Data Using Python Pivot Tables
  4. 3 Examples Using Pivot Table in Pandas
  5. Pandas Pivot Table Explained
  6. Pandas | 一文看懂透视表pivot_table
  7. 使用Python数据透视表挖掘幸福数据
  8. pandas之groupby分组与pivot_table透视表
  9. pandas中pivot()方法和groupby()方法的说明和对比
  10. Pandas: group by and Pivot table difference
  11. Pivot Tables
  12. Data Analysis and Visualization with pandas and Jupyter Notebook in Python 3
  13. 如何用pandas pivot_table透视表数据画图表
  14. Reshaping and pivot tables
  15. Aggregate, Group By and Pivot with… Pivot Table!