pandas-groupby-pivottable-crosstab |

Reshaping and pivot tables

In [1360]: pd.pivot_table(df,index=["Manager","Rep",'Product'],values=["Price","Quantity"],aggfunc=[np.sum],fill_value=0)
Out[1360]: 
                                            sum         
                                          Price Quantity
Manager       Rep           Product                     
Debra Henley  Craig Booker  CPU           65000        2
                            Maintenance    5000        2
                            Software      10000        1
              Daniel Hilton CPU          105000        4
                            Software      10000        1
              John Smith    CPU           35000        1
                            Maintenance    5000        2
Fred Anderson Cedric Moss   CPU           95000        3
                            Maintenance    5000        1
                            Software      10000        1
              Wendy Yule    CPU          165000        7
                            Maintenance    7000        3
                            Monitor        5000        2


In [1362]: df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()
Out[1362]:
                                          Price  Quantity
Manager       Rep           Product
Debra Henley  Craig Booker  CPU           65000         2
                            Maintenance    5000         2
                            Software      10000         1
              Daniel Hilton CPU          105000         4
                            Software      10000         1
              John Smith    CPU           35000         1
                            Maintenance    5000         2
Fred Anderson Cedric Moss   CPU           95000         3
                            Maintenance    5000         1
                            Software      10000         1
              Wendy Yule    CPU          165000         7
                            Maintenance    7000         3
                            Monitor        5000         2

In [1414]: table = pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
      ...:                columns=["Product"],aggfunc=[np.sum],fill_value=0)

In [1415]: table
Out[1415]:
                                sum
                              Price                              Quantity
Product                         CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep
Debra Henley  Craig Booker    65000        5000       0    10000        2           2       0        1
              Daniel Hilton  105000           0       0    10000        4           0       0        1
              John Smith      35000        5000       0        0        1           2       0        0
Fred Anderson Cedric Moss     95000        5000       0    10000        3           1       0        1
              Wendy Yule     165000        7000    5000        0        7           3       2        0

In [1416]: table.index
Out[1416]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1417]: table.columns
Out[1417]:
MultiIndex([('sum',    'Price',         'CPU'),
            ('sum',    'Price', 'Maintenance'),
            ('sum',    'Price',     'Monitor'),
            ('sum',    'Price',    'Software'),
            ('sum', 'Quantity',         'CPU'),
            ('sum', 'Quantity', 'Maintenance'),
            ('sum', 'Quantity',     'Monitor'),
            ('sum', 'Quantity',    'Software')],
           names=[None, None, 'Product'])

In [1418]: xx = df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()

In [1419]: xx
Out[1419]:
                                          Price  Quantity
Manager       Rep           Product
Debra Henley  Craig Booker  CPU           65000         2
                            Maintenance    5000         2
                            Software      10000         1
              Daniel Hilton CPU          105000         4
                            Software      10000         1
              John Smith    CPU           35000         1
                            Maintenance    5000         2
Fred Anderson Cedric Moss   CPU           95000         3
                            Maintenance    5000         1
                            Software      10000         1
              Wendy Yule    CPU          165000         7
                            Maintenance    7000         3
                            Monitor        5000         2

In [1420]: xx.unstack()
Out[1420]:
                                Price                              Quantity
Product                           CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep
Debra Henley  Craig Booker    65000.0      5000.0     NaN  10000.0      2.0         2.0     NaN      1.0
              Daniel Hilton  105000.0         NaN     NaN  10000.0      4.0         NaN     NaN      1.0
              John Smith      35000.0      5000.0     NaN      NaN      1.0         2.0     NaN      NaN
Fred Anderson Cedric Moss     95000.0      5000.0     NaN  10000.0      3.0         1.0     NaN      1.0
              Wendy Yule     165000.0      7000.0  5000.0      NaN      7.0         3.0     2.0      NaN

In [1421]: xx.unstack().index
Out[1421]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1422]: xx.unstack().columns
Out[1422]:
MultiIndex([(   'Price',         'CPU'),
            (   'Price', 'Maintenance'),
            (   'Price',     'Monitor'),
            (   'Price',    'Software'),
            ('Quantity',         'CPU'),
            ('Quantity', 'Maintenance'),
            ('Quantity',     'Monitor'),
            ('Quantity',    'Software')],
           names=[None, 'Product'])

In [1441]: table = pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
      ...:                columns=["Product"],aggfunc={'Price':np.sum, 'Quantity':np.sum},fill_value=0)

In [1442]: table
Out[1442]: 
                              Price                              Quantity                             
Product                         CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep                                                                                     
Debra Henley  Craig Booker    65000        5000       0    10000        2           2       0        1
              Daniel Hilton  105000           0       0    10000        4           0       0        1
              John Smith      35000        5000       0        0        1           2       0        0
Fred Anderson Cedric Moss     95000        5000       0    10000        3           1       0        1
              Wendy Yule     165000        7000    5000        0        7           3       2        0

In [1443]: table.index
Out[1443]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1444]: table.columns
Out[1444]:
MultiIndex([(   'Price',         'CPU'),
            (   'Price', 'Maintenance'),
            (   'Price',     'Monitor'),
            (   'Price',    'Software'),
            ('Quantity',         'CPU'),
            ('Quantity', 'Maintenance'),
            ('Quantity',     'Monitor'),
            ('Quantity',    'Software')],
           names=[None, 'Product'])

In [1445]: xx = df[['Manager', 'Rep', 'Price', 'Quantity', 'Product']].groupby(["Manager","Rep",'Product']).sum()

In [1448]: xx.unstack()
Out[1448]:
                                Price                              Quantity
Product                           CPU Maintenance Monitor Software      CPU Maintenance Monitor Software
Manager       Rep
Debra Henley  Craig Booker    65000.0      5000.0     NaN  10000.0      2.0         2.0     NaN      1.0
              Daniel Hilton  105000.0         NaN     NaN  10000.0      4.0         NaN     NaN      1.0
              John Smith      35000.0      5000.0     NaN      NaN      1.0         2.0     NaN      NaN
Fred Anderson Cedric Moss     95000.0      5000.0     NaN  10000.0      3.0         1.0     NaN      1.0
              Wendy Yule     165000.0      7000.0  5000.0      NaN      7.0         3.0     2.0      NaN

In [1449]: xx.unstack().index
Out[1449]:
MultiIndex([( 'Debra Henley',  'Craig Booker'),
            ( 'Debra Henley', 'Daniel Hilton'),
            ( 'Debra Henley',    'John Smith'),
            ('Fred Anderson',   'Cedric Moss'),
            ('Fred Anderson',    'Wendy Yule')],
           names=['Manager', 'Rep'])

In [1450]: xx.unstack().columns
Out[1450]:
MultiIndex([(   'Price',         'CPU'),
            (   'Price', 'Maintenance'),
            (   'Price',     'Monitor'),
            (   'Price',    'Software'),
            ('Quantity',         'CPU'),
            ('Quantity', 'Maintenance'),
            ('Quantity',     'Monitor'),
            ('Quantity',    'Software')],
           names=[None, 'Product'])

解释：

aggfunc: function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions.

pd.DataFrame(data=[df['指派给'],df['创建于'],df['状态'],df['处理状况']]).T
data = df[['指派给', '创建于', '状态', '处理状况']]

df_ori.groupby([key1, key2])[key1].count().unstack().fillna(0)
df_ori.groupby([key1, key2]).size().sort_values(ascending=False).reset_index(name='counts')

`melt` 与 `pivot_table`

In [95]: cheese
Out[95]:
  first last  height  weight
0  John  Doe     5.5     130
1  Mary   Bo     6.0     150

In [96]: redf = cheese.melt(id_vars=['first', 'last'])

In [97]: redf
Out[97]:
  first last variable  value
0  John  Doe   height    5.5
1  Mary   Bo   height    6.0
2  John  Doe   weight  130.0
3  Mary   Bo   weight  150.0

In [98]: redf.pivot_table(values='value', index=['first', 'last'], columns='variable').reset_index()
Out[98]:
variable first last  height  weight
0         John  Doe     5.5   130.0
1         Mary   Bo     6.0   150.0

In [100]: odf = redf.pivot_table(values='value', index=['first', 'last'], columns='variable').reset_index()

In [101]: odf.columns.name = None

In [102]: odf
Out[102]:
  first last  height  weight
0  John  Doe     5.5   130.0
1  Mary   Bo     6.0   150.0

melt 与 pivot_table

Ref

`melt` 与 `pivot_table`