Python Knowledge Center: summarize aggregate and grouping

basis functies op dataset

# How many rows the dataset
data['item'].count()

# What was the longest phone call / data entry?
data['duration'].max()

# How many seconds of phone calls are recorded in total?
data['duration'][data['item'] == 'call'].sum()

# How many entries are there for each month?
data['month'].value_counts()
Out[41]: 
2014-11    230
2015-01    205
2014-12    157
2015-02    137
2015-03    101
dtype: int64

# Number of non-null unique network entries
data['network'].nunique()

group by

op 1 variabele:

data.groupby('month')['duration'].sum()

op meerdere variabelen

 data.groupby(['month', 'item'])['date'].count()

output format group by

As a rule of thumb, if you calculate more than one column of results, your result will be a Dataframe. For a single column of results, the agg function, by default, will produce a Series.

You can change this by selecting your operation column differently:

# produces Pandas Series

data.groupby('month')['duration'].sum()

of

data.groupby('JAAR')['BEDRAG'].agg('sum'))

 # Produces Pandas DataFrame

 data.groupby('month')[['duration']].sum() # Produces Pandas DataFrame

of

dfjgd2.groupby('JAAR')['BEDRAG'].agg(['sum'])

Opmaak aanpassen van grouping output

dfjgd2.groupby('JAAR')['BEDRAG'].sum().apply(lambda x: '{:.2f}'.format(x))

of

maak een dataframe

df1=dfjgd2.groupby('JAAR')['BEDRAG'].agg(['sum'])

pas toe op de kolom (series)

df1['sum'].apply(lambda x: '{:.2f}'.format(x))

Multi-Index 1

The groupby output will have an index or multi-index on rows corresponding to your chosen grouping variables. To avoid setting this index, pass “as_index=False” to the groupby operation.

data.groupby('month', as_index=False).agg({"duration": "sum"})

https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/

Multi-index manier 2

Een andere manier is via reset_index

data.groupby('month', as_index=False).agg({"duration": "sum"})

data.reset_index(drop=False)

Meerdere groepering op verschillende velden in 1x

aggregations = { 'bsn':'count','bedrag': 'sum'}
aggregations = { 'bsn':'nunique','bedrag': 'sum'}
df_zeng =dflever.groupby('code_voorziening').agg(aggregations)

df_zeng.rename(columns={"bsn": "aantal_klant_facturatie", "bedrag": "bedrag_gefactureerd"})

Unieke aantal tellen

gebruik nunique
aggregations = { 'bsn':'nunique','bedrag': 'sum'}

Python Knowledge Center

maandag 3 september 2018

summarize aggregate and grouping