basis functies op dataset
# How many rows the dataset data['item'].count() # What was the longest phone call / data entry? data['duration'].max() # How many seconds of phone calls are recorded in total? data['duration'][data['item'] == 'call'].sum() # How many entries are there for each month? data['month'].value_counts() Out[41]: 2014-11 230 2015-01 205 2014-12 157 2015-02 137 2015-03 101 dtype: int64 # Number of non-null unique network entries data['network'].nunique()
group by
op 1 variabele:
data.groupby('month')['duration'].sum()
op meerdere variabelen
data.groupby(['month', 'item'])['date'].count()
output format group by
As a rule of thumb, if you calculate more than one column of results,
your result will be a Dataframe. For a single column of results, the agg
function, by default, will produce a Series.
# produces Pandas Series
data.groupby('month')['duration'].sum()
of
data.groupby('JAAR')['BEDRAG'].agg('sum'))
# Produces Pandas DataFrame
data.groupby('month')[['duration']].sum() # Produces Pandas DataFrame
of
dfjgd2.groupby('JAAR')['BEDRAG'].agg(['sum'])
Opmaak aanpassen van grouping output
dfjgd2.groupby('JAAR')['BEDRAG'].sum().apply(lambda x: '{:.2f}'.format(x))
of
maak een dataframe
df1=dfjgd2.groupby('JAAR')['BEDRAG'].agg(['sum'])
pas toe op de kolom (series)
df1['sum'].apply(lambda x: '{:.2f}'.format(x))
Multi-Index 1
The groupby output will have an index or multi-index
on rows corresponding to your chosen grouping variables. To avoid
setting this index, pass “as_index=False” to the groupby operation.
data.groupby('month', as_index=False).agg({"duration": "sum"})
https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/
aggregations = { 'bsn':'nunique','bedrag': 'sum'}
df_zeng =dflever.groupby('code_voorziening').agg(aggregations)
df_zeng.rename(columns={"bsn": "aantal_klant_facturatie", "bedrag": "bedrag_gefactureerd"})
gebruik nunique
aggregations = { 'bsn':'nunique','bedrag': 'sum'}
Multi-index manier 2
Een andere manier is via reset_indexdata.groupby('month', as_index=False).agg({"duration": "sum"})
data.reset_index(drop=False)
Meerdere groepering op verschillende velden in 1x
aggregations = { 'bsn':'count','bedrag': 'sum'}aggregations = { 'bsn':'nunique','bedrag': 'sum'}
df_zeng =dflever.groupby('code_voorziening').agg(aggregations)
df_zeng.rename(columns={"bsn": "aantal_klant_facturatie", "bedrag": "bedrag_gefactureerd"})
Unieke aantal tellen
gebruik nunique
aggregations = { 'bsn':'nunique','bedrag': 'sum'}
Geen opmerkingen:
Een reactie posten