Python Knowledge Center: python handigheidjes 1

value_counts

geef per waarde het aantal voorkomens in een bepaalde

df_iris.species.value_counts()

versicolor    50
setosa        50
virginica     50
Name: species, dtype: int64

get_dummies

To get the genres as seperate columns we can use the get_dummies method in the str namespace. This splits strings by a seperator and creates dummy variables for all the possible values:

movies.genres.str.get_dummies(sep='|').head()

sort_values

You can sort dataframes (or Series) by values using the sort_values method:

movies_sorted = movies.sort_values('genres')
movies_sorted.head(10)

String namespace

You can apply many of the string methods that are available in core Python. You can access these methods through the "string namespace" by calling .str on a Series object containing strings

movies.title.str.upper().head(3)
crime_movies = movies[movies.genres.str.contains("crime", case=False)] # the input is actually a regular expression
years = movies.title.str.extract(r".+ \(([0-9]{4})\)", expand=False)

group_by

By group by we are referring to a process involving one or more of the following steps:

Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure

In the apply step, we might wish to one of the following:

Aggregation: computing a summary statistic (or statistics) about each group. Some examples:

Compute group sums or means
Compute group sizes / counts

Transformation: perform some group-specific computations and return a like-indexed. Some examples:

Standardizing data (zscore) within group
Filling NAs within groups with a value derived from each group

Filtration: discard some groups, according to a group-wise computation that evaluates True or False. Some examples:

Discarding data that belongs to groups with only a few members
Filtering out data based on the group sum or mean

vb

gender	children	height	weight
0	man	yes	173.964090	68.624437
1	woman	yes	171.565582	77.870054
2	man	no	174.348325	72.875702
3	woman	yes	178.476953	86.612571
4	man	no	177.031283	74.815527
5	woman	no	167.551663	74.784314
6	woman	no	165.706805	84.409427
7	woman	yes	183.063584	70.793055

grouped = df.groupby('gender')

grouped.groups

grouped.get_group('woman')

grouped = df.groupby(['gender', 'children'])
grouped.groups

grouped.height.mean()

grouped.aggregate(np.mean)
# grouped.aggregate(lambda x: x.iloc[0])

Python Knowledge Center

woensdag 27 maart 2019

python handigheidjes 1

value_counts

get_dummies

sort_values

String namespace

group_by

Geen opmerkingen:

Een reactie posten

Datums bepalen adhv begin en einddatum in Dataframe

Zoeken in deze blog