value_counts
geef per waarde het aantal voorkomens in een bepaaldedf_iris.species.value_counts()
String namespace
You can apply many of the string methods that are available in core Python. You can access these methods through the "string namespace" by calling .str on a Series object containing strings- movies.title.str.upper().head(3)
- crime_movies = movies[movies.genres.str.contains("crime", case=False)] # the input is actually a regular expression
- years = movies.title.str.extract(r".+ \(([0-9]{4})\)", expand=False)
group_by
By group by we are referring to a process involving one or more of the following steps:
- Splitting the data into groups based on some criteria
- Applying a function to each group independently
- Combining the results into a data structure
In the apply step, we might wish to one of the following:
- Aggregation: computing a summary statistic (or statistics) about each group. Some examples:
Compute group sizes / counts
- Transformation: perform some group-specific computations and return a like-indexed. Some examples:
Filling NAs within groups with a value derived from each group
- Filtration: discard some groups, according to a group-wise computation that evaluates True or False. Some examples:
Filtering out data based on the group sum or mean
vb
gender | children | height | weight | |
---|---|---|---|---|
0 | man | yes | 173.964090 | 68.624437 |
1 | woman | yes | 171.565582 | 77.870054 |
2 | man | no | 174.348325 | 72.875702 |
3 | woman | yes | 178.476953 | 86.612571 |
4 | man | no | 177.031283 | 74.815527 |
5 | woman | no | 167.551663 | 74.784314 |
6 | woman | no | 165.706805 | 84.409427 |
7 | woman | yes | 183.063584 | 70.793055 |
grouped = df.groupby('gender')
grouped.groups
grouped.get_group('woman')
grouped = df.groupby(['gender', 'children'])
grouped.groups
grouped.height.mean()
grouped.aggregate(np.mean)
# grouped.aggregate(lambda x: x.iloc[0])
Geen opmerkingen:
Een reactie posten