woensdag 27 maart 2019

python handigheidjes 1

value_counts

geef per waarde het aantal voorkomens in een bepaalde 

df_iris.species.value_counts()
versicolor    50
setosa        50
virginica     50
Name: species, dtype: int64


get_dummies

To get the genres as seperate columns we can use the get_dummies method in the str namespace. This splits strings by a seperator and creates dummy variables for all the possible values:
 

sort_values


You can sort dataframes (or Series) by values using the sort_values method:

 


String namespace

You can apply many of the string methods that are available in core Python. You can access these methods through the "string namespace" by calling .str on a Series object containing strings

  • movies.title.str.upper().head(3)

  • crime_movies = movies[movies.genres.str.contains("crime", case=False)] # the input is actually a regular expression

  • years = movies.title.str.extract(r".+ \(([0-9]{4})\)", expand=False)

group_by


By group by we are referring to a process involving one or more of the following steps:

  •     Splitting the data into groups based on some criteria
  •     Applying a function to each group independently
  •     Combining the results into a data structure

In the apply step, we might wish to one of the following:

  •     Aggregation: computing a summary statistic (or statistics) about each group. Some examples:
        Compute group sums or means
        Compute group sizes / counts
  •     Transformation: perform some group-specific computations and return a like-indexed. Some examples:
        Standardizing data (zscore) within group
        Filling NAs within groups with a value derived from each group
  •     Filtration: discard some groups, according to a group-wise computation that evaluates True or False. Some examples:
        Discarding data that belongs to groups with only a few members
        Filtering out data based on the group sum or mean


vb
gender children height weight
0 man yes 173.964090 68.624437
1 woman yes 171.565582 77.870054
2 man no 174.348325 72.875702
3 woman yes 178.476953 86.612571
4 man no 177.031283 74.815527
5 woman no 167.551663 74.784314
6 woman no 165.706805 84.409427
7 woman yes 183.063584 70.793055

grouped = df.groupby('gender')

grouped.groups

grouped.get_group('woman')

grouped = df.groupby(['gender', 'children'])
grouped.groups

grouped.height.mean()

grouped.aggregate(np.mean)
# grouped.aggregate(lambda x: x.iloc[0])

Geen opmerkingen:

Een reactie posten

Datums bepalen adhv begin en einddatum in Dataframe

Voorbeeld op losse velden  ####################################################################### # import necessary packages from datetime...