Python Knowledge Center: statistiek 2

Centrummaten
Er zijn drie maten om het centrum van een verdeling te beschrijven: De modus geeft de klasse aan met de meeste waarnemingen, de mediaan geeft de klasse aan die de onderste 50% van de bovenste 50% scheidt en het gemiddelde houdt niet alleen rekening met de aantallen, maar ook met de hoogte van elke score…

Extremen uit een distributie kan je weghalen door de gebruik te maken van de interkwartiel range: alleen 2e en 3e kwartiel rondom mediaan

een frequency distribution kan je gebruiken om aan te geven hoe vaak een score voorkomt. Je kan het ook gebruiken om de waarschijnlijkheid van een score te bepalen

Bij een normaal verdeling heb je tabellen waarmee je de waarschijnlijkheid kan opzoeken dat iets voorkomt. Je moet de normaal verdeling wel omzetten naar een z-score

belangrijke waardes
z ligt tussen -1.96 en 1.96 (95% van de scores ) 2.5% aan beide kanten wordt er afgehakt

We've talked a little about the difference between working with a full population of data and working with a sample. In most real-world scenarios, you won't have access to the full population. For example, you're unlikely to have rainfall measures for everyday ever; and even if you did, that's a lot of data to try and manage.
Generally you work with samples of data that are representative of the data, and you use sample statistics such as the mean and standard deviation to approximate the parameters of the full data population. In practice it's best to get as large a sample as you can. The larger the sample, the better it will approximate the distribution and parameters of the full population.

Another thing you can do is to take multiple random samples. Each sample has a sample mean, and you can record these to form what's called a sampling distribution. With enough samples, two things happen.
One is that, thanks to something called the central limit theorem, the sampling distribution takes on a normal shape regardless of the shape of the population distribution; and the second thing is that the mean of the sampling distribution, in other words the mean of all the sample means, will be the same as the population mean.

Python Knowledge Center

vrijdag 8 februari 2019

statistiek 2

Geen opmerkingen:

Een reactie posten

Datums bepalen adhv begin en einddatum in Dataframe

Zoeken in deze blog