What is Statistics
A branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.
Probability is a mathematical language to discuss uncertainties and it plays a key role in statistics.
Probability is a mathematical language to discuss uncertainties and it plays a key role in statistics.
In layman's term, statistics is a toolbox with methods to get answers from data.
Terms
Binomial random variable
Gamma distribution is the maximum entropy probability distribution for a radom variable X for which E[X] = kθ = α/β is fixed and greater than 0.
E[ln(X)] = ψ(k) + ln(θ) = ψ(α) − ln(β) is fixed (ψ is the digamma function)
Hypothesis testing
68.26 % of the area under the normal curve is within one standard deviation of the mean. 𝞵 ± 𝞼
95.44 % of the area under the normal curve is within two standard deviation of the mean. 𝞵 ± 2𝞼
99.74 % of the area under the normal curve is within three standard deviation of the mean. 𝞵 ± 3𝞼
Probability distribution
- A distribution of a sum of the squares of k independent standard normal random variables.
- It is a special case of the gamma distribution
- Is one of the most commonly used probability distributions in inferential statistics.
- The random variable from the experiment that has only two possible values or outcomes.
- χ2 test is a hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true.
- In simple term, it often means 'Pearson's chi-squared test.
- Used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.
- It is often constructed from a sum of squared errors or the sample variance.
- It assumes the population has independent normally distributed data, which is valid due to central limit theorem.
- Estimate parameters of a population using a sample.
- Use the mean x from sample to find a range of values that we can be confident to contain the mean of the population sampled
- Lower bound = estimate - margin of error
- Upper bound = estimate + margin of error
- T-intervals - use it when population standard deviation is unknown and original population normal or sample size >= 30. This formulus use sample standard deviation instead of population standard deviation.
- Z-intervals - use it when sample size >= 30 and population standard deviation known, or original population normal with the population standard deviation known.
- Two parameter family of continuous probability distributions.
- Exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution.
- Three different parametrizations in common use:
1. With a shape parameter 𝞳 and a scale parameter 𝞡. 2. With a shape parameter 𝞪 = 𝞳 and an inverse scale parameter 𝞫 = 1/𝞡. 3. With a shape parameter 𝞳 and a mean parameter 𝞵 = 𝞳𝞡 = 𝞪/𝞫.
- Two types of errors in hypothesis testing: I and II.
- Test and draw conclusions about the value of a parameter
- Power analysis
- Tests of proportion
- P-value approach
- symmetrical about its mean.
- Bell-shaped with a single peak at the center of the distribution.
- Arithmetic mean is at the peak and at the center, with half the area above the mean and half under the mean.
- It is asymptotic and the curve gets closer to the X-axis but never really touches it.
- Mean, median and mode are equal
- Curve extends to infinity theoretically
- Standard Normal distribution has a mean of 0 and a standard deviation of 1
- Z-score or Z-value is the distance between a selected value x, and the population mean mu, divided by the population standard deviation sigma.
z = (x-𝞵) / 𝞼
- all possible outcomes of an experiment and the corresponding probability
- the sum of the probabilities of the various outcomes is 1.
- The probability of a particular outcome is between 0 and 1
- The standard deviation of particular probability is in inverse proportion to the sample size
Other basic terms
References