Skip to main content

Inferential Statistics - An Introduction


Inferential Statistics

                                         An Introduction

Inferential statistics helps us to predict/inference from the data.


Population: Population is a set of data on which we need to infer.
Sample: Sample is a subset of data which is drawn from Population.

Parameter: The measures we made on population data are called Parameter.
Statistic: The measures we made on Sample data are called Statistic.



Population Parameter
Sample Statistic
Mean – μ
Mean – x
Variance – σ²
Variance – s²
Standard Deviation - σ
Standard Deviation – s



Standard Error:

Per CLT, the mean of each sample’s mean will reflect the population mean.
In the sampling distribution, the variance between all the sample means from the mean is referred as Standard Error.
S.E = σ / √n
  


Confidence Interval & Confidence Level:

To estimate the population parameter (mean), we collect a random sample from the population and compute it to find statistic.
But if we are giving our result as a single value, then we are in risk that the real value may be different even in 1 or 2 numbers always, still that shows that our results are not certain.
What if we provide an interval in which our result lies?!
This interval is called confidence interval.

“Confidence intervals are extremely valuable for any usability professional. A confidence interval is a range that estimates the true population value for a statistic.”—Tom Tullis and Bill Albert

A Confidence Interval is a range of values we are fairly sure our true value lies in.


Here you can notice that 1 red line is not complying with the mean µ.
As all our results are not necessarily to be true, if we give it a percentage, 95% of data falls in the confidence interval.
I.e. 1 in 20 samples may not project the population mean.

Margin of Error:

The margin of error expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter.

Formula to Calculate Margin of Error:
M.E = z * S.

 Recollect what we know from Normal Distribution:

In a normal distribution, we know that 68% of data lies between -1 & +1 Stan. Dev.
95% of data lies in -2 & +2 Stan. Dev. 99.7% of data lies between -3 & +3 Stan. Dev.



This percentage value 95% is the confidence level we provide to our result.
If we say 68% of confidence level, then our result value should be somewhere between -1σ to +1σ. But still there are some possibility that the true mean may fall outside of this range.

If we say 95% of confidence level then less possibility that our true mean falls outside, because 95% of data will fall within the range of -2σ to 2σ. I.e. Our result are more certain than 68% of confidence level.

As we increase the confidence level, the more certain our results are.

Even we can say 99% but we cannot say 100% as there is some probability of our results may become false.
So if we say that 95% is a confidence level, then what we call the other 5%?
It is the range where our results are not true i.e. Fail.
That means, our results are significant.
We call this percentage as significance level.
As the normal distribution is a continuous probability distribution, the entire area under the curve becomes 1.
So the area in confidence level is called “C” – confidence value.
If the confidence level is 95%, then the confidence value becomes 0.95 (95/100).
So. C = 0.95
Obviously, the significant level (We denote it as α) becomes 1 – 0.95 = 0.05.
α = 0.05




Comments

Popular posts from this blog

Inferential Statistics - Hypothesis Testing Part #1

Hypothesis Testing We have methods to test our hypothesis and these methods can be categorized into two parts. Parametric Testing: This type of tests make assumptions about the Population parameters and the distributions that the data came from. These types of test  includes Student's T  tests  and ANOVA   tests , which assume data is from a normal distribution. Non- parametric Testing: Non - parametric tests  are used when there is  no  or few information available about the population parameters. Z Test: To find test statistics, we can use the below formula. Z test can be done if the below 3 points are satisfied. 1.      Sample size should be > 30. 2.      Population SD should be known. 3.      Variables should be continues. Steps for Z Test: 1.      State Null & Alternate Hypothesis. 2.      Find t...

Inferential Statistics - Degrees Of Freedom

Degrees Of Freedom: Degrees of freedom refers to the maximum number of logically independent values in a data sample which have the freedom to vary within. Example: If there is a sample of 3 values {5, x, 15} and the mean of all the values is 10. Now it is easy to say that the value of x would be 10 as the mean of these 3 values is 10. But if 2 values from this sample are not known, say {5, x, y} with same mean 10, then we are now cannot be sure about the exact values of x & y. It could be any values from (10, 15), (15, 10), (5, 20), (20, 5) or even (1, 24). So we cannot determine the exact value of these data x & y. These 2 values has a freedom to vary. But the third value do not have the freedom to change as it has to be some value so that the mean will not change. So this value depends upon all the other values. So the degrees of freedom of this sample data of size 3 is 2. Not only with size 3 sample, a sample with any size we can determine onl...