Introduction to Basic Statistics

A Beginner’s Journey into Statistics through Mean, Median, Standard Deviation, Quartiles, and Percentiles.

Statistics is a powerful tool for making sense of data, and mastering its basics can unlock a deeper understanding of the world around us. Let’s embark on a journey through some foundational statistical concepts: Mean, Median, Standard Deviation, Quartiles, and Percentiles. Whether you’re refreshing your high school knowledge or learning anew, this story will help you grasp what these terms mean and how they’re calculated.

Of course, this is basic mathematics that we all would have learnt in our High School classes. However, this will brush up our learnings to go through these terminologies and understand what they mean and how they are calculated/derived.

The Mean: The Story of the Average
Mean value or Mean is the average value of any given data set. This is simple, but why do we need Mean? what is the use of mean? Mean is termed as Representation of any given dataset.

For Ex: – the weights of students in a class. So, now, if this dataset has to be summarized, we usually take the average that dataset and represent as an average weight of that class is = X kgs / pounds.

So, Mean is generally used as the average value that can give a good picture about the representation of any given dataset.

Statisticians often refer to the mean and median as measures of Central tendency.

How do we calculate Mean ?

Mean = ( Sum of observations ) / Number of observations => Average of the dataset OR Mean of dataset.

Mean = ( Sum of observations ) / Number of observations => Average of the dataset OR Mean of dataset

Formula for calculating mean is :-  Mean is often termed as mu ( μ )

 ranging from 1 to where being the number of observations of a given data set.

The Median: Finding the Middle Ground
Median is similar to mean, with the data arranged in ascending order and then choose the mid-point of the No. Of Observations.

If, the (Count of Observations) is an odd number, it is fairly easy to find the median = middle value.

Sample Median =   X( ( n + 1) /2 ),  where n is odd,

If the ( Count of Observations )  is even, then, median =  sum of two middle values / 2

Sample Median =  1/2 X (n/2) , 1/2 x ( (n+1)/2), where n is even

The Sample Median is mostly preferred over Sample Mean, especially, if the data is asymmetric, i.e., if the data is not balanced and if there are too many outliers that is not balancing the data to measure the central tendency.

Couple of important differences between Mean and Median are:-

MeanMedian
Average of the dataset and also called / known as Center of distributionThe mid-number of the set of numbers arranged / sorted in ascending order
Minimizes the sum of squared deviations i.e., Sum of the squares of the deviations (Squares of each observation when subtracted from Mean) is less when compared to sum of squares of deviations from MedianMinimizes the sum of absolute deviations I.e., Absolute sum of deviations (sum of every observation after subtracting from Median) is less when compared to absolute deviations from Mean.

Note:- #2  holds good for the data which has outliers and where there’s a significant difference between Mean and Median, else, if mean and median are almost same, then the sum of squared deviations of mean and sum of absolute deviations of Median will be close to each other.

Standard Deviation:– is the deviation of every observation from the mean value. Standard deviation is the square root of sum of squares of the difference between mean and every observation of the data set, given by the formula.

 ranging from 1 to n

 is the ith observation & is the mean of a data set.

Inter Quartiles Range IQR: – 
Interquartile Range is a measure of statistical spread of the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR is calculated as

IQR = Q₃ − Q₁.

IQR is also called the midspread or the middle 50% of the data or the H spread.

How to compute the IQR:-

Sort the data in ascending order
Find the Median ( Q2 )
Find Medians of both upper half ( Q3 ) ( 75th percentile )  and Lower half ( Q1 ) ( 25th Percentile )
IQR is the difference between Q3 & Q1. 

Below diagram shows the different Quartiles

Q2 = Median

Q1 = Lower half median

Q3 = Upper half median

Outlier :  An outlier is an observation point that is distant from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error.

Outlier can be considered as Values >  Q3 +1.5 * (IQR)<  Q1 – 1.5 * (IQR)  of a given data set.

Leave a Comment

Your email address will not be published. Required fields are marked *