Location Parameters – Code With Prince

Knowing the center point of your sample data can be very crucial in Statistics. We use the location measures to establish the center of the data, specifically mean, median and mode. Measures of location are mostly used to summarize data using just a single metric value. For example, given the height data of a million people, if they tell you the mean of the heights is six feet, you kind of have a very “good” idea of this distribution of the data.

The most commonly used location measure is the mean, median or the mode. The selection choice on what gets used depends on the scale level of the underlying variable and distribution of the data as well as if there are any outliers in the data. For example, mode is more of nominal scale, median is ordinal (you first order the values in ascending order before splitting them) and median is more of metric scale.

Mean

Mean is simply the average of the data at hand. When we say mean in statistics, most often, we are referring to Arithmetic mean.

Putting it in a Mathematical sense, arithmetic mean or simply mean is the sum of all data points in the provided data, divided by the total number of available data points in the data.

Arithmetic mean can only be computed for metric scale variable types. E.g Height of people, age, blood pressure, temperature, net worth, just to mention a few.

The quadratic mean (also called the root mean square or RMS) is a type of average that emphasizes larger values more than the arithmetic mean does.

Formula

For a set of numbers x₁, x₂, …, xₙ, the quadratic mean is:

How it works

The process is right in the name:

Square each value
Find the mean of those squares
Take the square root of that mean

Example

For the numbers 3, 4, and 5:

Square them: 9, 16, 25
Find the mean: (9 + 16 + 25)/3 = 50/3 ≈ 16.67
Take the square root: √16.67 ≈ 4.08

Compare this to the arithmetic mean of (3 + 4 + 5)/3 = 4. The quadratic mean is larger because it gives more weight to larger values.

Why it matters

The quadratic mean is useful when:

Magnitudes matter more than signs — squaring eliminates negatives, so it focuses on size
You’re dealing with quantities that should be squared — like voltage, current, or velocity in physics
Larger deviations are more significant — in statistics and signal processing

For instance, electrical engineers use RMS voltage because power is proportional to voltage squared. An AC voltage that alternates between +170V and -170V has an RMS of 120V, which is the DC voltage that would produce the same heating effect.

Geometric Mean

The geometric mean is a type of average that’s particularly useful for rates, ratios, and values that multiply together.

Formula

For a set of numbers x₁, x₂, …, xₙ, the geometric mean is:

Or equivalently:

How it works

Multiply all the values together
Take the nth root (where n is how many values you have)

Example

For the numbers 2, 8, and 4:

Multiply them: 2 × 8 × 4 = 64
Take the cube root (3rd root): ∛64 = 4

Compare this to the arithmetic mean of (2 + 8 + 4)/3 = 4.67. The geometric mean is smaller because it’s less influenced by extreme values.

Why it matters

The geometric mean is the right choice when:

Dealing with growth rates or percentages — like investment returns over multiple years
Values are multiplicative rather than additive — things that compound or scale
Comparing things with different ranges — it handles ratios better than arithmetic mean

Practical example

If your investment grows 50% one year (+50%) and shrinks 20% the next (-20%):

Wrong approach: Arithmetic mean = (50% + (-20%))/2 = 15% per year
Right approach: The actual multipliers are 1.5 and 0.8
Geometric mean = √(1.5 × 0.8) = √1.2 ≈ 1.095
This means ~9.5% average growth per year, which correctly represents your actual returns

The geometric mean always gives you the true average rate of change for multiplicative processes.

Median

When you order data in increasing order and split it in half right down the middle, the value in the middle is the median. In such a way that half of the values are lower than the median value and the other half of the data is above the median value. So you can think of the median as the “middle value”.

You can only calculate median for data that is ordinal or metric scales, the reason for this is because you have to first order the value in ascending or descending order before splitting it into two halves. The data needs to be ordinally scaled to allow us to compute the ranking and use that to sort the data, without any ranking, then you can not calculate the median, so in this case nominal data can not be used to compute the median. Example of data you can calculate median on are, ages, salaries, grades and number of children.

Even and Odd Number Of Dataset

When your dataset size is an odd number, the median is simply a single value

When the dataset size is an even number, then you’ll get two middle values, the mean of these two values is the median of the data.

Outliers In Dataset

When you have outliers in your dataset, should you compute the mean or median?

The simple answer is that outliers do not have an influence (leverage) on the median, unlike mean that is influenced by the presence of an outlier.

Symmetric Data

When a dataset is normally distributed, the median and the mode will be the same value.

Mode

Model is the number or value in the data set with the largest frequency (most occurrences).

Comparison Of Mean, Mode and Median

Mean, mode, and median are three fundamental measures of central tendency used in statistics to describe the center or typical value of a dataset. Each has distinct characteristics that make it more or less suitable depending on the nature of the data and the presence of outliers.

Definitions

Mean (Arithmetic Average): The sum of all values divided by the number of values. Formula: x̄ = (x₁ + x₂ + … + xₙ)/n

Median: The middle value when data is arranged in order. For even-numbered datasets, it’s the average of the two middle values.

Mode: The value that appears most frequently in the dataset. A dataset can have one mode (unimodal), multiple modes (bimodal or multimodal), or no mode at all.

Impact of Outliers

Mean: Highly sensitive to outliers. A single extreme value can drastically shift the mean away from the typical center of the data. For example, if salaries in a company are $30K, $32K, $35K, $38K, and $500K, the mean ($127K) is misleading because the CEO’s salary pulls it upward.

Median: Resistant to outliers. The median only considers the position of values, not their magnitude. In the salary example above, the median remains $35K, accurately reflecting the typical employee’s salary.

Mode: Unaffected by outliers. The mode depends only on frequency of occurrence, so extreme values have no special influence unless they happen to be the most common value.

Variable Scales and Applicability

Nominal Scale (categorical data with no order): Examples include gender, color, brand names.

Mean: Not applicable (cannot average categories)
Median: Not applicable (no meaningful ordering)
Mode: Best choice (identifies most common category)

Ordinal Scale (ranked categories): Examples include satisfaction ratings (poor, fair, good, excellent), education level, socioeconomic status.

Mean: Problematic (assumes equal intervals between ranks)
Median: Appropriate (uses position, not magnitude)
Mode: Appropriate (identifies most common rank)

Interval/Ratio Scale (metric data with meaningful numerical differences): Examples include temperature, height, weight, income, test scores.

Mean: Fully appropriate (utilizes all numerical information)
Median: Appropriate (though doesn’t use all information)
Mode: Less informative (ignores most data characteristics)

When to Use Each Measure

Use the Mean when:

Data is measured on an interval or ratio scale
The distribution is relatively symmetric with no extreme outliers
You need to use all the data in your calculation
You plan to perform further statistical analyses (the mean has valuable mathematical properties)
Examples: average test scores in a class, mean daily temperature, average reaction time in an experiment

Use the Median when:

Data contains outliers or is skewed
You want a measure resistant to extreme values
Data is on an ordinal scale
You’re dealing with open-ended distributions (like income where the highest category is “$100,000+”)
Examples: median household income, median home prices, median recovery time for patients

Use the Mode when:

Data is nominal (categorical)
You want to identify the most typical or popular category
Data is discrete with repeated values
Multiple modes might reveal interesting patterns (bimodal distributions suggest two distinct groups)
Examples: most common shoe size sold, most frequent customer complaint, most popular product color

Practical Example

Consider a dataset of house prices in a neighborhood: $150K, $155K, $160K, $165K, $170K, $175K, $2.5M

Mean: $504K (misleading due to mansion)
Median: $165K (best representation of typical home)
Mode: None (each value appears once)

In this case, the median provides the most accurate picture of what a typical house costs in the neighborhood. The mean is inflated by the single luxury property, making it a poor choice for describing central tendency.

Conclusion

Congratulations for making it to the end!

Other platforms where you can reach out to me:

Happy coding! And see you next time, the world keeps spinning.

Mean

Formula

How it works

Example

Why it matters

Geometric Mean

Formula

How it works

Example

Why it matters

Practical example

Median

Even and Odd Number Of Dataset

Outliers In Dataset

Symmetric Data

Mode

Comparison Of Mean, Mode and Median

Definitions

Impact of Outliers

Variable Scales and Applicability

When to Use Each Measure

Practical Example

Conclusion

Leave a Reply Cancel reply

Quick Links

Services

Legal