×
Outlier Detection
Outlier Detection identifies and visualizes
outliers in continuous columns of a dataset using
the Interquartile Range (IQR) method.
Formula
The Interquartile Range (IQR) is defined as:
\[
\text{IQR} = Q3 - Q1
\]
Where:
- Q1: First Quartile (25th percentile).
- Q3: Third Quartile (75th percentile).
The bounds for outlier detection are:
\[
\text{Lower Bound} = Q1 - 1.5 \times \text{IQR}, \quad
\text{Upper Bound} = Q3 + 1.5 \times \text{IQR}
\]
Outliers are values that fall outside the range defined
by the Lower Bound and Upper
Bound.
What it Means?
- Low Outlier Presence: Data is
well-distributed.
- High Outlier Presence: Significant deviations
in data, which might require further
cleaning or analysis.
Example Dataset
Suppose we have a dataset with the following column
values:
Index |
Value |
1 |
10 |
2 |
15 |
3 |
200 |
4 |
18 |
5 |
12 |
Calculation
Q1 = 12, Q3 = 18, IQR = 6.
\[
\text{Lower Bound} = Q1 - 1.5 \times \text{IQR} = 12 -
1.5 \times 6 = 3
\]
\[
\text{Upper Bound} = Q3 + 1.5 \times \text{IQR} = 18 +
1.5 \times 6 = 27
\]
Outliers:
Values outside the range [3, 27] are outliers. In this
case, 200 is an outlier.
Boxplot:
The function would generate a boxplot with a whisker up
to 27 and highlight 200 as an
outlier.