Outlier detection techniques(python)| how to avoid outliers without deleting it

Опубликовано: 29 Декабрь 2020
на канале: Coder's Digest
11,958
like

Discover advanced outlier detection techniques in Python and learn how to handle outliers without deleting them. This comprehensive tutorial covers methods such as Isolation Forest, Local Outlier Factor, and more.

🔴 *Timestamps:*
00:00 Introduction
01:30 What are Outliers?
03:00 Using Box Plots for Detection
05:00 Isolation Forest Technique
07:30 Local Outlier Factor (LOF)
10:00 Avoiding Deletion of Outliers
12:00 Real-World Examples

🔗 *Resources:*
[Python Documentation](https://docs.python.org/3/)
[Pandas Documentation](https://pandas.pydata.org/pandas-docs...)
[Matplotlib Documentation](https://matplotlib.org/stable/content...)

👍 *Like and Subscribe:* If you found this video helpful, please give it a thumbs up and subscribe for more tutorials!


we will discuss Outlier detection techniques or outlier detection techniques in data mining and ways to Treat outliers effectively using interquartile range.
we will also discuss how to avoid outliers without deleting it
What is an outlier?
in simple terms an outllier is an unusual term which stands out completely from rest of the observations and does cause significate change to sample mean etc, we will plot qq plot and histograms to visualize outlers.

Due to outlier our anlysis and understanding of the data can be completely different from the reality , posing an incorrect or false representation.

for example lets take salarys of 5 individuals are as following:
10000,12000,9500,8800,1000000
we can see that the salary of the 5th individual is way higher than rest of the persons , and if we say then we can conclude that the mean salary is .

There are multiple statistiscal approaches such as z score , proximity models etc to detect outliers but for this demonstration we will more convinient and followed approach and will determine using histograms and box plots etc.

In this demo we will follow the IQR approach to filter and deal witg outliers. as we know that lower limit for any observation is Q1- 1.5* IQR and upper limit is Q3 + 1.5 IQR
these terms are as follow:


Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3- Q1