Enemies of Data Scientists ! Part - 1
Hello peeps ! Welcome to my new blog !
Ever wondered if data scientists have some enemies?
Well yes !! They do have enemies (not meaning enemy by person). In fact, when a data is given to a data scientists or analyst, the first thing one should go through is look for their enemies.
So what enemies am I exactly talking about. Keep reading this blog.
When a data is given, there might be two major problems in it (which I am calling it as enemies) -
1. Missing Values
2. Outliers
One cannot move to analyze the data without treating these enemies. But what are they exactly? How do we treat them ? I answer all these questions in this blog.
Missing Values
How to find missing values in python?
How to treat this missing values?
These missing values has to be treated as we cannot move to next step. There are many ways of treating missing values.
1. By removing the missing values from main data
When the number of missing values is very small, we can simply remove the missing value from the data
Using the above code, only the missing value will be deleted from the data.
'index_number' is the location of the missing data.
2. By replacing the missing values
a. If the data is continuous
Replacing with mean
- When the variation between the data in that variable is very small
- When the data of the variable has normal distribution.
Replacing with median
- When the variation between the data is large.
- When the data is has skewness (either right skewed or left skewed).
b. If the data is categorical
Codes for filling missing data in python

3. If data has above 50% of missing values, simply delete the column for analysis
Hope this blog was helpful !
Remember - "Every learner was once a beginner''
Comments