Normalization and Standardization Use Case
Case study:
We Have a used cars dataset from the website.
- This dataset contains information about used cars.
- This data can be used for a lot of purposes such as price prediction to exemplify the use of linear regression in Machine Learning.
- The columns in the given dataset are as follows: name, year, selling_price, km_driven, fuel, distance, seller_type, transmission, Owner
For used motorcycle datasets please go to https://www.kaggle.com/nehalbirla/motorcycle-dataset
Here using the above features we should predict the selling price of cars. so feature km_driven and distance are in different scaling if we load these features into a model then prediction may go wrong due to the wrong interpretation of slops.
To overcome these we will scale down these features into normal values between 0 to 1.
from sklearn.preprocessing import MinMaxScaler
Minscaler = MinMaxScaler()
scaler = Minscaler.fit('distance', 'km_driven')
scaler.data_min_
scaler.data_max_
X_scaled.describe()
Here SKlearn has a module called MinMaxScaler, we are creating an instance of MinMaxscaler
as Minscaler and applying a fit function to distance and km_driven features so that it
will convert all the values between 0 to 1 by applying the Normalization formula.
Then check the minimum and maximum value by validating through describe function.
Comments
Post a Comment