Normalization and Standardization Use Case

September 14, 2021

Case study:

We Have a used cars dataset from the website.

This dataset contains information about used cars.
This data can be used for a lot of purposes such as price prediction to exemplify the use of linear regression in Machine Learning.
The columns in the given dataset are as follows: name, year, selling_price, km_driven, fuel, distance, seller_type, transmission, Owner

For used motorcycle datasets please go to https://www.kaggle.com/nehalbirla/motorcycle-dataset

Here using the above features we should predict the selling price of cars. so feature km_driven and distance are in different scaling if we load these features into a model then prediction may go wrong due to the wrong interpretation of slops.

To overcome these we will scale down these features into normal values between 0 to 1.

from sklearn.preprocessing import MinMaxScaler

Minscaler = MinMaxScaler()
scaler = Minscaler.fit('distance', 'km_driven')
scaler.data_min_
scaler.data_max_
X_scaled.describe()

Here SKlearn has a module called MinMaxScaler, we are creating an instance of MinMaxscaler
as Minscaler and applying a fit function to distance and km_driven features so that it 
will convert all the values between 0 to 1 by applying the Normalization formula.

Then check the minimum and maximum value by validating through describe function.

Search This Blog

Data Science Thoughts

Normalization and Standardization Use Case

Comments

Post a Comment

Popular posts from this blog

Transformers: Self-attention

Retrieval Augmented Generation(RAG)

Large Language Models(LLMs)