Normalization and Standardization Use Case

Case study:

 We Have a used cars dataset from the website.

  • This dataset contains information about used cars.
  • This data can be used for a lot of purposes such as price prediction to exemplify the use of linear regression in Machine Learning.
  • The columns in the given dataset are as follows: name, year, selling_price, km_driven, fuel, distance, seller_type, transmission, Owner

For used motorcycle datasets please go to https://www.kaggle.com/nehalbirla/motorcycle-dataset


Here using the above features we should predict the selling price of cars. so feature km_driven and distance are in different scaling if we load these features into a model then prediction may go wrong due to the wrong interpretation of slops.

To overcome these we will scale down these features into normal values between 0 to 1.


from sklearn.preprocessing import MinMaxScaler
Minscaler = MinMaxScaler()
scaler = Minscaler.fit('distance', 'km_driven')
scaler.data_min_
scaler.data_max_
X_scaled.describe()

Here SKlearn has a module called MinMaxScaler, we are creating an instance of MinMaxscaler
as Minscaler and applying a fit function to distance and km_driven features so that it
will convert all the values between 0 to 1 by applying the Normalization formula.

Then check the minimum and maximum value by validating through describe function.

Comments

Popular posts from this blog

Transformers: Self-attention

Retrieval Augmented Generation(RAG)

Large Language Models(LLMs)