Normalization and Standardization (Train Test Split)

September 04, 2021

1. When you should use Standard normalization and MinMaxscaler?

In most scenarios whenever we use a machine learning algorithm that involves Euclidean distance and gradient descent basically it means parabola curve where you find the best minimal point in order to retrieve that point we need to scale down the values. most of the algorithms we use normalization.

2. Should we need to perform Normalization and Standardization before Train Test Split or After Train Test Split of the dataset.

Firstly we divide our complete Dataset into Train and Test Datasets. Train Data is used to train our model, Test Data will be given to our model to test model accuracy before passing unseen data.

If we perform Normalization and standardization to our entire dataset before Train Test Split, we will face issues with interpreting the available slops because slops are calculated on given different units or scaling so it's wrong interpreting the slops of our model.

So First Perform Normalization and Standardization to Train Data and interpreter the calculated slops, apply Transform to the Test data, and calculate the accuracy.

Here we will perform the Fit function only once to our Train data and the same Fit properties used for the train set will be used for the test dataset and transform the values. so that we can test our model how good it is predicting.

So while deploying our model we should make sure of our unseen Input data is scaled or converted to given values.

Search This Blog

Data Science Thoughts

Normalization and Standardization (Train Test Split)

Comments

Post a Comment

Popular posts from this blog

Transformers: Self-attention

Retrieval Augmented Generation(RAG)

Large Language Models(LLMs)