Data Science Thoughts

Posts

Showing posts from September, 2021

Normalization and Standardization Use Case

September 14, 2021

Case study: We Have a used cars dataset from the website. This dataset contains information about used cars. This data can be used for a lot of purposes such as price prediction to exemplify the use of linear regression in Machine Learning. The columns in the given dataset are as follows: name, year, selling_price, km_driven, fuel, distance, seller_type, transmission, Owner For used motorcycle datasets please go to https://www.kaggle.com/nehalbirla/motorcycle-dataset Here using the above features we should predict the selling price of cars. so feature km_driven and distance are in different scaling if we load these features into a model then prediction may go wrong due to the wrong interpretation of slops. To overcome these we will scale down these features into normal values between 0 to 1. from sklearn.preprocessing import MinMaxScaler Minscaler = MinMaxScaler() scaler = Minscaler.fit( 'distance', 'km_driven' ) scaler.data_min_ scaler...

Normalization and Standardization

September 12, 2021

Suppose if you have any use case, so the most important thing for the use case is data. Initially, you will be collecting the data so if you have collected the data that data have many features so those features may contain independent feature and dependent feature so with the help of the independent we will try to predict dependent feature in supervised machine learning. so when you consider these features this has 2 important properties. 1. Unit 2. Magnitude let's have features like personage, height, weight, etc. so if I consider the feature age the unit basically no of years and the magnitude is basically value. For Ex: Suppose if I say 25years then 25 is magnitude and years is unit. Each feature is calculated with unit and magnitude so if you have many features so it will get computed with different units. so this unit and magnitude vary between different features. so it is very necessary that for the machine learning algorithm the data we provide that we should try t...

Normalization and Standardization (Train Test Split)

September 04, 2021

1. When you should use Standard normalization and MinMaxscaler? In most scenarios whenever we use a machine learning algorithm that involves Euclidean distance and gradient descent basically it means parabola curve where you find the best minimal point in order to retrieve that point we need to scale down the values. most of the algorithms we use normalization. 2. Should we need to perform Normalization and Standardization before Train Test Split or After Train Test Split of the dataset. Firstly we divide our complete Dataset into Train and Test Datasets. Train Data is used to train our model, Test Data will be given to our model to test model accuracy before passing unseen data. If we perform Normalization and standardization to our entire dataset before Train Test Split, we will face issues with interpreting the available slops because slops are calculated on given different units or scaling so it's wrong interpreting the slops of o...