Data Science Thoughts

There are various pipelines for the machine learning use cases: Data collection, Feature engineering, Feature selection, Model creation, and Model deployment. So always remember before model creation what we do is whenever we have a dataset suppose I have 1000 records we usually perform a train test split saying that Training set has 70% of data and Test set have 30% or 80% train set and 20% test set depends on the count of the dataset. so our model will use 70% of the data to only train the model itself and the remaining 30% we will use to check the accuracy so when we do train test split 70% of data will randomly select and 30% also randomly select so when this kind of random selection happens so the type of data present in the test may not present in train set due to this our model accuracy go down. so whenever we use train test split we usually use random state, It will randomly select the data point so when we take random state=0 then it will shuffle our data and provide accu...

Search This Blog

Data Science Thoughts

Posts

Machine Learning Cross Validation