Thread
We cannot use all data for model training, because that would cause overfitting.
We can of course select randomly, but there is a better option:
Cross Validation.
1/5
We can of course select randomly, but there is a better option:
Cross Validation.
1/5
The steps Cross Validation does:
1. Divides the data into groups.
2. Iterates through the groups.
- Tries group combinations as training data.
- Uses the other group as testing data.
Let's see an example!
2/5
1. Divides the data into groups.
2. Iterates through the groups.
- Tries group combinations as training data.
- Uses the other group as testing data.
Let's see an example!
2/5
The first iteration can be:
- Group 1 & 2 as training data
- Group 3 as testing data
Of course every iteration will result in a different model.
In this case we will have 3 models.
Each with a different testing set.
Why is it good?
3/5
- Group 1 & 2 as training data
- Group 3 as testing data
Of course every iteration will result in a different model.
In this case we will have 3 models.
Each with a different testing set.
Why is it good?
3/5
Different results mean that we can compare them.
Using different testing datasets, the prediction errors will differ for each model.
With Cross Validation you can select the best performing model.
4/5
Using different testing datasets, the prediction errors will differ for each model.
With Cross Validation you can select the best performing model.
4/5
That's it for today.
I hope you've found this thread helpful.
Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.
Thanks ๐
5/5
I hope you've found this thread helpful.
Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.
Thanks ๐
5/5