3— Performance, Validation and Model Interpretation
Today we’ll see how to read a much larger dataset - one which may not even fit in the RAM on your machine! And we’ll also learn how to create a random forest for that dataset. We also discuss the software engineering concept of “profiling”, to learn how to speed up our code if it’s not fast enough - especially useful for these big datasets.
Next, we do a deeper dive in to validation sets, and discuss what makes a good validation set, and we use that discussion to pick a validation set for this new data.
In the second half of this lesson, we look at “model interpretation” - the critically important skill of using your model to better understand your data. Today’s focus for interpretation is the “feature importance plot”, which is perhaps the most useful model interpretation technique.