From the course: Machine Learning and AI Foundations: Prediction, Causation, and Statistical Inference

Unlock the full course today

Join today to access over 24,400 courses taught by industry experts.

Skepticism about results: Is that really the best predictor?

Skepticism about results: Is that really the best predictor?

- [Instructor] So you've been careful and you believe your data is trustworthy and not biased. What else can go wrong? Well, few things are as fundamental to an analysis than what your top variable in the model is. Imagine that you're a researcher helping a university look into whether SAT scores are predictive of graduating on time. You run your decision tree model with 50% of the data and the training partition and you discover that the verbal score on the test is the number one variable, but the math score is rather far down the list. So you may some notes. A bit later, you refer to your model and you realize that the results are slightly different. Apparently, your training partition has been randomized again. Oh my, well, that should be harmless enough. You look more carefully and you realize that the ranking of the variables has changed. Now it's the math score that's in top place and the only difference is…

Contents