From the course: Machine Learning Foundations: Statistics
Selection with replacement - Python Tutorial
From the course: Machine Learning Foundations: Statistics
Selection with replacement
- [Instructor] You are working on the research project before the upcoming elections. Obviously, you have to collect data about the residents. So you compose a survey with a question such as, what is your household income? What is the highest educational level you have attained? Do you support a certain law or candidate? We are interested in finding the answers to these questions about the population, but when designing an experiment we don't have access to the whole population, so we will gather data on a sample of the population which is representative of the entire population. Sampling is the process of selecting a dataset from a vast collection of data to calculate the specific characteristics of the entire dataset. Statisticians collect the samples in two ways, selection with replacement and selection without the replacement. Let's explore the selection with replacement, or short, SWR. For example, suppose we have 10 objects and let's assign them numbers from zero to nine. In that case, we can randomly select one object, for example, an object with a number one, make a copy of it for a new data set, and then return the object one in the original dataset. Next, we can take the object with the number three and repeat the process. After that, we can take the object with the number six and so on. This way we can select the same object multiple times, so our dataset can have multiple copies of the same object. As we don't alter the original dataset, we can continue picking elements and the new dataset can be smaller than the original, the same size, or even bigger than the original. As we can see, selections are not affected by the previous choices. There is one 10th or 10% probability we'll pick object one, and when we return it back to the original dataset, we again have 10% probability of picking object one. So the probability of selecting any particular object remains the same in the future draws. The most important application of sampling with replacement in ML is bootstrapping, which we'll soon explore in that.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.