From the course: Machine Learning Foundations: Statistics
Defining statistics - Python Tutorial
From the course: Machine Learning Foundations: Statistics
Defining statistics
- [Presenter] Statistics are all around us: from computing averaging grades in subject to filling in a sensor data or population record with personal information such as date of birth, residence, occupation, and marital status, which is used in the population service or demographic statistics, etcetera. Or every year, your health insurance asks for your age, existing medical conditions, current health status, et cetera. Statistics is a branch of mathematics that deals with the study and manipulation of data including ways to gather, review, analyze, and draw conclusions from data. Probability and statistics go hand in hand like peanut butter and jelly. We use data and probability to estimate how likely an event is to happen. Statistics are at the center of many data-driven decisions and innovations. Machine learning relies heavily on statistics, searching for possible hypothesis to correlate relationships between different variables in data. Having a powerful foundation in statistics will help you understand underlying relationships and build ML models that are optimized. It is important to recognize two major branches of statistics: descriptive and inferential statistics. The most commonly used part of statistics is called descriptive statistics. We use it to summarize data. So instead of looking at thousands and thousands of records of data, we look at the statistical measures that describe the data called the mean, the median, the mode, etcetera. For example, instead of looking at thousands of salaries for a position of software developer, we look at the median annual salary. The second branch of statistics is inferential statistics. It is used to uncover attributes about the larger population, often based on a sample. It is used in cases where we cannot get the collection of the entire data so we collect a subset of the data points called the sample. For example, if we would like to observe people working in a software development, this population is too large, so we will take a sample and draw a conclusion about the entire population of people working in a software development. In many cases, that sample won't be representative.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.