From the course: Learning Data Analytics Part 2: Extending and Applying Core Knowledge
Using column profile to learn the data
From the course: Learning Data Analytics Part 2: Extending and Applying Core Knowledge
Using column profile to learn the data
- [Instructor] We are all looking for ways to save time, and learning your data is one area where you can save time if you leverage the tools that are available to you in Power Query, either through Excel or Power BI you will find a few options on the view tab that are worth exploring, and they'll help you learn your data. Let's go ahead and connect our wage data to Power Query I'll go to data. I'll choose table arrange, I'll go ahead and choose okay. One of the very first things I want to point out to you is on the bottom left hand side, we already know that we have 22 columns and 343 rows. And then I want you to know that column profile is based on the top 1000 rows. If I want to look at the entire dataset for profiling I can simply choose that option by clicking column profiling and choosing the entire dataset. Since this entire data set is under 1000 rows, I can leave it at the default. Now you've probably noticed that each one of the columns, when you bring it into Power Query actually has a green bar. This is actually showing you the column quality and it requires you to hover over each one. Okay, so let's go ahead and close our query settings, 'cause we're not going to make any transformations and go to the view Tab. If I want to look at the column quality in one sweep meaning I don't want to hover over each one I can simply check the column quality. Immediately, I see in the question for what is your current age? I have an empty value, meaning someone did not fill out their age when they completed the survey. Okay, I'll go ahead and uncheck column quality. Okay, I also want to take a look at column profile. I love this feature. I'll choose column profile. And one of the very first things I'm going to notice down below, is that I have the count which is 343, also equivalent to my record count, and then I have some basic stats. I also say the value distribution. Now it's not as meaningful on a column like respond in a data, that's a unique identifier for each record. But when I go to, what is your current age? It gets a little bit more meaningful. Here I can see immediately that 25 to 34 year olds filled this survey out more than any other age range. If I want to see the degree status I can click what is your highest level of education, and immediately I can see people who have a bachelor's degree completed this survey more than any other type of degrade status. As a matter of fact, I can really take a look and see that people with some college or some form of degree really are the majority of the people who completed this survey. Okay, I'll go ahead and uncheck column profile. Let me show you a little trick. All right, so I'm going to do my column profile. I'm going to choose my three little dots and then I'll choose my copy by option. I'll go to my home, I'll choose close and load. Now what this does, is actually copies that information for that particular question. And then I can paste those values onto this sheet. Now, this doesn't update live with my data. However, it does give me the current counts based on my survey response. So if I'm doing quick and dirty analytics, this is a really easy way to get counts without having to perform any other actions to provide the same values. If you're new to data, it might be hard to appreciate the value of these tools giving you options like column profile. But if you've been at it for a while, you can immediately realize the time savings when using these options. And especially when you get asked questions for those quick and dirty analytics.