From the course: Big Data in the Age of AI

Unlock the full course today

Join today to access over 24,500 courses taught by industry experts.

Challenges with data preparation

Challenges with data preparation

From the course: Big Data in the Age of AI

Challenges with data preparation

- [Instructor] Salmon can be farmed or they can be caught wild. But either way, it takes a fair amount of work before they're turned into this. Everybody knows that food prep is an important, although time consuming and frequently tedious, part of cooking, there's a similar principle in any big data project. The rule of thumb is that about 80% of the time on a big data project is spent preparing the data. And that's been my own experience. Now, there are several reasons why this may be the case. It includes things like, how is the data entered? If you're using wild caught data, meaning data that you found out there in the world and that maybe was entered with free text, you have to look at things like place names. Here are four different ways of indicating California. You can write it out, you can use various abbreviations. And the inclusion of a period, at least by default, marks it as a separate answer than the one without a period. Or when people are putting in dates, here are four…

Contents