From the course: R for Data Science: Analysis and Visualization

R in context

- When it comes to working with data, you're confronted with a potentially overwhelming range of choices that are all competing for your attention. Now, the first and most obvious choice for working with data is a spreadsheet application like Microsoft Excel or Google Sheets. Although there are other choices, I like to think of spreadsheets as the "universal data container" because they're everywhere. Everybody uses them. Spreadsheets are great because they let you organize your data however you want. They can sort, they can filter the data, they can count and summarize, and they can quickly make basic graphs. Truthfully, spreadsheets are probably sufficient for the majority of real world data tasks that don't involve creating statistical models for your data. But when it's time to move beyond summaries and basic graphs and start making those statistical models, then you'll need something more specialized like a statistical application. Some of the most common statistical applications are SPSS, SAS and my personal favorite, the open source application Jamovi, all of which give user-friendly point and click interfaces for data exploration and modeling. But you may have data that doesn't fit nicely into the rows and columns that standards statistical applications expect. Or you may have questions that go beyond what dropdown menus are able to do. In that case, you'll need to take the final step to a data-oriented programming language, which gives you the ultimate and control and power in analyzing data. Now, some of the most common and interesting choices for data-oriented programming are Python, which is a powerful general purpose programming language that has been well adapted for working with data and are a language that is developed specifically for working with data and of course, the subject of this course. Now, I want to show you some data on the relative popularity and use of different languages. Some of this data is very fresh, some of it is several years old, but all of it exists to put R into the context of methods for working with data. Let's start by looking at the TIOBE index for February of 2025. And this is a very common index on the popularity of software worldwide. TIOBE, by the way, stands for the importance of Being Earnest, Oscar Wilde's play. But what you see from this is that the top three are general programming, Python and C++ and Java, and there's nothing specific to data science. That's because people use them for all sorts of things like building applications, like building in corporate software. Those are very common choices for those things. You have to skip down a bit to get to the data specific languages. You see here that R is at number 15 where it counts for 1% of all software programming, and MATLAB is right below at number 16, also at about 1%. So in the total, everybody doing anything programming with computers, R's kind of low on there, but it is one of the top ones that is specific to working with data. Now, in terms of data science, one of the best sources has historically been KDnuggets, which has to do with data mining. Unfortunately, they haven't updated their data since 2019, so this is a little out of date, but the general pattern has been consistent. And what you see here is that Python is on the top with 66% of people reporting using Python in their day-to-day data mining and data science work, followed by RapidMiner. And then the third one is R. So this is, again, the first specific data programming language. I want to mention one thing, it's just one step above Excel, the spreadsheet, at 35%. And so that is also a very common tool for use in data science projects. Some data that's not as old but hasn't been updated since 2022 has to do with data science jobs posted on Indeed. And what you see here is that when it comes to a job that specifically mentions data science or a closely related field, and they mention the expertise that they're looking for, well SQL, the structured query language we're working with databases in Python at the top, R is on the list, it's a little bit down low, which it would make you think that it's not really super in demand there. On the other hand, if you take a look at a different data set for the same year in terms of statistical jobs, statistician jobs on Indeed, you see that R comes in a very close second to SAS, which are two very specific statistical languages. And so there is a very high demand for people who are doing specifically statistical work. And then the third piece of data also from 2022 has to do with scholarly articles published. And what you see here is that SPSS, a menu-driven application, is the most commonly mentioned in scholarly articles. Over 100,000 have mentioned it, but R is second. And that lets you know that there, again, is a practical demand for the use of R and the flexibility it gives you in working with data. Now, I want to finish by saying just a little bit about one of the most frequently occurring, if not particularly most important debates in data science. Should you learn Python or R? Well, I can give you some generalizations. I mean, Python is especially strong in machine learning and database app development. So if you're building a machine learning project or you're going to make a web application, or you're going to be serving a mobile app, Python might be something you want to look at, that's a possibility. On the other hand, it can be nice to think about it this way. If you're going to try to promote your business online, you wouldn't be asking, should we advertise on Facebook or on Instagram, which are respectively the first and second most popular platforms for social media marketing? The answer is both and several others at the same time. Don't restrict your options, hit all of 'em. Or in other words, be a professional polyglot. If you're going to be a professional data scientist, if you're going to work in the field, you're going to be expected to work in many different languages, including R and Python. So there's a huge advantage to learning R. It's a great way to work with data. It has a great community, and it'll get you started on your data science path.

Contents