Data analysis

The last part of the course is about analyzing entire data sets. This is what we've been building up to.

Data analysis is about answering useful questions, from data about companies, products, students, or whatever. For example:

  • Which companies have the highest returns? Are returns stable over time? Are there trends?
  • Which products are selling the best in which regions? Are there differences between the regions?
  • Does taking an extra math basics course improve performance in other math courses? Or does it make no difference?

Answers to these questions change what organizations do. For example, if an extra math basics course doesn't improve performance in other math courses, there's no point in requiring the course.

Some people do data analysis full-time. They're job title is often data analyst, not surprisingly. However, data analysis skills are useful no matter what area of business you're in.

For example:

  • If you're in data security, you can analyze data from security incident reports. What are the most common reports about? Do they tend to come from the same places?
  • If you're in tech support, you can analyze data about support requests. Do some parts of the company consume more tech support resources than others? If so, maybe training needs to be improved there.
  • If you're in software project management, where are software change requests coming from? How many requests are there for different parts of the software portfolio? Does the data suggest how you might improve your development processes?

Having basic data analysis skills will make you more valuable.

You may be taking a statistics course, covering things like t tests, and regression. Before you can use those techniques, you need data. It has to be in a from that's ready for your stats software (which might be Excel) to process.

This course is about data handling, rather than statistical inference. For example, you might have a CSV file with 50,000 records. If some of the data is invalid, like having string data where there should be numbers, your statistical software might break.

How common is invalid data in the real world? Very common, especially if the data comes directly from people. You'll learn how to write a program that will input data, filter it, and only analyze valid records.

We will be computing some statistics, but only basic things, like means, counts, and ranges.


OK, so if you can do both things, filter data using what you learn in this course, and analyze it using what you learn in stats...

You can get...


Let's get started, by talking about data tables. They're central to this module.