When you are doing data science, you are doing research. You want to use data to answer a question, identify a new pattern, improve a current product, or come up with a new product. The common factor underlying each of these tasks is that you want to use the data to answer a question that you haven’t answered before. The most effective process we have come up for getting those answers is the scientific research process. That is why the key word in data science is not data, it is science. No matter where you are doing data science - in academia, in a non-profit, or in a company - you are doing research. The data is the substrate you use to get the answers you care about. The first step most people take when using data is to collect the data and store it. This is a data engineering problem and is a necessary first step before you can do data science. But the state and quality of the data you have can make a huge amount of difference in how fast and accurately you can get answers. If the...