What problems can data science solve?

Philippa Peasland
3 min readOct 9, 2017

I work in a data science team but I am not a data scientist. I’m not even an analyst by background. I don’t need to be able to write a machine learning algorithm to do my job, but I do need to understand the opportunities data science presents. So do a lot of other people. But how can we have a meaningful conversation with a data scientist when it feels like we are talking in different languages? In today’s society where big data is the buzzword and advanced analytics is seen as the key to business success I’ve been asking myself…

Where is the shared vocabulary between policy makers and data scientists, or more broadly, between analysts and everyone else?

I think the answer to that question is the problem space. If you look at the Drew Conway venn diagram a data scientist is a person who combines skills in programming, statistics and subject expertise. Well, those of us who aren’t analytical by nature won’t be able to talk about linear regression or functions, but we can talk about the subject area we work in. That is where the shared vocabulary is. And if we’re looking for opportunities, that means talking about problems.

By framing the conversation in terms of problems we start from common ground. Rather than launching into data science methodologies like natural language processing or machine learning (or even worse, their acronyms!), we should start by talking about the problem we’re trying to solve. Are we trying to reduce the time it takes to do something? Or predict when something is going to happen so we can prevent it? Are we trying to target resources more efficiently? Or to quickly understand what a big pile of data is telling us?

Here’s my starter for ten (or 7…) on problems data science can solve. Data science can…

  1. Help you identify themes in large volumes of text. For example, if you’ve run a public consultation and need a way to sort through lots of responses to find the common themes. This is a time when you might use natural language processing.
  2. Predict what will happen. For example, if you want to know if someone is more likely to forget a doctor’s appointment, so you can send them an extra reminder. This would usually be done using some kind of machine learning algorithm.
  3. Automatically categorise stuff. For example, if you have a very popular survey and you want to strip out comments that don’t require an action like ‘OK’ or ‘fine’. This is when you might choose a classification model.
  4. Spot something unusual. For example, if you have a service with lots of transactions and want to know which ones might be fraudulent. This is a problem anomaly detection might help with.
  5. Show you how things are connected to each other. For example, if you have a group of stakeholders and want to know which of them are the most influential. You could do some network analysis to figure this out.
  6. Understand what a very large quantity of data is telling you. For example, if you have a huge excel sheet of data and you want to see if there are any patterns or categories emerging. This is where even basic data visualisation could work wonders.
  7. Spot geographic patterns in services or data. For example, if you wanted to see where in the country average house prices are highest or lowest. For this you would most likely create a simple geospatial data visualisation.

Now I know the above examples are probably all kinds of wrong in a technical sense, but I hope they help beginners recognise where data science could be useful. So if the language isn’t exactly correct, is it not worth the ambiguity to start that conversation?

What do you think? Add your suggestions for other problems data science can solve below.

--

--

Philippa Peasland

Head of Product at Vypr, a Manchester HQd product insight SaaS scale-up. Product nerd. Principles over processes. Sensitivity over semantics.