You can’t tell a great story without defining your characters. So why do so many organisations think they can derive value from their data science initiatives without investing in data management?
The report by Anaconda on – The State of Data Science 2020 – confirms the widely known fact, that Data Scientists spend almost half of their time not solving business problems but by cleansing and loading data.
This surely begs the question: why not invest at least half of what you invest in data science, on data management?
#1 Improving data quality
#2 Defining a master data strategy
#3 Implementing data governance
Let’s dive into each of these:
Ideally, I’d quote a statistic which’d tell you just how many billions of dollars are being lost due to poor quality data.
But I don’t think that’s necessary, as the importance of having good data is well understood across the data community.
The report above mentions how a significant amount of time is lost by preparing, cleansing and organising the data.
But what if this was in the fabric of your big data platform? The ongoing maintenance and monitoring of data quality would hugely improve the initiatives when using the data downstream.
So, what can you do about it?
So, each of the data science initiatives are organising, loading, and using their own data in total silos.
To avoid this horror, define a master data strategy.
In simple terms, depending on the business outcome, your scientists should be able to choose from the relevant mastered entities. Such as customer-related models, should utilise customer master entity.
The basics of your strategy should be able to answer the following questions:
Your master data strategy will not be successful without governing the data. This is, of course not the “maintenance/back up” of data. That is for IT teams to deal with.
Governance is often confused with added bureaucracy and red tape. However, additional scrutiny is required to ensure the usage of the data is in line with regulations and corporate ethics.
This also helps the scientists rely on the right people who know and can interpret the data to aid the business outcome.
We are, of course talking about setting up policies, procedures and a framework that define the following:
Improving the quality of the data, governing it, and mastering it; are the three basic areas where you can invest and find a high return of investment (ROI) for your data science initiatives.
The goal is to ensure the data scientists are spending more time productionising models with actionable business outcome and less time doing data management.
Do you agree with what I’ve said above, what are your thoughts? Feel free to reach out to me via my email [email protected], if you have some feedback or if you just want to say hello!
If you’re still reading this, I hope you’ve found some value in this blog post.
If you’d like to be kept informed of more content like this, subscribe to my newsletter.
Also, check out my other blog on: Why is everyone obsessed with data science?