Preventing Runaway Data Trains

You’ve no doubt heard about data breaches and data security, but in my long and illustrious career, the biggest problem has been data management – which does encompass data security, data quality, data “efficacy” and basically everything that has to do with data.

Before (inexpensive and easy to use) big data engines came along – in the years following 2010 and the introduction of Hadoop, HBase and Cassandra (which were the forerunners to today’s Redshift, Big Query, etc), data was “small data” and even if you made a mistake managing data – or you forgot to manage it from the start and had to retroactively manage that data – no big deal – we were talking Terabytes. In fact, with SQL RDBMS’s, you had their Systems Tables, a very basic form of a data Dictionary.

Today we have Petabytes of data – and if its not managed from the start – companies can get into very serious trouble. In fact, Big Data Cloud providers AWS and Google have no problem if you park your companies multiple Petabytes of data in their cloud – guess why? Try migrating that data elsewhere – you won’t! Worse yet, some of these Big Data engines have no Systems Tables, which means you have no real idea who is doing what with data.

I’ve spent the last 2 years taming this beast and becoming an expert in managing such data. Turns out – if you do few simple things right from the start – you can prevent many problems that could be irreparable train wrecks later.

Build a Data Dictionary, like what I did at Credit Karma:

https://engineering.creditkarma.com/credit-karma-data-explorer/

That is the equivalent to exercising to prevent a heart attack. To really prevent a heart attack, add in an Enterprise Data Management program, commonly known as Data Governance.

The only issue is whether or not a business is at the level of “maturity” to do so. Most businesses are so busy being busy and trying to make money and get on the map that this data management is not even on the radar. Lucky me – I’ve made good money helping business sort this out.

But I dream of being at a company where they take this so seriously and that they catch this right from the git-go. Its a strategic way of thinking about data, and like I said – with some very simple things which do not cost much time or money – you can set the right course for your data train.

Damn – I really would be a GREAT Chief Data Officer (CDO). I just wished more companies had such a position. In time – they will – you can take that to the bank.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: