Years ago I ran a Data Warehouse for a multi-brand grocery chain on the east coast. Within weeks I learned the truth about most retail data warehouses: plenty of data, but very little that was reliable. As IT teams scrambled to upload data from source operational systems and finance created metric upon metric, it became clear that we were missing 5 critical measures around data quality:
Validity – does the metric actually mean what you think it means? Especially important when summing up a derived metric across hierarchies. (Think of adding an in-stock metric for Pepsi across 5 stores to learn you are 475% in-stock. Or, perhaps worse, averaging it and learning you are 80% in stock, when in fact you are 100% in stock in 4 stores and completely out of stock in one.) It aint pretty.
Completeness – Holes in the data because a store did not poll or a change in the product hierarchy erased history for a series of time periods leads to an awful lot of fire drills that expend precious resources and don’t deliver any value to customers.
Timeliness – Not only a measure of the recency of the data, but also of the ability to see across data in the same time period. Frequently executives take it as a matter of course that they cannot see sales, inventory and HR data for last week on the same report. Sales are current through last night, inventory is through last Saturday night and HR is as of the close of the last pay period. Many weeks that means the most recent cross-functional report is actually looking at three different weeks of data.
Accuracy – straight up errors in the data. The bubble gum that sold for $25.00, the person who never clocked out and appears to have worked for 36 straight hours, the coupon for dog food for $40.00 and the backwards Yuen to USDollar exchange that appears to have siphoned off $25billion instead of $25,000 in open to buy.
Consistency – Ranges of data and omissions that are standard and expected. Especially important when looking across business units with differing fiscal months. Normalizing data is a scrubbing process that can either introduce more consistency or unplanned errors.
Data Quality may be one of the least sexy things anyone can champion in a retail organization. But when Data Quality is done correctly, the benefits are reduced costs spread across the rest of the organization. Data Quality costs creep into every IT project as increases in ETL, Data Modeling and Source System upgrade expenses. Data Quality issues are embedded as “cost of doing business” on the business side as lack of visibility to near real-time issues that could be addressed if the data was recent and reliable. Think of sales that could be captured if retailers could see real promotional in-stocks, transportation schedules, competitor price moves and more using accurate clean data. Think of the wasted hours in drilling into issues that turn out to be data anomalies that happen every single day in every single retailer in America. Think data quality is just an expense? For retailers without severe Data Quality issues, their SG&A is lower and they are more agile competitors in the market.
And THAT is not an expense: it’s an asset.