Who’s Afraid of the Big Bad Data?
Bad data is everywhere. It’s like a virus in our organisations, which prospers unless we deal with it effectively. It can crops up all over the enterprise and cause little irritations which add up to enormous hidden costs. As we move into an era of Big Data, it is even more important to ensure that the data we collect is clean, so we don’t enter an era of Big Bad Data.
But here’s the thing: Bad data is both fixable and avoidable. It’s not like many other business challenges for which our tools are incomplete or which are caused by external factors out of our control. I’ll go further: there’s no excuse for bad data in organisations any more, and IT Directors should take responsibility for ensuring data quality. If your processes are not in place to manage data quality, then you’re not ready for small data, never mind about big data!
In this blog post, I identify 5 causes of bad data, and discuss how they can be rectified and avoided without a budget of $millions.
Have you ever encountered the scenario where you have to scroll through lots of records to find the one you are looking for, in the knowledge that many of those records flying by are not used or needed any longer? Perhaps you are participating in a transactional process or running a report and need to select that customer, material or GL account. And you know the record you’re looking for is on the second or third page down. You know that you haven’t used many of the entries on the list for years.
Never mind about data quality issues, you have a data inequality problem, where lots of data are obsolete and should be removed from the system.
I hardly need to say that data archiving solves all this. 20 years ago many SAP projects went live with a note that archiving would be tackled down the line, but these days all this should be in place, with regular archiving runs in place to remove the obsolete data.
 Out-dated Data
I make a careful distinction between obsolete data and out-date data. ‘Obsolete data’ I define as the data that you don’t want in your core systems and should be archived. ‘Out-dated data’ I define as data that has changed in real life but not on the system. You work with the data records, but you know that are out-of-date.
Has your enterprise ever continued to pay an employee after they have left employment? Or sent an account statement to the wrong customer address? Or send information by e-mails that bounce? Do you run ESS but know that employees consume data without updating it? I met one customer who didn’t actually know how many employees they had because they had a 3-month backlog of processing joiner and leaver forms.
These are all symptoms of out-dated data: data that describes how life used to be, rather than what it’s like today.
There can be a one-off or scheduled clean-up activity for this data, but really the solution lies in capturing changes as they occur. This can involve:
i) Automatically triggering a form to capture data changes based on events. For example, triggering a form to capture employee changes after each performance assessment meeting. For example, sending a link for customers to check their own details when an order is placed after a period of inactivity.
ii) Automatically triggering a form to capture data changes based on a schedule. For example, sending a link for employees to check/confirm their own data each year.
iii) Enabling the easy manual creation of a form to trigger master data updates when a user notices that something is incorrect.
 Erroneous & Missing Data
Do you ever hear people say that they don’t trust the system? Or see users who keep their own files in Excel to check what the system suggests. Do you see purchase orders where the ‘long text’ fields are used to describe the material or service to be purchased in great detail, because the description of the material master isn’t really what is wanted?
Have you seen a world in which the people responsible for the creation of master data are not the same people who work with that master data, and there’s a gap between how master data is expected to be used and how it is used in practice? Perhaps the data creation task is outsourced, and you are now finding a hidden cost of outsourcing?
There are many examples where the data is just plain wrong. Users struggle with the system and try to find the closest match. Sometimes the record is missing completely and users don’t request the new data record, but re-use an old one instead.
This type of bad data differs from out-dated data, because at least the out-dated data was right once. In this type of scenario we are dealing with data that was never correct because it wasn’t captured correctly in the first place.
The solution relies on better data capture. That means using integrated data capture e-forms, and throwing away Word or Excel solutions. That means running validation rules at the point of data capture. That means not only checking that mandatory fields are captured, but that they are filled with something meaningful. It requires a mixture of automatic and manual data checks before the record is saved to the SAP system.
 Inconsistent Data
Do you get strange results when running reports, only to later realise that customer or material master records have been inconsistently defined? Do you use many different grouping fields on master records and then rely on them for reporting? Do you see employees struggle because they don’t know what fields to use, or because they don’t know why they always have to fill particular fields?
Or do you see times when employees cannot find the right material to use because the descriptions don’t make sense to them? (Of course this may lead to them requesting the creation of a new material record unnecessarily.)
Organisations’ use of master data changes over time, and fields may have been crucial for historical data upload, or EDI/ALE, or transfer to BW, or reporting through LIS or COPA. However those same grouping fields may no longer have any relevance or any understood relevance. You end up with inconsistent use of master data records, which can cause great user confusion. And this isn’t master data management, it’s master data mismanagement.
You need to be clear and consistent in the use of fields such as grouping terms, search terms, description fields and product hierarchies, and of course for every other field.
With an e-form for automatic record creation, rules can be added for automatic generation of description fields. Many fields can be hidden and filled automatically and consistently. Validation checks can include checking the categorisation of similar records in order to throw out exceptions and suggestions to make the user’s task easier and deliver a bullet-proof master data request process.
I’ve separated out duplication as a separate cause of bad data as it’s a particular bug bear. When employees see a list of vendors and do not know which one to use, the process is broken before it is started. When the task of determining what record to use to difficult, employees may avoid this and simply ask for a new record to be created. When the appropriate checks and balances are not in place, the creation of new records can spiral out of control. And as soon as a customer, material, vendor or GL account has some transaction history it becomes troublesome to remove.
During master data request processing, a duplicate record check is essential. This might involve quite complex algorithms, to identify a list of suspected duplicates, in order to stop the process before it begins. Where a process continues, request forms can be dynamically routed for a manual check of candidate duplicates before the form is accepted and the update made to SAP.
Fixable and Avoidable
I make no apology that all this seems obvious. But I repeat, all these sources of bad data are both fixable and avoidable with modest investment. Bad data is now a choice for organisations. They can keep it at their own risk, and bad data isn’t free.
Of course there are many specialist tools and services available, but simple, integrated e-forms, together with robust business logic for validation, duplication checking, consistency checking, routing and SAP updating, can deliver the solution to all the above. You can even use an e-form to request/approve a record to be archived.
So next time you see any issues resulting from bad data, note that someone has chosen to keep that issue, as it is all perfectly solvable.