| Before you start any data integration or data | | | | be addressed in a cost-effective way. |
| analysis project (indeed before you start any | | | | Traditionally Data Profiling, if it has been done at |
| project involving data) you need to understand | | | | all, has been a manual, error-prone, process. But |
| what data you really have; not what data you | | | | there are now Data Profiling tools and processes |
| think you have. | | | | which do the heavy lifting, leaving you to leverage |
| Data Profiling is about building that understanding, | | | | your experience in your business and your |
| and validating everyone's assumptions about what | | | | requirements, focusing your time and energy |
| data you have and what uses it can be put to. | | | | where it really matters. |
| Many data projects start off with data which was | | | | But we still haven't truly said what Data Profiling is. |
| collected for one reason and is now being put to | | | | At its simplest, it is a collection of simple to |
| some new unanticipated use. Data Profiling is | | | | understand and generate statistics and checks |
| about finding gaps in your data which you may | | | | which you can perform against your data to find |
| need to augment. It's about finding what uses the | | | | issues, outliers, missing data, or anomalies; all |
| data will actually support. And most importantly of | | | | items that you need to address, or at least be |
| all, it's about flagging these issues up early in your | | | | aware of, as your project progresses. And while |
| project, before they become critical. | | | | it would be great to have a business expert |
| Any issue fixed in the analysis stage of a project | | | | sitting alongside you during this process, you can |
| is going to be hundreds of times cheaper to fix | | | | quickly find issues and create a meaningful list of |
| than one found during the testing or, worse, | | | | questions about any dataset with minimal prior |
| rollout phase of your project. | | | | knowledge. Of course to get the best out of your |
| As such Data Profiling is an essential first step for | | | | data you are going to need some knowledge of |
| any data project. Not just because it tells you | | | | how the data was collected, the business needs |
| what data you have, but because it is a quick | | | | and any implicit assumptions. |
| way to find out problems early while they can still | | | | |