| With a collection of quick and simple checks, Data | | | | datasets) which need to be cross-checked with |
| Profiling provides you with a much better | | | | the target database. |
| understanding of your data. You can quickly find | | | | To cope with all this you will want to perform a |
| issues before engaging on any data project; | | | | number of cross-database checks. In effect you'll |
| issues which will cost you much more to put right | | | | be data profiling several sources and comparing |
| later in the project life-cycle. | | | | their resulting profiles. Specifically, you should |
| In this article we're going to focus on perhaps one | | | | consider: |
| of the more advances aspects of Data Profiling; | | | | * Comparison of codes used in the various |
| cross-database checks and validation. | | | | systems. If not identical, is there an appropriate |
| Unfortunately, many tools do not support | | | | mapping between the codes? |
| cross-database analysis and you will often need to | | | | * If there are many codes, perhaps Social |
| load all the relevant sources in to the same | | | | Security Numbers, then compare their patterns |
| database or repository to perform such checks. | | | | formats. |
| But even given this extra step, cross-database | | | | * If entities are expected in more than one |
| validation is a very worthwhile exercise, and will | | | | system, then you can check keys in both |
| payback handsomely on any data initiative: | | | | systems to check for duplicate or missing entries. |
| * Data integration projects will by their very | | | | And of course, if you're expecting the data in the |
| nature require the analysis and comparison of | | | | systems to be unique, you should still check for, |
| multiple data sources. | | | | and investigate, any duplicates. |
| * On any data migration project you will want to | | | | Cross-database validation is not trivial, but it's not |
| validate both the source and loaded datasets. | | | | that hard either. The checks are easy to |
| * Even with a "single" database project you will | | | | understand and communicate and any issues |
| find that that are usually various authoritative data | | | | found are generally significant. It is therefore |
| sources strewn across the business (often in the | | | | something which you should always undertake as |
| shape of Excel spreadsheets and personal | | | | part of any Data Profiling exercise. |