Data Profiling - Cross-Database Validation

With a collection of quick and simple checks, Datadatasets) which need to be cross-checked with
Profiling provides you with a much betterthe target database.
understanding of your data. You can quickly findTo cope with all this you will want to perform a
issues before engaging on any data project;number of cross-database checks. In effect you'll
issues which will cost you much more to put rightbe data profiling several sources and comparing
later in the project life-cycle.their resulting profiles. Specifically, you should
In this article we're going to focus on perhaps oneconsider:
of the more advances aspects of Data Profiling;* Comparison of codes used in the various
cross-database checks and validation.systems. If not identical, is there an appropriate
Unfortunately, many tools do not supportmapping between the codes?
cross-database analysis and you will often need to* If there are many codes, perhaps Social
load all the relevant sources in to the sameSecurity Numbers, then compare their patterns
database or repository to perform such checks.formats.
But even given this extra step, cross-database* If entities are expected in more than one
validation is a very worthwhile exercise, and willsystem, then you can check keys in both
payback handsomely on any data initiative:systems to check for duplicate or missing entries.
* Data integration projects will by their veryAnd of course, if you're expecting the data in the
nature require the analysis and comparison ofsystems to be unique, you should still check for,
multiple data sources.and investigate, any duplicates.
* On any data migration project you will want toCross-database validation is not trivial, but it's not
validate both the source and loaded datasets.that hard either. The checks are easy to
* Even with a "single" database project you willunderstand and communicate and any issues
find that that are usually various authoritative datafound are generally significant. It is therefore
sources strewn across the business (often in thesomething which you should always undertake as
shape of Excel spreadsheets and personalpart of any Data Profiling exercise.