| -link"> | | | | postcode + suburb + state that you may have |
| In this article I will attempt to set out simply what | | | | obtained from Australia Post. Such a process |
| it means to clean a database, why you need to | | | | would likely generate a list of records where the |
| do it, and how you might go about it. Firstly the | | | | suburb was not found, requiring you to manually |
| "What" | | | | investigate and correct the data. |
| Cleaning a database is done to: | | | | Correcting the formatting of your data, is usually |
| - Remove duplicate records | | | | done using some pretty simple SQL perhaps |
| - Ensure your data is consistently formatted | | | | combined with logic programming. You need to |
| - Correct data that is obviously wrong e.g. wrong | | | | decide the format you wish to apply to your |
| postcode for a known suburb | | | | data, for example, whether you would like the |
| - Find other records that are likely to be the | | | | suburb in title case or all capitals. While this is much |
| same (more on this later)So "Why" would you | | | | less important than getting the data actually right, |
| want to do that? | | | | it can help to make your communications look |
| To explain why, I am going to use the example | | | | more professional. |
| of a customer database, but the principles apply | | | | Finding duplicates is a fairly easy task for |
| to other types of data also. | | | | someone who knows a little about the SQL |
| Have you ever received a marketing message / | | | | database language. It is more difficult to find |
| catalogue in the mail twice or more times? I | | | | similar records that really are the same person, |
| receive multiple copies of such communications | | | | but are not listed in exactly the same way in your |
| regularly, and I don't always get around to telling | | | | database. For instance the following two records |
| the sender of their mistake. This can: | | | | may actually be the same |
| - be interpretted as sloppiness on the part of the | | | | rbPostcodeState3442JohnCitizenPO Box |
| organisation | | | | 33Frankston3199VIC682JonathonCitien14 Beach |
| - undo your efforts to target / personalise - any | | | | RoadFRANKSTON3199VIC |
| attempt on the organisation's part to "personalise" | | | | Finding records such as the above calls for what |
| and "target" the message is wasted, because the | | | | is usually called "Fuzzy" Matching. Software is |
| recipient knows immediately that it was a | | | | available to find such records, and much more |
| mindless distribution of information using a | | | | experienced SQL programmers could write |
| database. | | | | software to find such possible duplicates. |
| - waste $$$! Everytime you send a | | | | Because you can't confidently use logic to |
| communication twice to the one person or | | | | determine whether or not two records are the |
| household, you have most likely just wasted | | | | same in the case given above, usually fuzzy |
| some of your hard-earned funds. | | | | matching would leave the data as is, but produce |
| In addition, cleaning your data, will help you to | | | | an exception report, highlighting likely duplicate |
| analyse your data more accurately. For instance, | | | | records. |
| you will know the real number of contacts and | | | | Even when you can determine confidently that |
| perhaps how they are geographically distributed, | | | | two records are the same, you may wish to |
| rather than the distorted figures that can be | | | | manually process the data cleanup to ensure that |
| derived from analysing a corrupted database. | | | | only the correct data is kept, and that all |
| It's not a crime! In fact it is very easy for your | | | | associated pieces of information are transferred |
| data to get in a state that requires cleaning. For | | | | across to the valid record e.g. customer payment |
| example, when a client changes their address, | | | | history. It is possible however, to set up your |
| your staff might update the suburb but forget to | | | | de-duplication process to remove all the duplicates |
| put in the new postcode. Or, an existing client | | | | and clean up all the records automatically. |
| returns to your organisation several years later, | | | | Cleaning your database can take some time, and |
| without informing new staff that they are an | | | | some manual effort on the part of your staff. If |
| existing client, and if you don't have the | | | | you are just starting out with a new database, it |
| appropriate keys on your database preventing | | | | is very worthwhile to: |
| duplicates, the client could be set up again as | | | | 1. Agree and document the data structure, and |
| another customer with the same or similar details. | | | | what information will be stored in what field (which |
| Having documented processes that your staff | | | | isn't always obvious despite the names you might |
| can use as a checklist, and appropriate unique | | | | give fields) |
| keys on your database fields, will go some way | | | | 2. Agree the format of the data entered into |
| to ensuring that your data is kept clean, but | | | | each field |
| incorrect data will never be prevented."How" then, | | | | 3. Agree a process to handle the case where a |
| do you efficiently clean your database? | | | | record needs to be entered that won't fit into the |
| Fixing incorrect information such as the postcode | | | | current structure |
| matching the suburb is usually done by comparing | | | | If you need help cleaning your database, Contact |
| each record to the correct values in another table. | | | | Point can help you. We provide a quick and |
| For example, to correct all the postcodes in your | | | | efficient service to deal with all the database |
| data, assuming that the suburb entered is correct, | | | | issues discussed above, and can tailor our service |
| you would write SQL code that would compare | | | | to meet your particular needs. Submit a request |
| the postcode of your record against a table of | | | | now for an obligation free quote. |