| Introduction | | | | After extraction, the data is transformed, or |
| ETL stands for Extract, Transform and Load, the | | | | modified, depending on the specific business logic |
| processes that enables the move of data from | | | | involved so that it can be sent to the target data |
| multiple sources, reformat and cleanse it, do | | | | store. |
| whatever data conversions are necessary, and | | | | There are a variety of ways to perform the |
| load it into another file, database, a data mart or | | | | transformation, and the work involved varies. The |
| a data warehouse for analysis, or onto another | | | | data may require reformatting only, but most |
| system, for instance a CMS environment like | | | | ETL operations also involve cleansing the data to |
| Drupal or Mambo/Joomla. | | | | remove duplicates and enforce consistency. |
| We all know that there are valuable data lying | | | | Transformation |
| around throughout our systems that would be | | | | In addition, the ETL process could involve |
| very useful if it could be reused in another | | | | transforming from a fixed-record format to a |
| program, or we'd like to upgrade our system to a | | | | variable one, or vice versa, standardizing name |
| different software package. | | | | and address fields, verifying telephone numbers or |
| The problem often is that the data lies in formats | | | | expanding records with additional fields containing |
| that cannot be readily used by other applications. | | | | demographic information or data from other |
| Solutions | | | | systems. |
| To solve the problem, you can use extract, | | | | The transformation occurs when the data from |
| transform and load (ETL) software, which includes | | | | each source is mapped, cleansed and reconciled |
| reading data from its source, cleaning it up and | | | | so it all can be tied together. |
| formatting it uniformly, and then writing it to a | | | | After reconciliation, the data is transported and |
| target format to be exploited. | | | | loaded into the data warehouse for analysis. |
| The data used in ETL processes can come from | | | | Online data transformation |
| any source: a flat file, a mainframe application, an | | | | There are many tools available that help in the |
| ERP application, a CRM tool, an Excel spreadsheet, | | | | ETL process. Most of them mandate an |
| an extraction program, anything really. | | | | investment in software that needs to be installed |
| Extracting the data | | | | on your computer. There are also online functions |
| Extraction can be done via a variety of methods. | | | | available here can be very useful if you cannot, or |
| Often, the environment or program in which the | | | | don't want to install any software on your |
| data is currently held will have an export function | | | | computer. You will still need to extract the |
| that can be used to get the data into a format | | | | information from your existing environment into a |
| that can be easily transformed and processed. | | | | file, but the transformation process can in many |
| There are also specialized tools available to take | | | | instances be done online. |
| data from a database environment. | | | | |