| Decision support systems are usually based on | | | | process, often represents a major part of the |
| the development of Data Warehouse | | | | man effort to develop a DW (international |
| infrastructures. A data warehouse (hereafter DW) | | | | statistics estimate that it exceeds 70% of total |
| architecture has two major areas: the staging | | | | effort). The DW staging area, is often |
| area and the presentation area. In this article we | | | | implemented in a separate physical server (staging |
| present the staging area. The sources from which | | | | server), thus adding complexity and cost. |
| data shall be systematically extracted, in order to | | | | However, this approach has certain advantages |
| be loaded in the DW, are determined. The | | | | like: |
| database schema documentation of these | | | | - isolation of raw data which are extracted from |
| sources, is reviewed in order to design the data | | | | sources, from processed data which are |
| extraction logic. Documentation quality of the data | | | | accessible by business analysts |
| structures of these sources, influences the | | | | - additional security and process quality, given that |
| degree of difficulty in designing the data | | | | DW users have no access in this area |
| extraction logic. Data extracted are loaded in the | | | | - load sharing, given that 'data preparation' tasks |
| staging area, either as simple files or as updates in | | | | and DW querying tasks are handled by separate |
| database tables. The staging area may have | | | | systems. |
| various stages. Extraction of data from sources, | | | | - development of a central metadata repository |
| transformation of data into new structures and | | | | which maintains documentation for all involved |
| data loading in the DW, a process known as ETL, | | | | systems: operational systems (data sources), ETL |
| takes places in the staging area. | | | | process, data warehouse, BI tools and predefined |
| The extraction process requires the determination | | | | reports Various types of raw data processing, |
| of source relational tables - fields, from which data | | | | take place at the staging area: |
| shall be extracted (as mentioned above, | | | | - Data standardization: data transformation to a |
| documentation of these structures is crucial for | | | | standard format, if needed |
| design). The design of the extraction process | | | | - Sorting of records |
| determines: | | | | - Matching and merging records of the same |
| - the frequency of data extraction | | | | entity, which are derived from different sources |
| - the extraction method (e.g. changes only) and | | | | (e.g. order records of the same Customer from |
| technology (database partial replication) | | | | different order handling systems), after |
| - the database instance or the file in which data | | | | standardization |
| are initially loaded, in the staging area Moreover, | | | | - Processing of calculated facts (facts derived |
| the volume of data to be extracted is estimated, | | | | from detailed data e.g. total monetary value of an |
| in order to plan for computational & storage | | | | order) |
| capacity. Estimation sheets known as 'volumetric | | | | - Management of surrogate keys, which replace |
| sheets' are developed with the following | | | | operational systems keys |
| information per source field: | | | | - Enrichment of records with default values, if |
| - extraction frequency | | | | required |
| - estimated volume | | | | - Production of aggregate data, if needed |
| - standardisation and transformation rules applied | | | | - Data conversion according to the technological |
| (if any) | | | | platform used by the DW (DBMS, operating |
| - DW database field to which data will be loaded | | | | system) The ETL process is automated by |
| In many cases, data quality assessment and data | | | | software and executed periodically to update the |
| cleansing steps also take place in the staging area. | | | | DW. |
| Design and implementation of the automated ETL | | | | |