| Â | | | | Step 7: Define the duration of database and |
| DATA WAREHOUSE AND ITS APPLICATIONS IN | | | | periodicity of updation. |
| AGRICULTURE | | | | Step 8: Track slowly the change in dimension. |
| Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Dr. Satish R. | | | | Step 9: Decide the query priorities and query |
| Kolhe                | | | | modes. |
| Assistant | | | | All the above steps are required before the data |
| Â Â Â Â Â Â Â Â | | | | warehousing is implemented. The final step or |
| Â Â Â Â Â Â Â Â Â | | | | step 10 is implemented a simple data warehouse |
| Gf's GCOE | | | | or data mart. The approach should be ‘from |
| Â Â Â Â Â NMU | | | | simpler to complex'. First only a few data marts |
| Jalgaon                         | | | | are identified, designed and implemented. A data |
| Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â | | | | warehouse then will emerge gradually. |
| Â | | | | Let us discuss the above mentioned steps in |
| A Data warehouse is a repository of integrated | | | | detail. Interaction with the user is essential for |
| information, available for queries and analysis. | | | | obtaining answers to many questions. The user to |
| Data and information are extracted from | | | | be interviewed includes top management, middle |
| heterogeneous sources as they are generated. | | | | management, executives, as also operational |
| This makes it much easier and more efficient to | | | | users, in addition to sales force and marketing |
| run queries over data that originally came from | | | | teams. A clear picture emerges from the entire |
| different sources. In other words Data warehouse | | | | project on the data warehousing as to what are |
| is a database that is used to hold data for | | | | their problems and how they can be possibly |
| reporting and analysis. | | | | solved with the help of data warehousing. |
| Â Economic foundation and productivity growth | | | | 4. Â Various Technology Considerations |
| depends on agricultural sectors. Agriculture is the | | | | The following or technological issues [3] are |
| driving force behind the way of live and source of | | | | required to be considered for designing and |
| earnings for the majority of peoples. More than | | | | implementing a data warehouse: |
| 60 percents of population are living in rural areas | | | | 1. The hardware platforms for Data Warehouse |
| and the majority are farmers. The rural | | | | 2. DBMS for supporting data warehouse |
| communities as a main producer for country food | | | | 3. Communication and network infrastructure for |
| productivity and food security earn only 11 | | | | a Data Warehouse |
| percents of Gross Domestic Product (GDP). The | | | | 4. The system management /operating system |
| arrival of information age guides this country to | | | | platforms |
| new development strategies. | | | | 5. The software tools for building, operating and |
| National Electronics and Computer Technology | | | | using Data Warehouse |
| Center (NECTEC) in collaboration with the Ministry | | | | Â Â |
| of Agriculture, has launched "Agriculture | | | | Hardware Platform |
| Information Network" as a response to the | | | | Organization normally tend to utilize the already |
| unmet information requirements of the agricultural | | | | existing hardware platform for data warehouse |
| sector. Farmers should gain benefit from the | | | | development however the disk storage |
| contents provided which include risk assessment, | | | | requirements for a data warehouse will be |
| agriculture warning system and agricultural | | | | significantly large, especially in comparison with |
| knowledge base, which aim to improve | | | | single application. |
| technology, productivity, income and stability of | | | | If data warehouse or data mart is small in data |
| India agriculture sector through the age of | | | | size, normal Pentium server will be probably |
| Information Technology. The data warehouse | | | | sufficient with not very high reliability standards. |
| consists of common databases and geo-spatial | | | | However for a regular large data warehouse |
| databases from various departments and | | | | application the server has to be specialized for the |
| organizations in the country and abroad. Farmers | | | | tasks associated with a data warehouse. A |
| can get access to the contents through Internet | | | | mainframe, for example is well suited for this |
| by themselves or from groups of professional | | | | purpose, as a data warehouse server. What are |
| people called "Information Brokers". | | | | the features required for a successful data |
| Â | | | | warehouse server? Firstly it should be able to |
| Keywords: Data Warehouse, Agriculture, IT | | | | support large data volume and complex query |
| Â | | | | processing. In addition, it has to be highly scalable. |
| Â | | | | As the user population keeps on growing, the |
| 1.   Introduction | | | | network traffic and the access traffic increase |
| A Data warehouse [1] is a repository of | | | | significantly. Therefore, the requirement of data |
| integrated information, available for | | | | warehouse server is the scalable high |
| queries and analysis. Data and information | | | | performance for data loading and ad hoc query |
| are extracted from heterogeneous | | | | processing as well as the ability to support large |
| sources as they are generated. This | | | | database in a reliable and efficient manner. If the |
| makes it much easier and more | | | | querying is going to be on a large public data |
| efficient to run queries over data that | | | | network then multiprocessor configuration will be |
| originally came from different sources. In | | | | required for parallel query processing. In case of a |
| other words Data warehouse is a database that | | | | complex server of configuration with multiple |
| is used to hold data for reporting and analysis. | | | | processors and large I/O bandwidth a proper |
| Â Â | | | | balance needs to be made between I/O and |
| Goals of Data Warehousing | | | | processing power. |
| - To facilitate reporting as well as analysis | | | | Â |
| - Maintain an organizations historical information | | | | DBMS Selection |
| - Be an adaptive and resilient source of | | | | Next to hardware solutions a factor most critical |
| information | | | | is the database selection. This determines the |
| - Be the foundation for decision making | | | | speed performance of the data warehousing |
| Â Â | | | | environment. The requirement Of a DBMS for |
| Data Warehouse Architecture | | | | data warehousing and requirement are scalability |
| Data warehouse Architecture comprises of | | | | and high volume storage and processing and |
| - Operational source systems | | | | throughput in traffic. |
| - A data staging area | | | | Â Â Â Â Â Â Â Â Â Â The majority of established |
| - One or more conformed data marts | | | | RDBMS vendors have implemented various |
| - A data warehouse database | | | | degree of parallelism in their products. Even |
| Â | | | | though all the vendors have implemented various |
| Operational Source Systems | | | | degrees of parallelism in their products. Even |
| Operational source systems [1] are | | | | though all the well known vendors-IBM, ORACLE |
| developed to capture and process | | | | SYBASE-support parallel database processing, |
| original business transactions. These | | | | some of them have improved their architectures |
| systems are designed for data entry, | | | | so as to better suit the specialized requirement of |
| not for reporting, but it is from here the | | | | the data warehouse. The RDBMS products |
| data in data warehouse gets populated. | | | | provide additional modules for OLAP cubes. The |
| Â | | | | correct choice of OLAP server DB server and |
| Data Staging Area | | | | web server can be made by the designer or user |
| Data staging area is where the raw | | | | of Data warehouse depending on the requirement. |
| operational data is extracted, cleaned, | | | |  |
| transformed and combined so that it can | | | | Communication and Networking Infrastructure |
| be reported on and queried by users. | | | | Data warehouse can be internet enabled or |
| This area lies between the operational source | | | | intranet enabled as the choice may be. If web |
| systems and the user database and is typically | | | | enabled the networking is taken care by the |
| not accessible to users. | | | | internet. If only Intranet based then the |
| Â | | | | appropriate LAN operational environment should |
| Data staging is a major process that includes the | | | | be accessible to all the identified users. Thus |
| following sub procedures: | | | | network expansion may be required as per the |
| - Extraction | | | | needs. In web enabled data warehouses, issues of |
| The extract step is the first step of | | | | security privacy and accessibility need to be |
| getting data into the data warehouse | | | | considered carefully .Accordingly web enablement |
| environment. Extracting means reading and | | | | facilities should be ensured in the software tools |
| understanding the source data, and copying | | | | used for data warehouse development. |
| the pas that are needed to the data staging for | | | |   |
| further work. | | | | Stages in Implementation |
| - Transformation | | | | A data warehouse cannot be purchased and |
| Once the data is extracted into the | | | | installed. Its implementation requires the integration |
| data staging area, there are many | | | | of implementation of many products. Following are |
| transformation steps, including | | | | the steps of the Data Warehouse implementation: |
| Â | | | | Step 1: Collect and analyze business requirement. |
| 1. Cleaning the data by correcting misspellings, | | | | Step2: Create a data model and physical design |
| resolving domain conflicts, dealing with | | | | and data warehouse after deciding the |
| Â Â Â Â Â Â Â Â missing data elements, and parsing | | | | Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â |
| into standard formats. | | | | Â Â Â Â Â Â Â Â Â Â Â appropriate hardware platform. |
| 2. Purging selected fields from the legacy data | | | | Step 3: Define the data sources |
| that are not useful for data warehouse. | | | | Step 4: Choose the DBMS and software platform |
| 3. Combining data sources by matching | | | | for data warehouse. |
| exactly on key values or by | | | | Step 5: Extract the data from operational data |
| performing fuzzy    matches on non-key | | | | sources, translate it, clean-up and load into the |
| attributes. | | | | Â Â Â Â Â |
| 4. Creating surrogate keys for each | | | |            data warehouse model or data |
| dimension record in order to avoid | | | | mart. |
| dependency on legacy defined keys, | | | | Step 6: Choose database access and reporting |
|  where the surrogate key generation | | | | tools. |
| process enforces referential integrity between | | | | Step 7: Choose database connectivity software. |
| the dimension tables and fact tables. | | | | Step 8: Choose the data analysis (OLAP) and |
| 5. Building the aggregates for boosting the | | | | presentation (client GUI) software. |
| performance of common queries. | | | | Step 9: Keep refreshing the data warehouse |
| - Loading and indexing | | | | periodically. |
| At the end of transformation process, | | | |   |
| the data is in the form of load | | | | Access Tools |
| record images. Loading in the data | | | | With the exception of SAS(of SAS institute), all |
| warehouse environment usually takes the | | | | the Data Warehouses /OLAP vendors are not |
| form of replicating the dimensional tables | | | | currently providing comprehensive single-window |
| and fact tables and presenting these | | | | software tools capable of handling all aspects of |
| tables to bulk loading facilitates each | | | | data warehousing project implementation .SAS |
| recipient data mart. Bulk loading is a very | | | | alone meets the requirement largely independently |
| important capability that is to be | | | | as it has its own database internally with a |
| contrasted with record-at-a time loading, | | | | capability of import data from any vendor DBMS |
| which is far slower. The target data mart | | | | software. Therefore one can implement a data |
| must then index the newly arrived data for query | | | | warehousing and data mining solution |
| performance. | | | | independently with SAS. |
| Â | | | | The best way to choose a group of tools is to |
| Data Mart | | | | understand the capability and compatibility of |
| Data mart is a logical subset of an | | | | different type of access to the data and |
| enterprise-wide data warehouse. For | | | | reporting by selecting best tool in market for that |
| example, a data warehouse for a retail chain is | | | | kind of access. The types of access and reporting |
| constructed incrementally from individual, | | | | are as follows: |
| conformed data marts dealing with separate | | | | - 1. Time series analysis |
| subject areas such as product sales. Dimensional | | | | - 2. Data visualization, graphing, charting and |
| data marts are organized by subject | | | | pivoting |
| area such as sales, finance, and | | | | - 3. Complex textual search (text mining) |
| marketing and coordinated by data | | | | - 4. General stastical analysis. |
| category such as customer, product, | | | | - 5. Artificial intelligence techniques for hypothesis |
| and location. These flexible information | | | | testing, trends discovery, identification and |
| stores allows data structures to | | | | validation of data clusters and segments(also |
| respond to business changes-product line | | | | useful for data mining) |
| additions, new staff responsibilities, | | | | - 6. Mapping of specifial information into geographic |
| mergers, consolidations, and acquisitions. | | | | information system |
| Â Â | | | | - 7. Ad hoc user-specific queries |
| Data Warehouse Database | | | | - 8. Predefined repeatable queries |
| A data warehouse database contains the | | | | - 9. Drilling down interactically |
| data that is organized and stored | | | | - 10. Reporting the analysis by drilling down |
| specifically for direct user queries and | | | | - 11. Complex queries with multi-table forces, |
| reports. It differs from an OLTP | | | | multilevel sub-queries, sophisticated search criteria. |
| database in the sense that it isdesigned | | | | In some applications, the user requirement may |
| primarily for reads not writes. An OLAP | | | | exceed the capability of tools. A number of query |
| application is a system designed for | | | | tools are available in the market today which |
| few but complex (read only) request. | | | | enables an ordinary user to build customized |
| An OLTP application is a system | | | | reports by easily composing and executing ad hoc |
| designed for many but simple | | | | queries without any necessity to have the |
| concurrent (and updating) requests. | | | | knowledge of the underlying design details or data |
| Â | | | | base technology, SQL, or even the data model |
| Metadata | | | | Â Â |
| Metadata defines the content and location of the | | | | 5. Â Its Applications in Agriculture |
| data in the data warehouse, relationships between | | | | Â Â |
| the operational databases and the data | | | | Project: Agriculture Information System Network |
| warehouse and the business views of the data in | | | | (AGRISNET) |
| the data in the warehouse as accessible to the | | | | Department of Agriculture and Cooperation (DAC) |
| end-user tools. Metadata is searched by user to | | | | [2] have taken steps to establish "Agricultural |
| find the subject areas and the definitions of the | | | | Information System Network (AGRISNET)" in |
| data. | | | | collaboration with NIC. The Proposal recommends |
| For decision support, the pointers required to data | | | | (i) the state-of-the-art IT infrastructure |
| warehouse are provided by the metadata. | | | | requirements to establish AGRISNET as the |
| Therefore, it acts as logical link between the | | | | INTRANET over NICNET, (ii) development of |
| decision support system application and the data | | | | databases and information systems for decision |
| warehouse. Thus, any data warehouse design | | | | support for evaluation, monitoring and policy |
| should assure that there is a mechanism that | | | | formulations, and (iii) human resources |
| populates and maintains the metadata repository | | | | development, (iv) multi-media based training and |
| and that all access paths to data warehouse have | | | | demonstration of transfer of technology to |
| metadata as an entry point. In other words there | | | | strengthen Farm Research and Education using |
| should be no direct access permitted to the | | | | broadcast VSATs, (v) special interest groups in |
| data-warehouse data if it does the user metadata | | | | respect of subjects, problems, programmes, |
| definitions to gain the access. Meta data definition | | | | schemes, etc, and above all, to make Indian |
| can be done by the user in any given data | | | | Agriculture on-line for INTERNET and INTRANET |
| warehousing environment. The software | | | | access through AGRISNET Nodes. AGRISNET |
| environment as decided by the software tools | | | | Nodes are envisaged to be established at |
| used will provide a facility for metadata definition | | | |  • DAC Hqrs (Krishi Bhawan), |
| in a metadata repository. | | | |  • DAC Attached Offices and its regional |
|  | | | | offices, |
| OLAP Vs OLTP | | | |  • DAC Subordinate Offices and its regional |
|  | | | | units, |
| OLTP (Online Transactional Processing) | | | |  • DAC Public Sector Undertakings |
| - OLTP servers handle mission-critical production | | | | (NSC&SFCI) and sub-units, |
| data accessed through simple queries | | | |  • DAC Autonomous Organizations, |
| - Usually handles queries of an automated nature | | | |  • Apex Cooperative Organizations |
| - OLTP applications consist of a large number of | | | |  • State Agriculture Departments |
| relatively simple transactions. | | | |  • NCT/UT Agriculture Departments |
| - Most often contains data organised on the basis | | | |  • District Agriculture Offices and |
| of logical relations between normalised tables | | | |  • Block Agriculture offices |
| • OLAP (Online Analytical Processing) | | | | In this direction, IFFCO has taken up a project in |
| - OLAP servers handle management-critical data | | | | association with Indian Space Research |
| accessed through an iterative analytical | | | | Organisation (ISRO) to utilise satellite based |
| investigation | | | | remote sensing data and Geographical Information |
| - Usually handles queries of an ad-hoc nature | | | | Systems (GIS). Attention may be drawn to the |
| - supports more complex and demanding | | | | fact that the developed countries have been |
| transactions | | | | utilising precision farming with the help of IT tools |
| - contains logically organised data in multiple | | | | for a long time. While this will take a long time for |
| dimensions | | | | our country due to small holdings, it is to be noted |
| Â | | | | that GIS has an invaluable role to play even in the |
| 2.   Warehouse Schema Design | | | | existing conditions. Remote sensing and GIS |
| Dimensional modeling is a term used to refer a | | | | information can provide warnings on evolving crop |
| set of data modeling techniques that havegained | | | | stresses, crop vigour, etc. |
| popularity and acceptance for data | | | | The IFFCO-ISRO GIS project extends support |
| warehouse implementation. Dimensional | | | | for efficient and timely availability of IFFCO's |
| modeling is one of the key techniques in data | | | | fertiliser to farmers though better logistics & |
| warehousing. Two types of tables are used in | | | | efficient operations. It endeavours to provide |
| dimensional modeling: Fact tables and dimensional | | | | farmers' advisory services to provide decision |
| tables | | | | support to farmers on land related issues, |
| Â | | | | weather, etc. In addition to the GIS based |
| Â | | | | services, effort is being made to create |
| Â | | | | databases that contain information of interest to |
| Fact Tables  | | | | the farmers. These include recommendation on |
| These are used to record actual facts and | | | | package of practices for major cereals, pulses, |
| measures in the business. Facts are numeric | | | | horticulture, floriculture and animal husbandry, |
| data items that are of interest to the business. | | | | etc.Information on all the inputs such as seeds, |
| Example, telecommunication- length of call in | | | | fertiliser, sources, current availability, prices, |
| minutes, average number of calls. | | | | availability of credit, alternatives available and |
| Â | | | | terms and conditions, etc. are sought to be |
| Dimensional Tables | | | | provided. An important service envisaged is to |
| Dimensional tables establish the context of the | | | | provide access to the nearest expert in case of |
| facts. Dimensional tables store fields that | | | | stress or any other problem witnessed in the |
| describe the facts. Example, telecommunication- | | | | crops. Facilities are sought to be provided to |
| call origin, call destination. A schema is a fact | | | | encourage and share farm experiences by forging |
| table plus its related dimensional table. | | | | various crop forums. Many of the agricultural |
| Â Â | | | | extension services are also proposed to be made |
| 3. Crucial Decision in Designing a Data Warehouse | | | | online using aspects of multimedia. |
| The job of designing and implementing a data | | | | In order to encourage farmers to obtain best |
| warehouse [3] is very challenging and difficult one, | | | | possible price, information on various agricultural |
| even though at the same time, there is a lot of | | | | output markets (mantis) is also being provided. |
| focus and importance attached to it. The designer | | | | The objective of this activity is to provide status |
| of the data warehouse may be asked by the top | | | | of price at different mandies to facilitate farmer |
| management:"take all enterprise data and build a | | | | to move his produce to the mandi where he can |
| data warehouse such that the management can | | | | expect better price. Other areas of interest to |
| get answer to all their questions". This is daunting | | | | farmers such as distance education, location |
| task with responsibility being visible and exciting. | | | | specific news, etc. are also planned. Access to |
| But how to get started? Where to start? Which | | | | other related sites of interest such those relating |
| data should be put first? Where is that data | | | | to courts, health, etc. are also sought to be |
| available? Which query should be answered? How | | | | provided. |
| would bring down the scope of project to | | | | Â Â |
| something smaller and manageable, yet be | | | | 6. Â Conclusions |
| scalable to gradually upgrade to upgrade to | | | | Analytical exploration of vast amount of |
| comprehensive data warehouse environment | | | | agricultural data can best be support by |
| finally? | | | | appropriate application of Data Warehousing and |
| The recent trend is to build data marts for before | | | | OLAP technologies. A Data Warehouse provides |
| a real large data warehouse is built. People want | | | | efficient and reliable structure of storage for vast |
| something smaller, so as to get manageable | | | | amount data while OLAP techniques provide |
| results before proceeding to the real data | | | | mechanisms for analysis of this data. |
| warehouse. | | | | Â |
| RALPH KIMBALL identified a nine step method as | | | | 7. Â References |
| follows: | | | | [1] Data warehouse and its applications in |
| Step 1: Choose the subject matter. | | | | Agriculture, Anil Rai, Indian Agricultural Statistics |
| Step 2: Decide the what the fact table represents. | | | | Research Institute Library Avenue, New Delhi. |
| Step 3: Identify and confirm the dimension. | | | | [2] Information Technology in Agriculture, S.C. |
| Step 4: Choose the facts. | | | | Mittal. |
| Step 5: Store precalculation in the fact table. | | | | [3] Data Warehousing concepts, Techniques, |
| Step 6: Define the dimension and tables. | | | | Products and Applications, C.S.R.Prabhu. |