| In any organization Data Analysis is important in | | | | results in losing some information. If the sample |
| order to improve the business performance and | | | | data size is large avoiding some records or |
| to sort out different problems associated with the | | | | attributes may not effect the results, but still you |
| business. To analyse the situation first thing you | | | | need to keep in mind you are losing something. |
| need is historical data (relevant to the problem). | | | | |
| Once you have data, its need to be in proper | | | | Data imputation: |
| format so it can be easily analyse. There may be | | | | Data imputation is another method of handling |
| number of problems you have to face while | | | | missing values. By using this method we try to fill |
| bringing the data into proper format. Missing Data | | | | missing values in the in the records and attributes. |
| is most common problem that comes up during | | | | This method is quite useful because by following |
| the data analysis process, especially when you get | | | | this method we can make sure we have all the |
| some feed back from customers to analyse the | | | | information from responders. There are number |
| situation and due to some reason they are not | | | | of methods to fill the missing values few are |
| interested to answer all the questions. During the | | | | given below. |
| Data Preparation phase (Data Mining Phases) you | | | | Case Substitution |
| have to sort out these things to bring the data | | | | Mean Substitution |
| into proper format. There are two methods that | | | | Hot Deck Imputation |
| are widely used to come up such situation and | | | | Nearest Neighbour Imputation |
| helps in well formed decision making in the end of | | | | Case Substitution: |
| analysis. One is “Avoid the missing | | | | By using this method we can replace the missing |
| data” and other is “Data | | | | value with historical value from similar cases. We |
| Imputation” Many data cleansing methods | | | | can not use value from current sample for case |
| had been developed and incorporated into analysis | | | | substitution; it must be from previous |
| softwares (SPSS) to handle these problems. | | | | observations. |
| | | | Mean Substitution: |
| Avoid Missing data: | | | | This is quite simple method. In this method we |
| | | | can simply replace the missing value with the |
| This is the easiest way to handle the missing data. | | | | mean value for that particular attribute. But this |
| Delete all those records that are not completely | | | | method can not be applied for categorical data. |
| filled. For example you send feedback form to | | | | This method is only useful for column or |
| 1000 customers, and out of 1000 only 800 | | | | attributes imputation not for row imputation. |
| feedback forms are completely filled. So easiest | | | | Hot Deck Imputation: |
| way is to avoid 200 forms and only consider rest | | | | In this method, missing value is filled with value |
| of 800 feedback forms for analysis purposes. | | | | that comes from similar cases or records in the |
| Some time it may happen for a particular | | | | current sample. That means if two records are |
| attribute or column you get less response as | | | | quite similar and in one record, value for some |
| compare to other attributes. For example out of | | | | attribute is missing then we can fill the value from |
| 1000 responders, 500 don’t like to answer | | | | other similar record. |
| question 6th. So in this case attribute | | | | Cold Deck Imputation: |
| corresponding to 6th number question can be | | | | In this method we can replace missing value with |
| discarded from the analysis. | | | | single fixed value. Value must be from external |
| Main advantage of this method is, it is not time | | | | source, which mean it can not be from current |
| consuming and same time it is very easy to | | | | sample. Fixed value means if in particular attribute |
| follow. But there are many drawback associated | | | | we are missing 5 values, then in all cases we have |
| with this method. Avoiding record or feedback | | | | to fill same value. |