How to Handle Missing Values

In any organization Data Analysis is important inresults in losing some information. If the sample
order to improve the business performance anddata size is large avoiding some records or
to sort out different problems associated with theattributes may not effect the results, but still you
business. To analyse the situation first thing youneed to keep in mind you are losing something.
need is historical data (relevant to the problem).
Once you have data, its need to be in properData imputation:
format so it can be easily analyse. There may beData imputation is another method of handling
number of problems you have to face whilemissing values. By using this method we try to fill
bringing the data into proper format. Missing Datamissing values in the in the records and attributes.
is most common problem that comes up duringThis method is quite useful because by following
the data analysis process, especially when you getthis method we can make sure we have all the
some feed back from customers to analyse theinformation from responders. There are number
situation and due to some reason they are notof methods to fill the missing values few are
interested to answer all the questions. During thegiven below.
Data Preparation phase (Data Mining Phases) youCase Substitution
have to sort out these things to bring the dataMean Substitution
into proper format. There are two methods thatHot Deck Imputation
are widely used to come up such situation andNearest Neighbour Imputation
helps in well formed decision making in the end ofCase Substitution:
analysis. One is “Avoid the missingBy using this method we can replace the missing
data” and other is “Datavalue with historical value from similar cases. We
Imputation” Many data cleansing methodscan not use value from current sample for case
had been developed and incorporated into analysissubstitution; it must be from previous
softwares (SPSS) to handle these problems.observations.
Mean Substitution:
Avoid Missing data:This is quite simple method. In this method we
can simply replace the missing value with the
This is the easiest way to handle the missing data.mean value for that particular attribute. But this
Delete all those records that are not completelymethod can not be applied for categorical data.
filled. For example you send feedback form toThis method is only useful for column or
1000 customers, and out of 1000 only 800attributes imputation not for row imputation.
feedback forms are completely filled. So easiestHot Deck Imputation:
way is to avoid 200 forms and only consider restIn this method, missing value is filled with value
of 800 feedback forms for analysis purposes.that comes from similar cases or records in the
Some time it may happen for a particularcurrent sample. That means if two records are
attribute or column you get less response asquite similar and in one record, value for some
compare to other attributes. For example out ofattribute is missing then we can fill the value from
1000 responders, 500 don’t like to answerother similar record.
question 6th. So in this case attributeCold Deck Imputation:
corresponding to 6th number question can beIn this method we can replace missing value with
discarded from the analysis.single fixed value. Value must be from external
Main advantage of this method is, it is not timesource, which mean it can not be from current
consuming and same time it is very easy tosample. Fixed value means if in particular attribute
follow. But there are many drawback associatedwe are missing 5 values, then in all cases we have
with this method. Avoiding record or feedbackto fill same value.