PC Backup Software and Data Deduplication (Part II)

We continue the discussion on how dataknowledge of the files or file-structure. This
deduplication has revolutionized how PC backupforces it to use the fixed-length block approach
software works. This article is the second in a(discussed in details later).
series that attempts to explain how various-       The post-process deduplication
factors define the success of a PC backup. Weasynchronously acts on the stored data. And has
have already explained Target Vs Source-basedan exact opposite effect on advantages and
Deduplication.disadvantages of the inline deduplication listed
Inline Vs Post-process Deduplicationabove.
In target based deduplication, the deduplicationFile vs Sub-file Level Deduplication
engine can either process data for duplicates inThe duplicate removal algorithm can be applied on
real time (i.e. as and when its send to target) orfull file or sub-file levels. Full file level duplicates can
after its been stored in the target storage. Thebe easily eliminated by calculating single checksum
former is called inline deduplication.of the complete file data and comparing it against
The obvious advantages are:existing checksums of already backed up files. It's
-       Increase in overall efficiency as data issimple and fast, but the extent of deduplication is
only passed and processed oncevery less, as it does not address the problem of
-       The processed data is instantaneouslyduplicate content found inside different files or
available for post storage processes like recoverydata-sets (e.g. emails).
and replication reducing the RPO and RTO window.The sub-file level deduplication technique breaks
The disadvantages are:the file into smaller fixed or variable size blocks,
-       Decrease in write throughputand then uses standard hash based algorithm to
-       Extent of deduplication is less - Onlyfind similar blocks.
fixed-length block deduplication approach can beAmong the better products PC backup software
usethat use both post-process and sub-file level data
-       The inline deduplication only processeddeduplication is Druva inSync.
incoming raw blocks and does not have any