| We continue the discussion on how data | | | | knowledge of the files or file-structure. This |
| deduplication has revolutionized how PC backup | | | | forces it to use the fixed-length block approach |
| software works. This article is the second in a | | | | (discussed in details later). |
| series that attempts to explain how various | | | | - The post-process deduplication |
| factors define the success of a PC backup. We | | | | asynchronously acts on the stored data. And has |
| have already explained Target Vs Source-based | | | | an exact opposite effect on advantages and |
| Deduplication. | | | | disadvantages of the inline deduplication listed |
| Inline Vs Post-process Deduplication | | | | above. |
| In target based deduplication, the deduplication | | | | File vs Sub-file Level Deduplication |
| engine can either process data for duplicates in | | | | The duplicate removal algorithm can be applied on |
| real time (i.e. as and when its send to target) or | | | | full file or sub-file levels. Full file level duplicates can |
| after its been stored in the target storage. The | | | | be easily eliminated by calculating single checksum |
| former is called inline deduplication. | | | | of the complete file data and comparing it against |
| The obvious advantages are: | | | | existing checksums of already backed up files. It's |
| - Increase in overall efficiency as data is | | | | simple and fast, but the extent of deduplication is |
| only passed and processed once | | | | very less, as it does not address the problem of |
| - The processed data is instantaneously | | | | duplicate content found inside different files or |
| available for post storage processes like recovery | | | | data-sets (e.g. emails). |
| and replication reducing the RPO and RTO window. | | | | The sub-file level deduplication technique breaks |
| The disadvantages are: | | | | the file into smaller fixed or variable size blocks, |
| - Decrease in write throughput | | | | and then uses standard hash based algorithm to |
| - Extent of deduplication is less - Only | | | | find similar blocks. |
| fixed-length block deduplication approach can be | | | | Among the better products PC backup software |
| use | | | | that use both post-process and sub-file level data |
| - The inline deduplication only processed | | | | deduplication is Druva inSync. |
| incoming raw blocks and does not have any | | | | |