Skip to content

Data deduplication

jasper-zanjani edited this page Jul 30, 2020 · 3 revisions

Data deduplication ("dedup") is a role service that conserves storage space by storing only one copy of redundant chunks of files. Data duplication is appropriate to specific workloads, like backup volumes and file servers. It is not appropriate for database storage or operating system data or boot volumes.

Data deduplication had required NTFS, although ReFS is supported since 1709.

Data deduplication runs as a low-priority background process when the system is idle, by default; however its behavior can be configured based on its intended usage. Deduplication works by scanning files, and breaking them into unique chunks of various sizes that are collected in a chunk store. The original locations of chunks are replaced by reparse points. When a file is recently written, it is written in the standard, unoptimized form; the accumulation of such files is known as churn. Other jobs associated with deduplication include garbage collection, integrity scrubbing, and (when disabling deduplication) unoptimization.

There are several deployment scenarios considered for data deduplication:

  • General purpose file servers Users often store multiple copies of the same, or similar, documents and files. Up to 30-50% of this space can be reclaimed using deduplication.
  • Virtualized Desktop Infrastructre (VDI) deployments Virtual hard disks that are used for remote desktops are essentially identical. Data Deduplication can also amelioriate the drop in storage performance when many users simultaneously log in at the start of the day, called a VDI boot storm.
  • Backup snapshots are an ideal deployment scenario because of the data is so duplicative.

Deduplication is especially useful for disk drive backups, since snapshots typically differ little from each other.

PowerShell

PowerShell support for Data deduplication is implemented in the deduplication module.

Cmdlet Description
Expand-DedupFile Expands an optimized file into its original location.
Measure-DedupFileMetadata Measures potential disk space on a volume.
Get-DedupJob Returns status and information for currently running or queued deduplication jobs.
Start-DedupJob Starts a data deduplication job.
Stop-DedupJob Cancels one or more specified data deduplication jobs.
Get-DedupMetadata Returns metadata for volumes that have data deduplication metadata.
Get-DedupSchedule Returns the deduplication job schedule defined on the computer.
New-DedupSchedule Creates a data deduplication schedule.
Remove-DedupSchedule Deletes a deduplication schedule.
Set-DedupSchedule Changes configuration settings for data deduplication schedules.
Get-DedupStatus Returns deduplication status for volumes that have data deduplication metadata.
Update-DedupStatus Scans volumes for fresh data deduplication savings.
Disable-DedupVolume Disables data deduplication activity on one or more volumes.
Enable-DedupVolume Enable deduplication for a volume
Get-DedupVolume Returns deduplication volumes that have data deduplication metadata.
Set-DedupVolume Changes data deduplication settings on one or more volumes.
Clone this wiki locally