Data deduplication

Data deduplication ("dedup") is a role service that conserves storage space by storing only one copy of redundant chunks of files. Data duplication is appropriate to specific workloads, like backup volumes and file servers. It is not appropriate for database storage or operating system data or boot volumes.

Data deduplication had required NTFS, although ReFS is supported since 1709.

Data deduplication runs as a low-priority background process when the system is idle, by default; however its behavior can be configured based on its intended usage. Deduplication works by scanning files, and breaking them into unique chunks of various sizes that are collected in a chunk store. The original locations of chunks are replaced by reparse points. When a file is recently written, it is written in the standard, unoptimized form; the accumulation of such files is known as churn. Other jobs associated with deduplication include garbage collection, integrity scrubbing, and (when disabling deduplication) unoptimization.

There are several deployment scenarios considered for data deduplication:

General purpose file servers Users often store multiple copies of the same, or similar, documents and files. Up to 30-50% of this space can be reclaimed using deduplication.
Virtualized Desktop Infrastructre (VDI) deployments Virtual hard disks that are used for remote desktops are essentially identical. Data Deduplication can also amelioriate the drop in storage performance when many users simultaneously log in at the start of the day, called a VDI boot storm.
Backup snapshots are an ideal deployment scenario because of the data is so duplicative.

Deduplication is especially useful for disk drive backups, since snapshots typically differ little from each other.

PowerShell

PowerShell support for Data deduplication is implemented in the deduplication module.

Cmdlet	Description
Expand-DedupFile	Expands an optimized file into its original location.
Measure-DedupFileMetadata	Measures potential disk space on a volume.
Get-DedupJob	Returns status and information for currently running or queued deduplication jobs.
Start-DedupJob	Starts a data deduplication job.
Stop-DedupJob	Cancels one or more specified data deduplication jobs.
Get-DedupMetadata	Returns metadata for volumes that have data deduplication metadata.
Get-DedupSchedule	Returns the deduplication job schedule defined on the computer.
New-DedupSchedule	Creates a data deduplication schedule.
Remove-DedupSchedule	Deletes a deduplication schedule.
Set-DedupSchedule	Changes configuration settings for data deduplication schedules.
Get-DedupStatus	Returns deduplication status for volumes that have data deduplication metadata.
Update-DedupStatus	Scans volumes for fresh data deduplication savings.
Disable-DedupVolume	Disables data deduplication activity on one or more volumes.
Enable-DedupVolume	Enable deduplication for a volume
Get-DedupVolume	Returns deduplication volumes that have data deduplication metadata.
Set-DedupVolume	Changes data deduplication settings on one or more volumes.

Exam objectives quick reference guide

70-740 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 5.1 5.2 5.3 5.4 5.5 5.6 6.1 6.2
70-741 1.1 1.2 2.1 2.2 3.1 3.2 3.3 4.1 4.2 4.3 5.1 5.2 6.1 6.2
70-742 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 5.1 5.2 5.3

Data deduplication

PowerShell

Exam objectives quick reference guide

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cluster