Skip to content

Commit 845abeb

Browse files
yhrcmaiolino
authored andcommitted
xfs: add tunable threshold parameter for triggering zone GC
Presently we start garbage collection late - when we start running out of free zones to backfill max_open_zones. This is a reasonable default as it minimizes write amplification. The longer we wait, the more blocks are invalidated and reclaim cost less in terms of blocks to relocate. Starting this late however introduces a risk of GC being outcompeted by user writes. If GC can't keep up, user writes will be forced to wait for free zones with high tail latencies as a result. This is not a problem under normal circumstances, but if fragmentation is bad and user write pressure is high (multiple full-throttle writers) we will "bottom out" of free zones. To mitigate this, introduce a zonegc_low_space tunable that lets the user specify a percentage of how much of the unused space that GC should keep available for writing. A high value will reclaim more of the space occupied by unused blocks, creating a larger buffer against write bursts. This comes at a cost as write amplification is increased. To illustrate this using a sample workload, setting zonegc_low_space to 60% avoids high (500ms) max latencies while increasing write amplification by 15%. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
1 parent a1a56f5 commit 845abeb

File tree

5 files changed

+75
-2
lines changed

5 files changed

+75
-2
lines changed

Documentation/admin-guide/xfs.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -542,3 +542,24 @@ The interesting knobs for XFS workqueues are as follows:
542542
nice Relative priority of scheduling the threads. These are the
543543
same nice levels that can be applied to userspace processes.
544544
============ ===========
545+
546+
Zoned Filesystems
547+
=================
548+
549+
For zoned file systems, the following attributes are exposed in:
550+
551+
/sys/fs/xfs/<dev>/zoned/
552+
553+
max_open_zones (Min: 1 Default: Varies Max: UINTMAX)
554+
This read-only attribute exposes the maximum number of open zones
555+
available for data placement. The value is determined at mount time and
556+
is limited by the capabilities of the backing zoned device, file system
557+
size and the max_open_zones mount option.
558+
559+
zonegc_low_space (Min: 0 Default: 0 Max: 100)
560+
Define a percentage for how much of the unused space that GC should keep
561+
available for writing. A high value will reclaim more of the space
562+
occupied by unused blocks, creating a larger buffer against write
563+
bursts at the cost of increased write amplification. Regardless
564+
of this value, garbage collection will always aim to free a minimum
565+
amount of blocks to keep max_open_zones open for data placement purposes.

fs/xfs/xfs_mount.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,7 @@ typedef struct xfs_mount {
229229
bool m_finobt_nores; /* no per-AG finobt resv. */
230230
bool m_update_sb; /* sb needs update in mount */
231231
unsigned int m_max_open_zones;
232+
unsigned int m_zonegc_low_space;
232233

233234
/*
234235
* Bitsets of per-fs metadata that have been checked and/or are sick.

fs/xfs/xfs_sysfs.c

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -718,8 +718,40 @@ max_open_zones_show(
718718
}
719719
XFS_SYSFS_ATTR_RO(max_open_zones);
720720

721+
static ssize_t
722+
zonegc_low_space_store(
723+
struct kobject *kobj,
724+
const char *buf,
725+
size_t count)
726+
{
727+
int ret;
728+
unsigned int val;
729+
730+
ret = kstrtouint(buf, 0, &val);
731+
if (ret)
732+
return ret;
733+
734+
if (val > 100)
735+
return -EINVAL;
736+
737+
zoned_to_mp(kobj)->m_zonegc_low_space = val;
738+
739+
return count;
740+
}
741+
742+
static ssize_t
743+
zonegc_low_space_show(
744+
struct kobject *kobj,
745+
char *buf)
746+
{
747+
return sysfs_emit(buf, "%u\n",
748+
zoned_to_mp(kobj)->m_zonegc_low_space);
749+
}
750+
XFS_SYSFS_ATTR_RW(zonegc_low_space);
751+
721752
static struct attribute *xfs_zoned_attrs[] = {
722753
ATTR_LIST(max_open_zones),
754+
ATTR_LIST(zonegc_low_space),
723755
NULL,
724756
};
725757
ATTRIBUTE_GROUPS(xfs_zoned);

fs/xfs/xfs_zone_alloc.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1201,6 +1201,13 @@ xfs_mount_zones(
12011201
xfs_set_freecounter(mp, XC_FREE_RTEXTENTS,
12021202
iz.available + iz.reclaimable);
12031203

1204+
/*
1205+
* The user may configure GC to free up a percentage of unused blocks.
1206+
* By default this is 0. GC will always trigger at the minimum level
1207+
* for keeping max_open_zones available for data placement.
1208+
*/
1209+
mp->m_zonegc_low_space = 0;
1210+
12041211
error = xfs_zone_gc_mount(mp);
12051212
if (error)
12061213
goto out_free_zone_info;

fs/xfs/xfs_zone_gc.c

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,18 +162,30 @@ struct xfs_zone_gc_data {
162162

163163
/*
164164
* We aim to keep enough zones free in stock to fully use the open zone limit
165-
* for data placement purposes.
165+
* for data placement purposes. Additionally, the m_zonegc_low_space tunable
166+
* can be set to make sure a fraction of the unused blocks are available for
167+
* writing.
166168
*/
167169
bool
168170
xfs_zoned_need_gc(
169171
struct xfs_mount *mp)
170172
{
173+
s64 available, free;
174+
171175
if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE))
172176
return false;
173-
if (xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE) <
177+
178+
available = xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE);
179+
180+
if (available <
174181
mp->m_groups[XG_TYPE_RTG].blocks *
175182
(mp->m_max_open_zones - XFS_OPEN_GC_ZONES))
176183
return true;
184+
185+
free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS);
186+
if (available < mult_frac(free, mp->m_zonegc_low_space, 100))
187+
return true;
188+
177189
return false;
178190
}
179191

0 commit comments

Comments
 (0)