Skip to content

Glusterfs does not heal sparse files correctly and fills up whole new brick on disperse volume after failed brick reset #3546

@strzinek

Description

@strzinek

Description of problem:
Gluster heal process fills up whole free space on replaced brick of disperse volume, if there are sparse files in volume.

The exact command to reproduce the issue:

  1. create new disperse volume (tested with 4+2), e.g. with gluster volume create vol1 disperse-data 4 redundancy 2 transport tcp node1:/gluster/nvme1/brick node1:/gluster/nvme2/brick node2:/gluster/nvme1/brick node2:/gluster/nvme2/brick node3:/gluster/nvme1/brick node3:/gluster/nvme2/brick
  2. place some sparse files on volume, e.g. with cp -avp --sparse=always source destination-vol1/
  3. reset a brick, e.g. with gluster volume reset-brick vol1 node3:/gluster/nvme1/brick start and gluster volume reset-brick vol1 node3:/gluster/nvme1/brick node3:/gluster/nvme1/brick commit force

Actual results:
The volume starts healing, but sparse files on new brick appear as their real size corresponds to apparent sparse file size, eventually filling up whole brick. Besides that, such volume starts reporting wrong size with df when mounted. Healing process never ends, leaving some files unhealed, and new brick reports No space left on device (see brick log fragment bellow).

Expected results:
Dispersed volume with sparse files on it should be correctly healed after brick reset.

Mandatory info:
- The output of the gluster volume info command:

Volume Name: vol1
Type: Disperse
Volume ID: ***
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/nvme1/brick
Brick2: node1:/gluster/nvme2/brick
Brick3: node2:/gluster/nvme1/brick
Brick4: node2:/gluster/nvme2/brick
Brick5: node3:/gluster/nvme1/brick
Brick6: node3:/gluster/nvme2/brick
Options Reconfigured:
cluster.server-quorum-type: none
storage.health-check-interval: 600
storage.health-check-timeout: 30
auth.allow: ***
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.cache-invalidation: on
network.ping-timeout: 5
server.allow-insecure: on
network.remote-dio: disable
client.event-threads: 8
server.event-threads: 8
performance.io-thread-count: 8
cluster.eager-lock: enable
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
cluster.lookup-optimize: off
performance.readdir-ahead: off
cluster.readdir-optimize: off
cluster.enable-shared-storage: enable

- The output of the gluster volume status command:

Status of volume: vol1
Gluster process                              TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/gluster/nvme1/brick             49168     0          Y       1239337
Brick node1:/gluster/nvme2/brick             49169     0          Y       1239344
Brick node2:/gluster/nvme1/brick             49168     0          Y       1363957
Brick node2:/gluster/nvme2/brick             49169     0          Y       1363964
Brick node3:/gluster/nvme1/brick             49157     0          Y       848916
Brick node3:/gluster/nvme2/brick             49158     0          Y       848923
Self-heal Daemon on localhost                N/A       N/A        Y       848936
Self-heal Daemon on node1                    N/A       N/A        Y       1239357
Self-heal Daemon on node2                    N/A       N/A        Y       1363977

Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

Launching heal operation to perform index self heal on volume vol1 has been successful
Use heal info commands to check status.

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/bricks/gluster-nvme1-brick.log

[2022-05-19 17:23:17.588485 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib64/glusterfs/9.1/xlator/features/index.so(+0x3bdc) [0x7f0915443bdc] -->/lib64/libglusterfs.so.0(dict_get_str+0x3c) [0x7f0924c5318c] -->/lib64/libglusterfs.so.0(dict_get_with_ref+0x85) [0x7f0924c519b5] ) 0-dict: dict OR key (link-count) is NULL [Invalid argument]
[2022-05-19 17:23:17.601320 +0000] E [MSGID: 113072] [posix-inode-fd-ops.c:2068:posix_writev] 0-vol1-posix: write failed: offset 0, [No space left on device]
[2022-05-19 17:23:17.601396 +0000] E [MSGID: 115067] [server-rpc-fops_v2.c:1324:server4_writev_cbk] 0-vol1-server: WRITE info [{frame=12201148}, {WRITEV_fd_no=3}, {uuid_utoa=***
-dc3d-4041-8e11-835327df299c}, {client=CTX_ID:***-GRAPH_ID:4-PID:1027276-HOST:my-host-name.cz-PC_NAME:vol1-client-4-RECON_NO:-0}, {error-xlator=vol1-posix}, {errno=28}, {error=No space left on device}]

**- Is there any crash ? Provide the backtrace and coredump
No

Additional info:
I am also concerned about very high PIDs, even short after node restart, but that may not be related.

- The operating system / glusterfs version:
centos 8 / glusterfs 9.1 and also tested on glusterfs 9.4

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions