Skip to content

Commit 5644c6b

Browse files
Yan ZhaiAlexei Starovoitov
authored andcommitted
bpf: skip non exist keys in generic_map_lookup_batch
The generic_map_lookup_batch currently returns EINTR if it fails with ENOENT and retries several times on bpf_map_copy_value. The next batch would start from the same location, presuming it's a transient issue. This is incorrect if a map can actually have "holes", i.e. "get_next_key" can return a key that does not point to a valid value. At least the array of maps type may contain such holes legitly. Right now these holes show up, generic batch lookup cannot proceed any more. It will always fail with EINTR errors. Rather, do not retry in generic_map_lookup_batch. If it finds a non existing element, skip to the next key. This simple solution comes with a price that transient errors may not be recovered, and the iteration might cycle back to the first key under parallel deletion. For example, Hou Tao <houtao@huaweicloud.com> pointed out a following scenario: For LPM trie map: (1) ->map_get_next_key(map, prev_key, key) returns a valid key (2) bpf_map_copy_value() return -ENOMENT It means the key must be deleted concurrently. (3) goto next_key It swaps the prev_key and key (4) ->map_get_next_key(map, prev_key, key) again prev_key points to a non-existing key, for LPM trie it will treat just like prev_key=NULL case, the returned key will be duplicated. With the retry logic, the iteration can continue to the key next to the deleted one. But if we directly skip to the next key, the iteration loop would restart from the first key for the lpm_trie type. However, not all races may be recovered. For example, if current key is deleted after instead of before bpf_map_copy_value, or if the prev_key also gets deleted, then the loop will still restart from the first key for lpm_tire anyway. For generic lookup it might be better to stay simple, i.e. just skip to the next key. To guarantee that the output keys are not duplicated, it is better to implement map type specific batch operations, which can properly lock the trie and synchronize with concurrent mutators. Fixes: cb4d03a ("bpf: Add generic support for lookup batch op") Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
1 parent 8784714 commit 5644c6b

File tree

1 file changed

+5
-13
lines changed

1 file changed

+5
-13
lines changed

kernel/bpf/syscall.c

Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1977,8 +1977,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
19771977
return err;
19781978
}
19791979

1980-
#define MAP_LOOKUP_RETRIES 3
1981-
19821980
int generic_map_lookup_batch(struct bpf_map *map,
19831981
const union bpf_attr *attr,
19841982
union bpf_attr __user *uattr)
@@ -1988,8 +1986,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
19881986
void __user *values = u64_to_user_ptr(attr->batch.values);
19891987
void __user *keys = u64_to_user_ptr(attr->batch.keys);
19901988
void *buf, *buf_prevkey, *prev_key, *key, *value;
1991-
int err, retry = MAP_LOOKUP_RETRIES;
19921989
u32 value_size, cp, max_count;
1990+
int err;
19931991

19941992
if (attr->batch.elem_flags & ~BPF_F_LOCK)
19951993
return -EINVAL;
@@ -2035,14 +2033,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
20352033
err = bpf_map_copy_value(map, key, value,
20362034
attr->batch.elem_flags);
20372035

2038-
if (err == -ENOENT) {
2039-
if (retry) {
2040-
retry--;
2041-
continue;
2042-
}
2043-
err = -EINTR;
2044-
break;
2045-
}
2036+
if (err == -ENOENT)
2037+
goto next_key;
20462038

20472039
if (err)
20482040
goto free_buf;
@@ -2057,12 +2049,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
20572049
goto free_buf;
20582050
}
20592051

2052+
cp++;
2053+
next_key:
20602054
if (!prev_key)
20612055
prev_key = buf_prevkey;
20622056

20632057
swap(prev_key, key);
2064-
retry = MAP_LOOKUP_RETRIES;
2065-
cp++;
20662058
cond_resched();
20672059
}
20682060

0 commit comments

Comments
 (0)