-
Notifications
You must be signed in to change notification settings - Fork 48
Description
We want to automatically detect when new functions appear in the spark module (or when existing ones are removed/renamed) so that we can update documentation, tests, and changelogs accordingly.
Currently, each mod.rs typically exposes a pub fn functions() -> Vec<...> returning a vec![ ... ] with registered functions. The idea is to inspect these mod.rs files and list the inner contents of that vec![].
Example in zsh
cd datafusion/spark
for f in **/mod.rs(.N); do
# empty vec
if rg -nUP 'pub fn functions\(\)[\s\S]*?vec!\[\s*\S' "$f" >/dev/null; then
echo "---- $f ----"
rg -nUP 'pub fn functions\(\)[\s\S]*?vec!\[([\s\S]*?\S[\s\S]*?)\]' -or '$1' "$f"
echo
fi
done
output
cd datafusion/spark
for f in **/mod.rs(.N); do
# ¿Tiene un vec![ ... ] NO vacío dentro de pub fn functions() ?
if rg -nUP 'pub fn functions\(\)[\s\S]*?vec!\[\s*\S' "$f" >/dev/null; then
echo "---- $f ----"
# Imprime solo el contenido interno del vec![ ... ]
rg -nUP 'pub fn functions\(\)[\s\S]*?vec!\[([\s\S]*?\S[\s\S]*?)\]' -or '$1' "$f"
echo
fi
done
---- src/function/aggregate/mod.rs ----
---- src/function/array/mod.rs ----
32:array()
---- src/function/bitmap/mod.rs ----
36:bitmap_count()
---- src/function/bitwise/mod.rs ----
39:bit_get(), bit_count()
---- src/function/collection/mod.rs ----
---- src/function/conditional/mod.rs ----
---- src/function/conversion/mod.rs ----
---- src/function/csv/mod.rs ----
---- src/function/datetime/mod.rs ----
59:date_add(), date_sub(), last_day(), next_day()
---- src/function/generator/mod.rs ----
---- src/function/hash/mod.rs ----
39:crc32(), sha1(), sha2()
---- src/function/json/mod.rs ----
---- src/function/lambda/mod.rs ----
---- src/function/map/mod.rs ----
---- src/function/math/mod.rs ----
54: expm1(),
55: factorial(),
56: hex(),
57: modulus(),
58: pmod(),
59: rint(),
60:
---- src/function/misc/mod.rs ----
---- src/function/predicate/mod.rs ----
---- src/function/string/mod.rs ----
64:ascii(), char(), ilike(), like(), luhn_check()
---- src/function/struct/mod.rs ----
---- src/function/table/mod.rs ----
---- src/function/url/mod.rs ----
32:parse_url()
---- src/function/window/mod.rs ----
---- src/function/xml/mod.rs ----
cat Cargo.toml | grep "Define DataFusion version" -A 1
# Define DataFusion version
version = "49.0.2"
For this we can migrate or add new function
P.S. Ideally, we should also have a complete overview of all the functions that are already implemented. I noticed there is something like expr::function, which might help us detect them systematically. With that, we could eventually build a bot that automatically checks whenever datafusion is updated and opens a new issue for every new implementation or refactor we need to handle.