Ts_2DiFF is a key encoding algorithm in TsFile, specifically designed for compressing timestamp data. Based on second-order delta encoding, it significantly reduces storage space and achieves high compression ratios, making it especially effective for high-frequency and sequential timestamp series. It serves as a core component in TsFile’s time-series compression and decompression pipeline.
As TsFile becomes increasingly adopted in machine learning workloads—for example, as a backend for data loading during model training—it is essential to support access patterns typical of ML scenarios, such as efficient random access and high-throughput batch loading. To meet these demands, parallelizing the Ts_2Diff encoding algorithm and enabling predicate filtering without full decoding are promising directions. These enhancements form a crucial part of the TsFile for AI initiative, which aims to build an efficient and intelligent data infrastructure tailored to AI workloads.
/Users/colin/dev/SIMD_TS2DIFF/cmake-build-debug/src/ts_2diff
==== STABLE (diff in [1,100]) N=5000000 ====
[Encode] compressed_bytes=4961245 raw_bytes=20000000 ratio(raw/comp)=4.03125
[Scalar] best=6.34846 ms 1.26969 ns/val 787.593 Mvals/s 745.285 MB/s (input)
[SIMD ] best=4.79567 ms 0.959133 ns/val 1042.61 Mvals/s 986.601 MB/s (input)
[Check] equal=true
==== UNSTABLE (diff in [-50,100]) N=5000000 ====
[Encode] compressed_bytes=5581400 raw_bytes=20000000 ratio(raw/comp)=3.58333
[Scalar] best=6.84112 ms 1.36823 ns/val 730.874 Mvals/s 778.065 MB/s (input)
[SIMD ] best=4.75662 ms 0.951325 ns/val 1051.17 Mvals/s 1119.04 MB/s (input)
[Check] equal=true
==== STABLE random filter N=5000000 ====
[Encode] compressed_bytes=4961245 raw_bytes=20000000 ratio(raw/comp)=4.03125
[Sample] front=263503179 mid=340546177 back=499418560
[Query 0] type=0 value=263503179 rvalue=0
[Scalar] 11.1274 ms, out=1
[SIMD ] 0.138708 ms, out=1 equal=true
[Query 1] type=0 value=340546177 rvalue=0
[Scalar] 11.3026 ms, out=1
[SIMD ] 0.140375 ms, out=1 equal=true
[Query 2] type=0 value=499418560 rvalue=0
[Scalar] 11.3098 ms, out=1
[SIMD ] 0.139666 ms, out=1 equal=true
[Query 3] type=1 value=340546177 rvalue=0
[Scalar] 12.2432 ms, out=3256048
[SIMD ] 8.94217 ms, out=3256048 equal=true
[Query 4] type=1 value=499418560 rvalue=0
[Scalar] 11.2122 ms, out=107522
[SIMD ] 0.428291 ms, out=107522 equal=true
[Query 5] type=5 value=340546177 rvalue=340547177
[Scalar] 11.1447 ms, out=21
[SIMD ] 4.392 ms, out=21 equal=true
==== UNSTABLE random filter N=5000000 ====
[Encode] compressed_bytes=5581400 raw_bytes=20000000 ratio(raw/comp)=3.58333
[Sample] front=130454756 mid=168634590 back=247288488
[Query 0] type=0 value=130454756 rvalue=0
[Scalar] 11.0137 ms, out=1
[SIMD ] 0.162333 ms, out=1 equal=true
[Query 1] type=0 value=168634590 rvalue=0
[Scalar] 11.0585 ms, out=1
[SIMD ] 0.191792 ms, out=1 equal=true
[Query 2] type=0 value=247288488 rvalue=0
[Scalar] 11.0591 ms, out=1
[SIMD ] 0.1975 ms, out=1 equal=true
[Query 3] type=1 value=168634590 rvalue=0
[Scalar] 11.5686 ms, out=3256049
[SIMD ] 9.09975 ms, out=3256049 equal=true
[Query 4] type=1 value=247288488 rvalue=0
[Scalar] 10.985 ms, out=107523
[SIMD ] 0.498416 ms, out=107523 equal=true
[Query 5] type=5 value=168634590 rvalue=168635590
[Scalar] 11.3651 ms, out=43
[SIMD ] 4.44271 ms, out=43 equal=true
[STABLE SUMMARY]
Encoded: 4961245 bytes, Raw: 20000000 bytes, Ratio(raw/comp): 4.03125
Scalar: 6.34846 ms, 1.26969 ns/val, 787.593 Mvals/s, 745.285 MB/s (input)
SIMD : 4.79567 ms, 0.959133 ns/val, 1042.61 Mvals/s, 986.601 MB/s (input)
Equal : true
[UNSTABLE SUMMARY]
Encoded: 5581400 bytes, Raw: 20000000 bytes, Ratio(raw/comp): 3.58333
Scalar: 6.84112 ms, 1.36823 ns/val, 730.874 Mvals/s, 778.065 MB/s (input)
SIMD : 4.75662 ms, 0.951325 ns/val, 1051.17 Mvals/s, 1119.04 MB/s (input)
Equal : true
Process finished with exit code 1