This repository contains the code, datasets, and supplementary material for the paper titled "DeXOR: Enabling XOR in Decimal Space for Streaming Lossless Compression of Floating-point Data".📚
More technical details, supporting theory and proofs, and additional experimental details can be found in the Appendix. Downloading the file to a local PDF viewer is recommended for better readability. 📖
This project is constructed using Maven and includes a variety of compression algorithms housed within the src/main/java/algorithms
directory. These algorithms inherit from a common parent class, Algorithm
, and are managed through AlgorithmManager
and AlgorithmEnums
. Each algorithm features corresponding Encoder
and Decoder
classes for the supported data types (INT32, INT64, Float, Double), which facilitate compression and decompression, respectively. 🛠️
We recommend using IntelliJ IDEA to build this project. After cloning this repository, the project can be built using Maven:
mvn clean install
Once the build process is complete, the resultant JAR file, which includes all dependencies, will be located in the target
directory:
target/OL-TSC-1.0-jar-with-dependencies.jar
Ensure that your runtime environment has Java 8 installed before starting the compression process. 💻
To run this package, use the java -jar
command, as shown below:
java -jar target/OL-TSC-1.0-jar-with-dependencies.jar -in [INPUT_PATH] -out [OUTPUT_PATH] -log [LOG_PATH] -m [METHOD] -config [CONFIG_PATH]
The following options are available for customizing the compression process:
-
-in [INPUT_PATH]
: The source of the files to be compressed, currently supporting files in CSV format with the structure of<timestamp, value>
. Default is./datasets/Overall
. -
-out [OUTPUT_PATH]
: The directory where the compressed binary files will be stored. Default is./storage
. -
-log [LOG_PATH]
: The directory where the results of the experiment benchmarks will be saved. Default is./results
. -
-config [CONFIG_PATH]
: If a configuration file is specified, it can be used to define the global settings for compression and decompression of a certain class of algorithms. For instance, the available settings for DeXOR includerho
,skip_available
, andbuffer_bits
. Default isnull
.- rho: A parameter within the DeXOR Exception Handler module.
- skip_available: Specifies the number of consecutive exceptions after which the main process is abandoned in favor of entering the exception control directly.
- buffer_bits: Declares the number of bits used for expanding the buffer.
Note: The settings
buffer_bits
andskip_available
cannot be used simultaneously. -
-m [METHOD]
: The name of the compression algorithm to be used. Currently supported algorithms includeGorilla
,Chimp
,Chimp128
,Elf
,ElfPlus
,Camel
,DeXOR
,ALP
,Elf*
, andSElf*
. The algorithm names are case-insensitive. Default isDeXOR
.
You can test multiple algorithms in a manner similar to the example provided:
java -jar OL-TSC-1.0-jar-with-dependencies.jar -in ./datasets/Overall -m DeXOR Gorilla Chimp Chimp128 Elf ElfPlus
-
Currently, only the
double
data type is supported. -
The application scenario of the Lossless
[Camel]
algorithm requires that the number of decimal places be between 1 and 4.⚠️