Skip to content

Add tests for compression #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions compression/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Compression

For the RNTuple Validation Suite, we assume that compression is orthogonal to the supported types and serialized data.
Therefore all tests in this category write a single `Int64` field with type `std::int64_t` and column type `SplitInt64`.
The entries have ascending values and the reference `.json` files only contain the sum of all elements.

* [`algorithms`](algorithms): `zlib`, `lzma`, `lz4`, `zstd`
* [`block`](block): big and short compression blocks
6 changes: 6 additions & 0 deletions compression/algorithms/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Compression Algorithms

* [`zlib`](zlib): compression settings `101`
* [`lzma`](lzma): level 7 (`207`)
* [`lz4`](lz4): level 4 (`404`)
* [`zstd`](zstd): level 5 (`505`)
6 changes: 6 additions & 0 deletions compression/algorithms/lz4/read.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#include "../../read_compression.hxx"

void read(std::string_view input = "compression.algorithms.lz4.root",
std::string_view output = "compression.algorithms.lz4.json") {
read_compression(input, output);
}
5 changes: 5 additions & 0 deletions compression/algorithms/lz4/write.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#include "../write_algorithm.hxx"

void write(std::string_view filename = "compression.algorithms.lz4.root") {
write_algorithm(filename, 404);
}
6 changes: 6 additions & 0 deletions compression/algorithms/lzma/read.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#include "../../read_compression.hxx"

void read(std::string_view input = "compression.algorithms.lzma.root",
std::string_view output = "compression.algorithms.lzma.json") {
read_compression(input, output);
}
5 changes: 5 additions & 0 deletions compression/algorithms/lzma/write.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#include "../write_algorithm.hxx"

void write(std::string_view filename = "compression.algorithms.lzma.root") {
write_algorithm(filename, 207);
}
32 changes: 32 additions & 0 deletions compression/algorithms/write_algorithm.hxx
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#include <ROOT/RNTupleModel.hxx>
#include <ROOT/RNTupleUtil.hxx>
#include <ROOT/RNTupleWriteOptions.hxx>
#include <ROOT/RNTupleWriter.hxx>

using ROOT::Experimental::EColumnType;
using ROOT::Experimental::RNTupleModel;
using ROOT::Experimental::RNTupleWriteOptions;
using ROOT::Experimental::RNTupleWriter;

#include <cstdint>
#include <memory>
#include <string_view>

void write_algorithm(std::string_view filename, std::uint32_t compression) {
auto model = RNTupleModel::Create();

auto Int64 = model->MakeField<std::int64_t>("Int64");
model->GetMutableField("Int64").SetColumnRepresentatives(
{{EColumnType::kSplitInt64}});

RNTupleWriteOptions options;
options.SetCompression(compression);
auto writer =
RNTupleWriter::Recreate(std::move(model), "ntpl", filename, options);

// Write 32 entries to make sure the compression block is not too small.
for (int i = 0; i < 32; i++) {
*Int64 = i;
writer->Fill();
}
}
6 changes: 6 additions & 0 deletions compression/algorithms/zlib/read.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#include "../../read_compression.hxx"

void read(std::string_view input = "compression.algorithms.zlib.root",
std::string_view output = "compression.algorithms.zlib.json") {
read_compression(input, output);
}
5 changes: 5 additions & 0 deletions compression/algorithms/zlib/write.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#include "../write_algorithm.hxx"

void write(std::string_view filename = "compression.algorithms.zlib.root") {
write_algorithm(filename, 101);
}
6 changes: 6 additions & 0 deletions compression/algorithms/zstd/read.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#include "../../read_compression.hxx"

void read(std::string_view input = "compression.algorithms.zstd.root",
std::string_view output = "compression.algorithms.zstd.json") {
read_compression(input, output);
}
5 changes: 5 additions & 0 deletions compression/algorithms/zstd/write.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#include "../write_algorithm.hxx"

void write(std::string_view filename = "compression.algorithms.zstd.root") {
write_algorithm(filename, 505);
}
4 changes: 4 additions & 0 deletions compression/block/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Compression Blocks

* [`big`](big): big compression blocks, larger than 16 MiB
* [`short`](short): a "short" compression that is actually uncompressed
6 changes: 6 additions & 0 deletions compression/block/big/read.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#include "../../read_compression.hxx"

void read(std::string_view input = "compression.block.big.root",
std::string_view output = "compression.block.big.json") {
read_compression(input, output);
}
38 changes: 38 additions & 0 deletions compression/block/big/write.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#include <ROOT/RNTupleModel.hxx>
#include <ROOT/RNTupleUtil.hxx>
#include <ROOT/RNTupleWriteOptions.hxx>
#include <ROOT/RNTupleWriter.hxx>

using ROOT::Experimental::EColumnType;
using ROOT::Experimental::RNTupleModel;
using ROOT::Experimental::RNTupleWriteOptions;
using ROOT::Experimental::RNTupleWriter;

#include <cstdint>
#include <memory>
#include <string_view>

void write(std::string_view filename = "compression.block.big.root") {
auto model = RNTupleModel::Create();

auto Int64 = model->MakeField<std::int64_t>("Int64");
model->GetMutableField("Int64").SetColumnRepresentatives(
{{EColumnType::kSplitInt64}});

RNTupleWriteOptions options;
// Crank up the zstd compression level to reduce the output file size by
// approximately a factor 6 (from 76K with 505 to 12K).
options.SetCompression(509);
// Increase the maximum unzipped page size to make it bigger than the maximum
// size of a compression block, which is 16 MiB.
options.SetMaxUnzippedPageSize(128 * 1024 * 1024);
auto writer =
RNTupleWriter::Recreate(std::move(model), "ntpl", filename, options);

// Write 40 MiB of entries that will be split into three compression blocks.
static constexpr int Entries = 40 * 1024 * 1024 / sizeof(std::int64_t);
for (int i = 0; i < Entries; i++) {
*Int64 = i;
writer->Fill();
}
}
6 changes: 6 additions & 0 deletions compression/block/short/read.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#include "../../read_compression.hxx"

void read(std::string_view input = "compression.block.short.root",
std::string_view output = "compression.block.short.json") {
read_compression(input, output);
}
33 changes: 33 additions & 0 deletions compression/block/short/write.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#include <ROOT/RNTupleModel.hxx>
#include <ROOT/RNTupleUtil.hxx>
#include <ROOT/RNTupleWriteOptions.hxx>
#include <ROOT/RNTupleWriter.hxx>

using ROOT::Experimental::EColumnType;
using ROOT::Experimental::RNTupleModel;
using ROOT::Experimental::RNTupleWriteOptions;
using ROOT::Experimental::RNTupleWriter;

#include <cstdint>
#include <memory>
#include <string_view>

void write(std::string_view filename = "compression.block.short.root") {
auto model = RNTupleModel::Create();

auto Int64 = model->MakeField<std::int64_t>("Int64");
model->GetMutableField("Int64").SetColumnRepresentatives(
{{EColumnType::kSplitInt64}});

RNTupleWriteOptions options;
options.SetCompression(505);
auto writer =
RNTupleWriter::Recreate(std::move(model), "ntpl", filename, options);

// Write only 2 entries to make sure the compression block is small and
// actually stored uncompressed.
for (int i = 0; i < 2; i++) {
*Int64 = i;
writer->Fill();
}
}
27 changes: 27 additions & 0 deletions compression/read_compression.hxx
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#include <ROOT/REntry.hxx>
#include <ROOT/RNTupleReader.hxx>

using ROOT::Experimental::REntry;
using ROOT::Experimental::RNTupleReader;

#include <cstdint>
#include <fstream>
#include <ostream>
#include <string>
#include <string_view>

void read_compression(std::string_view input, std::string_view output) {
auto reader = RNTupleReader::Open("ntpl", input);
auto Int64 =
reader->GetModel().GetDefaultEntry().GetPtr<std::int64_t>("Int64");
std::int64_t sum = 0;
for (auto index : *reader) {
reader->LoadEntry(index);
sum += *Int64;
}

std::ofstream os(std::string{output});
os << "{\n";
os << " \"Int64\": " << sum << "\n";
os << "}\n";
}