CASSANDRA-20918 Add cursor-based low allocation optimized compaction implementation #4402

nitsanw · 2025-09-29T15:22:19Z

patch by Nitsan Wakart; reviewed by TBD for CASSANDRA-TBD

EPPv1 - all new code
Cursor compaction integration
JMH benchmarks for compaction and cursor impls
EPPv1 - New tests
Existing tests tweaks for new code
[revert?] change the default partitioner to expand testing of new code
[revert?] test data used some benchmarks
[revert?] jmh tweak GC settings for stability
[revert?] javadoc typos, marking unused params
[revert?] clarifying comment
[revert?] toString improvement
[revert?] remove spurious keywords
[revert?] marking metadata collection
[revert?] cursor verifier
Exclude SAI and counter column
Exclude BTI and legacy versions
Temporarily skip very long running test

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

patch by Nitsan Wakart; reviewed by TBD for CASSANDRA-TBD - EPPv1 - all new code - Cursor compaction integration - JMH benchmarks for compaction and cursor impls - EPPv1 - New tests - Existing tests tweaks for new code - [revert?] change the default partitioner to expand testing of new code - [revert?] test data used some benchmarks - [revert?] jmh tweak GC settings for stability - [revert?] javadoc typos, marking unused params - [revert?] clarifying comment - [revert?] toString improvement - [revert?] remove spurious keywords - [revert?] marking metadata collection - [revert?] cursor verifier - Exclude SAI and counter column - Exclude BTI and legacy versions - Temporarily skip very long running test

aratno · 2025-10-08T19:21:37Z

src/java/org/apache/cassandra/db/compaction/CompactionTask.java

 {
    protected static final Logger logger = LoggerFactory.getLogger(CompactionTask.class);
+    public static final int MEGABYTE = 1024 * 1024 * 1024;
+    public static final boolean CURSOR_COMPACTION_ENABLED = SystemProperties.getBoolean("cassandra.enable_cursor_compaction", () -> true);


Could you move this to Config.java? One advantage of having it there is that the AST tests (like SingleNodeTableWalkTest) that generate random config would be able to exercise it. That's our best path to coverage with lots of different schemas and configurations.

If you can locally do a longer run of that test with cursor-compaction enabled, that would be useful too. That would be done via overriding StatefulASTBase#clusterConfig with the new config set.

Also related to testing, we need to be running all tests both with this feature enabled as well as disabled.

Let's make sure that among test, test-oa and test-latest we have at least one that is running with cursor compaction and one without.

aratno · 2025-10-08T19:23:00Z

src/java/org/apache/cassandra/db/rows/Rows.java

-        collector.updateColumnSetPerRow(StatsAccumulation.unpackColumnCount(result));
-        return StatsAccumulation.unpackCellCount(result);
+        collector.updateColumnSetPerRow(StatsAccumulation.unpackColumnCount(result)); // matched
+        return StatsAccumulation.unpackCellCount(result); // matched


Intend to keep?

aratno · 2025-10-08T19:30:59Z

src/java/org/apache/cassandra/io/util/RandomAccessReader.java

+        return (int)skipBytes((long)n);
+    }
+
+    public long skipBytes(long n) throws IOException


Long overflow on current + n below? Would have existed before as well, I suppose. And probably exceeds max buffer size anyway.

blambov · 2025-10-09T07:15:59Z

src/java/org/apache/cassandra/db/compaction/CompactionTask.java

 {
    protected static final Logger logger = LoggerFactory.getLogger(CompactionTask.class);
+    public static final int MEGABYTE = 1024 * 1024 * 1024;
+    public static final boolean CURSOR_COMPACTION_ENABLED = SystemProperties.getBoolean("cassandra.enable_cursor_compaction", () -> true);


Also related to testing, we need to be running all tests both with this feature enabled as well as disabled.

Let's make sure that among test, test-oa and test-latest we have at least one that is running with cursor compaction and one without.

src/java/org/apache/cassandra/db/compaction/CompactionTask.java

blambov

It must be stated that this approach that bundles all the steps of the processing in one single file will be quite difficult to maintain and keep in sync with the combination of iterators and transformations that we use in other parts of the code such as the query path. However, once we have reached a point of stability for a piece of functionality where we do not expect it to change significantly for a long time, it does makes sense to unpack the code and present it in a way that makes its execution as direct as possible, and this patch is a good such representation of the compaction process.

Personally, I am very unhappy about switching to mutable, pooled and reused objects, which are significantly more unwieldy and error prone, especially in contexts where concurrent access can occur. It seems this is becoming a necessity if we need to achieve acceptable performance with the current state of our heap usage, but we still need to very carefully separate the mutable versions of concepts from the immutable ones used throughout the code base. Suddenly making a DeletionTime mutable is not an acceptable change.

First batch of targeted comments below, mainly going over CompactionCursor.java.

blambov · 2025-10-09T13:34:10Z

src/java/org/apache/cassandra/db/compaction/CompactionTask.java

 {
+    private static final int MEGABYTE = 1024 * 1024;
    protected static final Logger logger = LoggerFactory.getLogger(CompactionTask.class);
+    public static final boolean CURSOR_COMPACTION_ENABLED = SystemProperties.getBoolean("cassandra.enable_cursor_compaction", () -> true);


This property should be in CassandraRelevantProperties.

blambov · 2025-10-09T13:38:53Z

src/java/org/apache/cassandra/db/compaction/CompactionTask.java

+                    else if (e instanceof CompactionInterruptedException)
+                        throw (CompactionInterruptedException) e;
+                    else
+                        throw new IllegalStateException(e);


What is this conversion addressing?

Defensively checking for incorrect exception types

blambov · 2025-10-09T14:08:28Z