Skip to content

Add methods to BinaryFormat to encode/decode directly to/from ByteArray/ByteBuffer or OutputStream/InputStream to avoid copy. #3093

@Delsart

Description

@Delsart

Current Situation:

The BinaryFormat interface, implemented by formats like ProtoBuf and Cbor, only provides:

fun <T> encodeToByteArray(serializer: SerializationStrategy<T>, value: T): ByteArray
fun <T> decodeFromByteArray(deserializer: DeserializationStrategy<T>, bytes: ByteArray): T

The encodeToByteArray method always allocates a new ByteArray to hold the serialized result.

Issue:

In performance-critical scenarios (e.g., network packet construction, processing large objects), this leads to unnecessary memory allocation and data copying. We cannot serialize directly into a pre-allocated buffer (like a ByteBuffer or an existing ByteArray that's part of a larger buffer/stream) or stream.

The Proposed Solution / Feature

We request new overloads or extension functions on BinaryFormat (or perhaps a new interface extension for advanced binary IO) that allow the user to specify the output target:

Proposal 1: Writing to an OutputStream

fun <T> encodeToStream(
    serializer: SerializationStrategy<T>, 
    value: T, 
    stream: OutputStream // Or a platform-specific equivalent in common code
)

Proposal 2: Writing to a ByteArray starting at an offset

fun <T> encodeToByteArray(
    serializer: SerializationStrategy<T>, 
    value: T, 
    output: ByteArray, 
    offset: Int = 0
): Int // Returns the number of bytes written

Proposal 3: Writing to a ByteBuffer (JVM/Native focus):

fun <T> encodeToByteBuffer(
    serializer: SerializationStrategy<T>, 
    value: T, 
    output: ByteBuffer,
): Int // Returns the number of bytes written

Justification/Motivation

  1. Zero-Copy Serialization: Essential for high-throughput applications to avoid copying data from the internal serialization buffer to a final destination buffer.

  2. Reduced GC Pressure: By reusing pre-allocated buffers (e.g., a ByteBuffer for a network channel or an OutputStream that wraps a pooled buffer), we significantly reduce the allocation rate and Garbage Collector overhead.

  3. Consistency: The Json format already provides encodeToStream/decodeFromStream (or similar), and binary formats should have an equivalent to support efficient IO operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions