Skip to content

[BUG]: HEAP_CORRUPTION/ACCESS_VIOLATION During ColumnReader_HasNext call #549

@IRailean97

Description

@IRailean97

Issue Description

After upgrading ParquetSharp from 10.0.1 to 19.0.1, my application began crashing intermittently with STATUS_HEAP_CORRUPTION inside ParquetSharpNative.dll.

WinDbgX shows the actual corruption occurring inside native code — specifically during ColumnReader_HasNext call.

The crash is a heap corruption (0xC0000374) detected inside RtlpHeapHandleError.
The failure occurs during a call to ParquetSharpNative!ColumnReader_HasNext invoked from managed code via P/Invoke.
The heap corruption is likely due to reading/writing freed or overwritten native memory inside the native Parquet library before this point.
The native stack trace shows memory allocation/free logic inside ntdll triggered during HasNext.
Before HasNext, there’s a long sequence of KmsConnectionConfig_SetKmsInstanceUrl calls, this suggests native code is building/handling a URL string, possibly reallocating heap memory.

Image

Environment Information

  • ParquetSharp Version: [e.g. 1.0.1]
  • .NET Framework/SDK Version: .NET 8
  • Operating System: Windows 10

Steps To Reproduce

Code using ColumnReader:

private IEnumerator<ColumnElement> GetColumnElements(ColumnReader columnReader, IParquetColumnBuffer<T> buffer)
{
    while (columnReader.HasNext)
    {
        buffer.Clear();
        buffer.Read(columnReader);
        int columnValuesOffset = 0;

for (int columnLevelsOffset = 0; columnLevelsOffset < buffer.LevelsCount; columnLevelsOffset++)
        {
            ColumnElement columnElement;
            try
            {
                short definitionLevel = buffer.DefinitionLevels == null ? (short)0 : buffer.DefinitionLevels[columnLevelsOffset];
                short repetitionLevel = buffer.RepetitionLevels == null ? (short)0 : buffer.RepetitionLevels[columnLevelsOffset];

                if (definitionLevel < buffer.MaxDefinitionLevel)
                {
                    columnElement = new ColumnElement(null, definitionLevel, repetitionLevel);
                }
                else
                {
                    string str = ParquetDataConverter.ConvertParquetValueToString(buffer.Values[columnValuesOffset], this.schemaElement);
                    columnElement = new ColumnElement(str, definitionLevel, repetitionLevel);
                    columnValuesOffset++;
                }
            }
            catch (Exception ex)
            {
                throw new ParquetReaderException($"Failed to read data from Parquet Column - {columnReader.ColumnDescriptor?.Name.MarkAsPrivate()} and of parquet physical type = {columnReader.Type.ToString().MarkAsPrivate()} and logical type = {columnReader.ColumnDescriptor?.LogicalType?.Type.ToString().MarkAsPrivate()}. Inner Exception from ParquetSharp: {ex}");
            }

            yield return columnElement;
        }
    }
}

Inside ConvertParquetValueToString:
I parse different physical types.

I've also noticed in dumpstack it is often failing during parsing Int96. May be related, however, no direct relation found:

case PhysicalType.Int96:
    if (valueToConvert is Int96 intDate)
    {
        return Int96ToTimestampNanos(intDate)?.ToString();
    }

    return valueToConvert.ToString();
internal static DateTime? Int96ToTimestampNanos(Int96 value)
{
    try
    {
        long time;
        int julianDay;
        unsafe
        {
            time = ReadInt64LittleEndian((byte*)&value);
            julianDay = ReadInt32LittleEndian((byte*)&value + 8);
        }

        // INT96 stores the Date part as the number of Julian Days (days since the start of Julian calendar) which can also include BCE dates
        if (julianDay < JulianDayMinValue)
        {
            return null;
        }

        int days = julianDay - UnixEpochJulianDay;
        return UnixEpoch + new TimeSpan(days, 0, 0, 0) + new TimeSpanNanos(time).TimeSpan;
    }
    catch (Exception e)
    {
        throw new ParquetReaderException($"Parquet data convert Error - Failed to convert Int96 to DateTime {value.ToString()}. ", e);
    }
}

Expected Behavior

This is sporadic issue, previously I used ParquetSharp 10.0.1 and no such exception occurred. After upgrade to 19.0.1 I started to observe this behavior

Additional Context (Optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions