-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Issue Description
After upgrading ParquetSharp from 10.0.1 to 19.0.1, my application began crashing intermittently with STATUS_HEAP_CORRUPTION inside ParquetSharpNative.dll.
WinDbgX shows the actual corruption occurring inside native code — specifically during ColumnReader_HasNext call.
The crash is a heap corruption (0xC0000374) detected inside RtlpHeapHandleError.
The failure occurs during a call to ParquetSharpNative!ColumnReader_HasNext invoked from managed code via P/Invoke.
The heap corruption is likely due to reading/writing freed or overwritten native memory inside the native Parquet library before this point.
The native stack trace shows memory allocation/free logic inside ntdll triggered during HasNext.
Before HasNext, there’s a long sequence of KmsConnectionConfig_SetKmsInstanceUrl calls, this suggests native code is building/handling a URL string, possibly reallocating heap memory.

Environment Information
- ParquetSharp Version: [e.g. 1.0.1]
- .NET Framework/SDK Version: .NET 8
- Operating System: Windows 10
Steps To Reproduce
Code using ColumnReader:
private IEnumerator<ColumnElement> GetColumnElements(ColumnReader columnReader, IParquetColumnBuffer<T> buffer)
{
while (columnReader.HasNext)
{
buffer.Clear();
buffer.Read(columnReader);
int columnValuesOffset = 0;
for (int columnLevelsOffset = 0; columnLevelsOffset < buffer.LevelsCount; columnLevelsOffset++)
{
ColumnElement columnElement;
try
{
short definitionLevel = buffer.DefinitionLevels == null ? (short)0 : buffer.DefinitionLevels[columnLevelsOffset];
short repetitionLevel = buffer.RepetitionLevels == null ? (short)0 : buffer.RepetitionLevels[columnLevelsOffset];
if (definitionLevel < buffer.MaxDefinitionLevel)
{
columnElement = new ColumnElement(null, definitionLevel, repetitionLevel);
}
else
{
string str = ParquetDataConverter.ConvertParquetValueToString(buffer.Values[columnValuesOffset], this.schemaElement);
columnElement = new ColumnElement(str, definitionLevel, repetitionLevel);
columnValuesOffset++;
}
}
catch (Exception ex)
{
throw new ParquetReaderException($"Failed to read data from Parquet Column - {columnReader.ColumnDescriptor?.Name.MarkAsPrivate()} and of parquet physical type = {columnReader.Type.ToString().MarkAsPrivate()} and logical type = {columnReader.ColumnDescriptor?.LogicalType?.Type.ToString().MarkAsPrivate()}. Inner Exception from ParquetSharp: {ex}");
}
yield return columnElement;
}
}
}
Inside ConvertParquetValueToString:
I parse different physical types.
I've also noticed in dumpstack it is often failing during parsing Int96. May be related, however, no direct relation found:
case PhysicalType.Int96:
if (valueToConvert is Int96 intDate)
{
return Int96ToTimestampNanos(intDate)?.ToString();
}
return valueToConvert.ToString();
internal static DateTime? Int96ToTimestampNanos(Int96 value)
{
try
{
long time;
int julianDay;
unsafe
{
time = ReadInt64LittleEndian((byte*)&value);
julianDay = ReadInt32LittleEndian((byte*)&value + 8);
}
// INT96 stores the Date part as the number of Julian Days (days since the start of Julian calendar) which can also include BCE dates
if (julianDay < JulianDayMinValue)
{
return null;
}
int days = julianDay - UnixEpochJulianDay;
return UnixEpoch + new TimeSpan(days, 0, 0, 0) + new TimeSpanNanos(time).TimeSpan;
}
catch (Exception e)
{
throw new ParquetReaderException($"Parquet data convert Error - Failed to convert Int96 to DateTime {value.ToString()}. ", e);
}
}
Expected Behavior
This is sporadic issue, previously I used ParquetSharp 10.0.1 and no such exception occurred. After upgrade to 19.0.1 I started to observe this behavior
Additional Context (Optional)
No response