Skip to content

Conversation

@otegami
Copy link
Contributor

@otegami otegami commented Sep 3, 2025

Problem

File::Stat fails with RuntimeError when handling files >= 2GB on Windows. The stat operation crashes when
attempting to get file statistics for large files, making it impossible to work with files over 2GB.

Cause

Windows stat() function uses 32-bit integers for file sizes by default, causing truncation for files >= 2GB (2^31 bytes). The regular stat() function and struct stat are limited to 32-bit file sizes on Windows, while Unix systems typically use 64-bit by default.

ref: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=msvc-170#time-type-and-file-length-type-variations-of-_stat

Solution

Use Windows 64-bit stat functions and structures:

  • Replace stat() with _stat64() for file statistics
  • Replace fstat() with _fstat64() for file descriptor statistics
  • Replace struct stat with struct _stat64 for data structures
  • Define macros (STAT, LSTAT, FSTAT, STAT_STRUCT) to use 64-bit variants on Windows while keeping original functions on Unix

This enables proper handling of files >= 2GB on Windows platforms while maintaining compatibility with other operating systems.

## Problem

File::Stat fails with RuntimeError when handling
files >= 2GB on Windows. The stat operation crashes when
attempting to get file statistics for large files,
making it impossible to work with files over 2GB.

## Cause

Windows `stat()` function uses 32-bit integers for file
sizes by default, causing truncation for files >= 2GB
(2^31 bytes). The regular `stat()` function and struct
stat are limited to 32-bit file sizes on Windows, while
Unix systems typically use 64-bit by default.

ref: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=msvc-170#time-type-and-file-length-type-variations-of-_stat

## Solution

Use Windows 64-bit stat functions and structures:
- Replace `stat()` with `_stat64()` for file statistics
- Replace `fstat()` with _fstat64() for file descriptor
  statistics
- Replace struct stat with struct `_stat64` for data
  structures
- Define macros (STAT, LSTAT, FSTAT, STAT_STRUCT) to
  use 64-bit variants on Windows while keeping
  original functions on Unix

This enables proper handling of files >= 2GB on
Windows platforms while maintaining compatibility with
other operating systems.
@otegami otegami force-pushed the windows-support-over-4gb-for-file-stat branch from 574a33c to e406eb1 Compare September 3, 2025 07:08
@otegami otegami changed the title Support for files over 4GB on Windows Support for files over 2GB on Windows Sep 3, 2025
Copy link
Owner

@ksss ksss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otegami
Thank you for the suggestion. I agree with the policy of supporting large file sizes on Windows.

I'm a bit concerned about the STAT_STRUCT macro in terms of overall consistency. Instead of defining STAT_STRUCT, how about defining a structure like mrb_stat, so that we can write struct mrb_stat, and let mrb_stat branch into platform-specific structures? (Although I am a little worried that the name is too similar to mrb_state...)

target_size = 2**31 # 2GB
begin
File.open(large_file, 'wb') do |f|
(2**19).times { f << "\0" * 4096 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use target_size here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: e15ed5b Sure, I tried to use it with seek.

Improve code consistency by replacing the STAT_STRUCT macro with
a more conventional mrb_stat struct name approach:

- Windows: mrb_stat -> _stat64 (for 64-bit file support)
- Unix/Linux: mrb_stat -> stat (standard struct)
@otegami
Copy link
Contributor Author

otegami commented Sep 3, 2025

@ksss

fix: 34be575 Thank you for reviewing! I understand your concerns and have followed your suggestion to define a new struct to absorb the differences between platforms.

Regarding the name, how about using mrb_file_stat? It's a bit longer, but it would be more descriptive and avoid any confusion with mrb_state. What do you think?

@ksss
Copy link
Owner

ksss commented Sep 4, 2025

LGTM.

Since I think it will be easier to refer to if we align with the implementation in ruby/ruby, I believe it’s fine to keep the name as mrb_stat.

src/file-stat.c Outdated
# define STAT(p,s) _stat64(p,s)
# define FSTAT(fd,s) _fstat64(fd,s)
# define LSTAT(p,s) _stat64(p,s)
# define mrb_stat _stat64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, typedef is better than #define for type aliasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: 7a7009e Thank you so much. I got it.

@otegami otegami force-pushed the windows-support-over-4gb-for-file-stat branch from cd96202 to 7a7009e Compare September 4, 2025 06:46
@ksss ksss merged commit 7d0e63a into ksss:master Sep 5, 2025
24 checks passed
@ksss
Copy link
Owner

ksss commented Sep 5, 2025

@otegami @kou Thank you for the improvement 🚀

@otegami otegami deleted the windows-support-over-4gb-for-file-stat branch September 5, 2025 03:02
abetomo pushed a commit to groonga/groonga that referenced this pull request Sep 8, 2025
This PR updates `mruby-file-stat` submodule from f3e858f01 to 7d0e63a95
to add support for files over 2GB on Windows.
- ref: ksss/mruby-file-stat#36

The previous stat implementation didn't support files larger than 2GB on
Windows platforms. This update is necessary to allow the grndb command
to properly check file stats for database files over 2GB on Windows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants