Skip to content

Add ability to emulate thread-safe locale operations #21971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

khwilliamson
Copy link
Contributor

Locale information was originally global for an entire process. Later, it was realized that different threads could want to be running in different locales. Windows added this ability, and POSIX 2008 followed suit (though using a completely different API). When available, perl automatically uses these capabilities.

But many platforms have neither, or their implementation, such as on Darwin, is buggy. This commit adds the capability for Perl programs to operate as if the platform were thread-safe.

This implementation is based on the observation that the underlying locale matters only to relatively few libc calls, and only during their execution. It can be anything at all at any other time. perl keeps what the proper locale should be for each category in a a per-thread array. Each locale-dependent operation must be wrapped in mutex lock/unlock operations. The lock additionally compares what libc knows the locale to be, and what it should be for this thread at this time, and changes the actual locale to the proper value if necessary. That's all that is needed.

This commit adds macros to perl.h, for example "MBTOWC_LOCK_", that expand to do the mutex lock, and change the global locale to the expected value. On perls built without this emulation capability, they are no-ops. All code in the perl core (unless I've missed something), are changed to use these macros (there weren't actually many places that needed this). Thus, any pure perl program will automatically become locale-thread-safe under this Configuration.

In order for XS code to also become locale-thread-safe, it must use these macros to wrap calls to locale-dependent functions. Relatively few modules call such functions. For example, the only one I found that ships with the perl core is Time::Piece, and it has more fundamental issues with running under threads than this. I am preparing pull requests for it.

Thus, this is not completely transparent to code like native-thread-safe locale handling is. Therefore ${^SAFE_LOCALES} returns 2 (instead of 1) for this type of thread-safety.

Another deficiency compared to the native thread safety is when a thread calls a non-perl library that accesses the locale. The typical example is Gtk (though this particular application can be configured to not be problematic). With the native safe threads, everything works as long as only one such thread is used per Perl program. That thread would then be the only one operating in the global locale, hence there are no conflicts. With this emulation, all threads are operating in the global locale, and mutexes would have to be used to prevent conflicts. To minimize those, the code added in this commit restores the global locale when through to the state it was in when started.

A major concern is the performance impact. This is after all trading speed for accuracy. lib/locale_threads.t is noticeably slower when this is being used. But that is doing multiple threads constantly using locale-dependent operations. I don't notice any change with the rest of the test suite. In pure perl, this only comes into play while in the scope of 'use locale' or when using some of the few POSIX:: functions that are locale-dependent. And to some extent when formatting, but the regular overhead there should dwarf what this adds.

This commit leaves this feature off by default.

@khwilliamson khwilliamson force-pushed the emulate branch 2 times, most recently from 61fb53d to ea961d3 Compare February 13, 2024 16:35
Locale information was originally global for an entire process.  Later,
it was realized that different threads could want to be running in
different locales.  Windows added this ability, and POSIX 2008 followed
suit (though using a completely different API).  When available, perl
automatically uses these capabilities.

But many platforms have neither, or their implementation, such as on
Darwin, is buggy.  This commit adds the capability for Perl programs to
operate as if the platform were thread-safe.

This implementation is based on the observation that the underlying
locale matters only to relatively few libc calls, and only during their
execution.  It can be anything at all at any other time.  perl keeps
what the proper locale should be for each category in a a per-thread
array.  Each locale-dependent operation must be wrapped in mutex
lock/unlock operations.  The lock additionally compares what libc knows
the locale to be, and what it should be for this thread at this time,
and changes the actual locale to the proper value if necessary.  That's
all that is needed.

This commit adds macros to perl.h, for example "MBTOWC_LOCK_", that
expand to do the mutex lock, and change the global locale to the
expected value.  On perls built without this emulation capability, they
are no-ops.  All code in the perl core (unless I've missed something),
are changed to use these macros (there weren't actually many places that
needed this).  Thus, any pure perl program will automatically become
locale-thread-safe under this Configuration.

In order for XS code to also become locale-thread-safe, it must use
these macros to wrap calls to locale-dependent functions.  Relatively
few modules call such functions.  For example, the only one I found that
ships with the perl core is Time::Piece, and it has more fundamental
issues with running under threads than this.  I am preparing pull
requests for it.

Thus, this is not completely transparent to code like native-thread-safe
locale handling is.  Therefore ${^SAFE_LOCALES} returns 2 (instead of 1)
for this type of thread-safety.

Another deficiency compared to the native thread safety is when a thread
calls a non-perl library that accesses the locale.  The typical example is
Gtk (though this particular application can be configured to not be
problematic).  With the native safe threads, everything works as long as
only one such thread is used per Perl program.  That thread would then
be the only one operating in the global locale, hence there are no
conflicts.  With this emulation, all threads are operating in the global
locale, and mutexes would have to be used to prevent conflicts.  To
minimize those, the code added in this commit restores the global locale
when through to the state it was in when started.

A major concern is the performance impact.  This is after all trading
speed for accuracy.  lib/locale_threads.t is noticeably slower when this
is being used.  But that is doing multiple threads constantly using
locale-dependent operations.  I don't notice any change with the rest of
the test suite.  In pure perl, this only comes into play while in the
scope of 'use locale' or when using some of the few POSIX:: functions
that are locale-dependent.  And to some extent when formatting, but the
regular overhead there should dwarf what this adds.

This commit leaves this feature off by default.  The next commit changes
that for the next few 5.39 development releases, so we can see if there
is actually an issue.
@jkeenan
Copy link
Contributor

jkeenan commented Aug 27, 2024

Locale information was originally global for an entire process. Later, it was realized that different threads could want to be running in different locales. Windows added this ability, and POSIX 2008 followed suit (though using a completely different API). When available, perl automatically uses these capabilities.

But many platforms have neither, or their implementation, such as on Darwin, is buggy. This commit adds the capability for Perl programs to operate as if the platform were thread-safe.

@khwilliamson, this pull request has not received any discussion since it was filed more than six months ago. That is uncommon. I suspect that people aren't really convinced that Perl has a big problem here -- "big" in the sense that it's bothering a lot of people -- or "big" in the sense that it requires intervention in 14 different files, almost all of them deep in the Perl guts.

As an indirect consequence of the lack of discussion of the p.r., it has acquired merge conflicts in 3 of those core files. If you wish to keep this p.r. in consideration, could you resolve those conflicts and re-push? Thanks.

@khwilliamson khwilliamson marked this pull request as draft September 5, 2024 01:07
@khwilliamson khwilliamson added the defer-next-dev This PR should not be merged yet, but await the next development cycle label Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defer-next-dev This PR should not be merged yet, but await the next development cycle hasConflicts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants