-
Notifications
You must be signed in to change notification settings - Fork 585
WIP Add resolving of $PROGRAM_NAME from /dev/fd/number #23358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
WIP Add resolving of $PROGRAM_NAME from /dev/fd/number #23358
Conversation
085c026
to
99313c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes need to be exercised by our test suite. Such a test will have to take into account the quirks of different operating systems.
99313c1
to
6241f02
Compare
There are situations when $PROGRAM_NAME is something /dev/fd/<number> 1) example like sudo /usr/local/bin/<tool> with checksum verification (GH#23351) 2) example with pipe: perl <( echo '#!perl -DA'; echo 'print "$0\n"');
6a4065e
to
da4a0e7
Compare
There were issues in the build on Windows, I updated PR. |
hm, issue on Windows |
This code should probably be feature guarded. That means Configure should check if |
What does this C code do when it executes on on Plan9, OS/2, Windows, and VMS? |
The actual plan is to create some configuration option to detect |
He tried to call P.S. Update: I would be strongly against any patch that knows hows to reach NT Kernel's root dir called |
@@ -4227,6 +4227,19 @@ S_open_script(pTHX_ const char *scriptname, bool dosearch, bool *suidscript) | |||
Safefree(PL_origfilename); | |||
PL_origfilename = (char *)scriptname; | |||
} | |||
else { | |||
char proc_fd_path[64]; | |||
snprintf(proc_fd_path, sizeof(proc_fd_path), "/proc/self/fd/%d", fdscript); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"/proc/self/fd/"
is 14 +1 chars
"/proc/self/fd/%d"
is 16+ 1 chars
"%d"
max legal expansion is 21 chars according to google but plz verify it, or find perl.h
's or handy.h
's official macro for max base10 expansion
"%d"
is signed, what does "/proc/self/fd/-13"
means on Linux?
The c stack buffer should really be 16+21+1 == 38
or with sizeof(size_t)
round up logic, (16+21+1)/8 == 4.75; 5*8 == 40
b/c why not?
The CC will round up C auto var char[38]
to the next 4 or 8 alignment mark no matter what you do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point fdscript
is a non-negative integer that fits in an int
, see the code above that sets it up.
@@ -4227,6 +4227,19 @@ S_open_script(pTHX_ const char *scriptname, bool dosearch, bool *suidscript) | |||
Safefree(PL_origfilename); | |||
PL_origfilename = (char *)scriptname; | |||
} | |||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this else {
block starting here really should be split off into a separate static, not-inline function, see my other comment below why this is needed
snprintf(proc_fd_path, sizeof(proc_fd_path), "/proc/self/fd/%d", fdscript); | ||
char target_path[MAXPATHLEN]; | ||
SSize_t len = readlink(proc_fd_path, target_path, sizeof(target_path) - 1); | ||
if (len != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Linux and FooNix OSes, MAXPATHLEN is constant 1024
, 4096
, or 2**16 == 65536
.
Those 3 buffer lengths are an alligator [S_open_script
] taking a bite into a sub/hero/hoagie [C stk].
Since this huge C stk buffer is not used on all control flow paths through S_open_script
, but the CC/machine code, needs to stretch the C stack by 1024/4096/65536 bytes each time upon entry, and then use Intel's U32
offsets operands everywhere in the machine code of this function's body, and not Intel's more efficient U8
/I8
offset operand encoding choice. It is smarter to put this new else{}
branch into its own dedicated static no-line function, so the new else{}
branch, and new else{}
branch's 1024 or upto 65536 bytes long C stk buffer, don't at runtime, affect the existing control flow paths inside ``S_open_script` that don't know, don't care, and don't use, that 1024-65536 stack buffer.
I have no tech/coding problem with this huge size + very short life span C stack buffer. I don't want to see it become another bloated wasteful Newx()
/Safefree()
/SAVEFREEPV()
statement pair in perl code. I'd would prefer that not all 100% of permutations of all PP calls to PP do
/require
/use
, force that 1024-65K buffer to temporarily exist while sitting unused on the C stack.
On WinPerl because of a P5P bug, CPP macro MAXPATHLEN
is 1024 instead of MS's MAX_PATH == 256
or P5P's Win32 cargo cult of writing char bug [ MAX_PATH+1 ];
. WinNT Kernel's real path limit is forever frozen at I16_MAX
"characters of an undefined encoding", aka, a 32K long array of WCHARs. which becomes 65536 bytes or U16_MAX for the byte length of a 32K long array of WCHARs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm not fond of large buffers on the stack we don't have a general solution at this point beyond using an SV or Newx().
At this point the stack is close to the least consumed it will be, so I think this buffer is as reasonable as it could be.
Some sort of cheap C level path abstraction would be useful - a relatively small built-in buffer, but can keep a dynamically allocated buffer if that's too small. But for now I think this is the practical choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char buf128chr[128];
#define pl_alloca(_l) ((_l) < sizeof(buf128chr) ? NUM2PTR(char *,buf128chr) : (_l) < U16_MAX ? alloca(_l) : SvPVX(sv_2mortal(newSV(_l))))
is my typical recipe. I don't expect anyone to do this but me, b/c its overkill to tweak char stk buffers against -O2 disassembly, I will sometimes adjust 128 downwards or upwards to a formula like
char bufchr[256
- sub rsp, 50h
- 3*sizeof(void *)
];
char bufchr[256
- sub rsp, 20h
- 3*sizeof(void *)
]
For generic guesstimates, I'd say len 96 (sse 166=96, 167=112) covers all sane src code identifiers and consts and paths with some breathing room. 128 is the max I'd put generically, its rare to see a function with more than 8 c stk vars active (liveness) at any time on x64 on top of whatever CC keeps in Win64's 8 non-vol regs.
C99's variable length arrays feature and alloca()
are synonyms in ICC/Clang/GCC machine code for Win32/Win64 but that might be OS specific and shouldn't be coded again. On paper C99's VLA spec allows de-allocation on scope leave, so else { char buf[len < 256? len : 1]; }
, ptr to buf is gone/invalid/now aliased to a diff and wrong var, after the closing curly, but alloca()
ptrs lasts until function return per spec. IDK if any CC on any arch actually rewinds the stack pointer at the closing curly, or recycles pieces of c stack for multiple VLA buffers in same function but diff scopes. But I wouldn't ever write code assuming a ptr to a VLA buf can outlast its scope curlys, b/c any CC can add the optimization in the future randomly.
VLA doesn't exist in C++ so alloca() is the tool to use in C++ mode.
If someone wants to be standards pedantic about alloca/vla'es, alloca/vlas have to exist internally in the C compiler and in the ABI/machine code level or its not C anymore. Recursion is part of the spec. 1960s HW where recursion or function reentry was electronically impossible, since the pre-allocated C stack and the functions machine code are adjacent on the same magnetic tape is irrelevant
alloca() can always be synthesized per ISO C spec. I can always write a dispatcher function, that takes a length, a function ptr, and va_list
, then round to nearest 16/32/64, then uses a switch(){}
to pick 1 of 24 or 48 functions.
Each of 48 functions takes a function ptr and a va_list
as args, then declares a fixed size char buf[32_64_96_128_160];
, then calls the func ptr with ptr to buf, and the va_list
ptr. Pedantic compliance problem solved.
At this point the stack is close to the least consumed it will be, so I think this buffer is as reasonable as it could be.
I did some breakpoints, your right, S_open_script(pTHX_ const char *scriptname, bool dosearch, bool *suidscript)
is a 1x ever early on startup call and returns to caller basically instantly, its not part of BEGIN{}
/do
/require
/use
infrastructure, which can get 3-5 runloops/yylexers deep on the C stack. I thought it was in my 1st comment. but its not after grepping, setting BPs and watching it. So a 1KB fixed len buffer is totally fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not appear to be ready for merging, as it's still failing tests on Windows. Those need to be resolved, after which I would request review by @tonycoz
char proc_fd_path[64]; | ||
snprintf(proc_fd_path, sizeof(proc_fd_path), "/proc/self/fd/%d", fdscript); | ||
char target_path[MAXPATHLEN]; | ||
SSize_t len = readlink(proc_fd_path, target_path, sizeof(target_path) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use PerlLIO_readlink()
instead of raw readlink()
.
The else block should be #if
conditional on HAS_READLINK
in addition to the /proc/self/...
option you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use
PerlLIO_readlink()
instead of rawreadlink()
.The else block should be
#if
conditional onHAS_READLINK
in addition to the/proc/self/...
option you suggested.
At work, I have to use these constants and throw in the original string constant with STRLENs()
, for calculating char stack buffers.
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/itoa-itow?view=msvc-170#maximum-conversion-count-macros
Ive never seen any max expansion constants like the MS ones in the Perl headers for Perl_my_sprintf()
and friends, regarding maximum char expansion in hex and base 10 of U8/I8/I16/I32/I64/void ptr/UV/IV.
I've looked at these briefly
Line 2461 in 8711297
#define IV_DIG (BIT_DIGITS(IVSIZE * 8)) |
but I dont trust them. I won't myself use them.
macro UV_DIG has ZERO users in the repo. IV_DIG exactly ONE grep hit
https://github.com/Perl/perl5/blob/blead/sv.c#L12295
for proc_fd_path id rather see STRLENs("/proc/self/fd/%d") + 21 +1 +12_whatever_padding_34 than just a random number like 64 which always make me paranoid when I see it b/c I can't verify it visually against the actual sprintf() args a few lines below, and IDK if someone before me quickly edited the fmt str and never updated the stk buf byte length to match. If the expression includes STRLENs("/proc/self/fd/%d")+whatever, paranoia gone.
There are situations when
$PROGRAM_NAME
is something /dev/fd/numbersudo /usr/local/bin/<tool>
with checksum verification - Issue with $PROGRAM_NAME (sudo + checksum verification) #23351→ Resolves to the tool path
perl <( echo '#!perl -DA'; echo 'print "$0\n"');
→ Stay with /dev/fd/number