Skip to content

Possible race condition when unhooking #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
m417z opened this issue Apr 4, 2025 · 2 comments
Open

Possible race condition when unhooking #12

m417z opened this issue Apr 4, 2025 · 2 comments

Comments

@m417z
Copy link

m417z commented Apr 4, 2025

Consider two threads running in parallel as follows:

void (*pOutputDebugStringWOriginal)(PCWSTR lpOutputString) = OutputDebugStringW;

void OutputDebugStringWHook(PCWSTR lpOutputString) {
    pOutputDebugStringWOriginal(L"B\n");
}

void thread1() {
    while (1) {
        OutputDebugStringW(L"A\n");
    }
}

void thread2() {
    Sleep(1000);
    SlimDetoursInlineHook(
        TRUE,
        (PVOID*)&pOutputDebugStringWOriginal,
        OutputDebugStringWHook
    );
    Sleep(1000);
    SlimDetoursInlineHook(
        FALSE,
        (PVOID*)&pOutputDebugStringWOriginal,
        OutputDebugStringWHook
    );
}

If all goes right, the output should be:

A
...
A
B
...
B
A
...

And indeed that's what happens most of the time, but there's a problem here.

Consider what happens when these two parts run in parallel:

Thread 1

    pOutputDebugStringWOriginal(L"B\n");

Thread 2

    SlimDetoursInlineHook(
        FALSE,
        (PVOID*)&pOutputDebugStringWOriginal,
        OutputDebugStringWHook
    );

What can happen is:

  • Thread 1 reads the value of pOutputDebugStringWOriginal, but doesn't yet use it for the actual call.
  • Thread 2 freezes all threads, sets pOutputDebugStringWOriginal back to OutputDebugStringW and freezes the trampoline.
  • Thread 1 is resumed and calls the trampoline, which has already been freed.

I noticed this problem only on ARM64. I think that the reason is that while in x86/x64, the call can be a single instruction:

call [pOutputDebugStringWOriginal]

On ARM64, the call is done with multiple instructions which load the address first, something like:

adrp    x16, Page(pOutputDebugStringWOriginal)
add     x16, x16, PageOffset(pOutputDebugStringWOriginal)
br      x16

On the library consumer side, one solution is to use a lock (perhaps a read/write lock) when accessing the original function pointer and when setting or removing hooks. However, this is not optimal or convenient, and currently isn't documented as a requirement.

On the SlimDetours side, I'm not sure what's the best solution.

  • If we want to keep the library usage unchanged, we have to make sure that the pointer to call the hook isn't freed. The only way I can think of that's truly guaranteed to be correct is to just never free the trampoline, which is of course suboptimal.
  • If we want to consider changing the library, perhaps we can introduce a helper function to call the original function, such as:
    SlimDetoursCallOriginal(pOutputDebugStringWOriginal, L"A\n");
    Then, we can handle the case of a thread being suspended within this function. This approach is also not optimal, both due to the library change and the non-trivial implementation.

Let me know if you come up with a solution I didn't think of.

@RatinCN
Copy link
Member

RatinCN commented Apr 16, 2025

one solution is to use a lock (perhaps a read/write lock) when accessing the original function pointer and when setting or removing hooks

It's okay for me to add a function to access the original function pointer if it's optional. The caller could choose use it for more safety, or don't use it for more convenient and keep existing code. We could use lock, ref count, ... to make it safer in different scenarios if needed, what do you think?

@m417z
Copy link
Author

m417z commented Apr 26, 2025

Using a lock can be done without additional changes in SlimDetours, for example:

static FN_EqualRect* g_pfnEqualRect = NULL;
static SRWLOCK g_srwLock = SRWLOCK_INIT;

static
BOOL
WINAPI
Hooked_EqualRect(
    _In_ CONST RECT *lprc1,
    _In_ CONST RECT *lprc2)
{
    AcquireSRWLockShared(&g_srwLock);
    // ...
    BOOL ret = g_pfnEqualRect(lprc1, lprc2);
    // ...
    ReleaseSRWLockShared(&g_srwLock);
    return ret;
}

void TwiceSimpleHook()
{
    AcquireSRWLockExclusive(&g_srwLock);
    hr = SlimDetoursInlineHook(TRUE, (PVOID)&g_pfnEqualRect, Hooked_EqualRect);
    ReleaseSRWLockExclusive(&g_srwLock);
    // ...
    AcquireSRWLockExclusive(&g_srwLock);
    hr - SlimDetoursInlineHook(FALSE, (PVOID)&g_pfnEqualRect, Hooked_EqualRect);
    ReleaseSRWLockExclusive(&g_srwLock);
}

The solution I selected for my specific needs with specific constraints is a combination of #19 (pending review/merge) and m417z/thread-call-stack-scanner.

In any case, since there's no easy way to address this in the library without API changes, I think it'd be a good idea to document this limitation (and #15) in the TechWiki.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants