[ENG-5862] SPAM - Fix Wiki Spamming #11171

antkryt · 2025-06-05T15:24:08Z

Purpose

verify if spammy domains are detected

Changes

merge check_resource_for_domains_postcommit and check_resource_with_spam_services tasks to avoid race condition
compare note to value not enum
log detected domains to sentry

QA Notes

You can test it with domain xakw1.com on staging3. Currently project won't be banned with this domain, regardless of whether it's public or not.

Documentation

Side Effects

Ticket

https://openscience.atlassian.net/browse/ENG-5862

antkryt · 2025-06-05T15:37:10Z

The issue described in the ticket is not entirely accurate. I found the domain "xawk1.com" on staging, and content containing this domain will not be banned even if the project is public (here is an example). And on the other side, if you repeat the steps from the ticket using some other domains, everything works as expected.

However, the spam check will always be triggered because the DomainReference is always created and you can verify it in django admin (the only place it’s created is in the function _check_resource_for_domains). Therefore, only two possibilities remain:

if notable_domain.note == NotableDomain.Note.EXCLUDE_FROM_ACCOUNT_CREATION_AND_CONTENT is not being triggered
or
resource.confirm_spam(save=True, domains=list(spammy_domains)) silently fails (changes are not saved in the database or racing condition and changes are overwritten with some other process).

Probably changes made by check_resource_for_domains_postcommit are overwritten by check_resource_with_spam_services. Both of them start almost at the same time and load same (non-spam) version of the resource at the beginning. Which task is completed last, such changes will be saved in the database (typically check_resource_with_spam_services ends last because we make request to external service)

…eck_resource_for_domains_postcommit

Johnetordoff

@antkryt I'm a little confused exactly what this accomplishes, could you write some test cases to illustrate how you are changing the current behavior. I see that the tests are changed, but I don't see new test cases. A good test case here should show the spam content or spam domain of a wiki is checked when a project is made public.

antkryt · 2025-06-10T12:30:53Z

@Johnetordoff I'm not sure that it's possible to write a test case to illustrate what I'm changing here, so I'll try to explain better.

Both check_resource_for_domains_postcommit and check_resource_with_spam_services are doing the same thing: start after response is sent to client (run_postcommit decorator) -> load node with spam_status=UNKNOWN -> process -> save changes to db.

As you can see, tasks don't know anything about each other and changes that are made (it's two different parallel processes). So, if check_resource_with_spam_services finishes after check_resource_for_domains_postcommit, then changes made by check_resource_for_domains_postcommit will be overwritten (and vice versa). Locally everything works because SPAM_SERVICES_ENABLED is false and only check_resource_for_domains_postcommit is running.

Also, I've changed signature of the def _check_resource_for_domains(guid, content) to def _check_resource_for_domains(resource, content) and removed confirm_spam() call from it, that's why I've updated some tests to make them work as before

As for checking spam during privacy change tests, we have bunch of those (see osf_tests/test_node.py::TestNodeSpam). I'll add a few tests to check if new check_resource_for_spam_postcommit function works properly

Johnetordoff

Still not quite the behavior we are looking for, but correct techniques.

Johnetordoff · 2025-06-10T14:21:27Z

osf_tests/test_node.py

+
+    @mock.patch.object(settings, 'SPAM_SERVICES_ENABLED', True)
+    @mock.patch('osf.external.spam.tasks._check_resource_for_domains')
+    def test_check_resource_for_spam_postcommit_with_spammy_domains(self, mock_check_domains, project, user):


THe ticket instructions to reproduce the bug are:

Create project

Update wiki with spammy domain

Make project public

These are the steps the tests should follow, you are testing behavior that is too specific we know checking for spam domans works, when a project is saved when public, but we need to check a project's wiki content too, not just it's desciption etc.

Johnetordoff

Good new tests, thanks!

verify if spammy domains are detected

b5c2939

antkryt added 2 commits June 6, 2025 15:47

fix racing condition between check_resource_with_spam_services and ch…

17a04e5

…eck_resource_for_domains_postcommit

fix tests

b02aab5

Johnetordoff requested changes Jun 9, 2025

View reviewed changes

add more tests

0c51319

Johnetordoff requested changes Jun 10, 2025

View reviewed changes

check wiki content included

71519d1

Johnetordoff approved these changes Jun 10, 2025

View reviewed changes

brianjgeiger approved these changes Jun 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENG-5862] SPAM - Fix Wiki Spamming #11171

[ENG-5862] SPAM - Fix Wiki Spamming #11171

Uh oh!

antkryt commented Jun 5, 2025 •

edited

Loading

Uh oh!

antkryt commented Jun 5, 2025 •

edited

Loading

Uh oh!

Johnetordoff left a comment

Uh oh!

antkryt commented Jun 10, 2025 •

edited

Loading

Uh oh!

Johnetordoff left a comment

Uh oh!

Johnetordoff Jun 10, 2025

Uh oh!

Johnetordoff left a comment

Uh oh!

Uh oh!

[ENG-5862] SPAM - Fix Wiki Spamming #11171

Are you sure you want to change the base?

[ENG-5862] SPAM - Fix Wiki Spamming #11171

Uh oh!

Conversation

antkryt commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

QA Notes

Documentation

Side Effects

Ticket

Uh oh!

antkryt commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Johnetordoff left a comment

Choose a reason for hiding this comment

Uh oh!

antkryt commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Johnetordoff left a comment

Choose a reason for hiding this comment

Uh oh!

Johnetordoff Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Johnetordoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antkryt commented Jun 5, 2025 •

edited

Loading

antkryt commented Jun 5, 2025 •

edited

Loading

antkryt commented Jun 10, 2025 •

edited

Loading