Skip to content

[ENG-8175] Fixed potential race condition #11179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: feature/pbs-25-10
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions website/archiver/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ def on_failure(self, exc, task_id, args, kwargs, einfo):
dst.save()

sentry.log_message(
'An error occured while archiving node',
f'An error occured while archiving node: {src._id} and registration: {dst._id}',
extra_data={
'source node guid': src._id,
'registration node guid': dst._id,
Expand Down Expand Up @@ -325,9 +325,9 @@ def archive(job_pk):
)


@celery_app.task(base=ArchiverTask, ignore_result=False)
@celery_app.task(bind=True, base=ArchiverTask, ignore_result=False, max_retries=1, default_retry_delay=60 * 5, acks_late=True)
@logged('archive_success')
def archive_success(dst_pk, job_pk):
def archive_success(self, dst_pk, job_pk):
"""Archiver's final callback. For the time being the use case for this task
is to rewrite references to files selected in a registration schema (the Prereg
Challenge being the first to expose this feature). The created references point
Expand All @@ -352,7 +352,17 @@ def archive_success(dst_pk, job_pk):

# Update file references in the Registration's responses to point to the archived
# file on the Registration instead of the "live" version on the backing project
utils.migrate_file_metadata(dst)
try:
utils.migrate_file_metadata(dst)
except ArchivedFileNotFound as err:
sentry.log_message(
f'Some files were not found while archiving the node {dst_pk}',
extra_data={
'missing_files': err.missing_files,
},
)
self.retry(exc=err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the retry? Are you expecting a situation where the files will be found on the retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it looks like this celery task is run faster than attaching files to a node, this is why they cannot be found. Why I think so because Mark mentioned that despite a user got email with "some files were altered" message, these files were actually included in the node registration, so maybe there was some retry and this celery task was re-run or Matt ran it manually. So if we don't find some files, it may be related to race condition and we make one more attempt in 5 minutes. If it doesn't work -> raise error
In order to avoid this race condition I think we can make this task synchronous, but in this case user'll need to wait a bit more time on registration page to not abort a request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I described this issue to Matt and it's a known bug:

Issue appears to be a known bug wherein those file_urls don't get updated after registration is created, leaving those links pointing to the same file on the node. Unsure about the exact cause, but usually just calling migrate_file_metadata works. Would you like me to run that against it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And migrate_file_metadata is a problem place that checks if files are found


job = ArchiveJob.load(job_pk)
if not job.sent:
job.sent = True
Expand Down
Loading