Skip to content

Conversation

zas
Copy link
Collaborator

@zas zas commented Nov 5, 2024

Summary

  • This is a…
    • Bug fix
    • Feature addition
    • Refactoring
    • Minor / simple change (like a typo)
    • Other
  • Describe this change in 1-2 sentences:

Problem

  • JIRA ticket (optional): PICARD-XXX

Solution

Action

Additional actions required:

  • Update Picard documentation (please include a reference to this PR)
  • Other (please specify below)

@zas zas requested a review from phw November 5, 2024 11:19

# read times again and compare with original
st = os.stat(self.file.filename)
max_tries = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment here explaining why this is necessary. This is so weird that we need to document it somehow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand why it happens first.
Fixing the test is perhaps not enough...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@Sophist-UK
Copy link
Contributor

It would help if the background details of why this PR is necessary and evidence (prior discussions, how to recreate the problem) were given in this PR (since there is apparently no Jira ticket).

@zas
Copy link
Collaborator Author

zas commented Nov 5, 2024

It would help if the background details of why this PR is necessary and evidence (prior discussions, how to recreate the problem) were given in this PR (since there is apparently no Jira ticket).

Still under investigation, but I had the problem running tests on my system. For now, not sure why it happens. I'll add details when I'll have some... My env is using ZFS and Python 3.12.3.

Related discussion on https://matrix.to/#/#musicbrainz-picard-dev:chatbrainz.org

@Sophist-UK
Copy link
Contributor

Some theorising on the root cause as an unstructured braindump:

  1. I am assuming that the root cause is in Linux/ZFS rather than Python on the basis that os.stat probably just makes a system call.
  2. I am unclear whether this is happening when Linux / ZFS is accessed by an executable on the same system, or by SAMBA or NFS from another system.
  3. Some filesystems can set access-time updating off (to save write I/Os to the metadata every time you read a file). I have no idea whether having this on/off is a factor in the root cause or not. Have we checked whether atime=off is the cause of atime_ns being zero?
  4. I am not sure whether it is possible for python to determine whether access-time updating is off (it seems unlikely) but if you can determine this then it might be sensible to avoid attempting to update (i.e. reset) atime.
  5. There is more than one implementation of ZFS e.g. the original Sun ZFS and OpenZFS - we need specifics on both zfs version and O/S version.

@Sophist-UK
Copy link
Contributor

Sophist-UK commented Nov 5, 2024

OpenZFS? Version?

Linux distro and version?

Kernel version?

@Sophist-UK
Copy link
Contributor

Sophist-UK commented Nov 5, 2024

A few more ideas:

  1. Check st_atime and see if it is zero at the same time.
  2. Apparently there is also an os.path.getatime() method - again you could check and see if this gives a different result.
  3. Try varying the ZFS dataset atime property on / off and see if it makes a difference.
  4. Apparently in Linux you can open a file and explicitly say not to update the atime - but AFAIK you cannot do this in Python, but I may be wrong (in which case doing this might be better than recording and resetting the atime value.
  5. Asynchronous and synchronous writes are handled very differently in ZFS - and for local / NFS access this is effectively defined by a ZFS dataset property. So I wonder whether changing between asynchronous and synchronous writes makes a difference.

@zas
Copy link
Collaborator Author

zas commented Nov 5, 2024

OpenZFS? Version?

zfs-2.2.2-0ubuntu9.1
zfs-kmod-2.2.2-0ubuntu9

Linux distro and version?

Linux Mint 22

Kernel version?

6.8.0-48-generic #48-Ubuntu

@zas
Copy link
Collaborator Author

zas commented Nov 5, 2024

-rw-rw-r-- 1 zas zas 5 nov.   5 17:03 test
➜  ~ echo test > test
➜  ~ ls -lu test     
-rw-rw-r-- 1 zas zas 5 nov.   5 17:03 test
➜  ~ touch test
➜  ~ ls -lu test
-rw-rw-r-- 1 zas zas 5 janv.  1  1970 test
➜  ~ stat -c%x test
1970-01-01 01:00:00.000000000 +0100
➜  ~ cat test
test
➜  ~ ls -lu test   
-rw-rw-r-- 1 zas zas 5 nov.   5 17:04 test

Clearly an OpenZFS bug (likely related to truenas/zfs@f5f39b3 or similar)

@zas zas closed this Nov 5, 2024
@phw
Copy link
Member

phw commented Nov 5, 2024

Looks like this specific bug was solved with zfs 2.2.3, see https://github.com/openzfs/zfs/releases/tag/zfs-2.2.3 . So if this specific fix was not backported to Ubuntu/Mint it likely is present there.

@Sophist-UK
Copy link
Contributor

Not that it makes much difference to Picard or even Python, but the issue on @zas' Mint instance is probably due to an incompatibility between openZFS 2.2.2 and a much more modern kernel 6.8 - the commit which introduced the bug was in late December 2023 to address a change in the 6.7 kernel that created issues with ctime/atime, and openZFS 2.2.2 was released before that at the beginning of Dec 2023.

Obviously Github action runners will have different versions again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants