-
Notifications
You must be signed in to change notification settings - Fork 128
Add support for signing during sync #4075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7c862a6
to
9767c43
Compare
6ee4eeb
to
a16ced7
Compare
a16ced7
to
7837aa8
Compare
Hey, I've gone ahead and moved forward with new sync policies to solve (1), and it turns out that Pulp RPM synchronization already handles (2) by looking at the NEVRA, not the checksums. This should be ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not answering the question (1) and (2) before (you started working) and thanks for the contribution! We were in a rush for the last weeks.
I've commented on a few points already, feel free to ask for any clarification.
Also, I'd like to discuss your questions a bit more.
About (1):
I've heard (and agreed) a general preference of the team to define the "sign policy" in the repository level instead of a per-call basis. Would that work for you or is there a case where you would need per-call control over it?
If that works, there are some options. First, how do you think the "auto-sign" behavior (like we have in upload) would be a surprise? It's not something that can be configured by accident. Anyway, if we agree its a surprise, an alternative could be adding a package_signing_enable
or similar to the repository, wdyt?
About (2):
I'm not convinced it won't download and sign it again... You could write a test for this, asserting that no new packages are added to the repository on a second sync.
It's worth mentioning that we have a recently opened issue for this (on the upload case) here. Ideally we should share these kind of package signing handlers between upload and sync, but I don't really think we have a solution in place for this. I would be surprised if we do!
edit: IHO (2) is not a hard requirement for the PR.
That's no problem at all! Thanks so much for the detailed review!
This would be absolutely fine. We have no requirement for per-call control over this. I was trying to make minimal structural changes, and sync_policy seemed to be the simplest place to change it. Having said that, I do agree that it makes more sense to be a repository level setting.
My main concern is that this would be a behavior change. Previously, if someone synced a repo, even if it had the signing service setup, nothing got signed. Now it would be signed. Having said that, there is a good argument that if someone has setup a signing service on a repo, it should be signed, so maybe I'm overthinking this. If you think it's not a big deal, I think I'm OK with dropping my second commit.
It sure didn't seem to download and sign again when running the test suite. Packages that had been synced unsigned in a previous test were not replaced with the signed versions in the signing test, and I spent way too long fighting with it before I realized it's because the old unsigned packages were still in the database. That's why I've added As you suggest, it would probably be worth adding an explicit test for this.
Yeah, let me think on this a bit more. I'll also address the individual issues you brought up. |
So, after using this a bit in our Pulp instance, I've figured out what was happening. It turns out that it won't download and sign again if and only if the repodata hasn't changed at all. If even one package has been updated in the repo, all the of the packages will be re-signed. I think the correct solution is to store the fingerprint of any RPM signatures in the database, so we can then ensure that we don't re-sign if the content hash hasn't changed and the correct RPM signature is in the database. What do you think? |
c591d53
to
5185221
Compare
That makes sense, we do have an optimization to skip basically the whole sync process if the metadata checksum is unchanged from the previous sync.
I assume what you mean is something along the lines of "keep a mapping of the pre-signed package checksums to the new, signed packages"? |
I'm pitching in on @jdieter's behalf. We work together.
Right. Store the checksum of the synced package (pre-signing) so that if we see that package again during sync it gets skipped. That's one option. The other option is to look only at the NEVRA, and if it is already present, don't sign, don't import. It's kind of a feature that old packages don't get sly-updated without extra work; broken packages would have to be deleted from the repository before a new sync to update them. This is a workaround I can get behind, because we wouldn't want things to update unexpectedly anyway. |
|
99eb6ae
to
ef23b7d
Compare
This commit adds a new stage, RpmArtifactSigningStage, to synchronize.py that signs the rpms as they're being synchronized from an upstream repo. The new stage only does the signing if both a signing service and fingerprint have been set for the repo. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
This commit adds two new sync policies, `additive_sign` and `mirror_content_only_sign`, which are identical to the non `_sign` versions, but will sign packages during the sync process. It also adds a test to ensure that signing *doesn't* happen when not using a `*_sign` policy. And, because sync is smart enough to skip packages that have identical nevras, we need to ensure that all packages are garbage collected before running the tests, so we're adding the delete_orphans_pre fixture to both tests. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
As mentioned in the PR review, it's far simpler to just create a new artifact that handles all the hashing instead of manually creating the hashes we think we should need. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
We need to ensure that the size and pkgId are updated correctly, so pass the content instead of the artifact to signing function and update the content size and pkgId. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
As requested in the review, the logic on whether or not to do package signing during sync has been moved out of RpmArtifactSigningStage, greatly simplifying the stage. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
As discussed in the review, adding new sync policies is the wrong place for ensuring that signing happens only when necessary. For the moment, we are going to assume that if a user has set a signing service and fingerprint for a repository, then they want it re-signed on sync. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
The "mirror_complete" sync policy or the "on_demand" remote policy are both not supported when doing signing during syncing, so raise errors if either policy is set during the sync process. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
Currently, we start synchronizing packages even if the package is unchanged from a previous sync. This means that we end up re-signing packages each time we sync when signing during sync. This commit fixes that problem by re-using the already synced version of the package for packages that already exist in the latest version of the repository with the same NEVRA. Note that this means that a package that has been changed without changing the NEVRA will *not* be resynced anymore. This may be seen as a security feature where packages can't be silently changed. To sync a package with the same NEVRA, the original package must first be removed from the repository and then the repository must be synced again. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
Make it easier to detect when the repos aren't being populated as expected by checking whether the repo_version has changed between each step of the test. This ensures that the test fails fast when things start going off the rails rather than waiting until the end. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
ef23b7d
to
9a704d2
Compare
We need more details in the DeclarativeContent, so let's create it properly and only use it there's at least one artifact. Also, add a "already_in_repo" flag into `extra_data` to ensure that we don't try to re-sign packages that were already in the repository (and should already be signed). Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
9a704d2
to
d5528e6
Compare
Create a test that syncs a repository, then syncs a completely different repository, and then re-syncs the first. Artifacts will be created during the first sync that will be orphaned (since signing creates a new artifact), and then, on the third sync, packages will be deduped to these artifacts. This test verifies that the artifact downloading in _sign_rpm_content works with these deduped artifacts. Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
7af4e7b
to
2ff3f4e
Compare
Ok, this has been implemented (correctly, this time, I think). Tests are all passing, and I've added a couple more, though I also had to disable two that were expecting the package to change when the NEVRA is the same. I'm fine if we want to go down the route of adding a field for the original pkgId (and we'll probably also need to add a field for the signing key, so when looking through artifacts with the same originalPkgId, we can determine which signing key was used on each of them), but I wanted to make sure that the logic in here is sane and that what I've done isn't totally off-the-wall. |
I think this would be great to have. We're also interested in not re-signing packages again. However, we've encountered the situation where two packages had the same NEVRA but different contents unfortunately. So checking by package hash digest would work for us whereas NEVRA wouldn't. |
This commit adds a new stage, RpmArtifactSigningStage, to synchronize.py that signs the rpms as they're being synchronized from an upstream repo. The new stage only does the signing if both a signing service and fingerprint have been set for the repo.
This is fully functional, but there are some questions I have on how best to move forward before moving this out of draft.
I'm not sure if solving 2 is a hard requirement for this PR, but we really should decide on something for 1.