Skip to content

Conversation

jdieter
Copy link
Contributor

@jdieter jdieter commented Aug 14, 2025

This commit adds a new stage, RpmArtifactSigningStage, to synchronize.py that signs the rpms as they're being synchronized from an upstream repo. The new stage only does the signing if both a signing service and fingerprint have been set for the repo.

This is fully functional, but there are some questions I have on how best to move forward before moving this out of draft.

  1. At the moment, signing will happen if the repo has a signing service and fingerprint, but this would be a behavior change with the potential of surprising users. Should this instead be activated as a new sync policy or two ("additive_sign", "mirror_content_only_sign")? Or should it be activated some other way?
  2. How do we ensure that we're not resyncing packages each time a sync is run? The checksums will obviously be different after redoing the signing, so should we add a field to the database that stores the original checksum?

I'm not sure if solving 2 is a hard requirement for this PR, but we really should decide on something for 1.

@github-actions github-actions bot added no-changelog no-issue multi-commit Added when a PR consists of more than one commit labels Aug 14, 2025
@jdieter jdieter force-pushed the allow-signing-on-sync branch from 7c862a6 to 9767c43 Compare August 18, 2025 13:22
@github-actions github-actions bot added multi-commit Added when a PR consists of more than one commit and removed multi-commit Added when a PR consists of more than one commit labels Aug 18, 2025
@jdieter jdieter force-pushed the allow-signing-on-sync branch from 6ee4eeb to a16ced7 Compare August 20, 2025 10:42
@jdieter jdieter changed the title [Draft] Initial support for signing during sync Add support for signing during sync Aug 20, 2025
@jdieter jdieter force-pushed the allow-signing-on-sync branch from a16ced7 to 7837aa8 Compare August 20, 2025 11:37
@jdieter
Copy link
Contributor Author

jdieter commented Aug 20, 2025

Hey, I've gone ahead and moved forward with new sync policies to solve (1), and it turns out that Pulp RPM synchronization already handles (2) by looking at the NEVRA, not the checksums.

This should be ready for review now.

@jdieter jdieter marked this pull request as ready for review August 20, 2025 13:07
Copy link
Member

@pedro-psb pedro-psb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not answering the question (1) and (2) before (you started working) and thanks for the contribution! We were in a rush for the last weeks.

I've commented on a few points already, feel free to ask for any clarification.
Also, I'd like to discuss your questions a bit more.

About (1):
I've heard (and agreed) a general preference of the team to define the "sign policy" in the repository level instead of a per-call basis. Would that work for you or is there a case where you would need per-call control over it?

If that works, there are some options. First, how do you think the "auto-sign" behavior (like we have in upload) would be a surprise? It's not something that can be configured by accident. Anyway, if we agree its a surprise, an alternative could be adding a package_signing_enable or similar to the repository, wdyt?

About (2):
I'm not convinced it won't download and sign it again... You could write a test for this, asserting that no new packages are added to the repository on a second sync.

It's worth mentioning that we have a recently opened issue for this (on the upload case) here. Ideally we should share these kind of package signing handlers between upload and sync, but I don't really think we have a solution in place for this. I would be surprised if we do!

edit: IHO (2) is not a hard requirement for the PR.

@jdieter
Copy link
Contributor Author

jdieter commented Aug 22, 2025

Sorry for not answering the question (1) and (2) before (you started working) and thanks for the contribution! We were in a rush for the last weeks.

That's no problem at all! Thanks so much for the detailed review!

I've commented on a few points already, feel free to ask for any clarification. Also, I'd like to discuss your questions a bit more.

About (1): I've heard (and agreed) a general preference of the team to define the "sign policy" in the repository level instead of a per-call basis. Would that work for you or is there a case where you would need per-call control over it?

This would be absolutely fine. We have no requirement for per-call control over this. I was trying to make minimal structural changes, and sync_policy seemed to be the simplest place to change it. Having said that, I do agree that it makes more sense to be a repository level setting.

If that works, there are some options. First, how do you think the "auto-sign" behavior (like we have in upload) would be a surprise? It's not something that can be configured by accident. Anyway, if we agree its a surprise, an alternative could be adding a package_signing_enable or similar to the repository, wdyt?

My main concern is that this would be a behavior change. Previously, if someone synced a repo, even if it had the signing service setup, nothing got signed. Now it would be signed. Having said that, there is a good argument that if someone has setup a signing service on a repo, it should be signed, so maybe I'm overthinking this. If you think it's not a big deal, I think I'm OK with dropping my second commit.

About (2): I'm not convinced it won't download and sign it again... You could write a test for this, asserting that no new packages are added to the repository on a second sync.

It sure didn't seem to download and sign again when running the test suite. Packages that had been synced unsigned in a previous test were not replaced with the signed versions in the signing test, and I spent way too long fighting with it before I realized it's because the old unsigned packages were still in the database. That's why I've added delete_orphans_pre to the tests I created.

As you suggest, it would probably be worth adding an explicit test for this.

It's worth mentioning that we have a recently opened issue for this (on the upload case) here. Ideally we should share these kind of package signing handlers between upload and sync, but I don't really think we have a solution in place for this. I would be surprised if we do!

edit: IHO (2) is not a hard requirement for the PR.

Yeah, let me think on this a bit more.

I'll also address the individual issues you brought up.

@jdieter
Copy link
Contributor Author

jdieter commented Sep 8, 2025

About (2): I'm not convinced it won't download and sign it again... You could write a test for this, asserting that no new packages are added to the repository on a second sync.

It sure didn't seem to download and sign again when running the test suite. Packages that had been synced unsigned in a previous test were not replaced with the signed versions in the signing test, and I spent way too long fighting with it before I realized it's because the old unsigned packages were still in the database. That's why I've added delete_orphans_pre to the tests I created.

So, after using this a bit in our Pulp instance, I've figured out what was happening. It turns out that it won't download and sign again if and only if the repodata hasn't changed at all. If even one package has been updated in the repo, all the of the packages will be re-signed.

I think the correct solution is to store the fingerprint of any RPM signatures in the database, so we can then ensure that we don't re-sign if the content hash hasn't changed and the correct RPM signature is in the database. What do you think?

@jdieter jdieter force-pushed the allow-signing-on-sync branch from c591d53 to 5185221 Compare September 8, 2025 11:48
@dralley
Copy link
Contributor

dralley commented Sep 10, 2025

So, after using this a bit in our Pulp instance, I've figured out what was happening. It turns out that it won't download and sign again if and only if the repodata hasn't changed at all. If even one package has been updated in the repo, all the of the packages will be re-signed.

That makes sense, we do have an optimization to skip basically the whole sync process if the metadata checksum is unchanged from the previous sync.

I think the correct solution is to store the fingerprint of any RPM signatures in the database, so we can then ensure that we don't re-sign if the content hash hasn't changed and the correct RPM signature is in the database. What do you think?

I assume what you mean is something along the lines of "keep a mapping of the pre-signed package checksums to the new, signed packages"?

@josephtate
Copy link

I'm pitching in on @jdieter's behalf. We work together.

I think the correct solution is to store the fingerprint of any RPM signatures in the database, so we can then ensure that we don't re-sign if the content hash hasn't changed and the correct RPM signature is in the database. What do you think?

I assume what you mean is something along the lines of "keep a mapping of the pre-signed package checksums to the new, signed packages"?

Right. Store the checksum of the synced package (pre-signing) so that if we see that package again during sync it gets skipped. That's one option.

The other option is to look only at the NEVRA, and if it is already present, don't sign, don't import. It's kind of a feature that old packages don't get sly-updated without extra work; broken packages would have to be deleted from the repository before a new sync to update them. This is a workaround I can get behind, because we wouldn't want things to update unexpectedly anyway.

@jdieter
Copy link
Contributor Author

jdieter commented Sep 12, 2025

The other option is to look only at the NEVRA, and if it is already present, don't sign, don't import. It's kind of a feature that old packages don't get sly-updated without extra work; broken packages would have to be deleted from the repository before a new sync to update them. This is a workaround I can get behind, because we wouldn't want things to update unexpectedly anyway.

As part of this PR, I've implemented this option. If you want me to do the full "keep a mapping of the pre-signed package checksums to the new, signed packages", I can do it, but it will be more invasive. UPDATED: I'm using skip package incorrectly. This is not ready for review yet.

@jdieter jdieter force-pushed the allow-signing-on-sync branch 4 times, most recently from 99eb6ae to ef23b7d Compare September 15, 2025 13:38
@dralley dralley marked this pull request as draft September 17, 2025 22:33
This commit adds a new stage, RpmArtifactSigningStage, to synchronize.py
that signs the rpms as they're being synchronized from an upstream repo.
The new stage only does the signing if both a signing service and
fingerprint have been set for the repo.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
This commit adds two new sync policies, `additive_sign` and
`mirror_content_only_sign`, which are identical to the non `_sign`
versions, but will sign packages during the sync process.

It also adds a test to ensure that signing *doesn't* happen when not using
a `*_sign` policy.

And, because sync is smart enough to skip packages that have identical
nevras, we need to ensure that all packages are garbage collected before
running the tests, so we're adding the delete_orphans_pre fixture to both
tests.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
As mentioned in the PR review, it's far simpler to just create a new
artifact that handles all the hashing instead of manually creating the
hashes we think we should need.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
We need to ensure that the size and pkgId are updated correctly, so pass
the content instead of the artifact to signing function and update the
content size and pkgId.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
As requested in the review, the logic on whether or not to do package
signing during sync has been moved out of RpmArtifactSigningStage, greatly
simplifying the stage.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
As discussed in the review, adding new sync policies is the wrong place for
ensuring that signing happens only when necessary.  For the moment, we are
going to assume that if a user has set a signing service and fingerprint
for a repository, then they want it re-signed on sync.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
The "mirror_complete" sync policy or the "on_demand" remote policy are both
not supported when doing signing during syncing, so raise errors if either
policy is set during the sync process.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
Currently, we start synchronizing packages even if the package is unchanged
from a previous sync.  This means that we end up re-signing packages each
time we sync when signing during sync.

This commit fixes that problem by re-using the already synced version of
the package for packages that already exist in the latest version of the
repository with the same NEVRA.  Note that this means that a package that
has been changed without changing the NEVRA will *not* be resynced anymore.
This may be seen as a security feature where packages can't be silently
changed.  To sync a package with the same NEVRA, the original package must
first be removed from the repository and then the repository must be synced
again.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
Make it easier to detect when the repos aren't being populated as expected
by checking whether the repo_version has changed between each step of the
test.

This ensures that the test fails fast when things start going off the rails
rather than waiting until the end.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
@jdieter jdieter force-pushed the allow-signing-on-sync branch from ef23b7d to 9a704d2 Compare September 26, 2025 14:41
We need more details in the DeclarativeContent, so let's create it properly
and only use it there's at least one artifact.

Also, add a "already_in_repo" flag into `extra_data` to ensure that we
don't try to re-sign packages that were already in the repository (and
should already be signed).

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
@jdieter jdieter force-pushed the allow-signing-on-sync branch from 9a704d2 to d5528e6 Compare September 26, 2025 14:45
Create a test that syncs a repository, then syncs a completely different
repository, and then re-syncs the first.  Artifacts will be created during
the first sync that will be orphaned (since signing creates a new
artifact), and then, on the third sync, packages will be deduped to these
artifacts.  This test verifies that the artifact downloading in
_sign_rpm_content works with these deduped artifacts.

Signed-off-by: Jonathan Dieter <jdieter@ciq.com>
@jdieter jdieter force-pushed the allow-signing-on-sync branch from 7af4e7b to 2ff3f4e Compare September 26, 2025 15:28
@jdieter
Copy link
Contributor Author

jdieter commented Sep 26, 2025

The other option is to look only at the NEVRA, and if it is already present, don't sign, don't import. It's kind of a feature that old packages don't get sly-updated without extra work; broken packages would have to be deleted from the repository before a new sync to update them. This is a workaround I can get behind, because we wouldn't want things to update unexpectedly anyway.

Ok, this has been implemented (correctly, this time, I think). Tests are all passing, and I've added a couple more, though I also had to disable two that were expecting the package to change when the NEVRA is the same.

I'm fine if we want to go down the route of adding a field for the original pkgId (and we'll probably also need to add a field for the signing key, so when looking through artifacts with the same originalPkgId, we can determine which signing key was used on each of them), but I wanted to make sure that the logic in here is sane and that what I've done isn't totally off-the-wall.

@daviddavis
Copy link
Contributor

I'm fine if we want to go down the route of adding a field for the original pkgId (and we'll probably also need to add a field for the signing key, so when looking through artifacts with the same originalPkgId, we can determine which signing key was used on each of them), but I wanted to make sure that the logic in here is sane and that what I've done isn't totally off-the-wall.

I think this would be great to have. We're also interested in not re-signing packages again. However, we've encountered the situation where two packages had the same NEVRA but different contents unfortunately. So checking by package hash digest would work for us whereas NEVRA wouldn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-commit Added when a PR consists of more than one commit no-changelog no-issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants