Skip to content

Conversation

@folbricht-stripe
Copy link
Contributor

@folbricht-stripe folbricht-stripe commented Jun 1, 2024

What was the end-user or developer problem that led to this PR?

The use of IMDSv1 is discouraged and AWS recommends using IMDSv2 to obtain instance credentials as outlined here and here. On instances without IMDSv1 support, rubygems can not be used to install gems from S3 using the instance profile.

Implements #5763

What is your fix for the problem, implemented in this PR?

Swiches the S3 URI signer to using IMDSv2.

Make sure the following tasks are checked

@welcome
Copy link

welcome bot commented Jun 1, 2024

Thanks for opening a pull request and helping make RubyGems and Bundler better! Someone from the RubyGems team will take a look at your pull request shortly and leave any feedback. Please make sure that your pull request has tests for any changes or added functionality.

We use GitHub Actions to test and make sure your change works functionally and uses acceptable conventions, you can review the current progress of GitHub Actions in the PR status window below.

If you have any questions or concerns that you wish to ask, feel free to leave a comment in this PR or join our #rubygems or #bundler channel on Slack.

For more information about contributing to the RubyGems project feel free to review our CONTRIBUTING guide

@folbricht-stripe folbricht-stripe marked this pull request as ready for review June 4, 2024 10:54
Copy link
Contributor

@deivid-rodriguez deivid-rodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to add some tests, but looking at git history, s3 fetcher is mainly maintained by Stripe people, and we don't really get many issues with it. So I'm happy to just merge this if you don't have time for tests.

@deivid-rodriguez
Copy link
Contributor

@geanboegman since you originally requested this, does this look good to you? Would you like to try it?

@geanboegman
Copy link

Looking at the code this is looks good yes, what I considered:

  • It uses a PUT (not GET) to get token as it should
  • timeout for token a bit short at 60s, but since it's discarded and not re-used this is good
  • One consideration that you need to consider for the use-case. The code does not have any fall-back to V1 when the V2 token retrieval fails. This is a instance configuration and use case detail. Although V2 is always enabled for all instances, the http-put-response-hop-limit configuration is typically set to 1 by default. This needs to be increased for containerized code (else the extra tcp hop will cause the token to never be returned). Some open source projects implement a V1 fallback on a timeout retrieving the token in order to have backwards compatibility, these fallbacks do come at a cost (logic and unexpected and hard to debug V1 usage for customers) - If you do want to implement fall-backs, the AWS SDK's can be used as an example (https://github.com/aws/aws-sdk-go/blob/main/aws/ec2metadata/service.go), maybe there are Ruby examples - I do not have deep Ruby knowledge.
  • Failure to get a token does lead to an error - this can happen, so need to be presented to the user.

I did not test the change, feedback only based on code review. Thanks for this!

@folbricht-stripe
Copy link
Contributor Author

@deivid-rodriguez I haven't been able to run the tests in my environment so would rather not have to spend the time setting all of that up. Any thoughts on how to handle http-put-response-hop-limit? Setting a fixed value like 2 doesn't seem great, but falling back to IMDS v1 is probably worse since it downgrades security. I don't know if there's an easy way to make this configurable.

@deivid-rodriguez
Copy link
Contributor

I haven't been able to run the tests in my environment

Could you share the problems you run into? Even if you don't end up contributing tests this time, I'd like to make this easier for other future contributions.

Any thoughts on how to handle http-put-response-hop-limit? Setting a fixed value like 2 doesn't seem great, but falling back to IMDS v1 is probably worse since it downgrades security. I don't know if there's an easy way to make this configurable.

Is it possible to increase it when getting a timeout, and then try with v2 again?

@geanboegman
Copy link

geanboegman commented Jul 3, 2024

Could you share the problems you run into? Even if you don't end up contributing tests this time, I'd like to make this easier for other future contributions.

Hi @deivid-rodriguez , that makes a lot of sense and it's great that you maintain a high bar. Unfortunately I did not run into this issues myself as we do not use Ruby Gems but I did notice that there code was still using V1 while doing a code audit as part of an environment assessment, so decided to log a ticket as it is little effort and would benefit the the open community overall.

Any thoughts on how to handle http-put-response-hop-limit

and

Is it possible to increase it when getting a timeout, and then try with v2 again?

My take here is that even when setting the hop limit to two, you will still benefit from an improved posture when using V2 and disabling V1, so when in doubt set the hop-limit to 2. Since these settings are instance level it would not be 1) trivial to change the hop limit dynamically as it would require permissions and AWS service calls (SDK or CLI), and 2) you should not try to change the hop limit from the Ruby code, it's a instances or account setting, it need to be managed at the account ownership level, so you are are the mercy of the instance configuration in this case (this is the reason some SDKs implement fallback logic when there is a timeout on the PUT token request - the response when running on a container and the hop limit is set to 1 - i.e. the router drops the token request after one hop). I think you have two options, either document that users need to up the hop-limit, or implement fallback logic to use V1 - but be sure to log some message to inform the user that they are using V1 not V2.

@deivid-rodriguez
Copy link
Contributor

Hi @deivid-rodriguez , that makes a lot of sense and it's great that you maintain a high bar. Unfortunately I did not run into this issues myself as we do not use Ruby Gems but I did notice that there code was still using V1 while doing a code audit as part of an environment assessment, so decided to log a ticket as it is little effort and would benefit the the open community overall.

Oh, no worries, your feedback is great! I actually meant this for @folbricht-stripe since he said "I haven't been able to run tests on my environment".

My take here is that even when setting the hop limit to two, you will still benefit from an improved posture when using V2 and disabling V1, so when in doubt set the hop-limit to 2. Since these settings are instance level it would not be 1) trivial to change the hop limit dynamically as it would require permissions and AWS service calls (SDK or CLI), and 2) you should not try to change the hop limit from the Ruby code, it's a instances or account setting, it need to be managed at the account ownership level, so you are are the mercy of the instance configuration in this case (this is the reason some SDKs implement fallback logic when there is a timeout on the PUT token request - the response when running on a container and the hop limit is set to 1 - i.e. the router drops the token request after one hop). I think you have two options, either document that users need to up the hop-limit, or implement fallback logic to use V1 - but be sure to log some message to inform the user that they are using V1 not V2.

Yeah, makes sense to not mess with trying to change this dynamically. I think ideally we implement a fallback with a warning for backwards compatibility.

@simi
Copy link
Contributor

simi commented Jul 31, 2024

Code looks OK. IMDSv1 seems soft-deprecated for years already. Any idea what IMDSv2 endpoint returns when not enabled? Would it be possible to just provide some friendly error explaining RubyGems requires IMDSv2 enabled?

Btw. @folbricht-stripe any plans to share your experience on using RubyGems with S3 in Stripe? Blog post would be amazing.

@folbricht-stripe
Copy link
Contributor Author

I'm not really involved in how RubyGems is used here. I just looked at it because it was the last thing using IMDSv1 on a few hosts, requiring me to enable v1 again. I'm also not normally working with Ruby, which probably explains why I fail to run any tests. I get errors even just trying to install the dependencies with bin/rake setup:

SyntaxError: <path>/rubygems/bundler/lib/bundler/compact_index_client/cache.rb:46: syntax error, unexpected '=', expecting ';' or '\n'
      def names_path = directory.join("names")
                     ^
<path>/rubygems/bundler/lib/bundler/compact_index_client/cache.rb:47: syntax error, unexpected '=', expecting ';' or '\n'
      def names_etag_path = directory.join("names.etag")
                          ^
<path>/rubygems/bundler/lib/bundler/compact_index_client/cache.rb:48: syntax error, unexpected '=', expecting ';' or '\n'
      def versions_path = directory.join("versions")

Probably some issues with the version I have here.

According to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html it doesn't seem possible to disable v2 only. IMDS is either off for all versions, or v2 is turned on. There's no way to turn off v2 while leaving v1 active. But given that it's possible v2 fails due to a misconfigured hop-limit, perhaps adding a fallback to v1 would be best. Though I'm not really a fan of automated fallback to less secure implementations.

Do you want me to add v1 fallback? The changes should be quite straightforward, but testing is not (last time I had to hand-patch a deployed version).

@geanboegman
Copy link

Though I'm not really a fan of automated fallback to less secure implementations.

I have to agree, although in some cases having a fallback as an interim measure can avoid impacting customers unintentionally (when they forget to up the hop in containers). If you decide to add a fallback, I would recommend that: 1) you clearly log when this happens, and 2) you allow customers to disable the fallback. The latter is needed as it can be really hard to bottom out on where calls are coming from, and it gives the customer control over his posture.

@deivid-rodriguez
Copy link
Contributor

Probably some issues with the version I have here.

Indeed. You're using Ruby older than Ruby 3.0 but RubyGems requires at least 3.0. The error you got is super unhelpful thought, I'm introducing a better error at #7942.

@simi Thoughts on adding a fallback + clear warning? I think that's probably the safest approach?

@deivid-rodriguez
Copy link
Contributor

#7942 closed this accidentally, reopening.

@pjsk-stripe
Copy link
Contributor

Howdy folks, I'm picking up this work (coordinating with @folbricht-stripe). Going to push my changes to this branch to keep the conversation.

@pjsk-stripe pjsk-stripe force-pushed the imdsv2 branch 2 times, most recently from 920bdd4 to 1c4090b Compare May 29, 2025 00:10
@pjsk-stripe
Copy link
Contributor

pjsk-stripe commented May 29, 2025

Sorry for dumping so many changes at once. I had to do some surgery

  • Make things testable. The original s3 signing tests completely stubbed out the logic. I put in effort to track what exact aws network requests are being made so tests can be written against them.
  • Implement fallback logic. Things first try V2 but fallback to V1 on failure.
  • Make a decision on hops (per discussion earlier with @folbricht-stripe and @geanboegman)
  • Add some indicator that fallback has happened. Is using Bundler.ui.warn appropriate in cases like this? I wasn't sure at what state using bundler ui elements was ok.

@deivid-rodriguez
Copy link
Contributor

Thank you! Regarding bullet point number 4, I think alert_warning is the better helper for RubyGems.

@pjsk-stripe
Copy link
Contributor

Ok! I have tested everything locally on our infra and it successfully pulls gems from S3 using imdsv2 🎉

I believe this PR is in a state for review if folks could take a look 🙏

cc @folbricht-stripe

@deivid-rodriguez
Copy link
Contributor

Thanks for your work, I'll try find time to review it. Can you address CI issues in the mean time?

@pjsk-stripe
Copy link
Contributor

Done!

@pjsk-stripe
Copy link
Contributor

Howdy folks, following up. Is there anything I can do to help review here?


case uri.to_s
when "http://169.254.169.254/latest/api/token"
if $imdsv2_token_failure
Copy link
Contributor

@simi simi Jun 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about to set at least class shared variable instead of global one to control this? FakeS3URISigner.return_token = "mysecrettoken" => returns token, FakeS3URISigner.return_token = false returns 401 instead. And reset before each test + set whenever failure is needed in with_imds_v2_failure method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me! I was largely following the convention in this file which was using global variables.

Will make the change! (Sorry for the slow response, was on holiday)

Copy link
Contributor

@pjsk-stripe pjsk-stripe Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Removed all global variables ❤️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, updated, had to do some more test surgery where places were assuming a V1 output erroneously. I hand inspected the tests and things look right now.

@pjsk-stripe
Copy link
Contributor

Hi friends, sorry for the thrash. I believe this is ready for review again :)

@pjsk-stripe
Copy link
Contributor

Howdy, @simi @deivid-rodriguez, any chance this could get a look over?

@deivid-rodriguez
Copy link
Contributor

I'm falling a bit behind on reviewing PRs, sorry. Not sure if @simi is planning to follow up soon, otherwise I'll get to it as soon as I can.

Do we have a way/guide to setup s3 sources so I can try this in real life? I've never used this feature, and being able to try it would help me when I review it.

@simi
Copy link
Contributor

simi commented Jul 1, 2025

I'm falling a bit behind on reviewing PRs, sorry. Not sure if @simi is planning to follow up soon, otherwise I'll get to it as soon as I can.

I do.

Copy link
Contributor

@simi simi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@deivid-rodriguez
Copy link
Contributor

This PR needs a rebase now to fix conflicts. I'd also squash the dummy "fix linter" commits into the more meaningful ones while at it.

@pjsk-stripe
Copy link
Contributor

Phew, sorry for the radio silence there, had a bunch of other things that took all my time. Just rebased, fixed conflicts, colocated commits, and I believe all tests / lints are passing.

Do you think you could take another look? Thanks!

@deivid-rodriguez @simi

Copy link
Contributor

@deivid-rodriguez deivid-rodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll merge in a couple of days (just in case @simi wants to double check the latest changes).

@simi
Copy link
Contributor

simi commented Aug 5, 2025

Thank you! I'll merge in a couple of days (just in case @simi wants to double check the latest changes).

@simi wants to double check and all good 🫡

@simi simi enabled auto-merge August 5, 2025 22:58
@simi simi merged commit 76abc01 into ruby:master Aug 6, 2025
73 checks passed
deivid-rodriguez pushed a commit that referenced this pull request Sep 4, 2025
Use IMDSv2 for S3 instance credentials

(cherry picked from commit 76abc01)
deivid-rodriguez pushed a commit that referenced this pull request Sep 9, 2025
Use IMDSv2 for S3 instance credentials

(cherry picked from commit 76abc01)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants