Skip to content

Conversation

ztgrace
Copy link
Collaborator

@ztgrace ztgrace commented Sep 9, 2025

  • added file names into the NPInput for more context
  • modified the docker download to handle ECR auth and url formats
  • fixed a deadlock issue with noseyparker link.

Summary by CodeRabbit

  • New Features

    • Improved Docker image parsing, including robust support for AWS ECR and Public ECR.
    • Added Basic authentication handling for ECR registries.
    • Enhanced scan results with richer provenance metadata (repo path and blob path).
  • Bug Fixes

    • Correct tag parsing when registries include ports.
    • More reliable token use and Authorization headers across registries.
    • Ensures scanner completion signal is always emitted, preventing potential hangs.
  • Tests

    • Added comprehensive unit tests covering diverse Docker image formats and edge cases.

@ztgrace ztgrace requested a review from josephwhenry September 9, 2025 14:48
Copy link

coderabbitai bot commented Sep 9, 2025

Walkthrough

Implements registry-aware image name parsing and ECR-specific Basic auth handling in Docker download logic; adds unit tests for image parsing; ensures NoseyParker scanner always signals completion via deferred send; extends NP provenance types and populates RepoPath and BlobPath during layer/file processing.

Changes

Cohort / File(s) Summary of Changes
Docker image parsing and ECR auth
pkg/links/docker/download.go
Reworked parseImageName to correctly handle tags vs registry ports; added normalization for ECR/public ECR and other registry formats. Introduced ECR-aware Basic token creation/usage in refreshToken, getTokenFromAuthEndpoint, getManifest, and fetchBlob. Added debug logging and minor variable reuse. No public API changes.
Tests for image parsing
pkg/links/docker/download_test.go
Added table-driven tests validating parseImageName across DockerHub, custom registries (incl. ports), ECR private/public, and edge cases; added focused edge-case tests.
NoseyParker scanner signaling
pkg/links/noseyparker/noseyparker.go
In startNpScanner, replaced explicit done send with deferred send inside goroutine to guarantee completion signal on all exit paths.
NP provenance enrichment and types
pkg/types/docker.go
While processing layers/files, now populate NPInput.Provenance.RepoPath and, for layers, FirstCommit.BlobPath.
Type additions
pkg/types/...
Added fields: NPProvenance.RepoPath string; NPProvenance.FirstCommit *NPCommitMetadata; NPCommitMetadata.BlobPath string.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Caller
  participant D as DockerDownloadLink
  participant A as Auth Endpoint
  participant R as Registry

  C->>D: Download(imageRef)
  D->>D: parseImageName(imageRef)
  D->>D: Detect registry (ECR vs non-ECR)

  alt ECR or public.ecr.aws
    D->>D: Build Basic token (base64 username:password)
    note right of D: Token stored as dd.token
  else Non-ECR
    D->>A: GET /auth (optional Basic using creds)
    A-->>D: Bearer token
    D->>D: Store dd.token
  end

  D->>R: GET /manifests/:ref
  alt ECR
    note over D,R: Authorization: Basic dd.token
  else Non-ECR
    note over D,R: Authorization: Bearer dd.token
  end
  R-->>D: Manifest

  loop For each blob
    D->>R: GET /blobs/:digest
    R-->>D: Blob content
  end

  D-->>C: Content/links
Loading
sequenceDiagram
  participant Caller
  participant NP as NoseyParker
  participant G as Goroutine
  participant Done as n.done (chan bool)

  Caller->>NP: startNpScanner()
  NP->>G: launch scan goroutine
  activate G
  G->>G: defer send Done <- true
  G->>G: run scan command (may error)
  G-->>Done: (deferred) true
  deactivate G
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • josephwhenry

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly highlights the two primary changes—adding ECR compatibility and fixing a deadlock—directly reflecting the main code updates in the Docker download flow and the noseyparker link. It is concise, specific, and clearly related to the changeset without unnecessary detail.
Description Check ✅ Passed The description clearly enumerates the core modifications, including enhancing NPInput context, updating Docker download for ECR authentication and URL formats, and resolving the noseyparker deadlock, all of which align with the pull request changes.

Poem

A whisker twitch, a parse so neat,
I hop through tags and ports I meet.
ECR says “Basic, please,” I grin—
Bearer for the rest, we win.
NoseyParker’s done is always true,
Provenance paths? I mapped them too.
Thump-thump: shipped, with carrots’ hue. 🥕

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ztg/ecr-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

github-actions bot commented Sep 9, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/links/docker/download.go (1)

376-378: Potential issue with token refresh retry logic

After refreshing the token in line 371, the code sets a Bearer token header regardless of registry type. This could fail for ECR registries which require Basic auth.

Apply this fix to handle ECR registries correctly during retry:

         // Update the request with the new token
         if dd.token != "" {
-            req.Header.Set("Authorization", "Bearer "+dd.token)
+            registryBase := strings.Split(dd.dockerImage.Image, "/")[0]
+            if (strings.Contains(registryBase, "ecr") && strings.Contains(registryBase, "amazonaws.com")) || strings.Contains(registryBase, "public.ecr.aws") {
+                req.Header.Set("Authorization", "Basic "+dd.token)
+            } else {
+                req.Header.Set("Authorization", "Bearer "+dd.token)
+            }
         }
🧹 Nitpick comments (6)
pkg/types/docker.go (1)

194-197: Provenance enrichment looks good, but consider initializing the commit metadata fully

While adding RepoPath and FirstCommit metadata is valuable for tracking provenance, the NPCommitMetadata struct is only partially initialized with BlobPath. Consider whether other fields in the commit metadata should be initialized or if a comment should clarify why only BlobPath is relevant here.

pkg/links/docker/download.go (1)

151-162: ECR authentication handling looks correct

The special case for ECR registries correctly uses Basic auth with base64-encoded credentials. The debug logging will be helpful for troubleshooting authentication issues.

Consider extracting the ECR detection logic into a helper function to reduce code duplication:

+func isECRRegistry(registryBase string) bool {
+    return (strings.Contains(registryBase, "ecr") && strings.Contains(registryBase, "amazonaws.com")) || 
+           strings.Contains(registryBase, "public.ecr.aws")
+}

 func (dd *DockerDownloadLink) refreshToken(dockerImage *types.DockerImage) error {
     imageName, tag := dd.parseImageName(dockerImage.Image)
 
     // Special handling for ECR - don't reset token, just recreate it
     registryBase := strings.Split(dockerImage.Image, "/")[0]
-    if (strings.Contains(registryBase, "ecr") && strings.Contains(registryBase, "amazonaws.com")) || strings.Contains(registryBase, "public.ecr.aws") {
+    if isECRRegistry(registryBase) {
         if dockerImage.AuthConfig.Username != "" && dockerImage.AuthConfig.Password != "" {

This helper function could be reused in lines 274, 327, and 544.

pkg/links/docker/download_test.go (4)

7-9: Enable test parallelism at the suite level.

Add t.Parallel() to speed up and avoid future accidental shared-state issues.

 func TestDockerDownloadLink_parseImageName(t *testing.T) {
-	dd := &DockerDownloadLink{}
+	t.Parallel()
+	dd := &DockerDownloadLink{}

125-137: Make subtests parallel-safe (capture loop var) and run them in parallel.

Prevents the classic loop-var capture pitfall and speeds up execution.

-	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
+	for _, tt := range tests {
+		tt := tt
+		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			image, tag := dd.parseImageName(tt.input)

16-123: Broaden coverage: add explicit docker.io, index.docker.io, port+2-part+tag, and common registries.

These pass under typical parsing rules and help guard regressions.

 		// Edge cases
 		{
 			name:          "complex tag with dash",
 			input:         "user/repo:v1.0-alpine",
 			expectedImage: "user/repo",
 			expectedTag:   "v1.0-alpine",
 		},
 		{
 			name:          "complex registry domain",
 			input:         "my-registry.internal.company.com/team/project:sha-abc123",
 			expectedImage: "team/project",
 			expectedTag:   "sha-abc123",
 		},
+		// Additional common forms
+		{
+			name:          "explicit docker.io official no tag",
+			input:         "docker.io/nginx",
+			expectedImage: "library/nginx",
+			expectedTag:   "latest",
+		},
+		{
+			name:          "explicit index.docker.io official with tag",
+			input:         "index.docker.io/library/busybox:1.36",
+			expectedImage: "library/busybox",
+			expectedTag:   "1.36",
+		},
+		{
+			name:          "registry with port 2-part with tag",
+			input:         "localhost:5000/myimage:dev",
+			expectedImage: "myimage",
+			expectedTag:   "dev",
+		},
+		{
+			name:          "ghcr no tag",
+			input:         "ghcr.io/owner/repo",
+			expectedImage: "owner/repo",
+			expectedTag:   "latest",
+		},
+		{
+			name:          "quay with tag",
+			input:         "quay.io/coreos/etcd:latest",
+			expectedImage: "coreos/etcd",
+			expectedTag:   "latest",
+		},
+		{
+			name:          "gcr multi-segment path",
+			input:         "gcr.io/k8s-artifacts-prod/kube-apiserver:v1.31.0",
+			expectedImage: "k8s-artifacts-prod/kube-apiserver",
+			expectedTag:   "v1.31.0",
+		},

140-185: Parallelize and dedupe the focused edge-case tests.

Parallelize subtests; consider folding these into the main table to avoid duplication.

 func TestDockerDownloadLink_parseImageName_EdgeCases(t *testing.T) {
-	dd := &DockerDownloadLink{}
+	t.Parallel()
+	dd := &DockerDownloadLink{}
 
 	t.Run("grafana/grafana should remain unchanged", func(t *testing.T) {
+		t.Parallel()
 		image, tag := dd.parseImageName("grafana/grafana")
@@
 	t.Run("registry.com/image should extract image name", func(t *testing.T) {
+		t.Parallel()
 		image, tag := dd.parseImageName("registry.com/myapp")
@@
 	t.Run("ECR registry should extract repo name", func(t *testing.T) {
+		t.Parallel()
 		image, tag := dd.parseImageName("123456789012.dkr.ecr.us-east-2.amazonaws.com/nebula-test-secrets-repo-nyx0")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fd5f634 and cdea2e0.

📒 Files selected for processing (4)
  • pkg/links/docker/download.go (5 hunks)
  • pkg/links/docker/download_test.go (1 hunks)
  • pkg/links/noseyparker/noseyparker.go (1 hunks)
  • pkg/types/docker.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/links/docker/download_test.go (1)
pkg/links/docker/download.go (1)
  • DockerDownloadLink (24-29)
pkg/types/docker.go (1)
pkg/types/noseyparker.go (1)
  • NPCommitMetadata (121-133)
🔇 Additional comments (7)
pkg/links/noseyparker/noseyparker.go (1)

32-35: Good fix for the deadlock issue!

The addition of the defer block ensures that n.done always receives a signal even if an error occurs during scanning. This prevents the goroutine from hanging indefinitely when n.executeFn returns an error.

pkg/types/docker.go (1)

274-274: LGTM!

The addition of RepoPath for file processing maintains consistency with the layer processing approach.

pkg/links/docker/download.go (4)

111-117: Good improvement to tag extraction logic

Using LastIndex and checking for slashes in the suffix properly handles registry URLs with ports (e.g., registry.example.com:5000/image:tag).


124-143: Registry normalization logic is comprehensive

The implementation handles various registry formats well, including ECR, public.ecr.aws, and standard registries. The logic correctly preserves Docker Hub's 2-part format while extracting image names from registry URLs.


272-283: ECR token generation in auth endpoint is properly implemented

The function correctly identifies ECR registries and returns Basic auth tokens directly without making HTTP requests to auth endpoints, which aligns with ECR's authentication model.


325-331: Authorization header handling for ECR is consistent

Both getManifest and fetchBlob correctly apply Basic auth for ECR registries and Bearer auth for others.

Also applies to: 542-548

pkg/links/docker/download_test.go (1)

1-186: Solid coverage and clear intent.

The table-driven structure is good and catches many real-world formats. Nice work.

Comment on lines +84 to +108
// ECR formats
{
name: "ecr private registry",
input: "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo",
expectedImage: "my-repo",
expectedTag: "latest",
},
{
name: "ecr private registry with tag",
input: "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:v1.0",
expectedImage: "my-repo",
expectedTag: "v1.0",
},
{
name: "ecr public registry",
input: "public.ecr.aws/my-namespace/my-repo",
expectedImage: "my-repo",
expectedTag: "latest",
},
{
name: "ecr public registry with tag",
input: "public.ecr.aws/my-namespace/my-repo:latest",
expectedImage: "my-repo",
expectedTag: "latest",
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm ECR path normalization assumptions.

Tests strip the namespace segment for public ECR (public.ecr.aws/namespace/repo → repo). Double-check this matches how the download code constructs the v2 manifest path; public ECR often expects namespace/repo in the path.


🌐 Web query:

For AWS ECR:
1) For public ECR, are repository references of the form public.ecr.aws/<namespace>/<repo>, and does the Docker Registry API path require v2/<namespace>/<repo>/manifests/<tag>?
2) For private ECR, can repository names include slashes (e.g., team/repo), and how should these be represented in the v2 path?
Please provide authoritative AWS docs references.

💡 Result:

Short answers

  1. Yes. Public ECR repository image URIs use public.ecr.aws/<registry_alias_or_namespace>/. The Docker Registry HTTP API path for manifests is https://public.ecr.aws/v2/<registry_alias_or_namespace>//manifests/. [1][2]

  2. Yes. Private ECR repository names may include forward slashes (namespaces), e.g. team/repo. When calling the Docker Registry HTTP API against a private registry use the full repository name in the v2 path, e.g. https://<aws_account_id>.dkr.ecr..amazonaws.com/v2/team/repo/manifests/. (Repository name rules and examples shown in the CreateRepository docs; private registry HTTP API example shown in the auth docs.) [3][4]

References

  • Amazon ECR Public — public repo URI and pull examples. [1]
  • Amazon ECR Public — HTTP API (curl) example showing v2///manifests/. [2]
  • Amazon ECR (private) — creating repositories; repository-name pattern allows forward slashes (namespaces). [3]
  • Amazon ECR (private) — HTTP API authentication example showing v2//tags/list (same v2/ path form applies for manifests). [4]

Preserve full ECR namespace in manifest paths and tests
Update pkg/links/docker/download_test.go:

  • Public ECR cases must expect <namespace>/<repo>, not just repo, since the manifest endpoint is /v2/<namespace>/<repo>/manifests/<tag>.
  • Add a private ECR test with a multi-segment repository name (e.g. team/repo) to ensure slashes are preserved in the v2 path.
🤖 Prompt for AI Agents
In pkg/links/docker/download_test.go around lines 84 to 108, the ECR test cases
incorrectly expect only the repository name for public ECR and lack a private
ECR case with a multi-segment repo; update the public ECR expectedImage values
to include the full namespace/repo (e.g. "my-namespace/my-repo") so the manifest
path matches /v2/<namespace>/<repo>/manifests/<tag>, and add a new private ECR
test case using a multi-segment repository name (e.g.
"123456789012.dkr.ecr.us-east-1.amazonaws.com/team/my-repo" with expectedImage
"team/my-repo" and appropriate expectedTag) to ensure slashes are preserved in
the v2 path.

@josephwhenry josephwhenry merged commit 915a140 into main Sep 9, 2025
2 of 3 checks passed
@josephwhenry josephwhenry deleted the ztg/ecr-fix branch September 9, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants