Skip to content

Conversation

@carlosgrillet
Copy link

@carlosgrillet carlosgrillet commented Oct 22, 2025

Summary

Categories with subcategories (h3) were incorrectly including all nested links. For example, "Command Line" would include links from both "Advanced Console UIs" and "Standard CLI" subcategories.

Fixes the documented bug at main.go:245-247.

Summary by CodeRabbit

  • Bug Fixes

    • Improved category extraction to include sub-lists under headers and to derive categories from table-of-contents structures.
    • Empty categories are now detected and skipped so processing continues instead of stopping.
  • Tests

    • Added comprehensive tests covering subcategory exclusion, empty-category handling, TOC-driven extraction, and link-count/filtering validation.

@github-actions
Copy link

Automated Quality Checks (from CONTRIBUTING minimum standards)

  • Repo link: missing
  • pkg.go.dev: missing
  • goreportcard: missing
  • coverage: missing

These checks are a best-effort automation and do not replace human review.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

Walkthrough

Compute the TOC from body #contents by selecting the next ul under its first following element (fallback to NextFiltered("ul")); start processing categories from that TOC; treat empty categories as sentinel errCategoryNoLinks and continue on it; broaden link lookup to stop at h2 or h3.

Changes

Cohort / File(s) Summary
Category extraction logic
main.go
Introduced sentinel errCategoryNoLinks; compute TOC from the next ul under the first following element of body #contents (fallback to NextFiltered("ul")); extractCategories begins from that TOC and continues on errCategoryNoLinks; extractCategory uses NextFilteredUntil("ul", "h2, h3") to locate lists and returns errCategoryNoLinks for empty categories.
Category extraction tests
category_extraction_test.go
Added tests validating category extraction: excluding subcategory-only links, extracting direct links, signaling errCategoryNoLinks for empty/subcategory-only categories, handling multiple categories via details/summary TOC, and skipping empty categories. Tests use markdown.ToHTML, goquery, extractCategory, extractCategories, and errCategoryNoLinks.

Sequence Diagram(s)

sequenceDiagram
  participant Main as main
  participant DOM as HTML DOM
  participant ExtractCats as extractCategories
  participant ExtractOne as extractCategory

  Note over Main,DOM #DDEBF7: Start category extraction
  Main->>DOM: Find `body #contents`
  DOM-->>Main: contents node
  Main->>ExtractCats: compute TOC: next `ul` under first following element (fallback NextFiltered("ul"))
  ExtractCats->>ExtractOne: for each TOC item -> extractCategory(item)
  alt links found
    ExtractOne->>DOM: NextFilteredUntil("ul","h2, h3") -> gather links
    DOM-->>ExtractOne: links list
    ExtractOne-->>ExtractCats: return category with links
  else no links (empty)
    ExtractOne-->>ExtractCats: return errCategoryNoLinks
    Note right of ExtractCats #FDE9D9: continue to next TOC item
  else other error
    ExtractOne-->>ExtractCats: return wrapped error
    ExtractCats-->>Main: abort with error
  end
  ExtractCats-->>Main: aggregated categories map
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

"I hopped through lists and headers bright,
Found TOCs hidden out of sight,
Skipped the empty with a twitch of nose,
Let parsing carry on its toes,
A rabbit's cheer for tidy bytes 🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix: exclude h3 subcategory links from h2 category extraction" directly aligns with the stated objectives. The PR addresses a documented bug where h2 categories incorrectly included links from their h3 subcategories (e.g., "Command Line" was including links from "Advanced Console UIs" and "Standard CLI"). The changes in main.go modify the link extraction logic by using NextFilteredUntil("ul", "h2, h3") to stop at h3 headers, thereby excluding nested subcategory links, and the test suite validates this behavior. The title is concise, specific, and clearly communicates the primary fix without unnecessary detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
main.go (1)

223-225: Replace string error comparison with sentinel error or error type.

Comparing error messages using string equality is fragile and not idiomatic Go. If the error message in extractCategory changes, this check will silently break.

Apply this diff to use a sentinel error:

+var errCategoryNoLinks = errors.New("category does not contain links")
+
 func extractCategories(doc *goquery.Document) (map[string]Category, error) {
 	categories := make(map[string]Category)
 	var rootErr error

Then update the error handling:

 					category, err := extractCategory(doc, selector)
 					if err != nil {
-						if err.Error() == "build a category: category does not contain links" {
+						if errors.Is(err, errCategoryNoLinks) {
 							return true
 						}
 						rootErr = fmt.Errorf("extract category: %w", err)
 						return false
 					}

And modify extractCategory to return the sentinel error:

 		// FIXME: In this case we would have an empty category in main index.html with link to 404 page.
 		if len(links) == 0 {
-			err = errors.New("category does not contain links")
+			err = errCategoryNoLinks
 			return false
 		}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90f89e3 and 00e6c7d.

📒 Files selected for processing (1)
  • main.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use standard formatting (gofmt) and idiomatic Go style
Favor small, testable functions in Go
Keep exported Go APIs documented with Go-style comments
Maintain ≥80% coverage for non-data packages and ≥90% for data packages when adding new testable Go code

Files:

  • main.go
{**/*.go,.github/scripts/**/*.js}

📄 CodeRabbit inference engine (AGENTS.md)

Use American English spelling in code comments

Files:

  • main.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (2)
main.go (2)

251-251: Core fix looks correct.

The change from NextFilteredUntil("ul", "h2") to NextFilteredUntil("ul", "h2, h3") directly addresses the stated bug. By stopping at both h2 and h3 headers, the selector now correctly prevents h2 categories from including links that belong to h3 subcategories.


201-206: The TOC extraction change is intentional and part of the coordinated fix.

The diff reveals all three changes are intentional parts of a single commit. The TOC extraction logic (lines 201-206) adds defensive handling: it first tries Next().Find("ul").First() (for cases where ul is nested within the next sibling), then falls back to the original NextFiltered("ul") (for cases where ul is directly the next sibling). This suggests the author discovered the HTML structure can vary.

The main bug fix is clearly correct: changing NextFilteredUntil("ul", "h2") to NextFilteredUntil("ul", "h2, h3") (line 251) stops traversal at both h2 and h3 headers, preventing h3 subcategory links from being included in h2 categories—exactly as described in the removed FIXME comment.

One minor note: the error handling (lines 223-225) uses fragile string comparison (err.Error() == "..."), which is generally considered an anti-pattern, though it is functional in this context.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main.go (1)

248-291: Add unit tests for the extractCategory and extractCategories functions.

The code changes modify critical category extraction logic, but verification shows no explicit unit tests exist for extractCategory or extractCategories. The PR's "needs-coverage" label and the coding guideline requirement of ≥80% coverage for non-data packages both indicate this gap must be addressed.

Add tests covering:

  • Categories with h3 subcategories correctly exclude subcategory links
  • H2 categories without h3 subcategories continue working
  • Empty categories return errCategoryNoLinks without error
  • The specific "Command Line" category bug case mentioned in the PR description
🧹 Nitpick comments (1)
main.go (1)

270-273: LGTM! Correct use of sentinel error for empty categories.

Returning errCategoryNoLinks when no links are found is appropriate. Combined with the error handling at lines 226-228, empty categories are now skipped entirely, which prevents the 404 page issue mentioned in the FIXME comment.

Consider updating the FIXME comment to reflect that empty categories are now skipped:

-// FIXME: In this case we would have an empty category in main index.html with link to 404 page.
+// Empty categories are skipped via errCategoryNoLinks to avoid creating 404 pages in index.html.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00e6c7d and bebe3ae.

📒 Files selected for processing (1)
  • main.go (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use standard formatting (gofmt) and idiomatic Go style
Favor small, testable functions in Go
Keep exported Go APIs documented with Go-style comments
Maintain ≥80% coverage for non-data packages and ≥90% for data packages when adding new testable Go code

Files:

  • main.go
{**/*.go,.github/scripts/**/*.js}

📄 CodeRabbit inference engine (AGENTS.md)

Use American English spelling in code comments

Files:

  • main.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (4)
main.go (4)

37-38: LGTM! Idiomatic sentinel error.

The sentinel error declaration follows Go best practices and enables clear error handling for categories without links.


226-228: LGTM! Correct error handling for empty categories.

Using errors.Is() to detect the sentinel error and continuing iteration is the right approach. This allows categories without links to be skipped gracefully rather than aborting the entire extraction process.


254-254: LGTM! Core fix correctly excludes h3 subcategory links.

The change from NextFilteredUntil("ul", "h2") to NextFilteredUntil("ul", "h2, h3") correctly stops link extraction at both h2 and h3 headers, preventing subcategory links from being incorrectly included in parent categories. This directly addresses the documented bug mentioned in the PR description.


204-207: Verify whether the fallback selector is actually needed, or document its purpose with a clarifying comment.

Based on the current README.md structure and goldmark HTML conversion, the first selector (.Next().Find("ul").First()) should handle the actual structure where <h2 id="contents"> is followed by a <details> wrapper containing the list. The second selector (.NextFiltered("ul")) appears to be defensive code for a different structure where <ul> directly follows the heading.

Without test coverage showing the fallback path being used, it's unclear whether:

  1. This is intentional defensive programming for potential README format variations, or
  2. The fallback selector is dead code that can be removed

Consider either: (a) adding a comment explaining both selector paths and when each is used, as suggested in the original review, or (b) validating through tests whether the fallback is actually reachable and removing it if unnecessary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
category_extraction_test.go (1)

115-124: Consider guarding against index out of bounds in the validation loop.

If the length check fails but the test continues (using t.Errorf rather than t.Fatalf), the loop at lines 120-123 could panic when accessing expectedLinks[i] if len(category.Links) > len(expectedLinks).

Consider one of these approaches:

Option 1: Use t.Fatalf to stop early

 	if len(category.Links) != 3 {
-		t.Errorf("Expected 3 links but got %d", len(category.Links))
+		t.Fatalf("Expected 3 links but got %d", len(category.Links))
 	}

Option 2: Guard the loop

 	expectedLinks := []string{"Tool A", "Tool B", "Tool C"}
-	for i, link := range category.Links {
+	for i := 0; i < min(len(category.Links), len(expectedLinks)); i++ {
+		link := category.Links[i]
 		if link.Title != expectedLinks[i] {
 			t.Errorf("Expected link %d to be '%s', got '%s'", i, expectedLinks[i], link.Title)
 		}
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bebe3ae and 93eb19b.

📒 Files selected for processing (1)
  • category_extraction_test.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use standard formatting (gofmt) and idiomatic Go style
Favor small, testable functions in Go
Keep exported Go APIs documented with Go-style comments
Maintain ≥80% coverage for non-data packages and ≥90% for data packages when adding new testable Go code

Files:

  • category_extraction_test.go
{**/*.go,.github/scripts/**/*.js}

📄 CodeRabbit inference engine (AGENTS.md)

Use American English spelling in code comments

Files:

  • category_extraction_test.go
🧬 Code graph analysis (1)
category_extraction_test.go (1)
pkg/markdown/convert.go (1)
  • ToHTML (16-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (1)
category_extraction_test.go (1)

12-263: Excellent test coverage for the subcategory exclusion fix!

The test suite comprehensively validates the bug fix and covers multiple scenarios:

  • Core fix validation (excluding h3 subcategory links from h2 categories)
  • Simple categories without subcategories
  • Empty category error handling with sentinel error errCategoryNoLinks
  • Categories containing only subcategories
  • TOC-driven category extraction
  • Skipping empty categories in batch extraction

The test structure is clear, follows Go conventions, and provides good debugging output.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
category_extraction_test.go (5)

51-54: Derive heading anchors from the generated HTML instead of hardcoding them.

Hardcoded IDs (e.g., "#test-category") couple tests to the current ID generator and can break with minor changes. Resolve by computing the anchor from the actual h2 element text.

Apply diffs like below (sample hunks shown; apply similarly to all anchor usages):

-	category, err := extractCategory(doc, "#test-category")
+	category, err := extractCategory(doc, anchorForH2(t, doc, "Test Category"))
-	category, err := extractCategory(doc, "#simple-category")
+	category, err := extractCategory(doc, anchorForH2(t, doc, "Simple Category"))
-	_, err = extractCategory(doc, "#empty-category")
+	_, err = extractCategory(doc, anchorForH2(t, doc, "Empty Category"))
-	_, err = extractCategory(doc, "#parent-only-category")
+	_, err = extractCategory(doc, anchorForH2(t, doc, "Parent Only Category"))
-	catA, ok := categories["#category-a"]
+	aAnchor := anchorForH2(t, doc, "Category A")
+	catA, ok := categories[aAnchor]
-	catB, ok := categories["#category-b"]
+	bAnchor := anchorForH2(t, doc, "Category B")
+	catB, ok := categories[bAnchor]
-	if _, ok := categories["#good-category"]; !ok {
+	if _, ok := categories[anchorForH2(t, doc, "Good Category")]; !ok {
-	if _, ok := categories["#another-good-category"]; !ok {
+	if _, ok := categories[anchorForH2(t, doc, "Another Good Category")]; !ok {
-	if _, ok := categories["#empty-category"]; ok {
+	if _, ok := categories[anchorForH2(t, doc, "Empty Category")]; ok {

Add this helper in the test file (outside the shown ranges):

func anchorForH2(t *testing.T, doc *goquery.Document, heading string) string {
	t.Helper()
	var out string
	doc.Find("h2").EachWithBreak(func(_ int, s *goquery.Selection) bool {
		if strings.TrimSpace(s.Text()) == heading {
			if id, ok := s.Attr("id"); ok {
				out = "#" + id
			}
			return false
		}
		return true
	})
	if out == "" {
		t.Fatalf("anchor for heading %q not found", heading)
	}
	return out
}

Remember to import "strings".

Also applies to: 106-109, 148-155, 187-195, 243-253, 255-262, 316-326


41-49: DRY the test setup with a shared docFromMarkdown helper.

The HTML conversion and goquery document creation are duplicated. Centralize for clarity and maintainability.

Apply this pattern (sample hunk for one test; replicate in others):

-	html, err := markdown.ToHTML([]byte(testMarkdown))
-	if err != nil {
-		t.Fatalf("Failed to convert markdown to HTML: %v", err)
-	}
-
-	doc, err := goquery.NewDocumentFromReader(bytes.NewReader(html))
-	if err != nil {
-		t.Fatalf("Failed to create goquery document: %v", err)
-	}
+	doc := docFromMarkdown(t, testMarkdown)

Add the helper once in this file:

func docFromMarkdown(t *testing.T, md string) *goquery.Document {
	t.Helper()
	html, err := markdown.ToHTML([]byte(md))
	if err != nil {
		t.Fatalf("Failed to convert markdown to HTML: %v", err)
	}
	doc, err := goquery.NewDocumentFromReader(bytes.NewReader(html))
	if err != nil {
		t.Fatalf("Failed to create goquery document: %v", err)
	}
	return doc
}

Also applies to: 96-104, 138-146, 177-185, 224-232, 296-304


12-13: Enable test parallelization.

These tests are independent; running them in parallel speeds up CI and surfaces hidden data races.

Insert at the top of each test:

 func TestCategoryExtractionDoesNotIncludeSubcategories(t *testing.T) {
+	t.Parallel()

(Repeat for the other test functions.)

Also applies to: 85-86, 127-128, 158-159, 197-199, 265-267


60-64: Remove or gate verbose logging in tests.

The per-link t.Logf noise isn’t needed for assertions and clutters CI output. Remove or guard with testing.Verbose().

-	t.Logf("Found %d links in category", len(category.Links))
-	for i, link := range category.Links {
-		t.Logf("  Link %d: %s - %s", i+1, link.Title, link.URL)
-	}
+	if testing.Verbose() {
+		t.Logf("Found %d links in category", len(category.Links))
+		for i, link := range category.Links {
+			t.Logf("  Link %d: %s - %s", i+1, link.Title, link.URL)
+		}
+	}

115-124: Strengthen assertions (URLs and titles).

  • Also assert URLs in TestCategoryWithoutSubcategories.
  • Assert Category B’s title to match expectations.
-	expectedLinks := []string{"Tool A", "Tool B", "Tool C"}
+	expectedLinks := []string{"Tool A", "Tool B", "Tool C"}
+	expectedURLs := []string{
+		"https://example.com/a",
+		"https://example.com/b",
+		"https://example.com/c",
+	}
 	for i, link := range category.Links {
 		if link.Title != expectedLinks[i] {
 			t.Errorf("Expected link %d to be '%s', got '%s'", i, expectedLinks[i], link.Title)
 		}
+		if link.URL != expectedURLs[i] {
+			t.Errorf("Expected URL for link %q to be %s, got %s", link.Title, expectedURLs[i], link.URL)
+		}
 	}
 	catB, ok := categories["#category-b"]
 	if !ok {
 		t.Error("Category B not found")
 	} else {
+		if catB.Title != "Category B" {
+			t.Errorf("Expected title 'Category B', got '%s'", catB.Title)
+		}
 		if len(catB.Links) != 1 {
 			t.Errorf("Expected 1 link in Category B, got %d", len(catB.Links))
 		}
 	}

Also applies to: 255-262

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93eb19b and 752012d.

📒 Files selected for processing (1)
  • category_extraction_test.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Use standard formatting (gofmt) and idiomatic Go style
Favor small, testable functions in Go
Keep exported Go APIs documented with Go-style comments
Maintain ≥80% coverage for non-data packages and ≥90% for data packages when adding new testable Go code

Files:

  • category_extraction_test.go
{**/*.go,.github/scripts/**/*.js}

📄 CodeRabbit inference engine (AGENTS.md)

Use American English spelling in code comments

Files:

  • category_extraction_test.go
🧬 Code graph analysis (1)
category_extraction_test.go (1)
pkg/markdown/convert.go (1)
  • ToHTML (16-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant