Skip to content

ci: Add script &CI to check dead links #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .github/workflows/check-broken-links-schedule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
name: Scheduled Broken Links Check

on:
# Run on schedule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove obvious comments

schedule:
- cron: "0 0 * * 0" # Runs at 00:00 UTC every Sunday
# Manual trigger
workflow_dispatch:

jobs:
check-links:
runs-on: ubuntu-latest
permissions:
contents: read
issues: write # Permission needed to create issues

steps:
- name: Checkout repository
uses: actions/checkout@v4

# Optional: Cache to reduce API rate limits and duplicate requests
- name: Restore lychee cache
uses: actions/cache@v4
with:
path: .lycheecache
key: cache-lychee-${{ github.sha }}
restore-keys: cache-lychee-

- name: Link Checker
id: lychee
uses: lycheeverse/lychee-action@v2
env:
GITHUB_TOKEN: ${{ github.token }}
with:
args: >-
--cache
--max-cache-age 48h
--verbose
--no-progress
--exclude-path ".git"
--exclude-path "node_modules"
--max-retries 5
--timeout 30
--max-concurrency 8
--retry-wait-time 3
'./**/*.md'
'./**/*.html'
'./**/*.txt'
fail: false
format: markdown

- name: Create Issue From File
if: steps.lychee.outputs.exit_code != 0
uses: peter-evans/create-issue-from-file@v5
with:
title: 🔍 Broken Links Report
content-filepath: ./lychee/out.md
labels: bug, documentation
68 changes: 68 additions & 0 deletions .github/workflows/check-broken-links.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to keep two different workflows? they do pretty much the same thing, let's refactor them

Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: Check Broken Links

on:
pull_request:
types: [opened, synchronize, reopened]
# Optional: Add scheduled checks
# schedule:
# - cron: "0 0 * * 0" # Runs once every Sunday

jobs:
check-links:
runs-on: ubuntu-latest
permissions:
contents: read

steps:
- name: Checkout repository
uses: actions/checkout@v4

# Set up caching to reduce API requests
- name: Restore lychee cache
uses: actions/cache@v4
with:
path: .lycheecache
key: cache-lychee-${{ github.sha }}
restore-keys: cache-lychee-

- name: Setup link exclusion patterns (optional)
id: setup-exclude
run: |
if [ -f .lycheeignore ]; then
echo "Exclusion patterns found in .lycheeignore"
else
echo "# Add URL regex patterns to exclude, one per line" > .lycheeignore
echo "# Example: ^https://example.com" >> .lycheeignore
fi

- name: Link Checker
id: lychee
uses: lycheeverse/lychee-action@v2
env:
GITHUB_TOKEN: ${{ github.token }}
with:
args: >-
--cache
--max-cache-age 48h
--verbose
--no-progress
--exclude-path ".git"
--exclude-path "node_modules"
--max-retries 5
--timeout 30
--max-concurrency 8
--retry-wait-time 3
'./**/*.md'
'./**/*.html'
'./**/*.txt'
fail: true
format: markdown
output: ./lychee-report.md

# If you want to post check results as PR comments, uncomment the following step
# - name: Create Comment
# uses: peter-evans/create-or-update-comment@v3
# if: github.event_name == 'pull_request' && steps.lychee.outputs.exit_code != 0
# with:
# issue-number: ${{ github.event.pull_request.number }}
# body-file: ./lychee-report.md
34 changes: 34 additions & 0 deletions .lycheeignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# URL patterns to exclude from checking, one regex pattern per line
# These links will be ignored by lychee

# Example domains
^https?://example\.com
^https?://example\.org

# Common temporary URLs or local development URLs
^https?://localhost
^https?://127\.0\.0\.1
^https?://0\.0\.0\.0

# Social media links (often have anti-scraping measures that may cause checks to fail)
^https?://(www\.)?linkedin\.com
^https?://(www\.)?twitter\.com
^https?://(www\.)?facebook\.com

# Files that may have restricted access
\.pdf$

# Local file paths that exist in production but not in CI environment
file:///home/runner/work/eclipse-edc.github.io/eclipse-edc.github.io/content/en/images/edc.schematic.svg
# Exclude all local SVG files as they may be processed during build
file://.*\.svg$
# Exclude content directory files which may be generated during build
file://.*?/content/.*

# GitHub specific patterns to reduce API rate limiting
# These patterns are specifically for repositories that frequently cause 429 errors
^https?://github\.com/git/git/blob/
^https?://raw\.githubusercontent\.com/git/
^https?://api\.github\.com/

# Add project-specific URL patterns to exclude here
Loading