-
Notifications
You must be signed in to change notification settings - Fork 9
ci: Add script &CI to check dead links #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
liugddx
wants to merge
17
commits into
eclipse-edc:main
Choose a base branch
from
liugddx:ci-1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 7 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
90b83b3
fix dead links
liugddx 3225cf3
fix dead links
liugddx 9d2cf21
fix dead links
liugddx dd4139c
fix dead links
liugddx 964260d
fix dead links
liugddx a035d2a
fix dead links
liugddx 8539dde
fix dead links
liugddx 8f904d5
Update exclude_patterns.txt
liugddx a6fbaed
Improve broken link checking workflow
liugddx 4c833f6
Update link checking workflow and remove README changes
liugddx ec114ff
Remove obsolete exclude_patterns.txt
liugddx 5e3a74d
Fix local file link checking issues
liugddx 6228070
Add alternative markdown-link-check workflow
liugddx 822a02d
Remove alternative workflow and optimize GitHub API rate limits
liugddx bd3aa05
Fix Hugo rendering issues in link checking
liugddx 8c44ee4
Simplify link checking to focus on external URLs only
liugddx 0eb8fac
Improve broken link detection workflows
liugddx File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
name: Check Dead Links | ||
|
||
on: | ||
pull_request: | ||
types: [opened, synchronize, reopened] | ||
|
||
jobs: | ||
check-links: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v3 | ||
|
||
- name: Verify curl installation | ||
run: curl --version | ||
|
||
- name: Extract and clean URLs from all documentation | ||
id: extract_urls | ||
run: | | ||
FILE_EXTENSIONS="*.md *.html *.txt" | ||
|
||
REGEX='https?://[^\s)"'"'"'<`:,]+' | ||
|
||
find . \( -name "*.md" -o -name "*.html" -o -name "*.txt" \) -type f -print0 | \ | ||
xargs -0 grep -oPh "$REGEX" > urls.txt || true | ||
|
||
sort -u urls.txt -o urls.txt | ||
|
||
echo "Total URLs found: $(wc -l < urls.txt)" | ||
|
||
if [ -f exclude_patterns.txt ]; then | ||
EXCLUDE_REGEX=$(paste -sd'|' exclude_patterns.txt) | ||
grep -vE "$EXCLUDE_REGEX" urls.txt > filtered_urls.txt | ||
else | ||
echo "exclude_patterns.txt not found. No URLs will be excluded." | ||
cp urls.txt filtered_urls.txt | ||
fi | ||
|
||
echo "Total URLs after exclusion: $(wc -l < filtered_urls.txt)" | ||
|
||
sed -E 's/[".>,)]+$//' filtered_urls.txt > cleaned_urls.txt | ||
|
||
echo "Total URLs after cleaning: $(wc -l < cleaned_urls.txt)" | ||
|
||
mv cleaned_urls.txt filtered_urls.txt | ||
|
||
- name: Print URLs to be checked | ||
run: | | ||
echo "===== URLs to be checked =====" | ||
cat filtered_urls.txt | ||
echo "==============================" | ||
|
||
- name: Check if URLs were found | ||
run: | | ||
if [ ! -s filtered_urls.txt ]; then | ||
echo "No URLs found to check after applying exclusions." | ||
exit 0 | ||
fi | ||
|
||
- name: Check URLs using curl | ||
shell: bash | ||
run: | | ||
set +e | ||
|
||
TOTAL=0 | ||
FAILED=0 | ||
DEAD_LINKS=() | ||
|
||
while IFS= read -r url; do | ||
TOTAL=$((TOTAL +1)) | ||
echo "[$TOTAL] Checking URL: $url" | ||
|
||
HTTP_STATUS=$(curl -k \ | ||
-A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)" \ | ||
-H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \ | ||
-H "Accept-Language: en-US,en;q=0.5" \ | ||
-H "Connection: keep-alive" \ | ||
-s -o /dev/null -w "%{http_code}" -L --connect-timeout 60 --retry 3 "$url" || echo "000") | ||
|
||
FINAL_URL=$(curl -k \ | ||
-A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)" \ | ||
-H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \ | ||
-H "Accept-Language: en-US,en;q=0.5" \ | ||
-H "Connection: keep-alive" \ | ||
-s -o /dev/null -w "%{url_effective}" -L --connect-timeout 60 --retry 3 "$url") | ||
|
||
echo "HTTP status for $url: $HTTP_STATUS" | ||
echo "Final URL after redirects: $FINAL_URL" | ||
|
||
if [[ "$HTTP_STATUS" -ge 400 || "$HTTP_STATUS" -eq "000" ]]; then | ||
echo "❌ Dead link found: $url (HTTP status: $HTTP_STATUS)" | ||
DEAD_LINKS+=("$url") | ||
FAILED=$((FAILED +1)) | ||
else | ||
echo "✅ Link is valid: $url (HTTP status: $HTTP_STATUS)" | ||
fi | ||
done < filtered_urls.txt | ||
|
||
echo "Total links checked: $TOTAL" | ||
echo "Dead links found: $FAILED" | ||
|
||
if [ "$FAILED" -ne 0 ]; then | ||
echo "::error::Found $FAILED dead links." | ||
for dead in "${DEAD_LINKS[@]}"; do | ||
echo "::error::Dead link: $dead" | ||
done | ||
|
||
printf "**Found %d dead links:**\n" "$FAILED" > dead_links.md | ||
for dead in "${DEAD_LINKS[@]}"; do | ||
printf "- %s\n" "$dead" >> dead_links.md | ||
done | ||
|
||
cat dead_links.md | ||
|
||
exit 1 | ||
else | ||
echo "All $TOTAL links are valid." | ||
fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
<HOST> | ||
http://callback/url | ||
http://control-plane | ||
http://custom-dataplane-host:3000/dataflows | ||
http://example.com/asset:12345 | ||
localhost | ||
provider-address | ||
http://provider/api/dsp | ||
controlplane-host | ||
dataplane-host | ||
example.com | ||
death.star |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to keep two different workflows? they do pretty much the same thing, let's refactor them