-
Notifications
You must be signed in to change notification settings - Fork 9
ci: Add script &CI to check dead links #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
PTAL @ndr-brt |
added myself as a reviewer, could you put the PR in draft until it will be effectively ready for review? So I will be more proactive in reviewing it ;) EDIT: checks need to be green before having a review |
CI failure is caused by dead links
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure is caused by dead links
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/ Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/dataFeed Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedFrequency Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedName Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedType Error: Dead link: http://w3id.org/starwars/context.jsonld Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/ Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/faction Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/name Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/person Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/webpage Error: Dead link: https://w3id.org/cx/v0.8/ Error: Dead link: https://w3id.org/idsa/v4.1/HTTP Error: Dead link: https://w3id.org/starwars/v0.0.1/ns/faction Error: Dead link: https://w3id.org/tractusx-trust/v0.8
these are not proper links, these are jsonld namespaces and they don't need to be checked (as any url in a code block I'd say)
Error: Dead link: https://identity.foundation/ Error: Dead link: https://identity.foundation/decentralized-web-node/spec/ Error: Dead link: https://identity.foundation/didcomm-messaging/spec/ Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/ Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/#presentation-definition Error: Dead link: https://identity.foundation/presentation-exchange/submission/v1
these are proper links, why are they listed?
The result returned by testing identity.foundation is 403, I don't know what's causing this. |
This is the latest list.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deep diving into this, I'm usually in favor of home made solutions, but this looks fairly over-complicated.
Why don't rely on already existing tools? With a brief search I found some of them, like lychee (it also has a dedicated github action). I would consider those, because I don't know how much effort we want to put in the future in maintaining this script.
Plus, the exclude_patterns.txt
it contains strings that are not "links", they are only url in the pages, and they shouldn't be taken into consideration at first
Agree. I had a look for an existing tool and found an GitHub Action that uses Linkspector and Reviewdog. I created an example on how this might be used. The action is triggered at PR creation and displays the errors in the workflow log and in the PR itself. There are further configuration options that might be useful (e.g. to limit to specific directories). |
Replace the custom curl-based link checking with the mainstream lychee-based solution. - Add PR-triggered workflow using lychee-action - Create scheduled workflow that creates issues for broken links - Add .lycheeignore file for URL exclusion patterns - Update README with link checking documentation The lychee tool is more efficient and reliable than the previous solution, while providing additional features like caching and exclusion rules.
- Restore README to original state - Convert all comments to English in workflow files - Update the format of .lycheeignore file
The exclusion patterns are now handled by .lycheeignore file in the new lychee-based implementation
- Add patterns to .lycheeignore for SVG and content directory files - Update workflow configurations to handle local file paths better - Ensure consistent settings between PR and scheduled workflows
- Create alternative workflow using markdown-link-check tool - Configure with equivalent ignore patterns - Set as manually triggered workflow for comparison testing
- Remove markdown-link-check alternative implementation - Optimize lychee configuration to handle GitHub API rate limits - Add caching to PR workflow and increase cache time to 48h - Add specific GitHub patterns to .lycheeignore to reduce API requests - Increase timeout, retries and wait time between requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the PR and identified an issue we are jumping into, especially within this repo. As Hugo is doing some rendering work for us, this unfortunately causes many links to be identified as broken by the GitHub Actions, which actually are not.
This is not only an issue from this PR! It is also not lychee or linespector (same issue)! Scripts are following their natural purpose and find a lot of links that are broken and they are from the plain source files. However, they are not broken after rendering of the website's resources by Hugo.
For example, the Actions lists the logos in the root _index.md as not found
with src="/static/images/logos/huawei.logo.svg"
, which is correct because the logos are in __/static/__images/logos/
. The rendered website displays the images without any issue.
Another example is content/en/documentation/for-contributors/control-plane/_index.md
, where links to entity.md
are identified as not found
. On the website entity.md is rendered by Hugo as there is the sub-directory ./entity
, which includes all linked files and resources. From the website, again links are working fine as ./entity/_index.md
is rendered into entity.md
..
I guess we need to find another approach of checking the links for this repository. Here such actions would just be usable for identifying links to other websites. With this approach, we are not really providing any support for contributors...
Any idea?
- Update workflows to build Hugo site before checking links - Split link checks into external and generated site checks - Check external links in source files - Check local links in the generated Hugo site - Combine reports for better analysis - Fix the problem with Hugo rendering paths differently than source files
- Remove Hugo build steps due to template errors - Focus on checking external URLs only (HTTP/HTTPS) - Add Node.js setup for theme dependencies - Specify stable Hugo version to avoid breaking changes - Add diagnostic steps to check Hugo configuration
1. Update lychee configuration to fix API rate limit issues \n2. Enhance .lycheeignore file to handle Hugo path inconsistencies \n3. Optimize GitHub workflow configurations \n4. Add documentation
I've made some changes.
|
@liugddx thanks a lot for the updates! I'll check asap to get the PR finally merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how far are we to merge this? the implementation looks good enough
@@ -0,0 +1,53 @@ | |||
# Broken Links Checking Workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation should stay in the docs
folder.
in addition, I don't think such detail in documenting workflows will pay off: they change often and we cannot expect people to update documentation every time. Personally I would discard it completely, better to add just some meaningful comments in the workflows directly
name: Scheduled Broken Links Check | ||
|
||
on: | ||
# Run on schedule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove obvious comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to keep two different workflows? they do pretty much the same thing, let's refactor them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the .lycheeignore file should document itself, this documentation is totally redundant
What this PR changes/adds
Why it does that
Further notes
Linked Issue(s)
Closes #39