Skip to content

ci: Add script &CI to check dead links #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

liugddx
Copy link
Contributor

@liugddx liugddx commented Dec 19, 2024

What this PR changes/adds

Why it does that

Further notes

Linked Issue(s)

Closes #39

@liugddx liugddx closed this Dec 24, 2024
@liugddx liugddx reopened this Dec 24, 2024
@liugddx
Copy link
Contributor Author

liugddx commented Dec 24, 2024

PTAL @ndr-brt

@liugddx liugddx closed this Dec 24, 2024
@liugddx liugddx reopened this Dec 24, 2024
@ndr-brt ndr-brt self-requested a review January 8, 2025 13:49
@ndr-brt
Copy link
Member

ndr-brt commented Jan 8, 2025

added myself as a reviewer, could you put the PR in draft until it will be effectively ready for review? So I will be more proactive in reviewing it ;)

EDIT: checks need to be green before having a review

@liugddx liugddx marked this pull request as draft January 8, 2025 14:00
@liugddx liugddx marked this pull request as ready for review February 10, 2025 11:22
@liugddx
Copy link
Contributor Author

liugddx commented Feb 10, 2025

CI failure is caused by dead links

Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/dataFeed
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedFrequency
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedName
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedType
Error: Dead link: http://w3id.org/starwars/context.jsonld
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/name
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/person
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/webpage
Error: Dead link: https://app.swaggerhub.com/home?type=API
Error: Dead link: https://docs.gradle.org/current/samples/sample_jvm_multi_project_with_code_coverage.html
Error: Dead link: https://docs.sonarqube.org/latest/analysis/github-integration/
Error: Dead link: https://foo-industries.com/subcatalog
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/.github/PULL_REQUEST_TEMPLATE.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/.github/workflows/close-inactive-issues.yml
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/CONTRIBUTING.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/contribution_categories.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/docs/developer/data-plane-signaling/data-plane-signaling-token-handling.md#2-updates-to-thedataaddress-format
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/known_friends.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/pr_etiquette.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/styleguide.md
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/.github/ISSUE_TEMPLATE
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/docs/legal
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/docs/templates
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/extensions/common/api/management-api-json-ld-context
Error: Dead link: https://github.com/eclipse-edc/Samples/blob/main/transfer/transfer-04-open-telemetry/README.md
Error: Dead link: https://github.com/orgs/eclipse-edc/projects/3
Error: Dead link: https://identity.foundation/
Error: Dead link: https://identity.foundation/decentralized-web-node/spec/
Error: Dead link: https://identity.foundation/didcomm-messaging/spec/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/#presentation-definition
Error: Dead link: https://identity.foundation/presentation-exchange/submission/v1
Error: Dead link: https://oss.sonatype.org/content/repositories/snapshots/
Error: Dead link: https://w3id.org/cx/v0.8/
Error: Dead link: https://w3id.org/idsa/v4.1/HTTP
Error: Dead link: https://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: https://w3id.org/tractusx-trust/v0.8

@liugddx
Copy link
Contributor Author

liugddx commented Feb 10, 2025

image

Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure is caused by dead links

Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/dataFeed
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedFrequency
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedName
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedType
Error: Dead link: http://w3id.org/starwars/context.jsonld
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/name
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/person
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/webpage
Error: Dead link: https://w3id.org/cx/v0.8/
Error: Dead link: https://w3id.org/idsa/v4.1/HTTP
Error: Dead link: https://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: https://w3id.org/tractusx-trust/v0.8

these are not proper links, these are jsonld namespaces and they don't need to be checked (as any url in a code block I'd say)

Error: Dead link: https://identity.foundation/
Error: Dead link: https://identity.foundation/decentralized-web-node/spec/
Error: Dead link: https://identity.foundation/didcomm-messaging/spec/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/#presentation-definition
Error: Dead link: https://identity.foundation/presentation-exchange/submission/v1

these are proper links, why are they listed?

@liugddx
Copy link
Contributor Author

liugddx commented Feb 11, 2025

CI failure is caused by dead links

Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/dataFeed
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedFrequency
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedName
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedType
Error: Dead link: http://w3id.org/starwars/context.jsonld
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/name
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/person
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/webpage
Error: Dead link: https://w3id.org/cx/v0.8/
Error: Dead link: https://w3id.org/idsa/v4.1/HTTP
Error: Dead link: https://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: https://w3id.org/tractusx-trust/v0.8

these are not proper links, these are jsonld namespaces and they don't need to be checked (as any url in a code block I'd say)

Error: Dead link: https://identity.foundation/
Error: Dead link: https://identity.foundation/decentralized-web-node/spec/
Error: Dead link: https://identity.foundation/didcomm-messaging/spec/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/#presentation-definition
Error: Dead link: https://identity.foundation/presentation-exchange/submission/v1

these are proper links, why are they listed?

image

The result returned by testing identity.foundation is 403, I don't know what's causing this.

@liugddx
Copy link
Contributor Author

liugddx commented Feb 11, 2025

CI failure is caused by dead links

Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/dataFeed
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedFrequency
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedName
Error: Dead link: http://w3id.org/market-systems/v0.0.1/ns/feedType
Error: Dead link: http://w3id.org/starwars/context.jsonld
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/name
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/person
Error: Dead link: http://w3id.org/starwars/v0.0.1/ns/webpage
Error: Dead link: https://w3id.org/cx/v0.8/
Error: Dead link: https://w3id.org/idsa/v4.1/HTTP
Error: Dead link: https://w3id.org/starwars/v0.0.1/ns/faction
Error: Dead link: https://w3id.org/tractusx-trust/v0.8

these are not proper links, these are jsonld namespaces and they don't need to be checked (as any url in a code block I'd say)

Error: Dead link: https://identity.foundation/
Error: Dead link: https://identity.foundation/decentralized-web-node/spec/
Error: Dead link: https://identity.foundation/didcomm-messaging/spec/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/#presentation-definition
Error: Dead link: https://identity.foundation/presentation-exchange/submission/v1

these are proper links, why are they listed?

image

The result returned by testing identity.foundation is 403, I don't know what's causing this.

This is the latest list.

Error: Dead link: https://docs.gradle.org/current/samples/sample_jvm_multi_project_with_code_coverage.html
Error: Dead link: https://docs.sonarqube.org/latest/analysis/github-integration/
Error: Dead link: https://foo-industries.com/subcatalog
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/.github/PULL_REQUEST_TEMPLATE.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/.github/workflows/close-inactive-issues.yml
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/CONTRIBUTING.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/contribution_categories.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/docs/developer/data-plane-signaling/data-plane-signaling-token-handling.md#2-updates-to-thedataaddress-format
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/known_friends.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/pr_etiquette.md
Error: Dead link: https://github.com/eclipse-edc/Connector/blob/main/styleguide.md
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/.github/ISSUE_TEMPLATE
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/docs/legal
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/docs/templates
Error: Dead link: https://github.com/eclipse-edc/Connector/tree/main/extensions/common/api/management-api-json-ld-context
Error: Dead link: https://github.com/eclipse-edc/Samples/blob/main/transfer/transfer-04-open-telemetry/README.md
Error: Dead link: https://github.com/orgs/eclipse-edc/projects/3
Error: Dead link: https://identity.foundation/
Error: Dead link: https://identity.foundation/decentralized-web-node/spec/
Error: Dead link: https://identity.foundation/didcomm-messaging/spec/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/
Error: Dead link: https://identity.foundation/presentation-exchange/spec/v2.0.0/#presentation-definition
Error: Dead link: https://identity.foundation/presentation-exchange/submission/v1
Error: Dead link: https://oss.sonatype.org/content/repositories/snapshots/

Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep diving into this, I'm usually in favor of home made solutions, but this looks fairly over-complicated.
Why don't rely on already existing tools? With a brief search I found some of them, like lychee (it also has a dedicated github action). I would consider those, because I don't know how much effort we want to put in the future in maintaining this script.

Plus, the exclude_patterns.txt it contains strings that are not "links", they are only url in the pages, and they shouldn't be taken into consideration at first

@mspiekermann
Copy link
Contributor

I would consider those, because I don't know how much effort we want to put in the future in maintaining this script.

Agree. I had a look for an existing tool and found an GitHub Action that uses Linkspector and Reviewdog. I created an example on how this might be used. The action is triggered at PR creation and displays the errors in the workflow log and in the PR itself. There are further configuration options that might be useful (e.g. to limit to specific directories).

liugddx added 3 commits May 2, 2025 18:19
Replace the custom curl-based link checking with the mainstream lychee-based solution.

- Add PR-triggered workflow using lychee-action

- Create scheduled workflow that creates issues for broken links

- Add .lycheeignore file for URL exclusion patterns

- Update README with link checking documentation

The lychee tool is more efficient and reliable than the previous solution, while providing additional features like caching and exclusion rules.
- Restore README to original state

- Convert all comments to English in workflow files

- Update the format of .lycheeignore file
The exclusion patterns are now handled by .lycheeignore file in the new lychee-based implementation
liugddx added 3 commits May 2, 2025 18:34
- Add patterns to .lycheeignore for SVG and content directory files

- Update workflow configurations to handle local file paths better

- Ensure consistent settings between PR and scheduled workflows
- Create alternative workflow using markdown-link-check tool

- Configure with equivalent ignore patterns

- Set as manually triggered workflow for comparison testing
- Remove markdown-link-check alternative implementation

- Optimize lychee configuration to handle GitHub API rate limits

- Add caching to PR workflow and increase cache time to 48h

- Add specific GitHub patterns to .lycheeignore to reduce API requests

- Increase timeout, retries and wait time between requests
@ndr-brt ndr-brt requested a review from mspiekermann May 5, 2025 09:07
Copy link
Contributor

@mspiekermann mspiekermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the PR and identified an issue we are jumping into, especially within this repo. As Hugo is doing some rendering work for us, this unfortunately causes many links to be identified as broken by the GitHub Actions, which actually are not.

This is not only an issue from this PR! It is also not lychee or linespector (same issue)! Scripts are following their natural purpose and find a lot of links that are broken and they are from the plain source files. However, they are not broken after rendering of the website's resources by Hugo.

For example, the Actions lists the logos in the root _index.md as not found with src="/static/images/logos/huawei.logo.svg", which is correct because the logos are in __/static/__images/logos/. The rendered website displays the images without any issue.

Another example is content/en/documentation/for-contributors/control-plane/_index.md, where links to entity.md are identified as not found. On the website entity.md is rendered by Hugo as there is the sub-directory ./entity, which includes all linked files and resources. From the website, again links are working fine as ./entity/_index.md is rendered into entity.md..

I guess we need to find another approach of checking the links for this repository. Here such actions would just be usable for identifying links to other websites. With this approach, we are not really providing any support for contributors...

Any idea?

@mspiekermann mspiekermann requested a review from ndr-brt May 10, 2025 18:09
liugddx added 3 commits May 11, 2025 11:29
- Update workflows to build Hugo site before checking links

- Split link checks into external and generated site checks

- Check external links in source files

- Check local links in the generated Hugo site

- Combine reports for better analysis

- Fix the problem with Hugo rendering paths differently than source files
- Remove Hugo build steps due to template errors

- Focus on checking external URLs only (HTTP/HTTPS)

- Add Node.js setup for theme dependencies

- Specify stable Hugo version to avoid breaking changes

- Add diagnostic steps to check Hugo configuration
1. Update lychee configuration to fix API rate limit issues \n2. Enhance .lycheeignore file to handle Hugo path inconsistencies \n3. Optimize GitHub workflow configurations \n4. Add documentation
@liugddx
Copy link
Contributor Author

liugddx commented May 11, 2025

I checked the PR and identified an issue we are jumping into, especially within this repo. As Hugo is doing some rendering work for us, this unfortunately causes many links to be identified as broken by the GitHub Actions, which actually are not.

This is not only an issue from this PR! It is also not lychee or linespector (same issue)! Scripts are following their natural purpose and find a lot of links that are broken and they are from the plain source files. However, they are not broken after rendering of the website's resources by Hugo.

For example, the Actions lists the logos in the root _index.md as not found with src="/static/images/logos/huawei.logo.svg", which is correct because the logos are in __/static/__images/logos/. The rendered website displays the images without any issue.

Another example is content/en/documentation/for-contributors/control-plane/_index.md, where links to entity.md are identified as not found. On the website entity.md is rendered by Hugo as there is the sub-directory ./entity, which includes all linked files and resources. From the website, again links are working fine as ./entity/_index.md is rendered into entity.md..

I guess we need to find another approach of checking the links for this repository. Here such actions would just be usable for identifying links to other websites. With this approach, we are not really providing any support for contributors...

Any idea?

I've made some changes.

  • Focused only on external links (HTTP/HTTPS)
  • Improved caching configuration

@mspiekermann
Copy link
Contributor

@liugddx thanks a lot for the updates! I'll check asap to get the PR finally merged.

Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how far are we to merge this? the implementation looks good enough

@@ -0,0 +1,53 @@
# Broken Links Checking Workflows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation should stay in the docs folder.

in addition, I don't think such detail in documenting workflows will pay off: they change often and we cannot expect people to update documentation every time. Personally I would discard it completely, better to add just some meaningful comments in the workflows directly

name: Scheduled Broken Links Check

on:
# Run on schedule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove obvious comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to keep two different workflows? they do pretty much the same thing, let's refactor them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the .lycheeignore file should document itself, this documentation is totally redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add script &CI to check dead links
3 participants