A crawler for hardware design projects.
(In reality, it is more a scraper)
Back in alpha state! Please do not yet use for anything then experiments!
It leverages the distributed ready Open Know-How (OKH) standard/Ontology.
We currently search hardware projects on:
- Appropedia
- OSHWA
- Thingiverse
- Custom git repos containig OKH manifests
- Recursive lists of manifest URLs (alpha stage)
We also support projects hosted on these platforms, but we do not search on them, because they intentionally criple their search APIs to make this infeasible.
The projects raw data - which is either the contents of an OKH meta-data (aka manifest) file, or whatever the hosting platforms API (or alternatively web interface) provides about the project - is stored as a file. This file might already be in the OKH format if the platform supplies that, or more likely, it is in JSON; Next to it is an other file (always in JSON), containing the scraping meta-data, like the source URL and crawling date.
In a second step, all these files are then parsed, analyzed, sanitized, and if necessary converted, and we arrive at a manifest file for the latest OKH specification in the TOML format. This fiel is then converted to OKH RDF data, and also stored in a file.
These files may then be hosted as-is in a git repo, or fed into an RDF Triple-Store like Apache Jena, OxiGraph or neo4j. From there, one can query the data through SPARQL, or access it through a search web-UI.
The project requires the following:
- Python >= 3.10
- Poetry
Once you have poetry
installed,
you can install the project locally
by running the following commands in the repository dir
(where the pyproject.toml
file is located):
poetry install
eval $(poetry env activate)
Within that shell,
you should then have krawl
in your PATH.
You can verify that with krawl --help
.
The application can be used with Docker, which you may need to install first. You first need to acquire the docker image, in either of these ways:
-
Build
To build the Docker image yourself, in the local dir of the sources of this project, run:
docker build --tag iopa/okh-krawler:latest .
-
Download
Download the latest build of the docker image with:
docker pull iopa/okh-krawler:latest
Then you can execute:
# display help
docker run --rm --user $UID iopa/okh-krawler:latest --help
# mount config and run some command
docker run --rm --user $UID \
--volume "$PWD/config.yml:/opt/krawler/config.yml" \
--volume "$PWD/workdir:/opt/krawler/workdir" \
iopa/okh-krawler:latest \
fetch url \
--config config.yml \
-vv \
"https://github.com/iop-alliance/OpenKnowHow/blob/master/res/sample_data/okh-OHLOOM.toml"
The application has a convenient CLI,
that also explains itself when you pass the --help
flag to it.
Here is a quick overview of the current available commands
(make sure you are within the poetry shell when executing these):
Command | Description |
---|---|
krawl convert <from> <to> |
Convert a manifest file from one format into another one. Supported formats: YAML (.yml , .yaml ), TOML (.toml ), RDF (.ttl ) |
krawl fetch url <url1> [<urlN>...] |
Download and process metadata of selected projects. |
krawl fetch <platform> |
Search for projects on a given platform and download and process their metadata. |
krawl list fetchers |
List available fetchers, that can be used in krawl fetch command. |
krawl validate config <file> |
Validate a given configuration file. |
krawl validate manifest <file> |
Validate a given manifest file. |
Examples:
# fetch selected projects
krawl fetch url -c config.yml -vv "https://github.com/iop-alliance/OpenKnowHow/blob/master/res/sample_data/okh-sample-OHLOOM.toml"
krawl fetch url -c config.yml -vv "https://www.thingiverse.com/thing:3062487"
# search and fetch all projects of a platform
krawl fetch github.com -c config.yml -vv --report report.txt
# convert a local manifest file
krawl convert -vv project.yml project.ttl
# validate a local manifest file
krawl validate -vv project.yml
A sample configuration file with explanations can be found in sample-config.yml.
Make sure you have the latest version
of the bump-my-version
tool installed:
pip install --upgrade bump-my-version
# ... or if your system does not allow this, use:
# pipx install bump-my-version
Then, to see what would happen without actually doing anything, try these commands:
bump-my-version bump patch --dry-run -v --allow-dirty
bump-my-version bump minor --dry-run -v --allow-dirty
bump-my-version bump major --dry-run -v --allow-dirty
bump-my-version bump --dry-run -v --allow-dirty --new-version 3.0.0
bump-my-version bump --help
To actually perform a release, which commits changes to files and creates a git tag, choose one of these commands:
bump-my-version bump patch -v
bump-my-version bump minor -v
bump-my-version bump major -v
bump-my-version bump -v --new-version 3.0.0
And then push the changes; done!