Skip to content

Yardify the repository #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
marda_registry/data/lfs/*/*.* filter=lfs diff=lfs merge=lfs -text
yard/data/lfs/*/*.* filter=lfs diff=lfs merge=lfs -text
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
exclude: ^(marda_registry/data/)
exclude: ^(yard/data/)
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
Expand Down
23 changes: 12 additions & 11 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
<img width="100px" align="center" src="https://avatars.githubusercontent.com/u/74017645?s=200&v=4">
</div>

# <div align="center">MaRDA Metadata Extractors: Registry</div>
# <div align="center">Datatractor Yard</div>
## <div align="center">Contributing guidelines</div>

Contributions to this registry are welcome. We pledge to follow the [Contributor Covenant Code of Conduct](https://www.contributor-covenant.org/version/2/1/code_of_conduct/).

You can contribute `FileType` and `Extractor` entries to this registry by creating pull requests (PRs), which should include YAML files under `./data/filetypes` and `./data/extractors`, respectively. These YAML files must follow the schemas at [marda-alliance/metadata_extractors_schema](https://github.com/marda-alliance/metadata_extractors_schema) (see the [online documentation](https://marda-alliance.github.io/metadata_extractors_schema) for more details).
You can contribute `FileType` and `Extractor` entries to this registry by creating pull requests (PRs), which should include YAML files under `./data/filetypes` and `./data/extractors`, respectively. These YAML files must follow the schemas at [datatractor/schemas](https://github.com/datatractor/schema) (see the [online documentation](https://datatractor.github.io/schema) for more details).

An example of the process can be seen in this [pull request](https://github.com/marda-alliance/metadata_extractors_registry/pull/54). The workflow steps are as follows:
An example of the process can be seen in this [pull request](https://github.com/marda-alliance/metadata_extractors_registry/pull/54) (in the previous iteration of this repository). The workflow steps are as follows:


### Opening a pull request
Expand All @@ -20,13 +20,13 @@ An example of the process can be seen in this [pull request](https://github.com/
- Please add the appropriate labels (`Extractor` and `FileType`).

### Example files
- If you are adding new `FileTypes` in this PR, add example files of those `FileTypes` using `git-lfs` into `marda_registry/data/lfs`. Do not commit example files into the repository directly. For example, to add a new file called `xmpl.tpe` (corresponding to the `FileType` with ID `example-type`) you would:
- If you are adding new `FileTypes` in this PR, add example files of those `FileTypes` using `git-lfs` into `yard/data/lfs`. Do not commit example files into the repository directly. For example, to add a new file called `xmpl.tpe` (corresponding to the `FileType` with ID `example-type`) you would:

``` bash
mkdir marda_registry/data/lfs/example-type # create the dir using FileType ID
cp xmpl.tpe marda_registry/data/lfs/example-type # copy the example file
mkdir yard/data/lfs/example-type # create the dir using FileType ID
cp xmpl.tpe yard/data/lfs/example-type # copy the example file
git lfs install # setup git-lfs, only necessary if you haven't used git-lfs before
git add marda_registry/data/lfs/example-type/* # track files
git add yard/data/lfs/example-type/* # track files
...
git lfs ls-files # check that your new example files are tracked before committing
git commit
Expand All @@ -40,14 +40,15 @@ An example of the process can be seen in this [pull request](https://github.com/
- `build`, which makes sure the new registry website can be built.

Therefore, it is mainly the `lint` and `validate` actions you will have to pay attention to. In case anything is unclear or you cannot find the error, ping one of the [Registry Maintainers](./README.md#registry-maintainers).
- You can of course validate your definitions locally, by using the [![Schema](https://badgen.net/static/marda-alliance/metadata_extractors_schema/?icon=github)](https://github.com/marda-alliance/metadata_extractors_schema/) package, see the [Usage section](https://marda-alliance.github.io/metadata_extractors_schema/main/usage.html) of the documentation.
- You can of course validate your definitions locally, by using the [![Schema](https://badgen.net/static/datatractor/schema/?icon=github)](https://github.com/datatractor/schema/) package, see the [Usage section](https://datatractor.github.io/schema/main/usage.html) of the documentation.

### Testing of Extraction
- You should check that your `yml` definitions can be used to install the `Extractor` and and extract any example files from the supported `FileTypes` locally, by using our [![Schema](https://badgen.net/static/marda-alliance/metadata_extractors_api/?icon=github)](https://github.com/marda-alliance/metadata_extractors_api/) package, see the [Installation](https://github.com/marda-alliance/metadata_extractors_api#installation) and [Usage](https://github.com/marda-alliance/metadata_extractors_api#usage) sections.
- You should check that your `yml` definitions can be used to install the `Extractor` and and extract any example files from the supported `FileTypes` locally, by using the datatractor [![beam](https://badgen.net/static/datatractor/beam/?icon=github)](https://github.com/datatractor/beam/) package, see the [Installation](https://github.com/datatractor/beam#installation) and [Usage](https://github.com/datatractor/beam#usage) sections.

> At the moment, the above CI is not checking whether the extraction of the example files actually works. See [this issue](https://github.com/marda-alliance/metadata_extractors_registry/issues/34) for further details.
> At the moment, the above CI is not checking whether the extraction of the example files actually works. See [this issue](https://github.com/datatractor/yard/issues/9) for further details.

### Approval and Deployment

- If you are a first-time contributor to the project, we will need to pre-approve your PR so that the CI can be run. Ping one of the [Registry Maintainers](./README.md#registry-maintainers).
- Once the CI passes, and all issues that are raised during the review are addressed, your PR can be merged.
- Once your PR is merged into the Registry, your new definitions should be available at the [Registry Website](https://marda-registry.fly.dev/). Cheers!
- Once your PR is merged into the Registry, your new definitions should be available at the [Registry Website](https://yard.datatractor.org/). Cheers!
8 changes: 5 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from python:3.11-slim-buster

run apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*

env PORT=8000
workdir /app

Expand All @@ -9,11 +11,11 @@ copy CONTRIBUTING.md LICENSE /app/
copy pyproject.toml /app

copy schemas /app/schemas
copy marda_registry /app/marda_registry
copy yard /app/yard
copy README.md /app/

# Needed to grab the VCS version from git tags
run pip install .
run --mount=type=bind,target=/app/.git,source=.git pip install .
copy tasks.py /app

# Regenerate models from the current schemas
Expand All @@ -22,7 +24,7 @@ run invoke regenerate-models
# Validate all entries against the schema
run invoke validate-entries

cmd uvicorn marda_registry.app:app --host 0.0.0.0 --port ${PORT}
cmd uvicorn yard.app:app --host 0.0.0.0 --port ${PORT}

healthcheck --interval=5m --timeout=3s --start-period=10s \
cmd curl --fail http://localhost:${PORT}/api || exit 1
1 change: 0 additions & 1 deletion Procfile

This file was deleted.

4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

</div>

A place to develop and discuss the MaRDA Extractors WG registry.
A place to develop and discuss the Datatractor Yard (formerly the MaRDA Extractors WG registry).
The idea is to collect various file formats used in materials science and chemistry, describe them with metadata, and provide links to software projects that can parse them.

By providing this data in a web API, it hoped that users can discover new extractors more easily and metadata standards can be developed for the output of extractors to enable schemas to proliferate throughout the field.
Expand Down Expand Up @@ -42,7 +42,7 @@ invoke regenerate-models
From the repository root directory, launch the server with uvicorn:

```
uvicorn marda_registry.app:app
uvicorn yard.app:app
```

then navigate to http://localhost:5000 to test.
Expand Down
2 changes: 0 additions & 2 deletions marda_registry/__init__.py

This file was deleted.

15 changes: 0 additions & 15 deletions marda_registry/static/background.svg

This file was deleted.

15 changes: 6 additions & 9 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
[build-system]
requires = ["hatchling", "hatch-vcs >= 0.3.0"]
requires = ["hatchling", "versioningit"]
build-backend = "hatchling.build"

[tool.hatch.version]
path = "marda_registry/__init__.py"
source = "versioningit"

[tool.hatch.metadata]
# required to allow git deps in optional dependencies
allow-direct-references = true

[tool.hatch.build.targets.wheel]
packages = ["marda_registry"]
packages = ["yard"]

[project]
name = "marda-extractors-registry"
name = "datatractor-yard"
readme = "README.md"
dynamic = ["version"]
classifiers = [
Expand Down Expand Up @@ -44,20 +44,17 @@ dependencies = [
[project.optional-dependencies]
test = [
"pytest",
"marda-extractors-api[formats] @ git+https://github.com/marda-alliance/metadata_extractors_api.git",
"marda-extractors-api[formats] @ git+https://github.com/datatractor/yard.git",
]

dev = [
"pre-commit",
]

[project.urls]
repository = "https://github.com/marda-alliance/metadata_extractors_registry"
repository = "https://github.com/datatractor/yard"

[tool.ruff]
extend-exclude = [
"providers",
]
target-version = "py310"

[tool.ruff.lint]
Expand Down
23 changes: 8 additions & 15 deletions tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from invoke import task

MODEL_DIRECTORY = Path(__file__).parent / "marda_registry" / "models"
MODEL_DIRECTORY = Path(__file__).parent / "yard" / "models"


@task
Expand Down Expand Up @@ -35,8 +35,8 @@ def regenerate_models(_):
def validate_entries(_):
print("Validating entries")

from marda_registry.models import Extractor, FileType
from marda_registry.utils import load_registry_collection
from yard.models import Extractor, FileType
from yard.utils import load_registry_collection

counts = {}
errors = []
Expand All @@ -52,9 +52,7 @@ def validate_entries(_):
if type_ is Extractor:
filetype_ids = set(
d.stem
for d in Path(__file__).parent.glob(
"./marda_registry/data/filetypes/*.yml"
)
for d in Path(__file__).parent.glob("./yard/data/filetypes/*.yml")
)

for extractor in entries:
Expand All @@ -76,12 +74,8 @@ def check_for_yaml(_):

print("Checking for erroneous .yaml files.")

extractors = list(
Path(__file__).parent.glob("./marda_registry/data/extractors/*.yaml")
)
filetypes = list(
Path(__file__).parent.glob("./marda_registry/data/filetypes/*.yaml")
)
extractors = list(Path(__file__).parent.glob("./yard/data/extractors/*.yaml"))
filetypes = list(Path(__file__).parent.glob("./yard/data/filetypes/*.yaml"))

for e in extractors:
print(f"Found {e} with bad file extension (should be .yml here)")
Expand All @@ -100,12 +94,11 @@ def validate_lfs_examples(_):
"""Loop through the LFS dir and check that each directory has a corresponding filetype."""

filetype_ids = set(
d.stem
for d in Path(__file__).parent.glob("./marda_registry/data/filetypes/*.yaml")
d.stem for d in Path(__file__).parent.glob("./yard/data/filetypes/*.yaml")
)

lfs_filetype_dirs = set(
d.name for d in Path(__file__).parent.glob("./marda_registry/data/lfs/*")
d.name for d in Path(__file__).parent.glob("./yard/data/lfs/*")
)

errors = []
Expand Down
4 changes: 4 additions & 0 deletions yard/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from importlib.metadata import version

__version__ = version("datatractor-yard")
__api_version__ = __version__
6 changes: 3 additions & 3 deletions marda_registry/app.py → yard/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@
from fastapi.templating import Jinja2Templates
from pydantic import BaseModel

from marda_registry import __api_version__
from yard import __api_version__

from .models import Extractor, FileType
from .utils import load_registry_collection

app = FastAPI(
title="MaRDA extractors registry API",
description=f"This server implements v{__api_version__} of the [MaRDA extractors WG](https://github.com/marda-alliance/metadata_extractors) registry API.", # noqa: E501
title="Datatractor Yard API",
description=f"This server implements v{__api_version__} of the [Datatractor Yard](https://github.com/datatractor/yard/) API.", # noqa: E501
version=__api_version__,
)

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
25 changes: 25 additions & 0 deletions yard/static/background.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading