-
Notifications
You must be signed in to change notification settings - Fork 11
Description
I'm trying to figure out how to use datalad catalog + metalad + extractors.
I was trying the example from the website, but I had an error when running this:
datalad -l 9 meta-conduct ./pipelines/extract_dataset_pipeline.json --pipeline-help
The output is as follows:
[DEBUG ] Command line args 1st pass for DataLad 0.19.3. Parsed: Namespace() Unparsed: ['meta-conduct', './pipelines/extract_dataset_pipeline.json', '--pipeline-help']
[DEBUG ] Processing entrypoints
[DEBUG ] Loading entrypoint catalog from datalad.extensions
[DEBUG ] Loaded entrypoint catalog from datalad.extensions
[DEBUG ] Loading entrypoint deprecated from datalad.extensions
[DEBUG ] Loaded entrypoint deprecated from datalad.extensions
[DEBUG ] Loading entrypoint metalad from datalad.extensions
[DEBUG ] Loaded entrypoint metalad from datalad.extensions
[DEBUG ] Loading entrypoint neuroimaging from datalad.extensions
[DEBUG ] Loaded entrypoint neuroimaging from datalad.extensions
[DEBUG ] Loading entrypoint next from datalad.extensions
[DEBUG ] Enable posting DataLad config overrides CLI/ENV as GIT_CONFIG items in process ENV
[DEBUG ] Apply datalad-next patch to annexrepo.py:AnnexRepo.enable_remote
[DEBUG ] Building doc for <class 'datalad.local.configuration.Configuration'>
[DEBUG ] Building doc for <class 'datalad_next.patches.configuration.Configuration'>
[DEBUG ] Building doc for <class 'datalad.local.configuration.Configuration'>
[DEBUG ] Apply datalad-next patch to create_sibling_ghlike.py:_GitHubLike._set_request_headers
[DEBUG ] Apply datalad-next patch to interface.(utils|base).py:_execute_command_
[DEBUG ] Building doc for <class 'datalad.core.local.status.Status'>
[DEBUG ] Building doc for <class 'datalad.core.local.diff.Diff'>
[DEBUG ] Building doc for <class 'datalad.core.distributed.push.Push'>
[DEBUG ] Apply patch to datalad.core.distributed.push._transfer_data
[DEBUG ] Patching datalad.core.distributed.push.Push docstring and parameters
[DEBUG ] Building doc for <class 'datalad.core.distributed.push.Push'>
[DEBUG ] Patching datalad.support.AnnexRepo.get_export_records (new method)
[DEBUG ] Apply patch to datalad.core.distributed.push._push
[DEBUG ] Apply patch to datalad.distribution.siblings._enable_remote
[DEBUG ] Building doc for <class 'datalad.distribution.update.Update'>
[DEBUG ] Building doc for <class 'datalad.distribution.siblings.Siblings'>
[DEBUG ] Retrofit `SpecialRemote` with a `close()` handler
[DEBUG ] Replace special remote _main() with datalad-next's progress logging enabled variant
[DEBUG ] Building doc for <class 'datalad.local.subdatasets.Subdatasets'>
[DEBUG ] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'>
[DEBUG ] Apply patch to datalad.distributed.create_sibling_gitlab._proc_dataset
[DEBUG ] Stop advertising discontinued "hierarchy" layout for `create_siblign_gitlab()`
[DEBUG ] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'>
[DEBUG ] Building doc for <class 'datalad.core.local.save.Save'>
[DEBUG ] Building doc for <class 'datalad.core.distributed.clone.Clone'>
[DEBUG ] Building doc for <class 'datalad.distribution.get.Get'>
[DEBUG ] Building doc for <class 'datalad.distribution.install.Install'>
[DEBUG ] Building doc for <class 'datalad.local.unlock.Unlock'>
[DEBUG ] Building doc for <class 'datalad.core.local.run.Run'>
[DEBUG ] Apply patch to datalad.core.local.run.format_command
[DEBUG ] Apply patch to datalad.distribution.update._choose_update_target
[DEBUG ] Apply patch to datalad.support.sshconnector._exec_ssh
[DEBUG ] Apply patch to datalad.support.sshconnector.get
[DEBUG ] Apply patch to datalad.support.sshconnector.put
[DEBUG ] Apply patch to datalad.distributed.ora_remote.url2transport_path
[DEBUG ] Apply patch to datalad.distributed.ora_remote.url2transport_path
[DEBUG ] Apply patch to datalad.distributed.ora_remote.SSHRemoteIO
[DEBUG ] Apply patch to datalad.distributed.ora_remote.close
[DEBUG ] Apply patch to datalad.customremotes.ria_utils.create_store
[DEBUG ] Apply patch to datalad.customremotes.ria_utils.create_ds_in_store
[DEBUG ] Apply patch to datalad.customremotes.ria_utils._ensure_version
[DEBUG ] Apply patch to datalad.distributed.ora_remote.ORARemote
[DEBUG ] Building doc for <class 'datalad_next.patches.replace_create_sibling_ria.CreateSiblingRia'>
[DEBUG ] Apply patch to datalad.distributed.create_sibling_ria.CreateSiblingRia
[DEBUG ] Building doc for <class 'datalad.distributed.create_sibling_ria.CreateSiblingRia'>
[DEBUG ] Loaded entrypoint next from datalad.extensions
[DEBUG ] Done processing entrypoints
[DEBUG ] Building doc for <class 'datalad_metalad.conduct.Conduct'>
[DEBUG ] Parsing known args among ['/Users/fraimondo/miniforge3/envs/junifer/bin/datalad', '-l', '9', 'meta-conduct', './pipelines/extract_dataset_pipeline.json', '--pipeline-help']
[DEBUG ] Determined class of decorated function: <class 'datalad_metalad.conduct.Conduct'>
[DEBUG ] Command parameter validation skipped. <class 'datalad_metalad.con
I followed it to:
datalad-metalad/datalad_metalad/utils.py
Lines 82 to 95 in 1219998
def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType: | |
if isinstance(path_or_object, str): | |
if path_or_object == "-": | |
metadata_file = sys.stdin | |
else: | |
try: | |
json_object = json.loads( | |
files("datalad_metalad.pipeline").joinpath( | |
f"pipelines/{path_or_object}_pipeline.json")) | |
return json_object | |
except FileNotFoundError: | |
metadata_file = open(path_or_object, "tr") | |
return json.load(metadata_file) | |
return path_or_object |
What happens is that it is giving a PosixPath object to json.loads
. I'm using Python 3.11, which seems to be supported.
Also, unless there some backwards compatibility issue, I was taught (and I stand with this premise) to never use exceptions as flow control. Use if statements.
This is how I fixed it locally:
def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
if isinstance(path_or_object, str):
if path_or_object == "-":
metadata_file = sys.stdin
else:
to_read = files("datalad_metalad.pipeline").joinpath(
f"pipelines/{path_or_object}_pipeline.json"
)
if to_read.exists():
json_object = json.loads(to_read)
return json_object
else:
metadata_file = open(path_or_object, "tr")
return json.load(metadata_file)
return path_or_object
While this works for my case, this is just a workaround and still not suitable for a PR. This will still fail due to json.loads expecting a string and not a PosixPath object. So then I tried to run this to test:
datalad -l 9 meta-conduct extract_metadata --pipeline-help
And I've got, as expected:
[ERROR ] Expecting value: line 1 column 1 (char 0)
So my final fix (which I think will help you):
def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
if isinstance(path_or_object, str):
if path_or_object == "-":
metadata_file = sys.stdin
else:
to_read = files("datalad_metalad.pipeline").joinpath(
f"pipelines/{path_or_object}_pipeline.json"
)
if to_read.exists():
metadata_file = open(to_read, "tr")
else:
metadata_file = open(path_or_object, "tr")
return json.load(metadata_file)
return path_or_object
Though I just patched so it works in these two cases and I have no clue what is the overall idea behind the read_json_object
function and its use-cases.