Skip to content

"[ERROR] the JSON object must be str, bytes or bytearray, not PosixPath" when using a path to pipeline.json file in meta-conduct #410

@fraimondo

Description

@fraimondo

I'm trying to figure out how to use datalad catalog + metalad + extractors.

I was trying the example from the website, but I had an error when running this:

datalad -l 9 meta-conduct ./pipelines/extract_dataset_pipeline.json --pipeline-help 

The output is as follows:

[DEBUG  ] Command line args 1st pass for DataLad 0.19.3. Parsed: Namespace() Unparsed: ['meta-conduct', './pipelines/extract_dataset_pipeline.json', '--pipeline-help'] 
[DEBUG  ] Processing entrypoints 
[DEBUG  ] Loading entrypoint catalog from datalad.extensions 
[DEBUG  ] Loaded entrypoint catalog from datalad.extensions 
[DEBUG  ] Loading entrypoint deprecated from datalad.extensions 
[DEBUG  ] Loaded entrypoint deprecated from datalad.extensions 
[DEBUG  ] Loading entrypoint metalad from datalad.extensions 
[DEBUG  ] Loaded entrypoint metalad from datalad.extensions 
[DEBUG  ] Loading entrypoint neuroimaging from datalad.extensions 
[DEBUG  ] Loaded entrypoint neuroimaging from datalad.extensions 
[DEBUG  ] Loading entrypoint next from datalad.extensions 
[DEBUG  ] Enable posting DataLad config overrides CLI/ENV as GIT_CONFIG items in process ENV 
[DEBUG  ] Apply datalad-next patch to annexrepo.py:AnnexRepo.enable_remote 
[DEBUG  ] Building doc for <class 'datalad.local.configuration.Configuration'> 
[DEBUG  ] Building doc for <class 'datalad_next.patches.configuration.Configuration'> 
[DEBUG  ] Building doc for <class 'datalad.local.configuration.Configuration'> 
[DEBUG  ] Apply datalad-next patch to create_sibling_ghlike.py:_GitHubLike._set_request_headers 
[DEBUG  ] Apply datalad-next patch to interface.(utils|base).py:_execute_command_ 
[DEBUG  ] Building doc for <class 'datalad.core.local.status.Status'> 
[DEBUG  ] Building doc for <class 'datalad.core.local.diff.Diff'> 
[DEBUG  ] Building doc for <class 'datalad.core.distributed.push.Push'> 
[DEBUG  ] Apply patch to datalad.core.distributed.push._transfer_data 
[DEBUG  ] Patching datalad.core.distributed.push.Push docstring and parameters 
[DEBUG  ] Building doc for <class 'datalad.core.distributed.push.Push'> 
[DEBUG  ] Patching datalad.support.AnnexRepo.get_export_records (new method) 
[DEBUG  ] Apply patch to datalad.core.distributed.push._push 
[DEBUG  ] Apply patch to datalad.distribution.siblings._enable_remote 
[DEBUG  ] Building doc for <class 'datalad.distribution.update.Update'> 
[DEBUG  ] Building doc for <class 'datalad.distribution.siblings.Siblings'> 
[DEBUG  ] Retrofit `SpecialRemote` with a `close()` handler 
[DEBUG  ] Replace special remote _main() with datalad-next's progress logging enabled variant 
[DEBUG  ] Building doc for <class 'datalad.local.subdatasets.Subdatasets'> 
[DEBUG  ] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'> 
[DEBUG  ] Apply patch to datalad.distributed.create_sibling_gitlab._proc_dataset 
[DEBUG  ] Stop advertising discontinued "hierarchy" layout for `create_siblign_gitlab()` 
[DEBUG  ] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'> 
[DEBUG  ] Building doc for <class 'datalad.core.local.save.Save'> 
[DEBUG  ] Building doc for <class 'datalad.core.distributed.clone.Clone'> 
[DEBUG  ] Building doc for <class 'datalad.distribution.get.Get'> 
[DEBUG  ] Building doc for <class 'datalad.distribution.install.Install'> 
[DEBUG  ] Building doc for <class 'datalad.local.unlock.Unlock'> 
[DEBUG  ] Building doc for <class 'datalad.core.local.run.Run'> 
[DEBUG  ] Apply patch to datalad.core.local.run.format_command 
[DEBUG  ] Apply patch to datalad.distribution.update._choose_update_target 
[DEBUG  ] Apply patch to datalad.support.sshconnector._exec_ssh 
[DEBUG  ] Apply patch to datalad.support.sshconnector.get 
[DEBUG  ] Apply patch to datalad.support.sshconnector.put 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.url2transport_path 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.url2transport_path 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.SSHRemoteIO 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.close 
[DEBUG  ] Apply patch to datalad.customremotes.ria_utils.create_store 
[DEBUG  ] Apply patch to datalad.customremotes.ria_utils.create_ds_in_store 
[DEBUG  ] Apply patch to datalad.customremotes.ria_utils._ensure_version 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.ORARemote 
[DEBUG  ] Building doc for <class 'datalad_next.patches.replace_create_sibling_ria.CreateSiblingRia'> 
[DEBUG  ] Apply patch to datalad.distributed.create_sibling_ria.CreateSiblingRia 
[DEBUG  ] Building doc for <class 'datalad.distributed.create_sibling_ria.CreateSiblingRia'> 
[DEBUG  ] Loaded entrypoint next from datalad.extensions 
[DEBUG  ] Done processing entrypoints 
[DEBUG  ] Building doc for <class 'datalad_metalad.conduct.Conduct'> 
[DEBUG  ] Parsing known args among ['/Users/fraimondo/miniforge3/envs/junifer/bin/datalad', '-l', '9', 'meta-conduct', './pipelines/extract_dataset_pipeline.json', '--pipeline-help'] 
[DEBUG  ] Determined class of decorated function: <class 'datalad_metalad.conduct.Conduct'> 
[DEBUG  ] Command parameter validation skipped. <class 'datalad_metalad.con

I followed it to:

def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
if isinstance(path_or_object, str):
if path_or_object == "-":
metadata_file = sys.stdin
else:
try:
json_object = json.loads(
files("datalad_metalad.pipeline").joinpath(
f"pipelines/{path_or_object}_pipeline.json"))
return json_object
except FileNotFoundError:
metadata_file = open(path_or_object, "tr")
return json.load(metadata_file)
return path_or_object

What happens is that it is giving a PosixPath object to json.loads. I'm using Python 3.11, which seems to be supported.

Also, unless there some backwards compatibility issue, I was taught (and I stand with this premise) to never use exceptions as flow control. Use if statements.

This is how I fixed it locally:

def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
    if isinstance(path_or_object, str):
        if path_or_object == "-":
            metadata_file = sys.stdin
        else:
            to_read = files("datalad_metalad.pipeline").joinpath(
                f"pipelines/{path_or_object}_pipeline.json"
            )
            if to_read.exists():
                json_object = json.loads(to_read)
                return json_object
            else:
                metadata_file = open(path_or_object, "tr")
        return json.load(metadata_file)
    return path_or_object

While this works for my case, this is just a workaround and still not suitable for a PR. This will still fail due to json.loads expecting a string and not a PosixPath object. So then I tried to run this to test:

datalad -l 9 meta-conduct extract_metadata --pipeline-help

And I've got, as expected:

[ERROR  ] Expecting value: line 1 column 1 (char 0) 

So my final fix (which I think will help you):

def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
    if isinstance(path_or_object, str):
        if path_or_object == "-":
            metadata_file = sys.stdin
        else:
            to_read = files("datalad_metalad.pipeline").joinpath(
                f"pipelines/{path_or_object}_pipeline.json"
            )
            if to_read.exists():
                metadata_file = open(to_read, "tr")
            else:
                metadata_file = open(path_or_object, "tr")
        return json.load(metadata_file)
    return path_or_object

Though I just patched so it works in these two cases and I have no clue what is the overall idea behind the read_json_object function and its use-cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions