Skip to content

Conversation

gabinoumbe
Copy link
Contributor

Allow zipped files as input for SEM mapping

  • SEM mapping now accepts zipped files as input, producing zipped JSON mapping documents as output.
  • When processing a zipped input, only successfully processed files are included in the output archive.

Changes:

  • Added an OutputWriter for image SEM mapping.
  • Updated the mapping service plugin to support zipped input/output types.

Copy link
Contributor

@GGoetzelmann GGoetzelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested the plugin with the new feature?
I am unsure how the plugin will work in regards to zip output exactly; I haven't tested it myself yet.

I would highly recommend an integration test both for making sure everything is in order and for documenting how to use the feature with the plugin.

I'd assume the README needs an update documenting the new feature as well.

Before addressing these general remarks, please make sure to check the individual code comments firs (because I left comments about the zip output as well, so we may have one or two open discussions first)

Comment on lines 19 to 30
def save_to_zip(file_path_list, zip_file_path):
try:
with zipfile.ZipFile(zip_file_path, 'w', zipfile.ZIP_DEFLATED) as zf:
# "ZIP_DEFLATED" is a lossless compression algorithm, meaning no data is lost during the compression process.
for file_path in file_path_list:
zf.write(file_path, os.path.basename(file_path))
logging.info(f"Files have been zipped into {zip_file_path} sucessfully!")

# Delete the original files after zipping
for file_path in file_path_list:
os.remove(file_path)
logging.info(f"{file_path} has been deleted.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik the provided output is used as a zip file even if it is, for example 'output.json'. This seems to be quite confusing, the user may expect json output but gets a zip file (so the json file will look totally broken)

Suggestions:

  • imho we should fail if the specified output is not zip because it is likely due to a misuse or misunderstanding on the user side.
  • The idea behind the quite extensive InputReader for TOMO was to fail early whenever possible. Under this paradigm it would be recommended to fail early if the input for SEM is a zip but the output specified is not. (but there is likely a caveat regarding the plugin somewhere either way)
  • how about not producing zip output at all but sticking to json, returning a json array with schema compliant json objects? Granted the full file would not be schema compliant then and could not be directly used in something like the metadata editor, I'd assume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The InputReader now also accepts output_path as an argument. A validation check is performed at the top level to ensure the output_path has the correct file extension, specifically to prevent the misuse of the .json extension when a .zip file is expected.
  • Regarding the case where the output is not a .zip archive but rather a JSON array, I believe the TOMO component appears to be better suited for this scenario. It could be valuable to support an additional option that produces distinct output documents for each image file, instead of putting everything into a single JSON array.

Comment on lines 15 to 16
except Exception as e:
logging.error(f"Failed to save {file_path}: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'd recommend avoiding generic 'except Exception' whenever possible (I know, I do it sometimes too). How do you know you reacted correctly, if you do not know what your problem was exactly?
  2. in case of an exception you only log the error, but you do not react accordingly. This should likely raise a MappingAbortionError to be handled elsewhere.

same comment for save zip function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These exceptions have now been used except (FileNotFoundError, PermissionError, IsADirectoryError, OSError, TypeError, ValueError, zipfile.BadZipFile) , as well as raise MappingAbortionError()

except MappingAbortionError as e:
logging.error(f"MappingAbortionError: {e}")
if reader:
reader.clean_up()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to implement the clean_up function.
Since it is likely identical for tomo and sem, it should ideally be inherited from a common base class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a first quick solution, a separate clean_up function has been added in IO/sem/InputReader. A common Clean_up base class will be implement to unify cleanup logic across reader types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more in the direction of an InputReader base class that either provides the interface for the clean_up method or (more likely) even implements it, since it likely always treats a working_dir used by inputReaders in the same way. Maybe there is even more overlap, especially in regards to parser handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InputReader base class has been implement in a distinct branch dev_inputreader_base_class.

mapping_cli.py Outdated
if reader:
reader.clean_up()
if reader_:
reader_.clean_up()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume the reader_ never needs clean_up because it does not create temporary files, but please check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! reader_ is initialized inside a loop and is used only on already extracted file paths. No need of clean_up on reader_, so it has been removed.

except MappingAbortionError as e:
#logging.error(f"MappingAbortionError: {e}")
exit(e)
finally:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I was not even aware that 'finally' would run even on exit call. TIL :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants