-
Notifications
You must be signed in to change notification settings - Fork 1
Enable zipped input/output in SEM mapping #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the plugin with the new feature?
I am unsure how the plugin will work in regards to zip output exactly; I haven't tested it myself yet.
I would highly recommend an integration test both for making sure everything is in order and for documenting how to use the feature with the plugin.
I'd assume the README needs an update documenting the new feature as well.
Before addressing these general remarks, please make sure to check the individual code comments firs (because I left comments about the zip output as well, so we may have one or two open discussions first)
def save_to_zip(file_path_list, zip_file_path): | ||
try: | ||
with zipfile.ZipFile(zip_file_path, 'w', zipfile.ZIP_DEFLATED) as zf: | ||
# "ZIP_DEFLATED" is a lossless compression algorithm, meaning no data is lost during the compression process. | ||
for file_path in file_path_list: | ||
zf.write(file_path, os.path.basename(file_path)) | ||
logging.info(f"Files have been zipped into {zip_file_path} sucessfully!") | ||
|
||
# Delete the original files after zipping | ||
for file_path in file_path_list: | ||
os.remove(file_path) | ||
logging.info(f"{file_path} has been deleted.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik the provided output is used as a zip file even if it is, for example 'output.json'. This seems to be quite confusing, the user may expect json output but gets a zip file (so the json file will look totally broken)
Suggestions:
- imho we should fail if the specified output is not zip because it is likely due to a misuse or misunderstanding on the user side.
- The idea behind the quite extensive InputReader for TOMO was to fail early whenever possible. Under this paradigm it would be recommended to fail early if the input for SEM is a zip but the output specified is not. (but there is likely a caveat regarding the plugin somewhere either way)
- how about not producing zip output at all but sticking to json, returning a json array with schema compliant json objects? Granted the full file would not be schema compliant then and could not be directly used in something like the metadata editor, I'd assume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The InputReader now also accepts output_path as an argument. A validation check is performed at the top level to ensure the output_path has the correct file extension, specifically to prevent the misuse of the
.json
extension when a.zip
file is expected. - Regarding the case where the output is not a
.zip
archive but rather a JSON array, I believe the TOMO component appears to be better suited for this scenario. It could be valuable to support an additional option that produces distinct output documents for each image file, instead of putting everything into a single JSON array.
src/IO/sem/OutputWriter.py
Outdated
except Exception as e: | ||
logging.error(f"Failed to save {file_path}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'd recommend avoiding generic 'except Exception' whenever possible (I know, I do it sometimes too). How do you know you reacted correctly, if you do not know what your problem was exactly?
- in case of an exception you only log the error, but you do not react accordingly. This should likely raise a MappingAbortionError to be handled elsewhere.
same comment for save zip function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These exceptions have now been used except (FileNotFoundError, PermissionError, IsADirectoryError, OSError, TypeError, ValueError, zipfile.BadZipFile)
, as well as raise MappingAbortionError()
except MappingAbortionError as e: | ||
logging.error(f"MappingAbortionError: {e}") | ||
if reader: | ||
reader.clean_up() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to implement the clean_up function.
Since it is likely identical for tomo and sem, it should ideally be inherited from a common base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a first quick solution, a separate clean_up
function has been added in IO/sem/InputReader
. A common Clean_up
base class will be implement to unify cleanup logic across reader types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more in the direction of an InputReader base class that either provides the interface for the clean_up method or (more likely) even implements it, since it likely always treats a working_dir used by inputReaders in the same way. Maybe there is even more overlap, especially in regards to parser handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InputReader base class has been implement in a distinct branch dev_inputreader_base_class.
mapping_cli.py
Outdated
if reader: | ||
reader.clean_up() | ||
if reader_: | ||
reader_.clean_up() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume the reader_ never needs clean_up because it does not create temporary files, but please check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! reader_
is initialized inside a loop and is used only on already extracted file paths. No need of clean_up on reader_
, so it has been removed.
except MappingAbortionError as e: | ||
#logging.error(f"MappingAbortionError: {e}") | ||
exit(e) | ||
finally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, I was not even aware that 'finally' would run even on exit call. TIL :)
…extension was expected.
Allow zipped files as input for SEM mapping
Changes: