Enable zipped input/output in SEM mapping #83

gabinoumbe · 2025-08-28T09:23:43Z

Allow zipped files as input for SEM mapping

SEM mapping now accepts zipped files as input, producing zipped JSON mapping documents as output.
When processing a zipped input, only successfully processed files are included in the output archive.

Changes:

Added an OutputWriter for image SEM mapping.
Updated the mapping service plugin to support zipped input/output types.

…file

…aising

GGoetzelmann

Have you tested the plugin with the new feature?
I am unsure how the plugin will work in regards to zip output exactly; I haven't tested it myself yet.

I would highly recommend an integration test both for making sure everything is in order and for documenting how to use the feature with the plugin.

I'd assume the README needs an update documenting the new feature as well.

Before addressing these general remarks, please make sure to check the individual code comments firs (because I left comments about the zip output as well, so we may have one or two open discussions first)

mapping_cli.py

GGoetzelmann · 2025-08-28T12:51:13Z

src/IO/sem/OutputWriter.py

+    def save_to_zip(file_path_list, zip_file_path):
+        try:
+            with zipfile.ZipFile(zip_file_path, 'w', zipfile.ZIP_DEFLATED) as zf:
+                # "ZIP_DEFLATED" is a lossless compression algorithm, meaning no data is lost during the compression process.
+                for file_path in file_path_list:
+                    zf.write(file_path, os.path.basename(file_path))
+            logging.info(f"Files have been zipped into {zip_file_path} sucessfully!")
+
+            # Delete the original files after zipping
+            for file_path in file_path_list:
+                os.remove(file_path)
+                logging.info(f"{file_path} has been deleted.")


Afaik the provided output is used as a zip file even if it is, for example 'output.json'. This seems to be quite confusing, the user may expect json output but gets a zip file (so the json file will look totally broken)

Suggestions:

imho we should fail if the specified output is not zip because it is likely due to a misuse or misunderstanding on the user side.

The idea behind the quite extensive InputReader for TOMO was to fail early whenever possible. Under this paradigm it would be recommended to fail early if the input for SEM is a zip but the output specified is not. (but there is likely a caveat regarding the plugin somewhere either way)

how about not producing zip output at all but sticking to json, returning a json array with schema compliant json objects? Granted the full file would not be schema compliant then and could not be directly used in something like the metadata editor, I'd assume.

The InputReader now also accepts output_path as an argument. A validation check is performed at the top level to ensure the output_path has the correct file extension, specifically to prevent the misuse of the .json extension when a .zip file is expected.

Regarding the case where the output is not a .zip archive but rather a JSON array, I believe the TOMO component appears to be better suited for this scenario. It could be valuable to support an additional option that produces distinct output documents for each image file, instead of putting everything into a single JSON array.

GGoetzelmann · 2025-08-28T13:01:54Z

src/IO/sem/OutputWriter.py

+        except Exception as e:
+            logging.error(f"Failed to save {file_path}: {e}")


I'd recommend avoiding generic 'except Exception' whenever possible (I know, I do it sometimes too). How do you know you reacted correctly, if you do not know what your problem was exactly?

in case of an exception you only log the error, but you do not react accordingly. This should likely raise a MappingAbortionError to be handled elsewhere.

same comment for save zip function

These exceptions have now been used except (FileNotFoundError, PermissionError, IsADirectoryError, OSError, TypeError, ValueError, zipfile.BadZipFile) , as well as raise MappingAbortionError()

GGoetzelmann · 2025-08-28T13:03:41Z

mapping_cli.py

    except MappingAbortionError as e:
+        logging.error(f"MappingAbortionError: {e}")
+        if reader:
+            reader.clean_up()


you need to implement the clean_up function.
Since it is likely identical for tomo and sem, it should ideally be inherited from a common base class

For a first quick solution, a separate clean_up function has been added in IO/sem/InputReader. A common Clean_up base class will be implement to unify cleanup logic across reader types.

I was thinking more in the direction of an InputReader base class that either provides the interface for the clean_up method or (more likely) even implements it, since it likely always treats a working_dir used by inputReaders in the same way. Maybe there is even more overlap, especially in regards to parser handling.

The InputReader base class has been implement in a distinct branch dev_inputreader_base_class.

GGoetzelmann · 2025-08-28T13:06:05Z

mapping_cli.py

+        if reader:
+            reader.clean_up()
+        if reader_:
+            reader_.clean_up()


I would assume the reader_ never needs clean_up because it does not create temporary files, but please check

Correct! reader_ is initialized inside a loop and is used only on already extracted file paths. No need of clean_up on reader_, so it has been removed.

mapping_cli.py

…tionError raises

… reader_

GGoetzelmann · 2025-09-09T06:16:02Z

mapping_cli.py

    except MappingAbortionError as e:
        #logging.error(f"MappingAbortionError: {e}")
+        exit(e)
+    finally:


nice, I was not even aware that 'finally' would run even on exit call. TIL :)

…extension was expected.

…parser

gabinoumbe added 3 commits August 22, 2025 16:52

create an outputwriter for sem mapping and allow input/output zipped …

c51e888

…file

correct the variable name input_file to file_path.name for an error r…

c35c9d2

…aising

Update inputTypes and outputTypes of mapping service plugin

e5cd04f

GGoetzelmann requested changes Aug 28, 2025

View reviewed changes

gabinoumbe added 6 commits August 29, 2025 16:36

remove all direct exit(ERROR_CODE) calls and replace with MappingAbor…

25c1e66

…tionError raises

define clean_up for SEM main reader; remove unnecessary clean_up from…

5cdd205

… reader_

avoid generic 'except Exception'

1e5270e

reinforce the 'except'

8422de8

ensure reader clean up in all cases (succes and failure)

2cccaf2

Update some debug logging messages

3353b0d

GGoetzelmann reviewed Sep 9, 2025

View reviewed changes

gabinoumbe added 5 commits September 12, 2025 14:01

ensure failure when the output path has a .json extension but a .zip …

51f2b47

…extension was expected.

ensure early fail if zip file contains no file with valid applicable …

2347b10

…parser

update test_inputreader_sem

67d80a0

implement integration test for zipped input

80ff70d

update README

f69dff7

		except Exception as e:
		logging.error(f"Failed to save {file_path}: {e}")

Enable zipped input/output in SEM mapping #83

Are you sure you want to change the base?

Enable zipped input/output in SEM mapping #83

Uh oh!

Conversation

gabinoumbe commented Aug 28, 2025

Uh oh!

GGoetzelmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants