Rework Schema Parsing Functions Logic #78

LswaN58 · 2025-01-13T22:40:12Z

LswaN58
Jan 13, 2025
Maintainer

Parsing Approach

Currently, parsing logic for ParseDict across Schema classes is a little wonky. We should have built-in parsing functions for major data types, which can be used directly, and leave Schema subclasses to make own implementations if needed for rarer data types.

Old Approach

Right now, we have parsing generally split between a base-level function called ElementFromDict, and individual, per-class-element functions of the form _parse<ElementName>, e.g. _parseTableName or _parseProjectID.

ElementFromDict takes in the dictionary being parsed, an optional logger, the name (or names) of the element to pull out, a function to parse those elements, and a default value:
```
def ElementFromDict(all_elements, logger, element_names, parser_function, default_value):
   ...
```
This function is responsible for checking that the element exists in the dictionary, and for using the default if the desired name(s) is/are missing.
It will try each element in the element_names list, in order, until it finds one in the dictionary.
If none of the items in element_names is a valid key in all_elements, the function will give a warning and use the default value.
The parser_function is one of those of the form _parse<ElementName>, discussed next.
ElementFromDict will call the parse function, passing in the element retrieved from the dict.
_parse<ElementName> takes just a single value, and is hardcoded to attempt to parse that value as a given type. The parse attempt may vary based on the type of the given value, e.g. if the parse function is meant to return an int, its behavior may be different for a str given value than a float given value.

New Approach

Instead of having a weird recursive relationship, where subclasses define parse functions, which are run by a general function from the parent class, we'll keep everything... well, bottom-up from the perspective of class hierarchy.

Each class will still implement a function for each individual item to parse.
However, this will effectively just be a wrapper to call the parent helper function.
These individual functions define the valid names to search for in the dict, the default value, and which type parser to use.

The helper would then do what it currently does, calling the appropriate parsing function after looping over all valid keys to search for and finding one.
The parsing functions will be on a type-by-type basis; could actually have this operate on some kind of match-case potentially, taking just a type rather than needing the caller to pass in a parser function.
In particular, this match-case could work with a matching like the following:

import builtins

def ParseValue(all_elements, valid_keys, type) -> Any:
   for key in valid_keys:
      # find a key in all_elements that is in valid_keys
   match (type):
      case builtins.str:
         # parse a string
      case builtins.int:
         # parse an int
      case builtins.float:
         # parse a float
      case _:
         # warn that the requested type isn't recognized, return raw value

Note that some experimentation will be needed to figure out how to handle types that are not Python builtins, e.g. datetimes.

With this done, the FromDict of each class will effectively just be a series of calls to the private functions, but with even fewer variables to pass along.
Hopefully, then, these become very simple functions.

Further, __init__ functions should then use the parse functions as fallbacks if there is no value given for a particular param.

Parsing `FromDict` Functionality

The changes above allow us to better solve an existing issue with how FromDict functions.
Have so far had a hell of a time coming up with a good way to useFromDict approach to parsing/instantiating Schemas in a way that accomplishes the following two important things:

Schema subclasses cleanly utilize parent class logic for parsing parent's required items from dictionaries, without redundant implementation
External callers of the Schema class constructors have a set of parameters to pass, rather than needing to know a dict format
Parsing never uses dicts as intermediaries between functions, which removes good linter type-checking.

Old approach

Currently, each Schema class is supposed to just implement a FromDict that parses everything needed from a dict, and return an instance of the class.
However, suppose we have a hierarchy where SchemaB inherits from SchemaB:

classDiagram

SchemaA <|-- SchemaB

Then SchemaB would need to redundantly parse out everything needed by SchemaA to call the super constructor, e.g.

class SchemaA(Schema):
   def __init__(self, name:str, a, other_elements:Dict[str,Any]):
       self._a = a
       super().__init__(name=name, other_elements=other_elements)
    
   @classmethod
   def FromDict(cls, name:str, all_elements:Dict[str, Any]):
      _a = all_elements['a']
      _leftovers = # all_elements minus element 'a'
      return SchemaA(name=name, a=_a, other_elements=_leftovers)

class SchemaB(SchemaA):
   def __init__(self, name:str, a, b, other_elements:Dict[str,Any]):
       self._b = b
       super().__init__(name=name, a=a, other_elements=other_elements)
    
   @classmethod
   def FromDict(cls, name:str, all_elements:Dict[str, Any]):
      _a = all_elements['a']
      _b = all_elements['b']
      _leftovers = # all_elements minus elements 'a' and 'b'
      return SchemaB(name=name, a=_a, b=_b, other_elements=_leftovers)

Redundancy isn't good, so we try to separate this out by adding a second layer, where there is a _fromDict that accepts pre-parsed params to pass directly to constructor, after parsing out subclass elements:

class SchemaA(Schema):
   def __init__(self, name:str, a, other_elements:Dict[str,Any]):
       self._a = a
       super().__init__(name=name, other_elements=other_elements)
    
   @classmethod
   def FromDict(cls, name:str, all_elements:Dict[str, Any]):
      _a = all_elements['a']
      _leftovers = # all_elements minus element 'a'
      return cls._fromDict(name=name, a=_a, other_elements=_leftovers)

   @classmethod
   @abstractmethod
   def _fromDict(cls, name:str, a, other_elements:Dict[str,Any]):
      pass

class SchemaB(SchemaA):
   def __init__(self, name:str a, b, other_elements:Dict[str,Any]):
       self._b = b
       super().__init__(name=name, a=a, other_elements=other_elements)
    
   @classmethod
   def _fromDict(cls, name:str, a, other_elements:Dict[str, Any]):
      _b = other_elements['b']
      _leftovers = # other_elements minus element 'b'
      return SchemaB(name=name, a=a, b=_b, other_elements=_leftovers)

However, this would require an additional level of "from dict" sub-function for each level of the hierarchy, and that's not clean at all.

An even older approach was to just have the __init__ function act as a de facto FromDict, where all required elements are parsed from a dictionary.
However, this makes for really, really awkward functions that need to call the constructor but don't have a dictionary ready to go.
Writing such functions requires checking and double-checking exactly what is parsed by the init, to understand what keys to use in a custom-built dictionary.
That's not any fun, and it is much, much nicer to just use actual parameters, where linting and autocomplete help ensure the right data is passed in.

New Approach

With the changes outlined to parser functions, we'll have a situation where __init__ functions can easily use a dictionary to look for parameters that were not directly passed in.
Thus, we simply change conventions to have all class-specific parameters to __init__ marked as Optional.
Within the FromDict, we can then call only the specific functions for the class, and pass in None for all the parent params.
The parent __init__ will then get None in those spots, and call its own parser functions as fallbacks.

Alternately, probably better, would be to just go ahead and call parent parser functions right within the FromDict, since each should be a very simple one-liner.
In that case, we can pretty directly pass along everything from the Dict.
The only thing we really lose here is the ability to have a super-easy "leftovers" variable to let us remove elements from the dictionary.
However, this could be done by just having the parse function del unparsed_elements[key] for whatever key actually gets used.

LswaN58 · 2025-04-25T18:36:15Z

LswaN58
Apr 25, 2025
Maintainer Author

This proposal was implemented in #91

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework Schema Parsing Functions Logic #78

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Rework Schema Parsing Functions Logic #78

Uh oh!

Uh oh!

LswaN58 Jan 13, 2025 Maintainer

Parsing Approach

Old Approach

New Approach

Parsing FromDict Functionality

Old approach

New Approach

Replies: 1 comment

Uh oh!

LswaN58 Apr 25, 2025 Maintainer Author

LswaN58
Jan 13, 2025
Maintainer

Parsing `FromDict` Functionality

LswaN58
Apr 25, 2025
Maintainer Author