port concepts from mem

[Mem](https://github.com/srp/mem) has a better memoization framework. I think it might be worth considering porting some concepts over. As a long term project, this is more of a note than a real issue, as right now I don't have time and I doubt anyone else is interested. Overview:

Consider the following (mostly equivalent, from a memoizing standpoint) fbuild functions:

```
def obj(ctx, target:fbuild.db.DST, source:fbuild.db.SRC):
def obj(ctx, source:fbuild.db.SRC) -> fbuild.db.DST:
```
1. Fbuild blurs inputs and outputs. The only requirements to enable determinism are: input path, input contents, and output path. However, fbuild uses: input path, input contents, output path, and output path exists. Not only does this confuse the concept of pure, deterministic functions, it has a major drawback (below).
2. Fbuild doesn't handle target modification. For example assume `obj` copies `source`->`target`, in this case `test.in`->`test.out`. Consider:
   
   Initial execution:
   
   ```
   $ for i in test*; do echo "$i"; cat "$i"; done
   test.in
   1
   2
   $ fbuild
   Copying test.in to test.out...
   $ for i in test*; do echo "$i"; cat "$i"; done
   test.in
   1
   2
   test.out
   1
   2
   ```
   
   That was the initial memoization, so not much to see. Let's try the only condition supported by fbuild's `fbuild.db.DST` - removing `test.out`.
   
   ```
   $ rm test.out
   $ for i in test*; do echo "$i"; cat "$i"; done
   test.in
   1
   2
   $ fbuild
   Copying test.in to test.out...
   $ for i in test*; do echo "$i"; cat "$i"; done
   test.in
   1
   2
   test.out
   1
   2
   ```
   
   While the end result is acceptable, unfortunately fbuild had to rerun the `obj` function. Now let's trying modifying `test.out`. As for real-world scenarios, this could easily be an unintended side effect of a build command.
   
   ```
   $ echo "44" >test.out
   $ for i in test*; do echo "$i"; cat "$i"; done
   test.in
   1
   2
   test.out
   4
   $ fbuild
   $ for i in test*; do echo "$i"; cat "$i"; done
   test.in
   1
   2
   test.out
   44
   ```
   
   Well, that's not good at all.
   
   In fact, whether or not the target was removed or modified, the memoized function should **never** be run again. Instead, the target should be restored from the cache if and only if it was modified or removed. Let's compare fbuid to mem:
   - target unmodified: fbuild does not rerun the memoized function. [1/1]
   - target removed: fbuild detects this, but reruns the memoized function. [1/2]
   - target removed: fbuild doesn't detect this. [0/1]
   
   Like fbuild, mem memoizes function outputs. Now obviously no function should be expected to return a byte-for-byte copy of a file, suitable for pickling. Instead, mem introduces an extra processing step if the output object defines the functions `hash`, `store`, and `restore`. If the output hasn't been memoized, mem will call `store()`. If it has, mem wall first call `hash()`. If the hash remains unchanged from the cached version, mem does nothing. Otherwise, it calls `restore()`. For example, this is mem's file class:
   
   ``` python
   class File:
      def __init__(self, path):
          # notice the file's contents won't be serialized
          self.path = path
   
      def __hash__(self):
          """ checksum of self.path """
   
      def __store(self):
          """" store a copy of the file in the build cache """
   
      def __restore(self):
          """ restore the file from the build cache """
   ```
3. Fbuild depends on python annotations to memoize file contents. While helpful, it is also obfuscating and confusing. Why not depend on the standard object-oriented paradigm, like mem does? Not only is this expected, it is less verbose, and simpler:
   
   ``` python
   obj_b(obj_a("file.a", "file.b"))
   @mem.memoize
   def obj_a(source_path_string, target_path_string): 
      # unfortunately, the inputs are python strings, without store()/restore(), and a __hash__() that doesn't depend on contents.
      # so, let's explicitly add a dependency on the path's contents
      mem.add_dep(source_path_string)
      mem.add_dep(target_path_string)
      # process the input, determine the outputs
      output = ...
      return mem.nodes.File(output)
   
   @mem.memoize
   def obj_b(source_path_node):
     # the inputs are already node objects, no need to use mem.add_dep()
     pass
   ```
   
   Now for convenience and backwards compatibility, I do like parameter annotations.
   
   ``` python
   @mem.memoize
   obj_a_alternative(source_path_string:fbuild.file.to_node, target_path_string:fbuild.file.to_node):
      pass
   ```
   
   Also why not add notation to prevent certain parameters from being memoized. Mem acknowledges this as a shortcoming of its design, but also notes that it has never needed such functionality:
   
   ``` python
   @mem.memoize
   obj_c(source_path_string:fbuild.file.to_node, dont_memoize:fbuild.db.ignore):
      pass
   ```
4. Fbuild ties the build environment (compiler flags) to a complicated data structure (`list(tuple(set, dict))`) and a complicated class hierarchy. While this simplifies most build targets, the complexity makes edge-cases more difficult to implement. On the other hand, mem provides a much "flatter" hierarchy.
   1. Mem doesn't differentiate between extraneous and required environment (or environment and command-line options). The merged dictionary of both shell environment and specific flags (overrides) can by passed to any build target function decorated with `mem.util.with_env`:
      
      ``` python
      @mem.util.with_env(CFLAGS=[])         # only pass-in CFLAGS from the environment
      @mem.memoize
      def obj(target, source, CFLAGS):
          pass
      
      obj(target, source, env={k:v for d in (os.environ, {CFLAGS: "-O3"}) for k,v in d.items()})
      ```
      
      The decorator ensures that only the required flags are memoized.
   2. Mem provides a single compile operation, and a single link operation. You just need to make sure you pass the correct flags to each operation, depending on your needs:
      - build, program: `[]`
      - build, static: `[]`
      - build, shared: `["-fPIC"]`
      - link, program: `[]`
      - link, static: `[]`
      - link, shared: `["-shared"]` (at the very minimum)
      
      Compare to fbuild's over-engineered `guess_static` and `guess_shared` with either `build_lib` or `build_exe`. Yes, the `guess_` function has a secondary use of finding the correct compiler, but the process of deciding static/shared then lib/exe makes the class hierarchy more complicated than it should be. An independent class maintaining a database of compiler flags would be more appropriate.
5. Support for building a single object from multiple sources (link-time optimization):
   
   All mem build targets support multiple sources. If the output target is unspecified, instead of compiling an object for each input source, the input sources will be agglomerated (link-time optimization) and a single optimized output target will be produced. Admittedly, because mem is unmaintained, this depends on the outdated '-combine' flag.
6. Just a tiny nitpick, but I find the term "cache" confusing, as the standard and pythonic term is "memoize".

Overall `mem` feels more pythonic. I only mention its advantages but in terms of features - as an unmaintained project - mem lags far behind fbuild.
- path objects
  
  Fbuild overrides `__truediv__` for convenience.
- logging
  
  Fbuild provides loggers, though I'm not too impressed. Setting up tee-styled redirections might be better.
- commands
  
  Using the provided `execute` function is required to log command output. Once again, I'm not impressed. I'd rather use the `subprocess` module directly, and have implicit logging at the program level from tee-style redirections.
- command line options
  
  Fbuild uses argparse, but requires the definition of the `pre_options` function, which is magically loaded. It would be more transparent to explicitly pass the `ArgumentParser` object to fbuild.
- installing files
  
  I'm not sure if `install` is memoized - haven't checked.
- configuration testing
  
  Fbuild really shines here - and I mean _really_.
- command-line targets
  
  This is well implemented - the decorator is sufficient, so interacting directly with argparse isn't necessary.
- python 3
- supports many more builders
- cross platform
  
  While mem is technically also cross platform by virtue of python, per platform code must still be written to handle differences of compiler and environment.

I don't see the point of requiring a context object passed around. If the namespace was becoming too polluted, why not put all configuration into a global container object (sub-module)?

...

With a memoization framework like mem's, it would be possible to support an uninstall target. Even more impressive, uninstall would be able to restore files overwritten during installation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

port concepts from mem #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

port concepts from mem #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions