-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Mem has a better memoization framework. I think it might be worth considering porting some concepts over. As a long term project, this is more of a note than a real issue, as right now I don't have time and I doubt anyone else is interested. Overview:
Consider the following (mostly equivalent, from a memoizing standpoint) fbuild functions:
def obj(ctx, target:fbuild.db.DST, source:fbuild.db.SRC):
def obj(ctx, source:fbuild.db.SRC) -> fbuild.db.DST:
-
Fbuild blurs inputs and outputs. The only requirements to enable determinism are: input path, input contents, and output path. However, fbuild uses: input path, input contents, output path, and output path exists. Not only does this confuse the concept of pure, deterministic functions, it has a major drawback (below).
-
Fbuild doesn't handle target modification. For example assume
obj
copiessource
->target
, in this casetest.in
->test.out
. Consider:Initial execution:
$ for i in test*; do echo "$i"; cat "$i"; done test.in 1 2 $ fbuild Copying test.in to test.out... $ for i in test*; do echo "$i"; cat "$i"; done test.in 1 2 test.out 1 2
That was the initial memoization, so not much to see. Let's try the only condition supported by fbuild's
fbuild.db.DST
- removingtest.out
.$ rm test.out $ for i in test*; do echo "$i"; cat "$i"; done test.in 1 2 $ fbuild Copying test.in to test.out... $ for i in test*; do echo "$i"; cat "$i"; done test.in 1 2 test.out 1 2
While the end result is acceptable, unfortunately fbuild had to rerun the
obj
function. Now let's trying modifyingtest.out
. As for real-world scenarios, this could easily be an unintended side effect of a build command.$ echo "44" >test.out $ for i in test*; do echo "$i"; cat "$i"; done test.in 1 2 test.out 4 $ fbuild $ for i in test*; do echo "$i"; cat "$i"; done test.in 1 2 test.out 44
Well, that's not good at all.
In fact, whether or not the target was removed or modified, the memoized function should never be run again. Instead, the target should be restored from the cache if and only if it was modified or removed. Let's compare fbuid to mem:
- target unmodified: fbuild does not rerun the memoized function. [1/1]
- target removed: fbuild detects this, but reruns the memoized function. [1/2]
- target removed: fbuild doesn't detect this. [0/1]
Like fbuild, mem memoizes function outputs. Now obviously no function should be expected to return a byte-for-byte copy of a file, suitable for pickling. Instead, mem introduces an extra processing step if the output object defines the functions
hash
,store
, andrestore
. If the output hasn't been memoized, mem will callstore()
. If it has, mem wall first callhash()
. If the hash remains unchanged from the cached version, mem does nothing. Otherwise, it callsrestore()
. For example, this is mem's file class:class File: def __init__(self, path): # notice the file's contents won't be serialized self.path = path def __hash__(self): """ checksum of self.path """ def __store(self): """" store a copy of the file in the build cache """ def __restore(self): """ restore the file from the build cache """
-
Fbuild depends on python annotations to memoize file contents. While helpful, it is also obfuscating and confusing. Why not depend on the standard object-oriented paradigm, like mem does? Not only is this expected, it is less verbose, and simpler:
obj_b(obj_a("file.a", "file.b")) @mem.memoize def obj_a(source_path_string, target_path_string): # unfortunately, the inputs are python strings, without store()/restore(), and a __hash__() that doesn't depend on contents. # so, let's explicitly add a dependency on the path's contents mem.add_dep(source_path_string) mem.add_dep(target_path_string) # process the input, determine the outputs output = ... return mem.nodes.File(output) @mem.memoize def obj_b(source_path_node): # the inputs are already node objects, no need to use mem.add_dep() pass
Now for convenience and backwards compatibility, I do like parameter annotations.
@mem.memoize obj_a_alternative(source_path_string:fbuild.file.to_node, target_path_string:fbuild.file.to_node): pass
Also why not add notation to prevent certain parameters from being memoized. Mem acknowledges this as a shortcoming of its design, but also notes that it has never needed such functionality:
@mem.memoize obj_c(source_path_string:fbuild.file.to_node, dont_memoize:fbuild.db.ignore): pass
-
Fbuild ties the build environment (compiler flags) to a complicated data structure (
list(tuple(set, dict))
) and a complicated class hierarchy. While this simplifies most build targets, the complexity makes edge-cases more difficult to implement. On the other hand, mem provides a much "flatter" hierarchy.-
Mem doesn't differentiate between extraneous and required environment (or environment and command-line options). The merged dictionary of both shell environment and specific flags (overrides) can by passed to any build target function decorated with
mem.util.with_env
:@mem.util.with_env(CFLAGS=[]) # only pass-in CFLAGS from the environment @mem.memoize def obj(target, source, CFLAGS): pass obj(target, source, env={k:v for d in (os.environ, {CFLAGS: "-O3"}) for k,v in d.items()})
The decorator ensures that only the required flags are memoized.
-
Mem provides a single compile operation, and a single link operation. You just need to make sure you pass the correct flags to each operation, depending on your needs:
- build, program:
[]
- build, static:
[]
- build, shared:
["-fPIC"]
- link, program:
[]
- link, static:
[]
- link, shared:
["-shared"]
(at the very minimum)
Compare to fbuild's over-engineered
guess_static
andguess_shared
with eitherbuild_lib
orbuild_exe
. Yes, theguess_
function has a secondary use of finding the correct compiler, but the process of deciding static/shared then lib/exe makes the class hierarchy more complicated than it should be. An independent class maintaining a database of compiler flags would be more appropriate. - build, program:
-
-
Support for building a single object from multiple sources (link-time optimization):
All mem build targets support multiple sources. If the output target is unspecified, instead of compiling an object for each input source, the input sources will be agglomerated (link-time optimization) and a single optimized output target will be produced. Admittedly, because mem is unmaintained, this depends on the outdated '-combine' flag.
-
Just a tiny nitpick, but I find the term "cache" confusing, as the standard and pythonic term is "memoize".
Overall mem
feels more pythonic. I only mention its advantages but in terms of features - as an unmaintained project - mem lags far behind fbuild.
-
path objects
Fbuild overrides
__truediv__
for convenience. -
logging
Fbuild provides loggers, though I'm not too impressed. Setting up tee-styled redirections might be better.
-
commands
Using the provided
execute
function is required to log command output. Once again, I'm not impressed. I'd rather use thesubprocess
module directly, and have implicit logging at the program level from tee-style redirections. -
command line options
Fbuild uses argparse, but requires the definition of the
pre_options
function, which is magically loaded. It would be more transparent to explicitly pass theArgumentParser
object to fbuild. -
installing files
I'm not sure if
install
is memoized - haven't checked. -
configuration testing
Fbuild really shines here - and I mean really.
-
command-line targets
This is well implemented - the decorator is sufficient, so interacting directly with argparse isn't necessary.
-
python 3
-
supports many more builders
-
cross platform
While mem is technically also cross platform by virtue of python, per platform code must still be written to handle differences of compiler and environment.
I don't see the point of requiring a context object passed around. If the namespace was becoming too polluted, why not put all configuration into a global container object (sub-module)?
...
With a memoization framework like mem's, it would be possible to support an uninstall target. Even more impressive, uninstall would be able to restore files overwritten during installation.