-
Notifications
You must be signed in to change notification settings - Fork 841
packaging #2438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
packaging #2438
Changes from all commits
e2ecc84
871faf9
d11c34c
c706438
9cd8e51
e500118
1bc64a4
01d2c16
47080d2
0b47966
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,18 +14,14 @@ | |
from .parameters import current_flow | ||
from .user_configs.config_decorators import CustomStepDecorator | ||
from .user_configs.config_parameters import ( | ||
ConfigValue, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not used |
||
UNPACK_KEY, | ||
resolve_delayed_evaluator, | ||
unpack_delayed_evaluator, | ||
) | ||
|
||
from metaflow._vendor import click | ||
|
||
try: | ||
unicode | ||
except NameError: | ||
unicode = str | ||
basestring = str | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can pull out Python2 changes in a separate PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. THere are definitely other changes. We can revert this but I think we can leave this here for now and yes, do a separate PR to pull out all the other ones. |
||
|
||
# Contains the decorators on which _init was called. We want to ensure it is called | ||
# only once on each decorator and, as the _init() function below can be called in | ||
|
@@ -189,7 +185,7 @@ def make_decorator_spec(self): | |
# escaping but for more complex types (typically dictionaries or lists), | ||
# we dump using JSON. | ||
for k, v in attrs.items(): | ||
if isinstance(v, (int, float, unicode, basestring)): | ||
if isinstance(v, (int, float, str)): | ||
attr_list.append("%s=%s" % (k, str(v))) | ||
else: | ||
attr_list.append("%s=%s" % (k, json.dumps(v).replace('"', '\\"'))) | ||
|
@@ -315,15 +311,26 @@ def package_init(self, flow, step_name, environment): | |
|
||
def add_to_package(self): | ||
""" | ||
Called to add custom packages needed for a decorator. This hook will be | ||
Called to add custom files needed for a decorator. This hook will be | ||
called in the `MetaflowPackage` class where metaflow compiles the code package | ||
tarball. This hook is invoked in the `MetaflowPackage`'s `path_tuples` | ||
function. The `path_tuples` function is a generator that yields a tuple of | ||
`(file_path, arcname)`.`file_path` is the path of the file in the local file system; | ||
the `arcname` is the path of the file in the constructed tarball or the path of the file | ||
after decompressing the tarball. | ||
|
||
Returns a list of tuples where each tuple represents (file_path, arcname) | ||
tarball. This hook can return one of two things: | ||
- a generator yielding a tuple of `(file_path, arcname)` to add files to | ||
the code package. `file_path` is the path to the file on the local filesystem | ||
and `arcname` is the path relative to the packaged code. | ||
- a generator yielding a tuple of `(content, arcname, type)` where: | ||
- type is a AddToPackageType | ||
- for CODE_FILE: | ||
- content: path to the file to include | ||
- arcname: path relative to the code directory in the package | ||
- for CODE_MODULE: | ||
- content: name of the module | ||
- arcame: None (ignored) | ||
- for CONFIG_FILE: | ||
- content: path to the file to include | ||
- arcname: path relative to the config directory in the package | ||
- for CONFIG_CONTENT: | ||
- content: bytes to include | ||
- arcname: path relative to the config directory in the package | ||
Comment on lines
+316
to
+333
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is quite funky - returning one of two things with different types. need to fix this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you suggest? As I mentioned:
|
||
""" | ||
return [] | ||
|
||
|
@@ -685,12 +692,8 @@ def foo(self): | |
f.is_step = True | ||
f.decorators = [] | ||
f.config_decorators = [] | ||
try: | ||
# python 3 | ||
f.name = f.__name__ | ||
except: | ||
# python 2 | ||
f.name = f.__func__.func_name | ||
f.wrappers = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is actually not related to this PR. It slipped through. |
||
f.name = f.__name__ | ||
return f | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
# This file serves as a __init__.py for metaflow_extensions when it is packaged | ||
# and needs to remain empty. | ||
# This file serves as a __init__.py for metaflow_extensions or metaflow | ||
# packages when they are packaged and needs to remain empty. |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
import json | ||
import os | ||
|
||
from enum import Enum | ||
from typing import Any, Dict, Optional, Union | ||
|
||
from .util import get_metaflow_root | ||
|
||
_info_file_content = None | ||
_info_file_present = None | ||
_included_dist_info = None | ||
_included_dist_present = None | ||
|
||
# Ideally these would be in package/mfenv.py but that would cause imports to fail so | ||
# moving here. The reason is that this is needed to read extension information which needs | ||
# to happen before mfenv gets packaged. | ||
|
||
MFENV_DIR = ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. change MFENV to MFCODE. we already have --environment and @Environment - MFENV will only lead to more confusion here and everywhere. |
||
".mfenv" # Directory containing "system" code (metaflow and user dependencies) | ||
) | ||
MFCONF_DIR = ".mfconf" # Directory containing Metaflow's configuration files | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. consider changing the name to something else since it is neither metaflow config not the new config feature There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we can have |
||
MFENV_MARKER = ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can't this be just in the info file? will simplify things quite a bit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would make it less flexible. The marker tells you how to access the info file. The info file is not in the same place already. |
||
".mfenv_install" # Special file containing metadata about how Metaflow is packaged | ||
) | ||
|
||
|
||
class MetaFile(Enum): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for future generations looking at the code, it will be immensely useful to call out in detail what is included in these files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. I can add this. We didn't add any file (except dist info) but can make it more clear as part of making the code better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw, do we need to have three separate files? any value in merging all/some? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can definitely have them all as one. I vaguely remember you being in favor of separate files in the past but I have no strong preference. |
||
INFO_FILE = "INFO" | ||
CONFIG_FILE = "CONFIG_PARAMETERS" | ||
INCLUDED_DIST_INFO = "INCLUDED_DIST_INFO" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can clarify: it means the information about any distribution that is included. So no, there is no excluded one. I can just hame it DIST_INFO. |
||
|
||
|
||
def meta_file_name(name: Union[MetaFile, str]) -> str: | ||
if isinstance(name, MetaFile): | ||
return name.value | ||
return name | ||
|
||
|
||
def generic_get_filename( | ||
name: Union[MetaFile, str], is_meta: Optional[bool] = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is_meta is not actually used. consider dropping |
||
) -> Optional[str]: | ||
# We are not in a MFEnv package (so this is an old style package). Everything | ||
# is at metaflow root. There is no distinction between code and config. | ||
real_name = meta_file_name(name) | ||
|
||
path_to_file = os.path.join(get_metaflow_root(), real_name) | ||
if os.path.isfile(path_to_file): | ||
return path_to_file | ||
return None | ||
|
||
|
||
def v1_get_filename( | ||
name: Union[MetaFile, str], | ||
meta_info: Dict[str, Any], | ||
is_meta: Optional[bool] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how does this work if i am interested in a file that unfortunately clashes with an internal reserved name? also, is_meta actually useful? what if we simply had another method that provided the contents of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps by setting is_meta to false actively somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let me check on that. The idea of putting stuff in mfconf is that they won't clash with code. |
||
) -> Optional[str]: | ||
if is_meta is None: | ||
is_meta = isinstance(name, MetaFile) | ||
if is_meta: | ||
conf_dir = meta_info.get("conf_dir") | ||
if conf_dir is None: | ||
raise ValueError( | ||
"Invalid package -- package info does not contain conf_dir key" | ||
) | ||
return os.path.join(conf_dir, meta_file_name(name)) | ||
# Not meta -- so code | ||
code_dir = meta_info.get("code_dir") | ||
if code_dir is None: | ||
raise ValueError( | ||
"Invalid package -- package info does not contain code_dir key" | ||
) | ||
return os.path.join(code_dir, meta_file_name(name)) | ||
|
||
|
||
get_filname_map = {1: v1_get_filename} | ||
|
||
|
||
def get_filename( | ||
name: Union[MetaFile, str], is_meta: Optional[bool] = None | ||
) -> Optional[str]: | ||
if os.path.exists(os.path.join(get_metaflow_root(), MFENV_MARKER)): | ||
with open( | ||
os.path.join(get_metaflow_root(), MFENV_MARKER), "r", encoding="utf-8" | ||
) as f: | ||
meta_info = json.load(f) | ||
version = meta_info.get("version") | ||
if version not in get_filname_map: | ||
raise ValueError( | ||
"Unsupported packaging version '%s'. Please update Metaflow" % version | ||
) | ||
Comment on lines
+87
to
+90
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will we actually ever hit this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes -- if a newer version of Metaflow has a newer packaging method and an old client is used to access stuff. |
||
return get_filname_map[version](name, meta_info, is_meta) | ||
return generic_get_filename(name, is_meta) | ||
|
||
|
||
def read_info_file(): | ||
# The info file is a bit special because it needs to be read to determine what | ||
# extensions to load. We need to therefore not load anything yet. This explains | ||
# the slightly wheird code structure where there is a bit of the file loading logic | ||
# here that is then used in MFEnv (it logically belongs in MFEnv but that file can't | ||
# be loaded just yet). | ||
global _info_file_content | ||
global _info_file_present | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need this extra global
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can make that change. Technically there is an extra global too (the |
||
|
||
if _info_file_present is None: | ||
info_filename = get_filename(MetaFile.INFO_FILE) | ||
if info_filename is not None: | ||
with open(info_filename, "r", encoding="utf-8") as f: | ||
_info_file_content = json.load(f) | ||
_info_file_present = True | ||
else: | ||
_info_file_present = False | ||
if _info_file_present: | ||
return _info_file_content | ||
return None | ||
|
||
|
||
def read_included_dist_info(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. consider renaming to read_dist_info() |
||
global _included_dist_info | ||
global _included_dist_present | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above |
||
|
||
from metaflow.package.mfenv import MFEnv | ||
|
||
if _included_dist_present is None: | ||
c = MFEnv.get_content(MetaFile.INCLUDED_DIST_INFO) | ||
if c is not None: | ||
_included_dist_info = json.loads(c.decode("utf-8")) | ||
_included_dist_present = True | ||
else: | ||
_included_dist_present = False | ||
if _included_dist_present: | ||
return _included_dist_info | ||
return None |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -460,6 +460,7 @@ | |
"stubgen", | ||
"userconf", | ||
"conda", | ||
"package", | ||
] | ||
|
||
for typ in DEBUG_OPTIONS: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,8 @@ | |
from metaflow.exception import MetaflowException | ||
from metaflow.extension_support import dump_module_info | ||
from metaflow.mflog import BASH_MFLOG, BASH_FLUSH_LOGS | ||
|
||
from .meta_files import MFENV_DIR | ||
from . import R | ||
|
||
|
||
|
@@ -49,8 +51,26 @@ def bootstrap_commands(self, step_name, datastore_type): | |
|
||
def add_to_package(self): | ||
""" | ||
A list of tuples (file, arcname) to add to the job package. | ||
`arcname` is an alternative name for the file in the job package. | ||
Called to add custom files needed for this environment. This hook will be | ||
called in the `MetaflowPackage` class where metaflow compiles the code package | ||
tarball. This hook can return one of two things: | ||
- a generator yielding a tuple of `(file_path, arcname)` to add files to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a bit funky, we shouldn't return two types of tuples There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. already addressed above with a question. |
||
the code package. `file_path` is the path to the file on the local filesystem | ||
and `arcname` is the path relative to the packaged code. | ||
- a generator yielding a tuple of `(content, arcname, type)` where: | ||
- type is a AddToPackageType | ||
- for CODE_FILE: | ||
- content: path to the file to include | ||
- arcname: path relative to the code directory in the package | ||
- for CODE_MODULE: | ||
- content: name of the module | ||
- arcame: None (ignored) | ||
- for CONFIG_FILE: | ||
- content: path to the file to include | ||
- arcname: path relative to the config directory in the package | ||
- for CONFIG_CONTENT: | ||
- content: bytes to include | ||
- arcname: path relative to the config directory in the package | ||
""" | ||
return [] | ||
|
||
|
@@ -177,6 +197,7 @@ def get_package_commands(self, code_package_url, datastore_type): | |
"after 6 tries. Exiting...' && exit 1; " | ||
"fi" % code_package_url, | ||
"TAR_OPTIONS='--warning=no-timestamp' tar xf job.tar", | ||
"export PYTHONPATH=`pwd`/%s:$PYTHONPATH" % MFENV_DIR, | ||
"mflog 'Task is starting.'", | ||
"flush_mflogs", | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be cleaned up better -
MFEnv.from_archive(self._tar).get_info()