-
Notifications
You must be signed in to change notification settings - Fork 204
Description
Sorted by earliest year of reference, limited to experimental entries with fewer than 52 sites: https://figshare.com/articles/dataset/Materials_Project_Time_Split_Data/19991516
How does this seem in terms of a matminer dataset contribution? See How do I do a time-split of Materials Project entries? e.g. pre-2018 vs. post-2018 and sparks-baird/xtal2png#12 (comment) for additional context. Starting to feel like I'm reinventing the wheel by trying to host it myself.
In my own code, I've been running into a strange issue where if I use:
Lines 89 to 177 in 76a529b
def load_dataframe_from_json(filename, pbar=True, decode=True): | |
"""Load pandas dataframe from a json file. | |
Automatically decodes and instantiates pymatgen objects in the dataframe. | |
Args: | |
filename (str): Path to json file. Can be a compressed file (gz and bz2) | |
are supported. | |
pbar (bool): If true, shows an ASCII progress bar for loading data from disk. | |
decode (bool): If true, will automatically decode objects (slow, convenient). | |
If false, will return json representations of the objects (fast, inconvenient). | |
Returns: | |
(Pandas.DataFrame): A pandas dataframe. | |
""" | |
# Progress bar for reading file with hook | |
pbar1 = tqdm(desc=f"Reading file {filename}", position=0, leave=True, ascii=True, disable=not pbar) | |
def is_monty_object(o): | |
""" | |
Determine if an object can be decoded into json | |
by monty. | |
Args: | |
o (object): An object in dict-form. | |
Returns: | |
(bool) | |
""" | |
if isinstance(o, dict) and "@class" in o: | |
return True | |
else: | |
return False | |
def pbar_hook(obj): | |
""" | |
A hook for a pbar reading the raw data from json, not | |
using monty decoding to decode the object. | |
Args: | |
obj (object): A dict-like | |
Returns: | |
obj (object) | |
""" | |
if is_monty_object(obj): | |
pbar1.update(1) | |
sys.stderr.flush() | |
return obj | |
# Progress bar for decoding objects | |
pbar2 = tqdm(desc=f"Decoding objects from {filename}", position=0, leave=True, ascii=True, disable=not pbar) | |
class MontyDecoderPbar(MontyDecoder): | |
""" | |
A pbar-friendly version of MontyDecoder. | |
""" | |
def process_decoded(self, d): | |
if isinstance(d, dict) and "data" in d and "index" in d and "columns" in d: | |
# total number of objects to decode | |
# is the number of @class mentions | |
pbar2.total = str(d).count("@class") | |
elif is_monty_object(d): | |
pbar2.update(1) | |
sys.stderr.flush() | |
return super().process_decoded(d) | |
if decode: | |
decoder = MontyDecoderPbar if pbar else MontyDecoder | |
else: | |
decoder = None | |
hook = pbar_hook if pbar else lambda x: x | |
with zopen(filename, "rb") as f: | |
dataframe_data = json.load(f, cls=decoder, object_hook=hook) | |
pbar1.close() | |
pbar2.close() | |
# if only keys are data, columns, index then orient=split | |
if isinstance(dataframe_data, dict): | |
if set(dataframe_data.keys()) == {"data", "columns", "index"}: | |
return pandas.DataFrame(**dataframe_data) | |
else: | |
return pandas.DataFrame(dataframe_data) |
It returns None
during an uninterrupted debugging run, but if I set a breakpoint and run the line manually in the debug console (VS Code) then it returns the expected DataFrame
.
See https://github.com/sparks-baird/mp-time-split/runs/6739787243?check_suite_focus=true/#step:5:1