Materials Project time split dataset - `load_data_from_json` returns `None` during debugging (conditionally)

Sorted by earliest year of reference, limited to experimental entries with fewer than 52 sites: https://figshare.com/articles/dataset/Materials_Project_Time_Split_Data/19991516

How does this seem in terms of a matminer dataset contribution? See [How do I do a time-split of Materials Project entries? e.g. pre-2018 vs. post-2018](https://matsci.org/t/how-do-i-do-a-time-split-of-materials-project-entries-e-g-pre-2018-vs-post-2018/42584) and https://github.com/sparks-baird/xtal2png/issues/12#issuecomment-1140316833 for additional context. Starting to feel like I'm reinventing the wheel by trying to host it myself.

In my own code, I've been running into a strange issue where if I use:

https://github.com/hackingmaterials/matminer/blob/76a529b769055c729d62f11a419d319d8e2f838e/matminer/utils/io.py#L89-L177

It returns `None` during an uninterrupted debugging run, but if I set a breakpoint and run the line manually in the debug console (VS Code) then it returns the expected `DataFrame`.
See https://github.com/sparks-baird/mp-time-split/runs/6739787243?check_suite_focus=true/#step:5:1



	def load_dataframe_from_json(filename, pbar=True, decode=True):
	"""Load pandas dataframe from a json file.

	Automatically decodes and instantiates pymatgen objects in the dataframe.

	Args:
	filename (str): Path to json file. Can be a compressed file (gz and bz2)
	are supported.
	pbar (bool): If true, shows an ASCII progress bar for loading data from disk.
	decode (bool): If true, will automatically decode objects (slow, convenient).
	If false, will return json representations of the objects (fast, inconvenient).

	Returns:
	(Pandas.DataFrame): A pandas dataframe.
	"""
	# Progress bar for reading file with hook
	pbar1 = tqdm(desc=f"Reading file {filename}", position=0, leave=True, ascii=True, disable=not pbar)

	def is_monty_object(o):
	"""
	Determine if an object can be decoded into json
	by monty.

	Args:
	o (object): An object in dict-form.

	Returns:
	(bool)

	"""
	if isinstance(o, dict) and "@class" in o:
	return True
	else:
	return False

	def pbar_hook(obj):
	"""
	A hook for a pbar reading the raw data from json, not
	using monty decoding to decode the object.

	Args:
	obj (object): A dict-like

	Returns:
	obj (object)

	"""
	if is_monty_object(obj):
	pbar1.update(1)
	sys.stderr.flush()
	return obj

	# Progress bar for decoding objects
	pbar2 = tqdm(desc=f"Decoding objects from {filename}", position=0, leave=True, ascii=True, disable=not pbar)

	class MontyDecoderPbar(MontyDecoder):
	"""
	A pbar-friendly version of MontyDecoder.
	"""

	def process_decoded(self, d):
	if isinstance(d, dict) and "data" in d and "index" in d and "columns" in d:
	# total number of objects to decode
	# is the number of @class mentions
	pbar2.total = str(d).count("@class")
	elif is_monty_object(d):
	pbar2.update(1)
	sys.stderr.flush()
	return super().process_decoded(d)

	if decode:
	decoder = MontyDecoderPbar if pbar else MontyDecoder
	else:
	decoder = None

	hook = pbar_hook if pbar else lambda x: x

	with zopen(filename, "rb") as f:
	dataframe_data = json.load(f, cls=decoder, object_hook=hook)

	pbar1.close()
	pbar2.close()

	# if only keys are data, columns, index then orient=split
	if isinstance(dataframe_data, dict):
	if set(dataframe_data.keys()) == {"data", "columns", "index"}:
	return pandas.DataFrame(**dataframe_data)
	else:
	return pandas.DataFrame(dataframe_data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Materials Project time split dataset - `load_data_from_json` returns `None` during debugging (conditionally) #832

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Materials Project time split dataset - load_data_from_json returns None during debugging (conditionally) #832

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Materials Project time split dataset - `load_data_from_json` returns `None` during debugging (conditionally) #832