Skip to content

Inform stage doesn't work with Parquet input files #20

@hdante

Description

@hdante

Hello, when running the inform procedure with a Parquet input file, I get the following error:

(base) [henrique.almeida@loginapl01 henrique.almeida]$ rail-train -a tpz train3.pq estimator_tpz.new.pkl 
Start: 2024-04-25 16:33:58.228075
Estimator algorithm: tpz
Bins: 301
HDF5 group name: ""
Column template for magnitude data: "mag_{band}"
Column template for error data: "magerr_{band}"
Starting setup.
Loading all program modules...
Configuring trainer...
Loading input file...
column_list None
Setup done.
Starting training.
self._parallel is mpi, number of processors we will use is 1
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
using native TPZ decision trees
Traceback (most recent call last):
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-train", line 182, in <module>
    if __name__ == '__main__': main()
                               ^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-train", line 173, in main
    train(cfg, ctx)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-train", line 162, in train
    ctx.trainer.inform(ctx.input)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/informer.py", line 65, in inform
    self.run()
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py", line 190, in run
    npdata = np.array(list(training_data.values()))
                           ^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'numpy.ndarray' object is not callable

Example Parquet file attached.
train3.pq.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions