Skip to content

Handling data processing functions that output to a grid or table #1536

Open
@weiji14

Description

@weiji14

Description of the issue

In the GMT command-line world, there are some data processing functions that can output to either a NetCDF grid or ASCII table. Translating to Python/PyGMT, do we want to 1) have a single function that can output to both (depending on some flag), or 2) have two functions/methods, one which outputs to a grid, and one which outputs to a table.

This is a list of functions that need to be handled:

Originally posted by @weiji14 in #1433 (comment)

I changed the implementation a bit relative to #731 to support ASCII or pandas.DataFrame output for writing out the equalized histogram.

Still, the code is a bit clunky in order to support four different output types (pandas.DataFrame, xarray.DataArray, netCDF, or ASCII). What would you think about having two PyGMT functions for GMT's grdhisteq module rather than just one? One function could write out the data ranges of histogram equalization to a pd.DataFrame or ASCII table and the other could write out the cumulative distribution statistics to a netCDF file or xarray.DataArray. I guess coming up with the names for these would be harder than the current implementation, but I think it would be more user friendly long-term.

Yeah I've debated a bit on whether to have 2 functions too, something like a pygmt.grdhisteq.to_table() and pygmt.grdhisteq.to_grid() (implemented using Python classmethods), or maybe with an underscore like pygmt.grdhisteq_to_table() and pygmt.grdhisteq_to_grid() (implemented purely using Python functions). Tying this to #1318 (comment), I think the split into 2 may have to happen eventually, especially if we want to support more table-like outputs (ascii/numpy/pandas/geopandas/etc) like what Will is doing at grd2xyz #1284.

Possible implementation styles

These are how the implementation would look like, using triangulate as an example.

Single function

def triangulate(data, outgrid=None, outfile=None):
    pass

Two Python functions

Have a common _triangulate function that handles grid or table outputs, some similarities to the _blockm.

def _triangulate(data, outgrid=None, outfile=None):
    pass

def triangulate_to_grid(data, outgrid=None):
    pass

def triangulate_to_table(data, outfile=None):
    pass

Two methods in a single Python class ✔️

class triangulate:
    def _triangulate():
        pass

    @staticmethod
    def to_grid(data, outgrid=None):
        pass

    @staticmethod
    def to_table(data, outfile=None):
        pass

Are you willing to help implement and maintain this feature? Vote for which API style you prefer!

  • A. 👍 Single function to do both grid/table output, i.e. pygmt.triangulate(outgrid=True) or pygmt.triangulate(outfile=True)
  • B. 🎉 The 'functional' style, i.e. pygmt.triangulate_to_grid() or pygmt.triangulate_to_table()
  • C. 🚀 The 'class' method style, i.e. pygmt.triangulate.to_grid() or pygmt.triangulate.to_table()
  • D. 👀 Other suggestions on the names, or API design, please comment below!

P.S. Also xref #896 where there is a similar API design discussion on wrapping GMT functions that do either plotting or data processing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions