Skip to content

Minimal support for reading ASDF files from SourceXtractor++ #24

@embray

Description

@embray

The original SExtractor is out of scope for now, but SourceXtractor++ is maybe more likely to accept patches and be maintainable (so I'm called). It's also already reasonably well-structured to be able to support other input and output file types, though for image files the only input file type it supports is FITS.

From my initial reading of the source code it seems pretty reasonable to add an ASDF image source for sextractor++ input (output is out of scope for this task, but would be possible once write support is added to libasdf).

Note: This task has some subtasks for the minimal (to my thinking) work needed in libasdf to progress on this. Tasks specifically related to the integration into SE++ will be on my fork thereof

Initial analysis of SourceXtractor++ code

In order to provide an ASDF file reader for SourceXtractor++ we need to implement the following interface:

class ImageSource {
public:

  ImageSource() {}

  virtual ~ImageSource() = default;

  /// Human readable representation of this source
  virtual std::string getRepr() const = 0;

  virtual void saveTile(ImageTile& tile) = 0;
  virtual std::shared_ptr<ImageTile> getImageTile(int x, int y, int width, int height) const = 0;


  /// Returns the width of the image in pixels
  virtual int getWidth() const = 0;

  /// Returns the height of the image in pixels
  virtual int getHeight() const = 0;

  virtual ImageTile::ImageType getType() const = 0;

  /**
   * @return A copy of the metadata set
   */
  virtual const std::map<std::string, MetadataEntry>& getMetadata() const { return m_metadata; }

  virtual void setMetadata(const std::string& key, const MetadataEntry& value) {
    m_metadata[key] = value;
  }

private:
  std::map<std::string, MetadataEntry> m_metadata;
};

Where MetaDataEntry is defined

struct MetadataEntry {
  typedef boost::variant<bool, char, int64_t, double, std::string> value_t;

  value_t m_value;

  /// Additional metadata about the entry: i.e. comments
  std::map<std::string, std::string> m_extra;
};

The metadata part is the only part that's very FITS-specific, sadly:

  • It assumes metadata is a flat list of key/value pairs (à la FITS)
  • It assumes the values are the simplest value types that can appear in a FITS header (granted they are the most common anyone would use, double, int64_t, std::string, bool, a single char).

The ImageSource.getMetadata is not used in many places, but the places where it is used--most notably in the detection image config--is also very FITS-specific (looking for specific FITS keywords, particularly GAIN, SATURATE and FLXSCALE. These are not part of the FITS standard but apparently commonly-enough used.

As a first pass I don't think it's worth worrying too much about the metadata. Could still allow reading scalar values from the top-level of the ASDF tree and just reuse the same keywords.

When it comes to WCS it just passes in the whole FITS header and doesn't use this interface. Supporting reading simple WCS info stored in an ASDF file would be entirely possible too though (it is just passing through to wcslib).

As previously mentioned writing to ASDF can be out of scope for now, and is not strictly required for the basic image source interface (rather, the interface does take a writeable flag but using it with ASDF files could raise an exception and simply not be used in parts of the code that write outputs).

Multi-extension FITS files are supported in various places. In all cases HDUs are addressed simply by their index number (non-negative integer, though for some reason typed as int, maybe it supports negative indexing too, not sure). I also found some code that appears to support IRAF-style filename.fits[1] HDU addressing but as far as I can tell it does not actually work?

This could be extended to support std::string as well for addressing an ndarray in an ASDF file (at least from the top-level, or maybe using some JSON Pointer-like syntax). This would be beneficial for FITS as well, to be able to address HDUs by EXTNAME.

Make ASDF support optional

SourceXtractor++ uses CMake and can be configured to detect libasdf availability and only compile ASDF support conditionally, which would make it more likely I think for a patch to be accepted upstream, especially as libasdf does not even have a release yet (though when the below is done I could make an alpha release).

What needs to be done in libasdf

Initial asdf_file_t interface

The existing asdf_parser_t interface that has been (mostly) implemented is intended to be low-level destructuring of the ASDF file. The plan has been to define an asdf_file_t type that is the "high-level" user interface to ASDF files. When reading ASDF files it uses pertinent information from the asdf_parser_t. For the first version this is very simple though, mostly just a wrapper around the parser, and maybe some array of block information.

Read values out of the YAML tree

This is basically just a thin wrapper around libfyaml for now. This status update vaguely discusses plans for handling values in ASDF headers. This is not even needed I think for the SExtractor project at first; will just hard-code some basic reading of ndarray metadata, to later be replaced with reading some ASDF ndarray structs through the extension interface.

Basic reading of block data

Given image dimensions and dtype read from the ndarray metadata we need to be able to copy a tile of some size out of the file into a provided buffer. Nothing resembling a more sophisticated set of array functions from libasdf, just copy some bytes. Again, probably to be replaced later with a more formalized API.

WCS support

Can support basic FITS-style WCS at least, via wcslib. Is there a schema for storing FITS WCS in ASDF? Or is the WCS for Roman close-enough that it can be massaged into it? Need more details here. But otherwise given some schema for finding the right parameters and/or any necessary simple transforms we could massage it into WCSLib.

Here I need more information though to be sure what to do.

Support C++

Include extern "C" {} linkage support in the libasdf headers--not strictly necessary but makes nice.

Is it worth it?

I think so. It's a nice use case to build libasdf around, and provides a not entirely insignificant convenience for users. Since I'm told this is considered rather urgent, an initial prototype can be a bit slapdash, to be improved later as more of libasdf gets fleshed out.

Future steps

If this works, and better yet is accepted by the upstream maintainers, writing catalog files as ASDF would be an obvious next step once write support is added to libasdf.

Some variant of the PSFEx format for PSFs defined in ASDF, and support for reading them.

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestrefactoringChanges to code structure or APIs

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions