What is a "field"? #406
Replies: 11 comments 4 replies
-
Dear Patrick, Thanks for these thoughts. I agree that some consistency in this area would be a good thing. When the data model was created, it was thought that some of its language could be used in the main conventions document, but that was a task that was never pursued, probably because it is tricky! One of the reasons for this was that we have to be very careful to not change the meaning with any new phrasing. In your example on cell methods, for instance, "Each ”name: method” pair indicates that for an axis of the data variable identified by name" is not quite right, as the name does not have to be a dimension of the data variable; the new term "computed cell values" is itself ambiguous (at least to me - is a single thermometer measurement "computed"?); the new text removes the explicit connection between cell_methods telling us what the characteristic of the given values is (although it is implied a couple of sentences on), etc. I'm not suggesting that the original text is perfect (for this example or elsewhere), just that if we want to change it we would have to be very careful not to subvert it! Cheers, |
Beta Was this translation helpful? Give feedback.
-
Dear David, I hope I am not coming across as subversive... It is certainly not my intent! Perhaps we could start by defining "data variable" formally, as well as "field" if that term is to remain in the document? I personally like the term "field" as it also used for physical quantities in meteorology and climatology, as well as other sciences of interest to this community. Following that definition we can then go through the conventions and identify where the language may be amended to be more clear and unambiguous. One thing that keeps tripping me up is the use of the term "variable" without further qualification. Perhaps unfortunately, netCDF uses the term for one of its data structures and in the conventions there are now quite a few variants so being explicit about the variant would be useful to the reader who is not as well versed in the conventions text as you are (or even me). Best, |
Beta Was this translation helpful? Give feedback.
-
It's also a term used in database lingo -- a "field" is a single value as part of a "record" -- In that sense, a DB field would be analogous to a netCDF variable, actually -- though the data models really don't match well. I don't think that's how we're using in in CF, though.
Indeed -- and given CF's roots in netCDF, ideally, we'd use the word variable in the same way as netCDF: that is, an array of data with associated attributes. In the text: "To describe the characteristic of a field that is represented by cell values" implies to me that "field" is the geophysical meaning -- some property that spans a area -- wow, that's hard to write a definition for! In short -- yes, it's a great idea to define these terms (and clarify what they don't mean -- e.g. in CF, a "field" is not a field in teh database sense. And to keep the definitions consistent throughout the document. The terminology section seems a good place for this: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.12/cf-conventions.html#terminology I note that in that section, "variable" seems to refer specifically to a netCDF variable. @davidhassell: oddly, I can't seem to find the CF data model written out anywhere -- where can that be found? Maybe these definition should be there. |
Beta Was this translation helpful? Give feedback.
-
Thanks! not sure how I didn't find that myself. I note in: Table I.1. The elements of the CF-netCDF conventions The word "variable" is used a LOT -- but not defined. Maybe it should be??? "field" is: Field: Scientific data discretised within a domain |
Beta Was this translation helpful? Give feedback.
-
By "field" I think we mean "field construct" in the sense of the CF data model, which says "A field construct corresponds to a CF-netCDF data variable with all of its metadata." By "variable" we mean netCDF variable (since this is a netCDF convention). The first sentence which Patrick quotes means "data variable" where it says "variable" i.e. a netCDF variable with a particular role. Out of context, "variable" is unclear, but I suppose the sentence was written with "variable" because the meaning "data variable" seemed obvious from the context. Certainly we should clarify "variable" anywhere it's not currently clear what sort of variable we mean in the context. I agree that it would be helpful to add definitions of field, variable and data variable to the Terminology, and a few other netCDF and CF data model terms. Possibly, as David mentions, there are some other CF data model terms we could use in the convention to improve clarity. For instance we could write "dimension coordinate variable" instead of plain "coordinate variable" when we mean "coordinate variable" in the NUG sense (a 1D variable with the same name as its dimension), and "generic coordinate variable" when we mean dimension coordinate variable, auxiliary coordinate variable or scalar coordinate variable. |
Beta Was this translation helpful? Give feedback.
-
@ChrisBarker-NOAA Chris wrote:
In the CF data model, Figure I-1, there are 11 CF constructs that have an "is-a" relationship to the netCDF variable. It is for this reason that I suggest the conventions are explicit about which of the 12 types of variable is meant in the text. There are multiple locations where this is not obvious. Additionally, not every CF variable contains an array, such as a grid mapping variable. |
Beta Was this translation helpful? Give feedback.
-
@JonathanGregory wrote:
However, the first occurrence of "field" in the conventions is in section 1.1:
Hence my initial question. To me, a useful abstraction from the real world to a netCDF file would go more-or-less like this:
The "field" would be the object that we define |
Beta Was this translation helpful? Give feedback.
-
Having looked through the CF standard, I would say there are some places where it would be better to write "data variable" than "field". "Data variable" is used quite a lot but not defined because it's a NUG concept. However, it's quite likely that people read CF without having read the NUG; therefore it would be helpful not to depend on the NUG. As I said above, I think there are a few terms we could add to Sect 1.3 Terminology. One such term is "field", as defined and used by the CF data model, as an abstraction corresponding to data variable + metadata in CF-netCDF. The CF document also uses the word "field" with the sense it has in physical science. That's what it means at the very first use of the word, in Sect 1.1, which Patrick quotes. For this meaning, wiktionary says, "A physical phenomenon (such as force, potential or fluid velocity) that pervades a region; a mathematical model of such a phenomenon that associates each point and time with a scalar, vector or tensor quantity." The word "field" is often used with this sense in atmosphere and ocean science. That's why it was chosen for the CF data model. Should this "ordinary" use of the word also be defined in the CF document? |
Beta Was this translation helpful? Give feedback.
-
Very much so -- yet -- we should define terms like this inline, for sure. As a rule, I'd like to see the CF docs move away from references that are "not inline" like the NUG, and outdated, like COORDS. That is, the CF docs should stand alone as much as possible, and be relevant on its own. whether or not something is done because its specified by the NUG, or because that's what COORDS is isn't that important to current readers. Back to the topic at hand: sorry I don't have the time to go through the doc now for specifics, but I support: A) Defining all terms (like field) in the CF docs. That being said, I don't care too much exactly which way (of the ones proposed) the specific terms are defined. |
Beta Was this translation helpful? Give feedback.
-
This is new to me. I have never seen this term in any netCDF documentation. I was always under the assumption that it was specific to CF and that is certainly how it is being presented in the CF data model. The concept of "field" may be defined in terms of how it is used in the CF data model, but that leaves open the "ordinary" use that Jonathan mentioned, and my second level of abstraction. I think it would be useful to have a generic term for physical phenomena, processes, feature, in short the thing that a |
Beta Was this translation helpful? Give feedback.
-
"Data variable" appears once in the NUG of 2008. Maybe we didn't adopt this term from there (as I had thought), but from COARDS, which has a section entitled "Data variable attributes". It doesn't define "data variable"; presumably they supposed its meaning to be obvious, in the context. It would be useful to include a definition of "data variable", I agree. For instance, a data variable is a netCDF variable which contains the data for a field (in the "ordinary" sense), and which may have coordinate variables or auxiliary coordinate variables to locate values of the field within the space, time or other dimensions of its domain. The purpose of all CF metadata is to describe the contents of data variables, except for some netCDF global attributes which refer to the netCDF file as a whole rather than the data variables in it. Patrick asks what property of a field a standard name describes. In Sect 3 we say it identifies the "physical quantity". I would say that "quantity" is another "ordinary" term of physical science. For this meaning of "quantity", wiktionary says, "Property of a phenomenon, body, or substance, where the property has a magnitude that can be expressed as number and a reference." In terms of CF, the number is what goes in the data variable's array, and the reference is given by its units. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question
The conventions use the term “field” about 15 times, not counting “bit field” in section 3.5 and “field construct” in Appendix I, but it is not a defined term in section 1.3. So what is a field? From context, it is very close to a “data variable” (which is also lacking a definition in section 1.3), but from a data science perspective they are not the same. As a first attempt at relating the two terms more formally: a data variable is a CF construct to capture the data that embodies the field, including its ancillary data such as axes and attributes. In other words, the field is the physical phenomenon whose quantities and properties are captured in a data variable’s data and axes and attributes, respectively.
As things are now, I am concerned about the conflation of the two terms and use. The opening paragraph from section 7.3:
The first line contains both terms “field” and “variable” without any obvious connection between the two (for lack of a proper definition of both, also noting that “variable” without any specification is ambiguous). Further on in the paragraph there are terms “data values”, “time means” and “time dimension variable”. We all know what it means when we read it, but it is not very precise or unambiguous. I’d rewrite this paragraph as follows (assuming that both “field” and “data variable” have been defined along the lines of the above; changes in bold):
There are 14 (or so) other references to "field" that might need similar sharpening. Is that something that we should take up?
Beta Was this translation helpful? Give feedback.
All reactions