-
Notifications
You must be signed in to change notification settings - Fork 4
first internal draft for the SIA -> DAP transition. Includes SIA2 upgrades #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For future editing / reviewing clarity, I'd like to suggest that we get the existing approved errata to SIAv2 merged into |
changing the date
Adding Gregory Dubois-Felsmann as author and editor
What is the status of this PullRequest? It seems to be the basis of any editing to be started in DAP. |
I will try to look at the PR today. I just finished work on a prototype DAP
implementation that I'll share shortly as well.
--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada
…On Thu, 10 Oct 2024 at 07:01, Grégory Mantelet ***@***.***> wrote:
What is the status of this PullRequest? It seems to be the basis of any
editing to be started in DAP.
Are all editors OK with it? If yes, should it approved and merged?
—
Reply to this email directly, view it on GitHub
<#3 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADIEM7X2FHMHEOPNOLJSOIDZ22CB5AVCNFSM6AAAAAAYFV6QQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBVGE3TSNZTGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of very long lines in the original so the diffs are somewhat hard to evaluate. Fixing that will be impossible to evaluate so I will do that in a separate PR that only fixes line folding.
@@ -328,6 +365,9 @@ \subsubsection{POL} | |||
|
|||
The POL parameter constrains values of the pol\_states column of the ObsCore data model; possible values for the POL parameter are also defined by ObsCore. | |||
|
|||
This parameter is case insensitive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that this really helps anyone. The set of allowed pol states is well known/documented and uses specific case (upper case). This probably just means clients can be careless, services have to convert the case of values, and there is another way for people to get things wrong.
The ID parameter is a string-valued parameter that specifies the identifier of dataset(s). Values of the ID parameter are compared to the obs\_publisher\_did column of the ObsCore data model. Note that IVOIDs MUST be compared case-insensitively. As publisher dataset identifiers in the VO generally are IVOIDs, implementations will usually have to use case-insensitive comparisons here. | ||
|
||
The ID parameter is a string-valued parameter that specifies the identifier of dataset(s). Values of the ID parameter are compared to the obs\_publisher\_did column of the ObsCore data model. Note that IVOIDs MUST be compared case-insensitively. As publisher dataset identifiers in the VO generally are IVOIDs, implementations will usually have to use case-insensitive comparisons here. When wildcarding of the end of the ID is needed the expression "extensionof ivo://bla" SHOULD be used. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, from a quick, search of the doc, that this is the only wildcard searching. I think this is something where we want to discuss a more general solution.
Here I also don't like case-insensitive but that's not entirely in our control. What if an implementor has some other URI scheme (not ivo
) -- are those also case-insensitive or just if the param value is an ivo
URI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some doubts : should we add it also for COLLECTION/INSTRUMENT/FACILITY ?
For the solution, the classical "*" seemed to introduce encoding issues. So I imagined this dirty fix. Ready to reconsider that.
|
||
|
||
The DPTYPE parameter is a string-valued parameter that specifies the type of data. This parameter is case-insensitive. The value is compared with the dataproduct\_type from the ObsCore data model. In contrast to SIA2.0 which allows only dataproduct\_types \textit{image} and \textit{cube}, all the other dataproduct\_types are allowed, in such a way that we can constrain the service to retrieve visibility data, timeseries and event lists. Using DPTYPE with the "spectrum" value can be considered as an upgrade of the SSA protocol. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the accepted values come from ObsCore (or the DataProductType vocabulary) I don't really see what case-insensitive adds here. It's harder to implement and will likely lead to inconsistent implementations. It will get even worse if/when the vocabulary of terms has mixed case terms in it.
DAP.tex
Outdated
The RELEASEDATE parameter specifies the range of release dates to be searched for data. | ||
The limits are compared to the obs\_release\_date optional attribute of the ObsCore data model. | ||
They are expressed as 2 ISO 8601 dates in the general case. A single value is searched for an exact match. | ||
As the obs\_release\_date attribute is optional, the service self description (\ref{sec:selfdesc}) informs the user of the availability of this parameter. | ||
RELEASEDATE queries for services not providing the release\_date attribute SHOULD provide an empty response. | ||
|
||
|
||
|
||
\subsubsection{RETRIEVEMODE (2 solutions)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been better to keep the SIA->DAP transition changes separate from upgrades like this. The main issue I see is that implementation is going to be complex for anyone who relies on DAP -> DataLink to handle 1 product to N files cardinality... if that leads to "only some simple DAP services implement this" then that will be bad.
Prefer: extract this from PR and create an issue to discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont' think it's the only change independant from SIA-> DAP transition. For example the introduction of RELEASEDATe and MOCs are others.
About the discussion on this point I hope to load today on the appropriate the Sydney discussion as I said last friday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used an audio transcription on line facility and revisited it to assign to participants their names. Its' there https://wiki.ivoa.net/internal/IVOA/InterOpMay2024DAL/DAL-2-notes or on the DAL sessions page of the last interop. Discussion was very rich. I see 4 points.
- initial controversy : Direct access to cutout using cutout (SODA) full URL in access_url or SODA service descriptors.
- if cutout direct computation is chosen in combination with DataLink, is "this" item the cutout or the whole image ?
- can we reintroduce with that change the different "INTERSECT" modes we had in SIAV1 instead of the unique OVERLAP mode. Specially the COVERS mode where the full ROI is included in the image.
- When we force one output format for the datasets with the FORMAT= parameter how can this appear in combination with DataLink usage because access_format will contain "datalink" and not the format which has been chosen.
I personnaly think for the first controversy there is none : we can have the citout retrievmode forcing a soda access_url and the SODA service descriptor in the response for intercation with the service at the same time.
For the three other items I am going to write issues
|
||
|
||
The DALI UPLOAD parameter is not used by this version of DAP. The use case of uploading lists of coordinates is covered by the multiple-valued parameters values. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue created to discuss UPLOAD
Thanks Pat for looking into this 🙂
|
I totally agree that we should not introduce more case-insensitivity
than we are already afflicted with: It tends to extend to large
amounts of pain without really helping anyone. If you are not
convinced, see the hoops RegTAP has to jump through.
And it's infectious, as evinced by:
On Thu, Oct 10, 2024 at 10:13:09AM -0700, Patrick Dowler wrote:
> to the obs\_publisher\_did column of the ObsCore data model. Note
> that IVOIDs MUST be compared case-insensitively. As publisher
> dataset identifiers in the VO generally are IVOIDs, implementations
> will usually have to use case-insensitive comparisons here. When
Here I also don't like case-insensitive but that's not entirely in
our control. What if an implementor has some other URI scheme (not
`ivo`) -- are those also case-insensitive or just if the param
value is an `ivo` URI?
URI comparisons are a can of worms, as the URI specification itself
mandates a few parts to be case-insensitive (the scheme and the
authority ("host" by RFC 3986), and %-escapes on top).
Me, I'd say we should side-step all of that and say "on ingestion,
publisher DIDs must be lowercased, while ID is compared
case-sensitively; this means that in DAP, pubDIDs differing only in
case are impossible and that ID parameters containing uppercase
letters will never match. Implementations are advised to raise
warnings when uppercase letters are used in values passed into ID."
Even that has a lot of holes. For one, we still don't have an
interoperable mechanism to communicate warnings (I've lobbied for
that for more than a decade, and perhaps this is the time to tackle
that). And then lowercasing arguably *should* happen before
percent-encoding, such that a %41 would become into a %61. I'd say,
however, that whoever percent-encodes ASCII or uses case-sensitive
non-ASCII in ivoids richly deserves interoperabilty problems, and
there is little we can do to help them.
Take this as another instance of: "case insensitive" is written
easily but almost impossible to do.
They are expressed as 2 ISO 8601 dates in the general case. A
single value is searched for an exact match.
While I'm writing: Let's reference DALI for the time format instead of
ISO 8601, which is another extremely painful standard (that's not
even publicly available).
+\subsubsection{RETRIEVEMODE (2 solutions)}
**Prefer:** extract this from PR and create an issue to discuss further.
+1
|
About case insensitive parameters DALI-1.1 establishes clearly that parameter names are case-insensitive. For the parameter values are case sensitive, unless the contrary is specified in the document. That being said, one of the first issue raised by the P3T is the case-insensitive parameters. The P3T argues all parameters should be case-sensitive, in order to avoid ambiguous behaviors and errors on both sides (client and server). Personally, I also think that all parameter names should be case-sensitive. For the parameters values, this is up to the standard defining them, but the general rule probably tends toward case-sensitive as well. About IVO IDs If I read IVOA Identifiers 2.0:
According to me, the recommandation for standard editors should be to write the parts of IVOIDs being compared case-insensitively (e.g. scheme, host) in lower-case. Parameters of course should be written with the appropriate case-folding (as said above). Implementers should be encouraged to do so as well. |
About DALI saying that parameters are case-INsensitive, I kind of remember that the plan of the P3T is to make this statement change: parameters should be case-sensitive. But let's see how hard it will be to change this major point in DALI (2.0?). |
About introducing case insensitive Parameter values. Moreover I think PyVO or Python programmers were complaining about that For me case sensitiveness was natural, but facing the demand I tried to gigure out where it could be avoided This is the list of string valued parameters MOC --> I think the ascii syntax is case sensitive In conclusion, I don't see any problem to force everything to be case sensitive except the ID (for reasons already explained by you above) but this should be rediscussed with PyVO and P3T people. |
For the ID param, I think advice to use lower case for the scheme and authority is the best way to reduce pain in the long term. I could/would certainly enforce that in CADC software. After that, I don't think we can/should do any conversion of the path, query, etc. because those are often strings that need to be consistent with other values in use and we can't foresee the consequences. For CADC, the name of the collection and the observations within it are not really in our control, although we can/do give data providers advice on how to avoid pain. In addition, wouldn't the use case of ID parameter to be to assign a value you found somewhere else? In that case, you just use the value as you found it and it should work... that has been my experience anyway. |
For any string params where values are a vocabulary, being case insensitive and letting callers ignore the known vocabulary doesn't buy us any value, so those are pretty easy. COLLECTION , FACILITY and INSTRUMENT : the service self-descriptor can list the finite set of local values and callers can use them as is, so case-insenstive doesn't seem as valuable as self-descriptor. In addition, there is work on a facility vocabulary and maybe we'd want to accept those values... but I foresee some pain if the facility vocab uses a form that doesn't match what the data provider uses internally (to be discussed in the context of that vocabulary). TARGET : this is a tough one because self-descriptor isn't really viable for listing all acceptable values, data providers aren't very consistent, and lots of objects have multiple names. I don't know where responsibility for making this work best for users really belongs... |
On Tue, Oct 15, 2024 at 10:22:50AM -0700, Patrick Dowler wrote:
For the ID param, I think advice to use lower case for the scheme
and authority is the best way to reduce pain in the long term. I
"Advice" probably is not strong enough if we want this to work
without requiring manipulation on incoming IDs (or case-insensitive
matching, or ignoring URI rules). I guess DAP will have to say
something like: "Clients MUST lowercase schema and authority before
passing IDs to services, and services MUST lowercase schema and
authority before ingestion" or so. Yes, this means URI parsing, but
at least there's no need for URI parsing in the services at query
time.
could/would certainly enforce that in CADC software. After that, I
don't think we can/should do any conversion of the path, query,
etc. because those are often strings that need to be consistent
with other values in use and we can't foresee the consequences. For
I won't belabour that point, but as a matter of experience I can say
that whenever I did not treat the identifiers as opaque strings that
only have meaning as primary keys into some table, I've ended up
regretting it. Sure, it can save you a database query or two, but
there *is* a price tag on peeking inside of them.
Hence, I'd still prefer if we lowercased the whole thing and left
translation to whatever you would otherwise parse out of the
identifier to database tables, but I can live with just normalising
the parts that RFC 3986 forces us to normalise, too.
In addition, wouldn't the use case of ID parameter to be to assign
a value you found somewhere else? In that case, you just use the
value as you found it and it should work... that has been my
experience anyway.
True. But then let's be defensive, because some component might
unwisely do URI "normalisation", where we can't expect global
consensus on what "normal" might be. Clear rules for how ID is
supposed to work will help guide bother clients and servers to not
exploit the flexibility that generic URIs have.
Oh, talking about normalisation: Do we want to specify that path
components like /./ or foo/../bar are forbidden? Or do we just say
"No path normalisations are admitted"?
|
I think there is nothing good to have such path components. If you say, as registry expert, that it is already not that easy to deal with IVOIDs, then, I would say that complicating them even more does not sound like a good idea. So, my opinion is that they should be forbidden. If we have to say that somewhere, where should it be? My guess would be in the IVOA Identifiers document. |
On Thu, Oct 17, 2024 at 05:17:12AM -0700, Grégory Mantelet wrote:
If we have to say that somewhere, where should it be? My guess
would be in the IVOA Identifiers document.
Oh... good that you mention it. The author of Identifiers 2 clearly
had a bit more brain than me and already wrote in
https://ivoa.net/documents/IVOAIdentifiers/20160523/REC-Identifiers-2.0.html#tth_sEc2.6:
As no hierarchy is implied in any IVOID part, no path segment
normalization is ever performed on IVOIDs.
Similarly, my old nemesis, the percent-encoding is also treated
there. So, I suppose it's enough to reference Identifiers and
mandate lowercasing of schema, authority, and path; the latter is
because I back then had to be backwards-compatible and require:
In addition to scheme and authority as in RFC 3986, in IVOIDs the
resource key is also compared case-insensitively. This means that
Registry references can be case-folded for processing.
But I claim that shouldn't be a problem; potentially case-sensitive
file paths would only turn up in the query part of the PubDID (or
CreatorDID, for that matter). The rest just identifies the
originating resource, and there, DAP cannot mandate
case-sensitiveness.
On the other hand, that's cool, because DAP id normalisation then is
a simple string operation:
id, query = pubdid.split("?", 1)
normalised = "{}?{}".format(id.lower, query)
…-- Markus
|
I created another Pull request (#12) on the main in order to follow Gregory DF suggestion to start DAP discussion on top of SIA version fixed by errata. I also added him as Author/editor and upgraded the title |
Cutting up long lines and commenting RETRIEVEMODE
date
As I wrote 2 weeks ago I created another pull request #12 which is SIA2 + mofied title and authors/editors + 2 SIA errata. By the way the CI exceution are now failing. Some library changed and makes the old CI Obsolete. Some you probably know what to do to solve this, because it surely happened for other specs. For the current PR, I made the following changes in my fork :
I wish you all good ADASS and IVOA meetings |
This PR to show changes proposals for the transition between SIA2 to DAP
Includes also specific SIA2 changes as in https://github.com/ivoa-std/SIA/ , PR #23
Doesn't include #PR 22 yet
Corresponding internal draft is here :
https://wiki.ivoa.net/internal/IVOA/SIAP-2_0-Next/DAP.pdf