first internal draft for the SIA -> DAP transition. Includes SIA2 upgrades #3

Bonnarel · 2023-05-17T22:12:11Z

This PR to show changes proposals for the transition between SIA2 to DAP
Includes also specific SIA2 changes as in https://github.com/ivoa-std/SIA/ , PR #23
Doesn't include #PR 22 yet

Corresponding internal draft is here :

https://wiki.ivoa.net/internal/IVOA/SIAP-2_0-Next/DAP.pdf

…rades

gpdf · 2024-05-21T18:02:17Z

For future editing / reviewing clarity, I'd like to suggest that we get the existing approved errata to SIAv2 merged into main for DAP as separate commits, and then rebase this PR accordingly.

changing the date

Adding Gregory Dubois-Felsmann as author and editor

gmantele · 2024-10-10T14:00:38Z

What is the status of this PullRequest? It seems to be the basis of any editing to be started in DAP.
Are all editors OK with it? If yes, should it approved and merged?

pdowler · 2024-10-10T15:43:10Z

I will try to look at the PR today. I just finished work on a prototype DAP implementation that I'll share shortly as well. -- Patrick Dowler Canadian Astronomy Data Centre Victoria, BC, Canada

…

On Thu, 10 Oct 2024 at 07:01, Grégory Mantelet ***@***.***> wrote: What is the status of this PullRequest? It seems to be the basis of any editing to be started in DAP. Are all editors OK with it? If yes, should it approved and merged? — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADIEM7X2FHMHEOPNOLJSOIDZ22CB5AVCNFSM6AAAAAAYFV6QQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBVGE3TSNZTGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

pdowler

There are a lot of very long lines in the original so the diffs are somewhat hard to evaluate. Fixing that will be impossible to evaluate so I will do that in a separate PR that only fixes line folding.

pdowler · 2024-10-10T16:47:28Z

DAP.tex

@@ -328,6 +365,9 @@ \subsubsection{POL}

 The POL parameter constrains values of the pol\_states column of the ObsCore data model; possible values for the POL parameter are also defined by ObsCore.

+This parameter is case insensitive.


Not sure that this really helps anyone. The set of allowed pol states is well known/documented and uses specific case (upper case). This probably just means clients can be careless, services have to convert the case of values, and there is another way for people to get things wrong.

pdowler · 2024-10-10T16:57:29Z

DAP.tex

-The ID parameter is a string-valued parameter that specifies the identifier of dataset(s). Values of the ID parameter are compared to the obs\_publisher\_did column of the ObsCore data model. Note that IVOIDs MUST be compared case-insensitively. As publisher dataset identifiers in the VO generally are IVOIDs, implementations will usually have to use case-insensitive comparisons here.
+
+The ID parameter is a string-valued parameter that specifies the identifier of dataset(s). Values of the ID parameter are compared to the obs\_publisher\_did column of the ObsCore data model. Note that IVOIDs MUST be compared case-insensitively. As publisher dataset identifiers in the VO generally are IVOIDs, implementations will usually have to use case-insensitive comparisons here. When wildcarding of the end of the  ID is needed the expression "extensionof ivo://bla" SHOULD be used.
+


I believe, from a quick, search of the doc, that this is the only wildcard searching. I think this is something where we want to discuss a more general solution.

Here I also don't like case-insensitive but that's not entirely in our control. What if an implementor has some other URI scheme (not ivo) -- are those also case-insensitive or just if the param value is an ivo URI?

I had some doubts : should we add it also for COLLECTION/INSTRUMENT/FACILITY ?

For the solution, the classical "*" seemed to introduce encoding issues. So I imagined this dirty fix. Ready to reconsider that.

pdowler · 2024-10-10T17:01:53Z

DAP.tex

+
+
+The DPTYPE parameter is a string-valued parameter that specifies the type of data. This parameter is case-insensitive. The value is compared with the dataproduct\_type from the ObsCore data model.  In contrast to SIA2.0 which allows only dataproduct\_types  \textit{image} and \textit{cube}, all the other dataproduct\_types are allowed, in such a way that we can constrain the service to retrieve visibility data, timeseries and event lists. Using DPTYPE with the "spectrum" value can be considered as an upgrade of the SSA protocol.
+


Since the accepted values come from ObsCore (or the DataProductType vocabulary) I don't really see what case-insensitive adds here. It's harder to implement and will likely lead to inconsistent implementations. It will get even worse if/when the vocabulary of terms has mixed case terms in it.

pdowler · 2024-10-10T17:08:27Z

DAP.tex

 The RELEASEDATE parameter specifies the range of release dates to be searched for data. 
 The limits are compared to the obs\_release\_date optional attribute of the ObsCore data model. 
 They are expressed as 2 ISO 8601 dates in the general case. A single value is searched for an exact match.
 As the obs\_release\_date attribute is optional, the service self description (\ref{sec:selfdesc})  informs the user of the availability of this parameter.
 RELEASEDATE queries for services not providing the release\_date attribute SHOULD provide an empty response.

+
+
+\subsubsection{RETRIEVEMODE (2 solutions)}


It would have been better to keep the SIA->DAP transition changes separate from upgrades like this. The main issue I see is that implementation is going to be complex for anyone who relies on DAP -> DataLink to handle 1 product to N files cardinality... if that leads to "only some simple DAP services implement this" then that will be bad.

Prefer: extract this from PR and create an issue to discuss further.

I dont' think it's the only change independant from SIA-> DAP transition. For example the introduction of RELEASEDATe and MOCs are others.

About the discussion on this point I hope to load today on the appropriate the Sydney discussion as I said last friday.

I used an audio transcription on line facility and revisited it to assign to participants their names. Its' there https://wiki.ivoa.net/internal/IVOA/InterOpMay2024DAL/DAL-2-notes or on the DAL sessions page of the last interop. Discussion was very rich. I see 4 points.

initial controversy : Direct access to cutout using cutout (SODA) full URL in access_url or SODA service descriptors.

if cutout direct computation is chosen in combination with DataLink, is "this" item the cutout or the whole image ?

can we reintroduce with that change the different "INTERSECT" modes we had in SIAV1 instead of the unique OVERLAP mode. Specially the COVERS mode where the full ROI is included in the image.

When we force one output format for the datasets with the FORMAT= parameter how can this appear in combination with DataLink usage because access_format will contain "datalink" and not the format which has been chosen.

I personnaly think for the first controversy there is none : we can have the citout retrievmode forcing a soda access_url and the SODA service descriptor in the response for intercation with the service at the same time.

For the three other items I am going to write issues

pdowler · 2024-10-10T17:10:21Z

DAP.tex

+
+
+ The DALI UPLOAD parameter is not used by this version of DAP. The use case of uploading lists of coordinates is covered by the multiple-valued parameters values.
+


issue created to discuss UPLOAD

gmantele · 2024-10-11T07:57:15Z

Thanks Pat for looking into this 🙂

I will try to look at the PR today. I just finished work on a prototype DAP implementation that I'll share shortly as well. -- Patrick Dowler Canadian Astronomy Data Centre Victoria, BC, Canada

msdemlei · 2024-10-11T08:14:58Z

I totally agree that we should not introduce more case-insensitivity than we are already afflicted with: It tends to extend to large amounts of pain without really helping anyone. If you are not convinced, see the hoops RegTAP has to jump through. And it's infectious, as evinced by:

On Thu, Oct 10, 2024 at 10:13:09AM -0700, Patrick Dowler wrote: > to the obs\_publisher\_did column of the ObsCore data model. Note > that IVOIDs MUST be compared case-insensitively. As publisher > dataset identifiers in the VO generally are IVOIDs, implementations > will usually have to use case-insensitive comparisons here. When Here I also don't like case-insensitive but that's not entirely in our control. What if an implementor has some other URI scheme (not `ivo`) -- are those also case-insensitive or just if the param value is an `ivo` URI?

URI comparisons are a can of worms, as the URI specification itself mandates a few parts to be case-insensitive (the scheme and the authority ("host" by RFC 3986), and %-escapes on top). Me, I'd say we should side-step all of that and say "on ingestion, publisher DIDs must be lowercased, while ID is compared case-sensitively; this means that in DAP, pubDIDs differing only in case are impossible and that ID parameters containing uppercase letters will never match. Implementations are advised to raise warnings when uppercase letters are used in values passed into ID." Even that has a lot of holes. For one, we still don't have an interoperable mechanism to communicate warnings (I've lobbied for that for more than a decade, and perhaps this is the time to tackle that). And then lowercasing arguably *should* happen before percent-encoding, such that a %41 would become into a %61. I'd say, however, that whoever percent-encodes ASCII or uses case-sensitive non-ASCII in ivoids richly deserves interoperabilty problems, and there is little we can do to help them. Take this as another instance of: "case insensitive" is written easily but almost impossible to do.

They are expressed as 2 ISO 8601 dates in the general case. A single value is searched for an exact match.

While I'm writing: Let's reference DALI for the time format instead of ISO 8601, which is another extremely painful standard (that's not even publicly available).

+\subsubsection{RETRIEVEMODE (2 solutions)} **Prefer:** extract this from PR and create an issue to discuss further.

+1

gmantele · 2024-10-11T09:06:58Z

About case insensitive parameters

DALI-1.1 establishes clearly that parameter names are case-insensitive. For the parameter values are case sensitive, unless the contrary is specified in the document.

That being said, one of the first issue raised by the P3T is the case-insensitive parameters. The P3T argues all parameters should be case-sensitive, in order to avoid ambiguous behaviors and errors on both sides (client and server).

Personally, I also think that all parameter names should be case-sensitive. For the parameters values, this is up to the standard defining them, but the general rule probably tends toward case-sensitive as well.

About IVO IDs

If I read IVOA Identifiers 2.0:

In addition to scheme and authority as in RFC 3986, in IVOIDs the resource key is also compared case-insensitively. This means that Registry references can be case-folded for processing.
[...]
For instance, given the IVOID
ivo://example.com/res/key1?par=U%20Pic#Part1,
the IVOID
IVO://EXAMPLE.COM/RES/KEY1?par=U%20Pic#Part1
must compare equal

According to me, the recommandation for standard editors should be to write the parts of IVOIDs being compared case-insensitively (e.g. scheme, host) in lower-case. Parameters of course should be written with the appropriate case-folding (as said above). Implementers should be encouraged to do so as well.

gmantele · 2024-10-11T09:08:38Z

About DALI saying that parameters are case-INsensitive, I kind of remember that the plan of the P3T is to make this statement change: parameters should be case-sensitive. But let's see how hard it will be to change this major point in DALI (2.0?).

Bonnarel · 2024-10-14T13:30:59Z

About introducing case insensitive Parameter values.
Before P3T, the initial demand came from Adrian Damian (see my slides at May 2020 Interop : https://wiki.ivoa.net/internal/IVOA/InterOpMay2020DAL/SIA2-SODA-nextPyVO-support.pdf and discussion https://wiki.ivoa.net/internal/IVOA/InterOpMay2020DAL/IVOA_Virtual2020_sia2sodapy.txt)

Moreover I think PyVO or Python programmers were complaining about that

For me case sensitiveness was natural, but facing the demand I tried to gigure out where it could be avoided

This is the list of string valued parameters

MOC --> I think the ascii syntax is case sensitive
POL --> Case sensitive was not a must because there seems to be no ambiguity.
ID --> I agree with the comments above
COLLECTION , FACILITY and INSTRUMENT : It seemed diificult to allow case insensitive because these names are strongly dependant from data providers and capital or small letters may be important to them.
DPTYPE : seemed to be possible to relax the case sensitive constraint
TARGET : absolutly impossible to relax there.
FORMAT : seemed to be possible to relax there
RELEASEDATE : Is ISO8601 string. I don't know if capital letters for T and Z are mandatory.

In conclusion, I don't see any problem to force everything to be case sensitive except the ID (for reasons already explained by you above) but this should be rediscussed with PyVO and P3T people.

pdowler · 2024-10-15T17:22:29Z

For the ID param, I think advice to use lower case for the scheme and authority is the best way to reduce pain in the long term. I could/would certainly enforce that in CADC software. After that, I don't think we can/should do any conversion of the path, query, etc. because those are often strings that need to be consistent with other values in use and we can't foresee the consequences. For CADC, the name of the collection and the observations within it are not really in our control, although we can/do give data providers advice on how to avoid pain.

In addition, wouldn't the use case of ID parameter to be to assign a value you found somewhere else? In that case, you just use the value as you found it and it should work... that has been my experience anyway.

pdowler · 2024-10-15T17:29:23Z

For any string params where values are a vocabulary, being case insensitive and letting callers ignore the known vocabulary doesn't buy us any value, so those are pretty easy.

COLLECTION , FACILITY and INSTRUMENT : the service self-descriptor can list the finite set of local values and callers can use them as is, so case-insenstive doesn't seem as valuable as self-descriptor. In addition, there is work on a facility vocabulary and maybe we'd want to accept those values... but I foresee some pain if the facility vocab uses a form that doesn't match what the data provider uses internally (to be discussed in the context of that vocabulary).

TARGET : this is a tough one because self-descriptor isn't really viable for listing all acceptable values, data providers aren't very consistent, and lots of objects have multiple names. I don't know where responsibility for making this work best for users really belongs...

msdemlei · 2024-10-16T08:11:53Z

On Tue, Oct 15, 2024 at 10:22:50AM -0700, Patrick Dowler wrote: For the ID param, I think advice to use lower case for the scheme and authority is the best way to reduce pain in the long term. I

"Advice" probably is not strong enough if we want this to work without requiring manipulation on incoming IDs (or case-insensitive matching, or ignoring URI rules). I guess DAP will have to say something like: "Clients MUST lowercase schema and authority before passing IDs to services, and services MUST lowercase schema and authority before ingestion" or so. Yes, this means URI parsing, but at least there's no need for URI parsing in the services at query time.

could/would certainly enforce that in CADC software. After that, I don't think we can/should do any conversion of the path, query, etc. because those are often strings that need to be consistent with other values in use and we can't foresee the consequences. For

I won't belabour that point, but as a matter of experience I can say that whenever I did not treat the identifiers as opaque strings that only have meaning as primary keys into some table, I've ended up regretting it. Sure, it can save you a database query or two, but there *is* a price tag on peeking inside of them. Hence, I'd still prefer if we lowercased the whole thing and left translation to whatever you would otherwise parse out of the identifier to database tables, but I can live with just normalising the parts that RFC 3986 forces us to normalise, too.

In addition, wouldn't the use case of ID parameter to be to assign a value you found somewhere else? In that case, you just use the value as you found it and it should work... that has been my experience anyway.

True. But then let's be defensive, because some component might unwisely do URI "normalisation", where we can't expect global consensus on what "normal" might be. Clear rules for how ID is supposed to work will help guide bother clients and servers to not exploit the flexibility that generic URIs have. Oh, talking about normalisation: Do we want to specify that path components like /./ or foo/../bar are forbidden? Or do we just say "No path normalisations are admitted"?

gmantele · 2024-10-17T12:16:49Z

Oh, talking about normalisation: Do we want to specify that path components like /./ or foo/../bar are forbidden? Or do we just say "No path normalisations are admitted"?

I think there is nothing good to have such path components. If you say, as registry expert, that it is already not that easy to deal with IVOIDs, then, I would say that complicating them even more does not sound like a good idea. So, my opinion is that they should be forbidden.

If we have to say that somewhere, where should it be? My guess would be in the IVOA Identifiers document.

msdemlei · 2024-10-17T13:00:31Z

On Thu, Oct 17, 2024 at 05:17:12AM -0700, Grégory Mantelet wrote: If we have to say that somewhere, where should it be? My guess would be in the IVOA Identifiers document.

Oh... good that you mention it. The author of Identifiers 2 clearly had a bit more brain than me and already wrote in https://ivoa.net/documents/IVOAIdentifiers/20160523/REC-Identifiers-2.0.html#tth_sEc2.6: As no hierarchy is implied in any IVOID part, no path segment normalization is ever performed on IVOIDs. Similarly, my old nemesis, the percent-encoding is also treated there. So, I suppose it's enough to reference Identifiers and mandate lowercasing of schema, authority, and path; the latter is because I back then had to be backwards-compatible and require: In addition to scheme and authority as in RFC 3986, in IVOIDs the resource key is also compared case-insensitively. This means that Registry references can be case-folded for processing. But I claim that shouldn't be a problem; potentially case-sensitive file paths would only turn up in the query part of the PubDID (or CreatorDID, for that matter). The rest just identifies the originating resource, and there, DAP cannot mandate case-sensitiveness. On the other hand, that's cool, because DAP id normalisation then is a simple string operation: id, query = pubdid.split("?", 1) normalised = "{}?{}".format(id.lower, query)

…

-- Markus

Bonnarel · 2024-10-24T17:01:03Z

I created another Pull request (#12) on the main in order to follow Gregory DF suggestion to start DAP discussion on top of SIA version fixed by errata. I also added him as Author/editor and upgraded the title
Pat can you review and merge that one before we go on, on the main one ?
I plan to make some of the changes asked in the last two weeks today.

Cutting up long lines and commenting RETRIEVEMODE

date

Bonnarel · 2024-11-08T21:05:13Z

As I wrote 2 weeks ago I created another pull request #12 which is SIA2 + mofied title and authors/editors + 2 SIA errata.
It is the required ground level to figure out the changes we discuss to move to DAP.
Pat , please can you merge that #12 one ?

By the way the CI exceution are now failing. Some library changed and makes the old CI Obsolete. Some you probably know what to do to solve this, because it surely happened for other specs.

For the current PR, I made the following changes in my fork :

+  Cut the lines in the latex source  which were too long
+ Comment out the RETRIEVEMODE section in the source. I didn't remove it because, contrary to Markus I'm not confident that going back to a previous githuIb version will be done. I did in such a way that we can still discuss and decomment / modify to go to the final solution later
 + For FORMAT I made a more self-consistent change but did not remove it. Fill free to do it if this is the final decision
 + for case sensitiveness, I don't care. Fill free to do what is consistent with DALI, PyVO and P3T

I wish you all good ADASS and IVOA meetings

BONNAREL FRANCOIS added 2 commits May 18, 2023 00:03

first internal draft for the SIA -> DAP transition. Includes SIA2 upg…

58bff03

…rades

changing the Makefile

cf8ee7d

This was referenced May 21, 2024

extension of SIA-style protocol usage outside the image/cube "camp" ivoa-std/SIA#10

Open

2023 and before 1 changes ivoa-std/SIA#23

Draft

Bonnarel added 2 commits May 22, 2024 21:36

Update Makefile

d85bbc0

changing the date

Update DAP.tex

6add96d

Adding Gregory Dubois-Felsmann as author and editor

pdowler requested changes Oct 10, 2024

View reviewed changes

Bonnarel added 4 commits October 28, 2024 21:29

Update DAP.tex

e953aa2

Update DAP.tex

281d48c

Cutting up long lines and commenting RETRIEVEMODE

Changing FORMAT parameter text

edb2d6a

Update Makefile

f1bb805

date

		@@ -328,6 +365,9 @@ \subsubsection{POL}

		The POL parameter constrains values of the pol\_states column of the ObsCore data model; possible values for the POL parameter are also defined by ObsCore.

		This parameter is case insensitive.

		The ID parameter is a string-valued parameter that specifies the identifier of dataset(s). Values of the ID parameter are compared to the obs\_publisher\_did column of the ObsCore data model. Note that IVOIDs MUST be compared case-insensitively. As publisher dataset identifiers in the VO generally are IVOIDs, implementations will usually have to use case-insensitive comparisons here.

		The ID parameter is a string-valued parameter that specifies the identifier of dataset(s). Values of the ID parameter are compared to the obs\_publisher\_did column of the ObsCore data model. Note that IVOIDs MUST be compared case-insensitively. As publisher dataset identifiers in the VO generally are IVOIDs, implementations will usually have to use case-insensitive comparisons here. When wildcarding of the end of the ID is needed the expression "extensionof ivo://bla" SHOULD be used.



		The DPTYPE parameter is a string-valued parameter that specifies the type of data. This parameter is case-insensitive. The value is compared with the dataproduct\_type from the ObsCore data model. In contrast to SIA2.0 which allows only dataproduct\_types \textit{image} and \textit{cube}, all the other dataproduct\_types are allowed, in such a way that we can constrain the service to retrieve visibility data, timeseries and event lists. Using DPTYPE with the "spectrum" value can be considered as an upgrade of the SSA protocol.



		The DALI UPLOAD parameter is not used by this version of DAP. The use case of uploading lists of coordinates is covered by the multiple-valued parameters values.

first internal draft for the SIA -> DAP transition. Includes SIA2 upgrades #3

Are you sure you want to change the base?

first internal draft for the SIA -> DAP transition. Includes SIA2 upgrades #3

Uh oh!

Conversation

Bonnarel commented May 17, 2023

Uh oh!

gpdf commented May 21, 2024

Uh oh!

gmantele commented Oct 10, 2024

Uh oh!

pdowler commented Oct 10, 2024 via email

Uh oh!

pdowler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmantele commented Oct 11, 2024

Uh oh!

msdemlei commented Oct 11, 2024 via email

Uh oh!

gmantele commented Oct 11, 2024

Uh oh!

gmantele commented Oct 11, 2024

Uh oh!

Bonnarel commented Oct 14, 2024

Uh oh!

pdowler commented Oct 15, 2024

Uh oh!

pdowler commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msdemlei commented Oct 16, 2024 via email

Uh oh!

gmantele commented Oct 17, 2024

Uh oh!

msdemlei commented Oct 17, 2024 via email

Uh oh!

Bonnarel commented Oct 24, 2024

Uh oh!

Bonnarel commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pdowler commented Oct 15, 2024 •

edited

Loading

Bonnarel commented Nov 8, 2024 •

edited

Loading