-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Assume that TESK is deployed with a config file that sets the output endpoint to some s3 instance in http or https format...
[default]
endpoint_url=http://some.endpoint.com
Then in the job json the "url" for outputs is set to the following:
# This works!
"url": "s3://output",
The s3 schema means "output" gets treated as the bucket name.
# These all fail!
"url": "s3://outputs3",
"url": "s3://s3output",
# Here a less contrived example to show how this can happen when you don't even intentionally use "s3" to mean "s3"
"url": "s3://shoulders3486output",
The s3 schema is detected but because the bucket name also contains "s3" it falsely triggers this regex:
tesk-core/src/tesk_core/filer_s3.py
Line 64 in 1a7b810
match = re.search('^([^.]+).s3', self.netloc) |
Which mangles the bucket name leading to a bucket not found error.
But we can trick it...
# This works!
"url": "http://s3.foo.bar.baz/shoulders3486output",
HTTP is detected as the schema, but the netloc part of the url contains "s3" so it is treated as s3 due to this logic:
tesk-core/src/tesk_core/filer.py
Lines 416 to 417 in 1a7b810
if 's3' in netloc: | |
return S3Transput |
The bucket name is now part of the URL "path" not the URL "netloc", so it doesn't get mangled.
s3.foo.bar.baz
the netloc part is never actually used other than detecting if it's an s3 transfer or http transfer.