Skip to content

Commit 5611c81

Browse files
Merge pull request #8 from sat-utils/develop
deploy
2 parents 3fb49a9 + 2c69d00 commit 5611c81

File tree

6 files changed

+135
-25
lines changed

6 files changed

+135
-25
lines changed

README.md

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,78 @@
11
# sat-stac-sentinel
22

3-
This is a Python repository for the creation of a STAC Sentinel catalog using the data and metadata files from s3://sentinel2-l1c/
3+
This is a repository used for the creation and maintenance of a [STAC](https://github.com/radiantearth/stac-spec) compliant [Sentinel catalog](https://sentinel-stac.s3.amazonaws.com/catalog.json) for data from the [Sentinel on AWS project](https://registry.opendata.aws/sentinel-2/) (located at s3://sentinel-s2-l1c/).
4+
5+
There are two pieces of this repository:
6+
7+
- A Python library (satstac.sentinel) and CLI containing functions for reading Sentinel metadata, transforming to STAC Items, and adding to the Sentinel catalog.
8+
- An AWS Lambda handler that accepts an SNS message containing the s3 URL for a new Sentinel scene, transforms it, and adds it to the catalog.
9+
10+
To create the Sentinel STAC catalog located at https://sentinel-stac.s3.amazonaws.com/catalog.json the sat-stac-sentinel CLI was used to create the initial catalog of historical data. The Lambda function is deployed and keeping the catalog up to date with new scenes.
11+
12+
## Installation
13+
14+
15+
16+
## Usage
17+
18+
A command line tool is available for ingesting the existing Sentinel data on s3 and creating/adding to a STAC catalog.
19+
20+
```bash
21+
$ sat-stac-sentinel -h
22+
usage: sat-stac-sentinel [-h] {ingest,inventory} ...
23+
24+
sat-stac-sentinel (v0.1.0)
25+
26+
positional arguments:
27+
{ingest,inventory}
28+
ingest Ingest records into catalog
29+
inventory Get latest inventory of tileInfo.json files
30+
31+
optional arguments:
32+
-h, --help show this help message and exit
33+
```
34+
35+
There are two available commands:
36+
37+
### `inventory`
38+
39+
This will fetch the latest inventory files from s3://sentinel-inventory/sentinel-s2-l1c/sentinel-s2-l1c-inventory and save to a local file. This isn't necessary, files can be directly ingested from the latest inventory files, but saving the file first allows it to be broken up and run with several jobs.
40+
41+
### `ingest`
42+
43+
This will ingest records either from a local inventory file or, if not provided, the latest bucket inventory files
44+
45+
46+
The `catalog` argument is the URL to the root catalog which contains a child collection called 'sentinel-2-l1c'. If the 'sentinel-2-l1c' Collection does not exist in the Catalog it will be added. In the case of the catalog maintained by this repo it is located at https://sentinel-stac.s3.amazonaws.com/catalog.json.
47+
48+
If `start` and/or `end` are provided the records are all scanned and only those meeting the date requirements are ingested.
49+
50+
51+
## Transforming Sentinel metadata to STAC
52+
53+
The data that is ingested by the sat-stac-sentinel CLI starts with
54+
55+
In addition to the inventories, an SNS message is published (arn:aws:sns:us-west-2:274514004127:NewSceneHTML) whenever a new `index.html` appears in the bucket. The sat-stac-sentinel Lambda function listens for this message to get the link of the s3 path with the new scene.
56+
57+
58+
59+
60+
## Development
61+
62+
The `master` branch is the latest versioned release, while the `develop` branch is the latest development version. When making a new release:
63+
64+
- Update the [version](satstac.sentinel.version.py)
65+
- Update [CHANGELOG.md](CHANGELOG.md)
66+
- Create PR and merge to master
67+
- Create new tag with the version and push to GitHub:
68+
69+
```bash
70+
$ git tag `<version>`
71+
$ git push origin `<version>`
72+
```
73+
74+
On a release (merge to `master`) CircleCI will package the Lambda code and deploy it to the production Lambda function that listens (via SNS) for new Sentinel scenes, creates STAC Items and adds them to the Catalog.
75+
476

577
## About
678
[sat-stac-sentinel](https://github.com/sat-utils/sat-stac-sentinel) was created by [Development Seed](<http://developmentseed.org>) and is part of a collection of tools called [sat-utils](https://github.com/sat-utils).

lambda/lambda_function.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ def lambda_handler(event, context):
3535
# transform to STAC
3636
item = transform(metadata)
3737
logger.info('Item: %s' % json.dumps(item.data))
38-
collection.add_item(item, path=SETTINGS['path_pattern'], filename=SETTINGS['fname_pattern'])
39-
logger.info('Added %s as %s' % (item, item.filename))
38+
#collection.add_item(item, path=SETTINGS['path_pattern'], filename=SETTINGS['fname_pattern'])
39+
#logger.info('Added %s as %s' % (item, item.filename))
4040
client.publish(TopicArn=sns_arn, Message=json.dumps(item.data))
4141
logger.info('Published to %s' % sns_arn)

satstac/sentinel/cli.py

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,16 @@ def parse_args(args):
3535
valid_date = lambda d: datetime.strptime(d, '%Y-%m-%d').date()
3636
parser.add_argument('--start', help='Start date of ingestion', default=None, type=valid_date)
3737
parser.add_argument('--end', help='End date of ingestion', default=None, type=valid_date)
38+
parser.add_argument('--prefix', help='Only ingest scenes with a path starting with prefix', default=None)
3839
parser.add_argument('--s3meta', help='Get metadata directly from S3 (requestor pays)', default=False, action='store_true')
40+
parser.add_argument('--filename', help='Inventory filename to use (default to fetch latest from bucket Inventory files)', default=None)
41+
parser.add_argument('--publish', help='ARN to publish new Items to', default=None)
3942

4043
# command 2
41-
#parser = subparsers.add_parser('cmd2', parents=[pparser], help='Command 2', formatter_class=dhf)
42-
# parser.add_argument()
44+
h = 'Get latest inventory of tileInfo.json files'
45+
parser = subparsers.add_parser('inventory', parents=[pparser], help=h, formatter_class=dhf)
46+
fout = str(datetime.now().date()) + '.csv'
47+
parser.add_argument('--filename', help='Filename to save', default=fout)
4348

4449
# turn Namespace into dictinary
4550
parsed_args = vars(parser0.parse_args(args))
@@ -54,7 +59,16 @@ def cli():
5459

5560
if cmd == 'ingest':
5661
cat = Catalog.open(args['catalog'])
57-
sentinel.add_items(cat, start_date=args['start'], end_date=args['end'], s3meta=args['s3meta'])
62+
if args['filename'] is not None:
63+
records = sentinel.read_inventory(args['filename'])
64+
else:
65+
records = sentinel.latest_inventory()
66+
sentinel.add_items(cat, records, start_date=args['start'], end_date=args['end'],
67+
prefix=args['prefix'], s3meta=args['s3meta'], publish=args['publish'])
68+
elif cmd == 'inventory':
69+
with open(args['filename'], 'w') as f:
70+
f.write('datetime,path\n')
71+
[f.write('%s,%s\n' % (i['datetime'], i['path'])) for i in sentinel.latest_inventory()]
5872

5973

6074
if __name__ == "__main__":

satstac/sentinel/main.py

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
}
3535

3636

37-
def add_items(catalog, start_date=None, end_date=None, s3meta=False):
37+
def add_items(catalog, records, start_date=None, end_date=None, s3meta=False, prefix=None, publish=None):
3838
""" Stream records to a collection with a transform function
3939
4040
Keyword arguments:
@@ -50,17 +50,26 @@ def add_items(catalog, start_date=None, end_date=None, s3meta=False):
5050
cols = {c.id: c for c in catalog.collections()}
5151
collection = cols['sentinel-2-l1c']
5252

53+
client = None
54+
if publish:
55+
parts = publish.split(':')
56+
client = boto3.client('sns', region_name=parts[3])
57+
5358
duration = []
5459
# iterate through records
55-
for i, record in enumerate(records()):
60+
for i, record in enumerate(records):
5661
start = datetime.now()
62+
if i % 50000 == 0:
63+
logger.info('%s: Scanned %s records' % (start, str(i)))
5764
dt = record['datetime'].date()
65+
if prefix is not None:
66+
# if path doesn't match provided prefix skip to next record
67+
if record['path'][:len(prefix)] != prefix:
68+
continue
5869
if s3meta:
5970
url = op.join(SETTINGS['s3_url'], record['path'])
6071
else:
6172
url = op.join(SETTINGS['roda_url'], record['path'])
62-
if i % 10000 == 0:
63-
print('Scanned %s records' % str(i+1))
6473
#if i == 10:
6574
# break
6675
if (start_date is not None and dt < start_date) or (end_date is not None and dt > end_date):
@@ -79,13 +88,28 @@ def add_items(catalog, start_date=None, end_date=None, s3meta=False):
7988
continue
8089
try:
8190
collection.add_item(item, path=SETTINGS['path_pattern'], filename=SETTINGS['fname_pattern'])
91+
if client:
92+
client.publish(TopicArn=publish, Message=json.dumps(item.data))
8293
duration.append((datetime.now()-start).total_seconds())
8394
logger.info('Ingested %s in %s' % (item.filename, duration[-1]))
8495
except Exception as err:
8596
logger.error('Error adding %s: %s' % (item.id, err))
8697
logger.info('Read in %s records averaging %4.2f sec (%4.2f stddev)' % (i, np.mean(duration), np.std(duration)))
8798

88-
def records():
99+
100+
def read_inventory(filename):
101+
""" Create generator from inventory file """
102+
with open(filename) as f:
103+
f.readline()
104+
for line in f.readlines():
105+
parts = line.split(',')
106+
yield {
107+
'datetime': parse(parts[0]),
108+
'path': parts[1].strip('\n')
109+
}
110+
111+
112+
def latest_inventory():
89113
""" Return generator function for list of scenes """
90114
s3 = boto3.client('s3')
91115
# get latest file

satstac/sentinel/sentinel-2-l1c.json

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -59,86 +59,86 @@
5959
"eo:off_nadir": 0,
6060
"eo:bands": [
6161
{
62-
"id": "B01",
62+
"name": "B01",
6363
"common_name": "coastal",
6464
"gsd": 60.0,
6565
"center_wavelength": 0.4439,
6666
"full_width_half_max": 0.027
6767
},
6868
{
69-
"id": "B02",
69+
"name": "B02",
7070
"common_name": "blue",
7171
"gsd": 10.0,
7272
"center_wavelength": 0.4966,
7373
"full_width_half_max": 0.098
7474
},
7575
{
76-
"id": "B03",
76+
"name": "B03",
7777
"common_name": "green",
7878
"gsd": 10.0,
7979
"center_wavelength": 0.56,
8080
"full_width_half_max": 0.045
8181
},
8282
{
83-
"id": "B04",
83+
"name": "B04",
8484
"common_name": "red",
8585
"gsd": 10.0,
8686
"center_wavelength": 0.6645,
8787
"full_width_half_max": 0.038
8888
},
8989
{
90-
"id": "B05",
90+
"name": "B05",
9191
"gsd": 20.0,
9292
"center_wavelength": 0.7039,
9393
"full_width_half_max": 0.019
9494
},
9595
{
96-
"id": "B06",
96+
"name": "B06",
9797
"gsd": 20.0,
9898
"center_wavelength": 0.7402,
9999
"full_width_half_max": 0.018
100100
},
101101
{
102-
"id": "B07",
102+
"name": "B07",
103103
"gsd": 20.0,
104104
"center_wavelength": 0.7825,
105105
"full_width_half_max": 0.028
106106
},
107107
{
108-
"id": "B08",
108+
"name": "B08",
109109
"common_name": "nir",
110110
"gsd": 10.0,
111111
"center_wavelength": 0.8351,
112112
"full_width_half_max": 0.145
113113
},
114114
{
115-
"id": "B8A",
115+
"name": "B8A",
116116
"gsd": 20.0,
117117
"center_wavelength": 0.8648,
118118
"full_width_half_max": 0.033
119119
},
120120
{
121-
"id": "B09",
121+
"name": "B09",
122122
"gsd": 60.0,
123123
"center_wavelength": 0.945,
124124
"full_width_half_max": 0.026
125125
},
126126
{
127-
"id": "B10",
127+
"name": "B10",
128128
"common_name": "cirrus",
129129
"gsd": 60.0,
130130
"center_wavelength": 1.3735,
131131
"full_width_half_max": 0.075
132132
},
133133
{
134-
"id": "B11",
134+
"name": "B11",
135135
"common_name": "swir16",
136136
"gsd": 20.0,
137137
"center_wavelength": 1.6137,
138138
"full_width_half_max": 0.143
139139
},
140140
{
141-
"id": "B12",
141+
"name": "B12",
142142
"common_name": "swir22",
143143
"gsd": 20.0,
144144
"center_wavelength": 2.22024,

test/test_main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def test_main(self):
2626
#fout = sentinel.main(sentinel, start_date=dt(2013, 10, 1).date())
2727

2828
def test_records(self):
29-
for r in sentinel.records():
29+
for r in sentinel.latest_inventory():
3030
assert('datetime' in r)
3131
assert('path' in r)
3232
break

0 commit comments

Comments
 (0)