Skip to content

Commit 7b6f15d

Browse files
authored
Custom object embeddings with annotation uploads (#274)
Finally here! + 2 tests
1 parent ccbd330 commit 7b6f15d

File tree

8 files changed

+149
-25
lines changed

8 files changed

+149
-25
lines changed

README.md

Lines changed: 29 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,13 @@ Nucleus is a new way—the right way—to develop ML models, helping us move awa
1717

1818
`$ pip install scale-nucleus`
1919

20-
2120
## CLI installation
21+
2222
We recommend installing the CLI via `pipx` (https://pypa.github.io/pipx/installation/). This makes sure that
2323
the CLI does not interfere with you system packages and is accessible from your favorite terminal.
2424

2525
For MacOS:
26+
2627
```bash
2728
brew install pipx
2829
pipx ensurepath
@@ -32,6 +33,7 @@ nu install-completions
3233
```
3334

3435
Otherwise, install via pip (requires pip 19.0 or later):
36+
3537
```bash
3638
python3 -m pip install --user pipx
3739
python3 -m pipx ensurepath
@@ -45,6 +47,7 @@ nu install-completions
4547
### Outdated Client
4648

4749
Nucleus is iterating rapidly and as a result we do not always perfectly preserve backwards compatibility with older versions of the client. If you run into any unexpected error, it's a good idea to upgrade your version of the client by running
50+
4851
```
4952
pip install --upgrade scale-nucleus
5053
```
@@ -87,33 +90,34 @@ poetry run pytest -m "not integration"
8790

8891
## Pydantic Models
8992

90-
Prefer using [Pydantic](https://pydantic-docs.helpmanual.io/usage/models/) models rather than creating raw dictionaries
91-
or dataclasses to send or receive over the wire as JSONs. Pydantic is created with data validation in mind and provides very clear error
93+
Prefer using [Pydantic](https://pydantic-docs.helpmanual.io/usage/models/) models rather than creating raw dictionaries
94+
or dataclasses to send or receive over the wire as JSONs. Pydantic is created with data validation in mind and provides very clear error
9295
messages when it encounters a problem with the payload.
9396

9497
The Pydantic model(s) should mirror the payload to send. To represent a JSON payload that looks like this:
98+
9599
```json
96100
{
97101
"example_json_with_info": {
98-
"metadata": {
99-
"frame": 0
100-
},
101-
"reference_id": "frame0",
102-
"url": "s3://example/scale_nucleus/2021/lidar/0038711321865000.json",
103-
"type": "pointcloud"
102+
"metadata": {
103+
"frame": 0
104104
},
105+
"reference_id": "frame0",
106+
"url": "s3://example/scale_nucleus/2021/lidar/0038711321865000.json",
107+
"type": "pointcloud"
108+
},
105109
"example_image_with_info": {
106-
"metadata": {
107-
"author": "Picasso"
108-
},
109-
"reference_id": "frame0",
110-
"url": "s3://bucket/0038711321865000.jpg",
111-
"type": "image"
110+
"metadata": {
111+
"author": "Picasso"
112112
},
113+
"reference_id": "frame0",
114+
"url": "s3://bucket/0038711321865000.jpg",
115+
"type": "image"
116+
}
113117
}
114118
```
115119

116-
Could be represented as the following structure. Note that the field names map to the JSON keys and the usage of field
120+
Could be represented as the following structure. Note that the field names map to the JSON keys and the usage of field
117121
validators (`@validator`).
118122

119123
```python
@@ -161,29 +165,31 @@ parsed_model = ExampleNestedModel.parse_obj(payload.json())
161165
requests.post("example/post_to", json=parsed_model.dict())
162166
```
163167

164-
165168
### Migrating to Pydantic
169+
166170
- When migrating an interface from a dictionary use `nucleus.pydantic_base.DictCompatibleModel`. That allows you to get
167-
the benefits of Pydantic but maintaints backwards compatibility with a Python dictionary by delegating `__getitem__` to
168-
fields.
169-
- When migrating a frozen dataclass use `nucleus.pydantic_base.ImmutableModel`. That is a base class set up to be
170-
immutable after initialization.
171+
the benefits of Pydantic but maintaints backwards compatibility with a Python dictionary by delegating `__getitem__` to
172+
fields.
173+
- When migrating a frozen dataclass use `nucleus.pydantic_base.ImmutableModel`. That is a base class set up to be
174+
immutable after initialization.
171175

172176
**Updating documentation:**
173177
We use [Sphinx](https://www.sphinx-doc.org/en/master/) to autogenerate our API Reference from docstrings.
174178

175179
To test your local docstring changes, run the following commands from the repository's root directory:
180+
176181
```
177182
poetry shell
178183
cd docs
179184
sphinx-autobuild . ./_build/html --watch ../nucleus
180185
```
181-
`sphinx-autobuild` will spin up a server on localhost (port 8000 by default) that will watch for and automatically rebuild a version of the API reference based on your local docstring changes.
182186

187+
`sphinx-autobuild` will spin up a server on localhost (port 8000 by default) that will watch for and automatically rebuild a version of the API reference based on your local docstring changes.
183188

184189
## Custom Metrics using Shapely in scale-validate
185190

186191
Certain metrics use `shapely` which is added as an optional dependency.
192+
187193
```bash
188194
pip install scale-nucleus[metrics]
189195
```
@@ -199,4 +205,4 @@ apt-get install libgeos-dev
199205

200206
To develop it locally use
201207

202-
`poetry install --extra shapely`
208+
`poetry install --extras shapely`

nucleus/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -809,6 +809,7 @@ def make_request(
809809
Returns:
810810
Response payload as JSON dict.
811811
"""
812+
print(payload, route)
812813
if payload is None:
813814
payload = {}
814815
if requests_command is requests.get:

nucleus/annotation.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
CATEGORY_TYPE,
1313
CUBOID_TYPE,
1414
DIMENSIONS_KEY,
15+
EMBEDDING_VECTOR_KEY,
1516
GEOMETRY_KEY,
1617
HEIGHT_KEY,
1718
INDEX_KEY,
@@ -97,7 +98,8 @@ class BoxAnnotation(Annotation): # pylint: disable=R0902
9798
height=10,
9899
reference_id="image_1",
99100
annotation_id="image_1_car_box_1",
100-
metadata={"vehicle_color": "red"}
101+
metadata={"vehicle_color": "red"},
102+
embedding_vector=[0.1423, 1.432, ...3.829],
101103
)
102104
103105
Parameters:
@@ -121,6 +123,9 @@ class BoxAnnotation(Annotation): # pylint: disable=R0902
121123
attach to this annotation. Strings, floats and ints are supported best
122124
by querying and insights features within Nucleus. For more details see
123125
our `metadata guide <https://nucleus.scale.com/docs/upload-metadata>`_.
126+
embedding_vector: Custom embedding vector for this object annotation.
127+
If any custom object embeddings have been uploaded previously to this dataset,
128+
this vector must match the dimensions of the previously ingested vectors.
124129
"""
125130

126131
label: str
@@ -131,6 +136,7 @@ class BoxAnnotation(Annotation): # pylint: disable=R0902
131136
reference_id: str
132137
annotation_id: Optional[str] = None
133138
metadata: Optional[Dict] = None
139+
embedding_vector: Optional[list] = None
134140

135141
def __post_init__(self):
136142
self.metadata = self.metadata if self.metadata else {}
@@ -149,6 +155,7 @@ def from_json(cls, payload: dict):
149155
reference_id=payload[REFERENCE_ID_KEY],
150156
annotation_id=payload.get(ANNOTATION_ID_KEY, None),
151157
metadata=payload.get(METADATA_KEY, {}),
158+
embedding_vector=payload.get(EMBEDDING_VECTOR_KEY, None),
152159
)
153160

154161
def to_payload(self) -> dict:
@@ -164,6 +171,7 @@ def to_payload(self) -> dict:
164171
REFERENCE_ID_KEY: self.reference_id,
165172
ANNOTATION_ID_KEY: self.annotation_id,
166173
METADATA_KEY: self.metadata,
174+
EMBEDDING_VECTOR_KEY: self.embedding_vector,
167175
}
168176

169177

@@ -282,7 +290,8 @@ class PolygonAnnotation(Annotation):
282290
vertices=[Point(100, 100), Point(150, 200), Point(200, 100)],
283291
reference_id="image_2",
284292
annotation_id="image_2_bus_polygon_1",
285-
metadata={"vehicle_color": "yellow"}
293+
metadata={"vehicle_color": "yellow"},
294+
embedding_vector=[0.1423, 1.432, ...3.829],
286295
)
287296
288297
Parameters:
@@ -298,13 +307,17 @@ class PolygonAnnotation(Annotation):
298307
attach to this annotation. Strings, floats and ints are supported best
299308
by querying and insights features within Nucleus. For more details see
300309
our `metadata guide <https://nucleus.scale.com/docs/upload-metadata>`_.
310+
embedding_vector: Custom embedding vector for this object annotation.
311+
If any custom object embeddings have been uploaded previously to this dataset,
312+
this vector must match the dimensions of the previously ingested vectors.
301313
"""
302314

303315
label: str
304316
vertices: List[Point]
305317
reference_id: str
306318
annotation_id: Optional[str] = None
307319
metadata: Optional[Dict] = None
320+
embedding_vector: Optional[list] = None
308321

309322
def __post_init__(self):
310323
self.metadata = self.metadata if self.metadata else {}
@@ -333,6 +346,7 @@ def from_json(cls, payload: dict):
333346
reference_id=payload[REFERENCE_ID_KEY],
334347
annotation_id=payload.get(ANNOTATION_ID_KEY, None),
335348
metadata=payload.get(METADATA_KEY, {}),
349+
embedding_vector=payload.get(EMBEDDING_VECTOR_KEY, None),
336350
)
337351

338352
def to_payload(self) -> dict:
@@ -345,6 +359,7 @@ def to_payload(self) -> dict:
345359
REFERENCE_ID_KEY: self.reference_id,
346360
ANNOTATION_ID_KEY: self.annotation_id,
347361
METADATA_KEY: self.metadata,
362+
EMBEDDING_VECTOR_KEY: self.embedding_vector,
348363
}
349364
return payload
350365

nucleus/constants.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
DEFAULT_ANNOTATION_UPDATE_MODE = False
4242
DEFAULT_NETWORK_TIMEOUT_SEC = 120
4343
DIMENSIONS_KEY = "dimensions"
44+
EMBEDDING_VECTOR_KEY = "embedding_vector"
4445
EMBEDDINGS_URL_KEY = "embeddings_urls"
4546
EMBEDDING_DIMENSION_KEY = "embedding_dimension"
4647
ERRORS_KEY = "errors"

nucleus/prediction.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
CONFIDENCE_KEY,
2727
CUBOID_TYPE,
2828
DIMENSIONS_KEY,
29+
EMBEDDING_VECTOR_KEY,
2930
GEOMETRY_KEY,
3031
HEIGHT_KEY,
3132
LABEL_KEY,
@@ -153,6 +154,9 @@ class BoxPrediction(BoxAnnotation):
153154
annotation. Each value should be between 0 and 1 (inclusive), and sum up to
154155
1 as a complete distribution. This can be useful for computing entropy to
155156
surface places where the model is most uncertain.
157+
embedding_vectorOptional[List]): Custom embedding vector for this object annotation.
158+
If any custom object embeddings have been uploaded previously to this dataset,
159+
this vector must match the dimensions of the previously ingested vectors.
156160
"""
157161

158162
def __init__(
@@ -167,6 +171,7 @@ def __init__(
167171
annotation_id: Optional[str] = None,
168172
metadata: Optional[Dict] = None,
169173
class_pdf: Optional[Dict] = None,
174+
embedding_vector: Optional[list] = None,
170175
):
171176
super().__init__(
172177
label=label,
@@ -177,6 +182,7 @@ def __init__(
177182
reference_id=reference_id,
178183
annotation_id=annotation_id,
179184
metadata=metadata,
185+
embedding_vector=embedding_vector,
180186
)
181187
self.confidence = confidence
182188
self.class_pdf = class_pdf
@@ -204,6 +210,7 @@ def from_json(cls, payload: dict):
204210
annotation_id=payload.get(ANNOTATION_ID_KEY, None),
205211
metadata=payload.get(METADATA_KEY, {}),
206212
class_pdf=payload.get(CLASS_PDF_KEY, None),
213+
embedding_vector=payload.get(EMBEDDING_VECTOR_KEY, None),
207214
)
208215

209216

@@ -296,6 +303,9 @@ class PolygonPrediction(PolygonAnnotation):
296303
annotation. Each value should be between 0 and 1 (inclusive), and sum up to
297304
1 as a complete distribution. This can be useful for computing entropy to
298305
surface places where the model is most uncertain.
306+
embedding_vector: Custom embedding vector for this object annotation.
307+
If any custom object embeddings have been uploaded previously to this dataset,
308+
this vector must match the dimensions of the previously ingested vectors.
299309
"""
300310

301311
def __init__(
@@ -307,13 +317,15 @@ def __init__(
307317
annotation_id: Optional[str] = None,
308318
metadata: Optional[Dict] = None,
309319
class_pdf: Optional[Dict] = None,
320+
embedding_vector: Optional[list] = None,
310321
):
311322
super().__init__(
312323
label=label,
313324
vertices=vertices,
314325
reference_id=reference_id,
315326
annotation_id=annotation_id,
316327
metadata=metadata,
328+
embedding_vector=embedding_vector,
317329
)
318330
self.confidence = confidence
319331
self.class_pdf = class_pdf
@@ -340,6 +352,7 @@ def from_json(cls, payload: dict):
340352
annotation_id=payload.get(ANNOTATION_ID_KEY, None),
341353
metadata=payload.get(METADATA_KEY, {}),
342354
class_pdf=payload.get(CLASS_PDF_KEY, None),
355+
embedding_vector=payload.get(EMBEDDING_VECTOR_KEY, None),
343356
)
344357

345358

tests/helpers.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,13 @@ def reference_id_from_url(url):
177177
for i in range(len(TEST_IMG_URLS))
178178
]
179179

180+
181+
TEST_BOX_ANNOTATIONS_EMBEDDINGS = [
182+
{**ann, "embedding_vector": [i, i, i]}
183+
for i, ann in enumerate(TEST_BOX_ANNOTATIONS)
184+
]
185+
186+
180187
TEST_LINE_ANNOTATIONS = [
181188
{
182189
"label": f"[Pytest] Line Annotation ${i}",
@@ -355,6 +362,20 @@ def reference_id_from_url(url):
355362
for i in range(len(TEST_BOX_ANNOTATIONS))
356363
]
357364

365+
TEST_BOX_PREDICTIONS_EMBEDDINGS = [
366+
{
367+
**TEST_BOX_ANNOTATIONS_EMBEDDINGS[i],
368+
"confidence": 0.10 * i,
369+
"class_pdf": TEST_BOX_MODEL_PDF,
370+
}
371+
if i != 0
372+
else {
373+
**TEST_BOX_ANNOTATIONS_EMBEDDINGS[i],
374+
"confidence": 0.10 * i,
375+
}
376+
for i in range(len(TEST_BOX_ANNOTATIONS_EMBEDDINGS))
377+
]
378+
358379
TEST_LINE_PREDICTIONS = [
359380
{
360381
**TEST_LINE_ANNOTATIONS[i],

0 commit comments

Comments
 (0)