Skip to content

Commit 66a763c

Browse files
author
Diego Ardila
committed
Add readme + ability to deselect slow tests relying on stepfunctions
1 parent acd98ec commit 66a763c

File tree

3 files changed

+51
-8
lines changed

3 files changed

+51
-8
lines changed

README.md

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,13 @@ Aggregate metrics in ML are not good enough. To improve production ML, you need
66

77
Scale Nucleus helps you:
88

9-
* Visualize your data
10-
* Curate interesting slices within your dataset
11-
* Review and manage annotations
12-
* Measure and debug your model performance
9+
- Visualize your data
10+
- Curate interesting slices within your dataset
11+
- Review and manage annotations
12+
- Measure and debug your model performance
1313

1414
Nucleus is a new way—the right way—to develop ML models, helping us move away from the concept of one dataset and towards a paradigm of collections of scenarios.
1515

16-
17-
1816
## Installation
1917

2018
`$ pip install scale-nucleus`
@@ -26,65 +24,83 @@ The client abstractions serves to authenticate the user and act as the gateway
2624
for users to interact with their datasets, models, and model runs.
2725

2826
### Create a client object
27+
2928
```python
3029
import nucleus
3130
client = nucleus.NucleusClient("YOUR_API_KEY_HERE")
3231
```
3332

3433
### Create Dataset
34+
3535
```python
3636
dataset = client.create_dataset("My Dataset")
3737
```
3838

3939
### List Datasets
40+
4041
```python
4142
datasets = client.list_datasets()
4243
```
4344

4445
### Delete a Dataset
46+
4547
By specifying target dataset id.
4648
A response code of 200 indicates successful deletion.
49+
4750
```python
4851
client.delete_dataset("YOUR_DATASET_ID")
4952
```
5053

5154
### Append Items to a Dataset
55+
5256
You can append both local images and images from the web. Simply specify the location and Nucleus will automatically infer if it's remote or a local file.
57+
5358
```python
5459
dataset_item_1 = DatasetItem(image_location="./1.jpeg", reference_id="1", metadata={"key": "value"})
5560
dataset_item_2 = DatasetItem(image_location="s3://srikanth-nucleus/9-1.jpg", reference_id="2", metadata={"key": "value"})
5661
```
5762

5863
The append function expects a list of `DatasetItem` objects to upload, like this:
64+
5965
```python
6066
response = dataset.append([dataset_item_1, dataset_item_2])
6167
```
6268

6369
### Get Dataset Info
70+
6471
Tells us the dataset name, number of dataset items, model_runs, and slice_ids.
72+
6573
```python
6674
dataset.info
6775
```
6876

6977
### Access Dataset Items
78+
7079
There are three methods to access individual Dataset Items:
7180

7281
(1) Dataset Items are accessible by reference id
82+
7383
```python
7484
item = dataset.refloc("my_img_001.png")
7585
```
86+
7687
(2) Dataset Items are accessible by index
88+
7789
```python
7890
item = dataset.iloc(0)
7991
```
92+
8093
(3) Dataset Items are accessible by the dataset_item_id assigned internally
94+
8195
```python
8296
item = dataset.loc("dataset_item_id")
8397
```
8498

8599
### Add Annotations
100+
86101
Upload groundtruth annotations for the items in your dataset.
87102
Box2DAnnotation has same format as https://dashboard.scale.com/nucleus/docs/api#add-ground-truth
103+
88104
```python
89105
annotation_1 = BoxAnnotation(reference_id="1", label="label", x=0, y=0, width=10, height=10, annotation_id="ann_1", metadata={})
90106
annotation_2 = BoxAnnotation(reference_id="2", label="label", x=0, y=0, width=10, height=10, annotation_id="ann_2", metadata={})
@@ -94,6 +110,7 @@ response = dataset.annotate([annotation_1, annotation_2])
94110
For particularly large payloads, please reference the accompanying scripts in **references**
95111

96112
### Add Model
113+
97114
The model abstraction is intended to represent a unique architecture.
98115
Models are independent of any dataset.
99116

@@ -102,10 +119,12 @@ model = client.add_model(name="My Model", reference_id="newest-cnn-its-new", met
102119
```
103120

104121
### Upload Predictions to ModelRun
122+
105123
This method populates the model_run object with predictions. `ModelRun` objects need to reference a `Dataset` that has been created.
106124
Returns the associated model_id, human-readable name of the run, status, and user specified metadata.
107125
Takes a list of Box2DPredictions within the payload, where Box2DPrediction
108126
is formulated as in https://dashboard.scale.com/nucleus/docs/api#upload-model-outputs
127+
109128
```python
110129
prediction_1 = BoxPrediction(reference_id="1", label="label", x=0, y=0, width=10, height=10, annotation_id="pred_1", confidence=0.9)
111130
prediction_2 = BoxPrediction(reference_id="2", label="label", x=0, y=0, width=10, height=10, annotation_id="pred_2", confidence=0.2)
@@ -114,39 +133,51 @@ model_run = model.create_run(name="My Model Run", metadata={"timestamp": "121012
114133
```
115134

116135
### Commit ModelRun
136+
117137
The commit action indicates that the user is finished uploading predictions associated
118-
with this model run. Committing a model run kicks off Nucleus internal processes
138+
with this model run. Committing a model run kicks off Nucleus internal processes
119139
to calculate performance metrics like IoU. After being committed, a ModelRun object becomes immutable.
140+
120141
```python
121142
model_run.commit()
122143
```
123144

124145
### Get ModelRun Info
146+
125147
Returns the associated model_id, human-readable name of the run, status, and user specified metadata.
148+
126149
```python
127150
model_run.info
128151
```
129152

130153
### Accessing ModelRun Predictions
154+
131155
You can access the modelRun predictions for an individual dataset_item through three methods:
132156

133157
(1) user specified reference_id
158+
134159
```python
135160
model_run.refloc("my_img_001.png")
136161
```
162+
137163
(2) Index
164+
138165
```python
139166
model_run.iloc(0)
140167
```
168+
141169
(3) Internally maintained dataset_item_id
170+
142171
```python
143172
model_run.loc("dataset_item_id")
144173
```
145174

146175
### Delete ModelRun
176+
147177
Delete a model run using the target model_run_id.
148178

149179
A response code of 200 indicates successful deletion.
180+
150181
```python
151182
client.delete_model_run("model_run_id")
152183
```
@@ -163,14 +194,20 @@ poetry install
163194
```
164195

165196
Please install the pre-commit hooks by running the following command:
197+
166198
```python
167199
poetry run pre-commit install
168200
```
169201

170202
**Best practices for testing:**
171203
(1). Please run pytest from the root directory of the repo, i.e.
204+
172205
```
173-
poetry pytest tests/test_dataset.py
206+
poetry run pytest tests/test_dataset.py
174207
```
175208

209+
(2) To skip slow integration tests that have to wait for an async job to start.
176210

211+
```
212+
poetry run pytest -m "not integration"
213+
```

tests/test_dataset.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ def test_dataset_append_local(CLIENT, dataset):
144144
assert ERROR_PAYLOAD not in resp_json
145145

146146

147+
@pytest.mark.integration
147148
def test_dataset_append_async(dataset: Dataset):
148149
job = dataset.append(make_dataset_items(), asynchronous=True)
149150
job.sleep_until_complete()
@@ -173,6 +174,7 @@ def test_dataset_append_async_with_local_path(dataset: Dataset):
173174
dataset.append(ds_items, asynchronous=True)
174175

175176

177+
@pytest.mark.integration
176178
def test_dataset_append_async_with_1_bad_url(dataset: Dataset):
177179
ds_items = make_dataset_items()
178180
ds_items[0].image_location = "https://looks.ok.but.is.not.accessible"
@@ -248,6 +250,7 @@ def test_dataset_export_autotag_scores(CLIENT):
248250
assert len(scores[column]) > 0
249251

250252

253+
@pytest.mark.integration
251254
def test_annotate_async(dataset: Dataset):
252255
dataset.append(make_dataset_items())
253256
semseg = SegmentationAnnotation.from_json(TEST_SEGMENTATION_ANNOTATIONS[0])
@@ -281,6 +284,7 @@ def test_annotate_async(dataset: Dataset):
281284
}
282285

283286

287+
@pytest.mark.integration
284288
def test_annotate_async_with_error(dataset: Dataset):
285289
dataset.append(make_dataset_items())
286290
semseg = SegmentationAnnotation.from_json(TEST_SEGMENTATION_ANNOTATIONS[0])

tests/test_prediction.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,7 @@ def test_mixed_pred_upload(model_run):
271271
)
272272

273273

274+
@pytest.mark.integration
274275
def test_mixed_pred_upload_async(model_run: ModelRun):
275276
prediction_semseg = SegmentationPrediction.from_json(
276277
TEST_SEGMENTATION_PREDICTIONS[0]
@@ -307,6 +308,7 @@ def test_mixed_pred_upload_async(model_run: ModelRun):
307308
}
308309

309310

311+
@pytest.mark.integration
310312
def test_mixed_pred_upload_async_with_error(model_run: ModelRun):
311313
prediction_semseg = SegmentationPrediction.from_json(
312314
TEST_SEGMENTATION_PREDICTIONS[0]

0 commit comments

Comments
 (0)