Skip to content

Commit c2a99bd

Browse files
authored
Merge pull request #139 from scaleapi/da-document-dataset-item
Docstrings for dataset item class
2 parents ba1adcc + 818d36a commit c2a99bd

File tree

1 file changed

+81
-2
lines changed

1 file changed

+81
-2
lines changed

nucleus/dataset_item.py

Lines changed: 81 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,17 @@
3131

3232
@dataclass
3333
class Quaternion:
34+
"""Quaternion objects are used to represent rotation.
35+
We use the Hamilton quaternion convention, where i^2 = j^2 = k^2 = ijk = -1, i.e. the right-handed convention.
36+
The quaternion represented by the tuple (x, y, z, w) is equal to w + x*i + y*j + z*k
37+
38+
Attributes:
39+
x: x value
40+
y: y value
41+
x: z value
42+
w: w value
43+
"""
44+
3445
x: float
3546
y: float
3647
z: float
@@ -53,6 +64,20 @@ def to_payload(self) -> dict:
5364

5465
@dataclass
5566
class CameraParams:
67+
"""CameraParams objects represent the camera position/heading used to record the image.
68+
69+
Attributes:
70+
position: Vector3 World-normalized position of the camera
71+
heading: Vector <x, y, z, w> indicating the quaternion of the camera direction;
72+
note that the z-axis of the camera frame represents the camera's optical axis.
73+
See `Heading Examples<https://docs.scale.com/reference/data-types-and-the-frame-objects#heading-examples>`_
74+
for examples.
75+
fx: focal length in x direction (in pixels)
76+
fy: focal length in y direction (in pixels)
77+
cx: principal point x value
78+
cy: principal point y value
79+
"""
80+
5681
position: Point3D
5782
heading: Quaternion
5883
fx: float
@@ -89,14 +114,68 @@ class DatasetItemType(Enum):
89114

90115
@dataclass # pylint: disable=R0902
91116
class DatasetItem: # pylint: disable=R0902
117+
"""A dataset item is an image or pointcloud that has associated metadata.
118+
119+
Note: for 3D data, please include a :class:`.CameraParams` object under a key named
120+
"camera_params" within the metadata dictionary. This will allow for projecting
121+
3D annotations to any image within a scene.
122+
123+
124+
Attributes:
125+
image_location: Required if pointcloud_location not present: The location
126+
containing the image for the given row of data. This can be a local path, or a remote URL.
127+
Remote formats supported include any URL (http:// or https://) or URIs for AWS S3, Azure, or GCS,
128+
(i.e. s3://, gcs://)
129+
reference_id: (required) A user-specified identifier to reference the item. The
130+
default value is present in order to not have to change argument order, but
131+
must be replaced.
132+
metadata: Extra information about the particular dataset item. ints, floats,
133+
string values will be made searchable in the query bar by the key in this dict
134+
For example, {"animal": "dog"} will become searchable via
135+
metadata.animal = "dog"
136+
137+
Categorical data can be passed as a string and will be treated categorically
138+
by Nucleus if there are less than 250 unique values in the dataset. This means
139+
histograms of values in the "Insights" section and autocomplete
140+
within the query bar.
141+
142+
Numerical metadata will generate histograms in the "Insights" section, allow
143+
for sorting the results of any query, and can be used with the modulo operator
144+
For example: metadata.frame_number % 5 = 0
145+
146+
All other types of metadata will be visible from the dataset item detail view.
147+
148+
It is important that string and numerical metadata fields are consistent - if
149+
a metadata field has a string value, then all metadata fields with the same
150+
key should also have string values, and vice versa for numerical metadata.
151+
If conflicting types are found, Nucleus will return an error during upload!
152+
153+
The recommended way of adding or updating existing metadata is to re-run the
154+
ingestion (dataset.append) with update=True, which will replace any existing
155+
metadata with whatever your new ingestion run uses. This will delete any
156+
metadata keys that are not present in the new ingestion run. We have a cache
157+
based on image_location that will skip the need for a re-upload of the images,
158+
so your second ingestion will be faster than your first.
159+
TODOC(Shorten this once we have a guide migrated for metadata, or maybe link
160+
from other places to here.)
161+
pointcloud_location: Required if image_location not present: The remote URL
162+
containing the pointcloud JSON. Remote formats supported include any URL
163+
(http:// or https://) or URIs for AWS S3, Azure, or GCS, (i.e. s3://, gcs://)
164+
upload_to_scale: Set this to false in order to use `privacy mode <https://dashboard.scale.com/nucleus/docs/api#privacy-mode>`_. TODOC (update this once guide is migrated).
165+
Setting this to false means the actual data within the item
166+
(i.e. the image or pointcloud) will not be uploaded to scale meaning that
167+
you can send in links that are only accessible to certain users, and not to
168+
Scale.
169+
"""
170+
92171
image_location: Optional[str] = None
93-
reference_id: Optional[str] = None
172+
reference_id: str = "DUMMY_VALUE" # Done in order to preserve argument ordering and not break old clients.
94173
metadata: Optional[dict] = None
95174
pointcloud_location: Optional[str] = None
96175
upload_to_scale: Optional[bool] = True
97176

98177
def __post_init__(self):
99-
assert self.reference_id is not None, "reference_id is required."
178+
assert self.reference_id != "DUMMY_VALUE", "reference_id is required."
100179
assert bool(self.image_location) != bool(
101180
self.pointcloud_location
102181
), "Must specify exactly one of the image_location, pointcloud_location parameters"

0 commit comments

Comments
 (0)