Skip to content
This repository was archived by the owner on Jul 29, 2024. It is now read-only.

Commit af9d32a

Browse files
Add files via upload
1 parent 0f02e2e commit af9d32a

File tree

1 file changed

+349
-0
lines changed

1 file changed

+349
-0
lines changed

create_data_rows_example.ipynb

Lines changed: 349 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,349 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "f0948f3e-77f5-471f-8062-eb00328636bd",
6+
"metadata": {},
7+
"source": [
8+
"<td>\n",
9+
" <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=190/></a>\n",
10+
"</td>"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "e34e7399-7bdc-49b9-9173-c15de66d8ddb",
16+
"metadata": {
17+
"tags": []
18+
},
19+
"source": [
20+
"# Labelpandas - The Labelbox <> Pandas Connector\n",
21+
"***Instantly Load CSVs (and other Tables) into Labelbox***\n",
22+
"\n",
23+
"---\n",
24+
"\n",
25+
"This notebook is used to go over the basic use of the Labelpandas Python SDK. \n",
26+
"\n",
27+
"**Pandas** is a Python library that helps in loading and manipulating CSVs and tabular data more efficiently. It is one of the most widely used Python libraries in the world.\n",
28+
"\n",
29+
"**Labelpandas** incorporates both Labelbox and Pandas to make uploading CSVs and tabular data to Labelbox straightforward. It can handle both local file assets as well as cloud-hosted assets. "
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": 1,
35+
"id": "5c4bb8a6-44d4-4076-9728-f85bad8aeaba",
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"!pip install labelpandas --upgrade -q"
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": 2,
45+
"id": "7ef12ac7-5c85-497a-987c-1e41af2dd715",
46+
"metadata": {},
47+
"outputs": [],
48+
"source": [
49+
"import labelpandas as lbpd"
50+
]
51+
},
52+
{
53+
"cell_type": "markdown",
54+
"id": "1dfe1eae-e28f-4928-9058-796d097f38cc",
55+
"metadata": {},
56+
"source": [
57+
"# Set up Labelpandas Client"
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": 3,
63+
"id": "de523a2e-484a-4e16-b976-f2c4edd7efe3",
64+
"metadata": {},
65+
"outputs": [],
66+
"source": [
67+
"labelbox_api_key = \"\""
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": 4,
73+
"id": "5faabd84-8ae0-4b5b-8e0e-1a807317b0a5",
74+
"metadata": {},
75+
"outputs": [],
76+
"source": [
77+
"client = lbpd.Client(lb_api_key=labelbox_api_key)"
78+
]
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"id": "b2255bcf-3df6-43c7-ab78-262df2798e64",
83+
"metadata": {},
84+
"source": [
85+
"# Load CSV\n",
86+
"\n",
87+
"To upload data rows from a csv, your csv **must** have the following:\n",
88+
"\n",
89+
"- Column consisting of your **row data** as a string value - this pertains to either your asset URL (pointing to cloud storage) or a local file path\n",
90+
" \n",
91+
"- Column consisting of your **global key** as a string value - this is an externally facing ID that must be unique (Labelbox enforces it)\n",
92+
" - If you attempt to upload a data row with an existing global key, it will either auto-generate a suffix such as \"_1\" or it will skip it entirely\n",
93+
" \n",
94+
"**To upload data rows with metadta, your csv must have one column per metadata field name**. Labelpandas matches the column names to Labelbox metadata names when uploading metadata."
95+
]
96+
},
97+
{
98+
"cell_type": "code",
99+
"execution_count": 5,
100+
"id": "83fec8c5-7456-4dd3-a5e1-b0605db3ff35",
101+
"metadata": {},
102+
"outputs": [],
103+
"source": [
104+
"from io import StringIO\n",
105+
"import uuid\n",
106+
"\n",
107+
"demo_csv = f\"\"\"global_key,row_data,split\n",
108+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000569539.jpg,train\n",
109+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000288451.jpg,train\n",
110+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000240902.jpg,train\n",
111+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000428116.jpg,train\n",
112+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000459566.jpg,train\n",
113+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000442982.jpg,train\n",
114+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000569538.jpg,valid\n",
115+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000022415.jpg,valid\n",
116+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000146981.jpg,test\n",
117+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000173046.jpg,test\"\"\""
118+
]
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": 6,
123+
"id": "fb03e1e3-4682-4738-8e87-974c4acb9a8c",
124+
"metadata": {},
125+
"outputs": [
126+
{
127+
"data": {
128+
"text/html": [
129+
"<div>\n",
130+
"<style scoped>\n",
131+
" .dataframe tbody tr th:only-of-type {\n",
132+
" vertical-align: middle;\n",
133+
" }\n",
134+
"\n",
135+
" .dataframe tbody tr th {\n",
136+
" vertical-align: top;\n",
137+
" }\n",
138+
"\n",
139+
" .dataframe thead th {\n",
140+
" text-align: right;\n",
141+
" }\n",
142+
"</style>\n",
143+
"<table border=\"1\" class=\"dataframe\">\n",
144+
" <thead>\n",
145+
" <tr style=\"text-align: right;\">\n",
146+
" <th></th>\n",
147+
" <th>global_key</th>\n",
148+
" <th>row_data</th>\n",
149+
" <th>split</th>\n",
150+
" </tr>\n",
151+
" </thead>\n",
152+
" <tbody>\n",
153+
" <tr>\n",
154+
" <th>0</th>\n",
155+
" <td>99aad74a-0ce1-41d4-b172-97abbe4ae8b2</td>\n",
156+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
157+
" <td>train</td>\n",
158+
" </tr>\n",
159+
" <tr>\n",
160+
" <th>1</th>\n",
161+
" <td>b47b1358-3d81-4384-920a-d4a08cbe7ffe</td>\n",
162+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
163+
" <td>train</td>\n",
164+
" </tr>\n",
165+
" <tr>\n",
166+
" <th>2</th>\n",
167+
" <td>2a75b633-2266-4ec0-8441-8e41374a04e2</td>\n",
168+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
169+
" <td>train</td>\n",
170+
" </tr>\n",
171+
" <tr>\n",
172+
" <th>3</th>\n",
173+
" <td>7c75170d-26c8-4a8b-8e74-3d1483d7719b</td>\n",
174+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
175+
" <td>train</td>\n",
176+
" </tr>\n",
177+
" <tr>\n",
178+
" <th>4</th>\n",
179+
" <td>53d6ecd1-cc80-42ef-9e11-2be9b7cf879d</td>\n",
180+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
181+
" <td>train</td>\n",
182+
" </tr>\n",
183+
" </tbody>\n",
184+
"</table>\n",
185+
"</div>"
186+
],
187+
"text/plain": [
188+
" global_key \\\n",
189+
"0 99aad74a-0ce1-41d4-b172-97abbe4ae8b2 \n",
190+
"1 b47b1358-3d81-4384-920a-d4a08cbe7ffe \n",
191+
"2 2a75b633-2266-4ec0-8441-8e41374a04e2 \n",
192+
"3 7c75170d-26c8-4a8b-8e74-3d1483d7719b \n",
193+
"4 53d6ecd1-cc80-42ef-9e11-2be9b7cf879d \n",
194+
"\n",
195+
" row_data split \n",
196+
"0 https://storage.googleapis.com/diagnostics-dem... train \n",
197+
"1 https://storage.googleapis.com/diagnostics-dem... train \n",
198+
"2 https://storage.googleapis.com/diagnostics-dem... train \n",
199+
"3 https://storage.googleapis.com/diagnostics-dem... train \n",
200+
"4 https://storage.googleapis.com/diagnostics-dem... train "
201+
]
202+
},
203+
"execution_count": 6,
204+
"metadata": {},
205+
"output_type": "execute_result"
206+
}
207+
],
208+
"source": [
209+
"## You can load in csv's into pandas with df = pd.read_csv(file_path_as_string)\n",
210+
"import pandas as pd\n",
211+
"\n",
212+
"df = pd.read_csv(StringIO(demo_csv))\n",
213+
"df.head()"
214+
]
215+
},
216+
{
217+
"cell_type": "markdown",
218+
"id": "8031341e-b42f-4838-9a8a-ad06271fdcdf",
219+
"metadata": {},
220+
"source": [
221+
"# Create a `metadata_index`\n",
222+
"\n",
223+
"* Your metadata_index is a dictionary where {key=`column_name` : value=`metadata_field_type`}\n",
224+
" * `column_name` must correspond to Labelbox metadata field names. Labelpandas uses these names to sync data.\n",
225+
" * `metadata_field_type` must be one of the following string values: **\"datetime\" \"enum\" \"string\" \"number\"**"
226+
]
227+
},
228+
{
229+
"cell_type": "code",
230+
"execution_count": 7,
231+
"id": "167ceb94-d3aa-40bb-a310-1158fd6d2e71",
232+
"metadata": {},
233+
"outputs": [],
234+
"source": [
235+
"metadata_index={ \n",
236+
" \"split\" : \"enum\"\n",
237+
"}"
238+
]
239+
},
240+
{
241+
"cell_type": "markdown",
242+
"id": "e7e2121e-a3db-4734-b034-77f365a1c20a",
243+
"metadata": {},
244+
"source": [
245+
"# Get or Create a Labelbox Dataset\n",
246+
"\n",
247+
"* Labelpandas will create data rows for you in existing datasets. If you don't have a dataset, create one."
248+
]
249+
},
250+
{
251+
"cell_type": "code",
252+
"execution_count": 11,
253+
"id": "7cb9f610-d083-416d-920b-df8cf6240ca6",
254+
"metadata": {},
255+
"outputs": [
256+
{
257+
"name": "stdout",
258+
"output_type": "stream",
259+
"text": [
260+
"Creating a Labelbox dataset with name Labelpandas Demo Dataset and the default delegated access integration setting\n",
261+
"Created a new dataset with ID clchxm7q011xs073n6kxe3otq\n"
262+
]
263+
}
264+
],
265+
"source": [
266+
"dataset_name = \"Labelpandas Demo Dataset\" # Desired or existing dataset name\n",
267+
"integration_name = \"DEFAULT\" # Desired delegated access integration name (ignore if using an existing dataset)\n",
268+
"\n",
269+
"dataset = client.base_client.get_or_create_dataset(name=dataset_name, integration=integration_name, verbose=True)"
270+
]
271+
},
272+
{
273+
"cell_type": "markdown",
274+
"id": "c6ad8272-1eb2-4ac3-ae44-072c40675646",
275+
"metadata": {},
276+
"source": [
277+
"# Upload Data Rows from CSV to Labelbox\n",
278+
"\n",
279+
"**`client.create_data_rows_from_table()`** has the following arguments:\n",
280+
"```\n",
281+
"df : Required (pandas.core.frame.DataFrame) - Pandas DataFrame \n",
282+
"lb_dataset : Required (labelbox.schema.dataset.Dataset) - Labelbox dataset to add data rows to \n",
283+
"row_data_col : Required (str) - Column containing asset URL or file path\n",
284+
"global_key_col : Optional (str) - Column name containing the data row global key - defaults to row data\n",
285+
"external_id_col : Optional (str) - Column name containing the data row external ID - defaults to global key\n",
286+
"metadata_index : Optional (dict) - Dictionary where {key=column_name : value=metadata_type}\n",
287+
"local_files : Optional (bool) - If True, will create urls for local files; if False, uploads `row_data_col` as urls\n",
288+
"skip_duplicates : Optional (bool) - If True, will skip duplicate global_keys, otherwise will generate a unique global_key with a suffix \n",
289+
"verbose : Optional (bool) - If True, prints information about code execution\n",
290+
"```\n",
291+
"This function will return a list of errors, if any"
292+
]
293+
},
294+
{
295+
"cell_type": "code",
296+
"execution_count": 9,
297+
"id": "849f87af-a809-44ca-b909-4bcb2ed4b74a",
298+
"metadata": {},
299+
"outputs": [
300+
{
301+
"name": "stdout",
302+
"output_type": "stream",
303+
"text": [
304+
"Valid metadata_index\n",
305+
"Creating upload list - 10 rows in Pandas DataFrame\n",
306+
"Generated upload list - 10 data rows to upload\n",
307+
"Beginning data row upload: uploading 10 data rows\n",
308+
"Batch #1: 10 data rows\n",
309+
"Success: upload batch number 1 complete\n",
310+
"Upload complete\n"
311+
]
312+
}
313+
],
314+
"source": [
315+
"upload_results = client.create_data_rows_from_table(\n",
316+
" df=df, \n",
317+
" lb_dataset=dataset, \n",
318+
" row_data_col=\"row_data\", \n",
319+
" global_key_col=\"global_key\", \n",
320+
" external_id_col=None, \n",
321+
" metadata_index=metadata_index,\n",
322+
" local_files=False,\n",
323+
" skip_duplicates=False,\n",
324+
" verbose=True)"
325+
]
326+
}
327+
],
328+
"metadata": {
329+
"kernelspec": {
330+
"display_name": "Python 3 (ipykernel)",
331+
"language": "python",
332+
"name": "python3"
333+
},
334+
"language_info": {
335+
"codemirror_mode": {
336+
"name": "ipython",
337+
"version": 3
338+
},
339+
"file_extension": ".py",
340+
"mimetype": "text/x-python",
341+
"name": "python",
342+
"nbconvert_exporter": "python",
343+
"pygments_lexer": "ipython3",
344+
"version": "3.8.12"
345+
}
346+
},
347+
"nbformat": 4,
348+
"nbformat_minor": 5
349+
}

0 commit comments

Comments
 (0)