Skip to content
This repository was archived by the owner on Jul 29, 2024. It is now read-only.

Commit b9a123c

Browse files
Add files via upload
1 parent c5185bf commit b9a123c

File tree

1 file changed

+345
-0
lines changed

1 file changed

+345
-0
lines changed

create_data_rows_example.ipynb

Lines changed: 345 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,345 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "f0948f3e-77f5-471f-8062-eb00328636bd",
6+
"metadata": {},
7+
"source": [
8+
"<td>\n",
9+
" <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=190/></a>\n",
10+
"</td>"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "e34e7399-7bdc-49b9-9173-c15de66d8ddb",
16+
"metadata": {},
17+
"source": [
18+
"# Labelpandas - The Labelbox <> Pandas Connector\n",
19+
"***Instantly Load CSVs (and other Tables) into Labelbox***\n",
20+
"_______________________________________\n",
21+
"This notebook is used to go over the basic use of the Labelpandas Python SDK. \n",
22+
"\n",
23+
"**Pandas** is a Python library that helps in loading and manipulating CSVs and tabular data more efficiently. It is one of the most widely used Python libraries in the world.\n",
24+
"\n",
25+
"**Labelpandas** incorporates both Labelbox and Pandas to make uploading CSVs and tabular data to Labelbox straightforward. It can handle both local file assets as well as cloud-hosted assets. "
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": 1,
31+
"id": "5c4bb8a6-44d4-4076-9728-f85bad8aeaba",
32+
"metadata": {},
33+
"outputs": [],
34+
"source": [
35+
"!pip install labelpandas --upgrade -q"
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": 2,
41+
"id": "7ef12ac7-5c85-497a-987c-1e41af2dd715",
42+
"metadata": {},
43+
"outputs": [],
44+
"source": [
45+
"import labelpandas as lbpd"
46+
]
47+
},
48+
{
49+
"cell_type": "markdown",
50+
"id": "1dfe1eae-e28f-4928-9058-796d097f38cc",
51+
"metadata": {},
52+
"source": [
53+
"# Set up Labelpandas Client"
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": 3,
59+
"id": "de523a2e-484a-4e16-b976-f2c4edd7efe3",
60+
"metadata": {},
61+
"outputs": [],
62+
"source": [
63+
"labelbox_api_key = \"\""
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": 4,
69+
"id": "5faabd84-8ae0-4b5b-8e0e-1a807317b0a5",
70+
"metadata": {},
71+
"outputs": [],
72+
"source": [
73+
"client = lbpd.Client(lb_api_key=labelbox_api_key)"
74+
]
75+
},
76+
{
77+
"cell_type": "markdown",
78+
"id": "b2255bcf-3df6-43c7-ab78-262df2798e64",
79+
"metadata": {},
80+
"source": [
81+
"# Load CSV\n",
82+
"\n",
83+
"To upload data rows from a csv, your csv **must** have the following:\n",
84+
"\n",
85+
"- Column consisting of your **row data** as a string value - this pertains to either your asset URL (pointing to cloud storage) or a local file path\n",
86+
" \n",
87+
"- Column consisting of your **global key** as a string value - this is an externally facing ID that must be unique (Labelbox enforces it)\n",
88+
" - If you attempt to upload a data row with an existing global key, it will either auto-generate a suffix such as \"_1\" or it will skip it entirely\n",
89+
" \n",
90+
"**To upload data rows with metadta, your csv must have one column per metadata field name**. Labelpandas matches the column names to Labelbox metadata names when uploading metadata."
91+
]
92+
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": 5,
96+
"id": "83fec8c5-7456-4dd3-a5e1-b0605db3ff35",
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"from io import StringIO\n",
101+
"import uuid\n",
102+
"\n",
103+
"demo_csv = f\"\"\"global_key,row_data,split\n",
104+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000569539.jpg,train\n",
105+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000288451.jpg,train\n",
106+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000240902.jpg,train\n",
107+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000428116.jpg,train\n",
108+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000459566.jpg,train\n",
109+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000442982.jpg,train\n",
110+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000569538.jpg,valid\n",
111+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000022415.jpg,valid\n",
112+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_val2014_000000146981.jpg,test\n",
113+
"{str(uuid.uuid4())},https://storage.googleapis.com/diagnostics-demo-data/coco/COCO_train2014_000000173046.jpg,test\"\"\""
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": 6,
119+
"id": "fb03e1e3-4682-4738-8e87-974c4acb9a8c",
120+
"metadata": {},
121+
"outputs": [
122+
{
123+
"data": {
124+
"text/html": [
125+
"<div>\n",
126+
"<style scoped>\n",
127+
" .dataframe tbody tr th:only-of-type {\n",
128+
" vertical-align: middle;\n",
129+
" }\n",
130+
"\n",
131+
" .dataframe tbody tr th {\n",
132+
" vertical-align: top;\n",
133+
" }\n",
134+
"\n",
135+
" .dataframe thead th {\n",
136+
" text-align: right;\n",
137+
" }\n",
138+
"</style>\n",
139+
"<table border=\"1\" class=\"dataframe\">\n",
140+
" <thead>\n",
141+
" <tr style=\"text-align: right;\">\n",
142+
" <th></th>\n",
143+
" <th>global_key</th>\n",
144+
" <th>row_data</th>\n",
145+
" <th>split</th>\n",
146+
" </tr>\n",
147+
" </thead>\n",
148+
" <tbody>\n",
149+
" <tr>\n",
150+
" <th>0</th>\n",
151+
" <td>99aad74a-0ce1-41d4-b172-97abbe4ae8b2</td>\n",
152+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
153+
" <td>train</td>\n",
154+
" </tr>\n",
155+
" <tr>\n",
156+
" <th>1</th>\n",
157+
" <td>b47b1358-3d81-4384-920a-d4a08cbe7ffe</td>\n",
158+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
159+
" <td>train</td>\n",
160+
" </tr>\n",
161+
" <tr>\n",
162+
" <th>2</th>\n",
163+
" <td>2a75b633-2266-4ec0-8441-8e41374a04e2</td>\n",
164+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
165+
" <td>train</td>\n",
166+
" </tr>\n",
167+
" <tr>\n",
168+
" <th>3</th>\n",
169+
" <td>7c75170d-26c8-4a8b-8e74-3d1483d7719b</td>\n",
170+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
171+
" <td>train</td>\n",
172+
" </tr>\n",
173+
" <tr>\n",
174+
" <th>4</th>\n",
175+
" <td>53d6ecd1-cc80-42ef-9e11-2be9b7cf879d</td>\n",
176+
" <td>https://storage.googleapis.com/diagnostics-dem...</td>\n",
177+
" <td>train</td>\n",
178+
" </tr>\n",
179+
" </tbody>\n",
180+
"</table>\n",
181+
"</div>"
182+
],
183+
"text/plain": [
184+
" global_key \\\n",
185+
"0 99aad74a-0ce1-41d4-b172-97abbe4ae8b2 \n",
186+
"1 b47b1358-3d81-4384-920a-d4a08cbe7ffe \n",
187+
"2 2a75b633-2266-4ec0-8441-8e41374a04e2 \n",
188+
"3 7c75170d-26c8-4a8b-8e74-3d1483d7719b \n",
189+
"4 53d6ecd1-cc80-42ef-9e11-2be9b7cf879d \n",
190+
"\n",
191+
" row_data split \n",
192+
"0 https://storage.googleapis.com/diagnostics-dem... train \n",
193+
"1 https://storage.googleapis.com/diagnostics-dem... train \n",
194+
"2 https://storage.googleapis.com/diagnostics-dem... train \n",
195+
"3 https://storage.googleapis.com/diagnostics-dem... train \n",
196+
"4 https://storage.googleapis.com/diagnostics-dem... train "
197+
]
198+
},
199+
"execution_count": 6,
200+
"metadata": {},
201+
"output_type": "execute_result"
202+
}
203+
],
204+
"source": [
205+
"## You can load in csv's into pandas with df = pd.read_csv(file_path_as_string)\n",
206+
"import pandas as pd\n",
207+
"\n",
208+
"df = pd.read_csv(StringIO(demo_csv))\n",
209+
"df.head()"
210+
]
211+
},
212+
{
213+
"cell_type": "markdown",
214+
"id": "8031341e-b42f-4838-9a8a-ad06271fdcdf",
215+
"metadata": {},
216+
"source": [
217+
"# Create a `metadata_index`\n",
218+
"\n",
219+
"* Your metadata_index is a dictionary where {key=`column_name` : value=`metadata_field_type`}\n",
220+
" * `column_name` must correspond to Labelbox metadata field names. Labelpandas uses these names to sync data.\n",
221+
" * `metadata_field_type` must be one of the following string values: **\"datetime\" \"enum\" \"string\" \"number\"**"
222+
]
223+
},
224+
{
225+
"cell_type": "code",
226+
"execution_count": 7,
227+
"id": "167ceb94-d3aa-40bb-a310-1158fd6d2e71",
228+
"metadata": {},
229+
"outputs": [],
230+
"source": [
231+
"metadata_index={ \n",
232+
" \"split\" : \"enum\"\n",
233+
"}"
234+
]
235+
},
236+
{
237+
"cell_type": "markdown",
238+
"id": "e7e2121e-a3db-4734-b034-77f365a1c20a",
239+
"metadata": {},
240+
"source": [
241+
"# Get or Create a Labelbox Dataset\n",
242+
"\n",
243+
"* Labelpandas will create data rows for you in existing datasets. If you don't have a dataset, create one."
244+
]
245+
},
246+
{
247+
"cell_type": "code",
248+
"execution_count": 11,
249+
"id": "7cb9f610-d083-416d-920b-df8cf6240ca6",
250+
"metadata": {},
251+
"outputs": [
252+
{
253+
"name": "stdout",
254+
"output_type": "stream",
255+
"text": [
256+
"Creating a Labelbox dataset with name Labelpandas Demo Dataset and the default delegated access integration setting\n",
257+
"Created a new dataset with ID clchxm7q011xs073n6kxe3otq\n"
258+
]
259+
}
260+
],
261+
"source": [
262+
"dataset_name = \"Labelpandas Demo Dataset\" # Desired or existing dataset name\n",
263+
"integration_name = \"DEFAULT\" # Desired delegated access integration name (ignore if using an existing dataset)\n",
264+
"\n",
265+
"dataset = client.base_client.get_or_create_dataset(name=dataset_name, integration=integration_name, verbose=True)"
266+
]
267+
},
268+
{
269+
"cell_type": "markdown",
270+
"id": "c6ad8272-1eb2-4ac3-ae44-072c40675646",
271+
"metadata": {},
272+
"source": [
273+
"# Upload Data Rows from CSV to Labelbox\n",
274+
"\n",
275+
"**`client.create_data_rows_from_table()`** has the following arguments:\n",
276+
"```\n",
277+
"df : Required (pandas.core.frame.DataFrame) - Pandas DataFrame \n",
278+
"lb_dataset : Required (labelbox.schema.dataset.Dataset) - Labelbox dataset to add data rows to \n",
279+
"row_data_col : Required (str) - Column containing asset URL or file path\n",
280+
"global_key_col : Optional (str) - Column name containing the data row global key - defaults to row data\n",
281+
"external_id_col : Optional (str) - Column name containing the data row external ID - defaults to global key\n",
282+
"metadata_index : Optional (dict) - Dictionary where {key=column_name : value=metadata_type}\n",
283+
"local_files : Optional (bool) - If True, will create urls for local files; if False, uploads `row_data_col` as urls\n",
284+
"skip_duplicates : Optional (bool) - If True, will skip duplicate global_keys, otherwise will generate a unique global_key with a suffix \n",
285+
"verbose : Optional (bool) - If True, prints information about code execution\n",
286+
"```\n",
287+
"This function will return a list of errors, if any"
288+
]
289+
},
290+
{
291+
"cell_type": "code",
292+
"execution_count": 9,
293+
"id": "849f87af-a809-44ca-b909-4bcb2ed4b74a",
294+
"metadata": {},
295+
"outputs": [
296+
{
297+
"name": "stdout",
298+
"output_type": "stream",
299+
"text": [
300+
"Valid metadata_index\n",
301+
"Creating upload list - 10 rows in Pandas DataFrame\n",
302+
"Generated upload list - 10 data rows to upload\n",
303+
"Beginning data row upload: uploading 10 data rows\n",
304+
"Batch #1: 10 data rows\n",
305+
"Success: upload batch number 1 complete\n",
306+
"Upload complete\n"
307+
]
308+
}
309+
],
310+
"source": [
311+
"upload_results = client.create_data_rows_from_table(\n",
312+
" df=df, \n",
313+
" lb_dataset=dataset, \n",
314+
" row_data_col=\"row_data\", \n",
315+
" global_key_col=\"global_key\", \n",
316+
" external_id_col=None, \n",
317+
" metadata_index=metadata_index,\n",
318+
" local_files=False,\n",
319+
" skip_duplicates=False,\n",
320+
" verbose=True)"
321+
]
322+
}
323+
],
324+
"metadata": {
325+
"kernelspec": {
326+
"display_name": "Python 3 (ipykernel)",
327+
"language": "python",
328+
"name": "python3"
329+
},
330+
"language_info": {
331+
"codemirror_mode": {
332+
"name": "ipython",
333+
"version": 3
334+
},
335+
"file_extension": ".py",
336+
"mimetype": "text/x-python",
337+
"name": "python",
338+
"nbconvert_exporter": "python",
339+
"pygments_lexer": "ipython3",
340+
"version": "3.8.12"
341+
}
342+
},
343+
"nbformat": 4,
344+
"nbformat_minor": 5
345+
}

0 commit comments

Comments
 (0)