Skip to content

[SN-147] Created quick start guide to geared to new users #1640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@
<td><a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/data_row_metadata.ipynb" target="_blank"><img src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="Open In Github"></a></td>
<td><a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/data_row_metadata.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td>
</tr>
<tr>
<td>Quick Start</td>
<td><a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/quick_start.ipynb" target="_blank"><img src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="Open In Github"></a></td>
<td><a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/quick_start.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td>
</tr>
<tr>
<td>Basics</td>
<td><a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/basics.ipynb" target="_blank"><img src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="Open In Github"></a></td>
Expand Down
195 changes: 195 additions & 0 deletions examples/basics/quick_start.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
{
Copy link
Contributor

@paulnoirel paulnoirel May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer shell commands (!) over magic commands (%).


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic installs it to the correct env my linter goes nuts over that actually lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@paulnoirel paulnoirel May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    

Don't use the default integration if not relevant

dataset = client.create_dataset(name="Quick Start Example Dataset", iam_integration=None)

Reply via ReviewNB

Copy link
Contributor

@paulnoirel paulnoirel May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Use export_v2() instead which is easier to grasp.


Reply via ReviewNB

Copy link
Collaborator Author

@Gabefire Gabefire May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Engineering has repeatedly said the long-term plan is to remove export_v2. (no exact plans yet) Because of this, I want to get people to use to Export. Fewer people on export_v2 == fewer people causing chaos once they make the decision. Also less confusing overall since export seems like V1 lol and its not. Moving docs to that direction right now will cause less work in the future

Copy link
Contributor

@paulnoirel paulnoirel May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor one: "our SDK" -> "the Labelbox SDK"


Reply via ReviewNB

Copy link
Contributor

@paulnoirel paulnoirel May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"install the labelbox library" -> "install the labelbox library"


Reply via ReviewNB

Copy link
Contributor

@paulnoirel paulnoirel May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as earlier.


Reply via ReviewNB

Copy link
Contributor

@paulnoirel paulnoirel May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the comment "specify the media type" since it doesn't add much.


Reply via ReviewNB

"nbformat": 4,
"nbformat_minor": 2,
"metadata": {},
"cells": [
{
"metadata": {},
"source": [
"<td>",
" <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>",
"</td>\n"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"<td>\n",
"<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/quick_start.ipynb\" target=\"_blank\"><img\n",
"src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
"</td>\n",
"\n",
"<td>\n",
"<a href=\"https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/quick_start.ipynb\" target=\"_blank\"><img\n",
"src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
"</td>"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"# Quick Start\n",
"\n",
"This notebook is intended to be a quick overview on Labelbox-Python SDK by demonstrating a simple but common workflow.\n",
"\n",
"In this guide, we will be:\n",
"\n",
"1. Creating a dataset and importing an image data row\n",
"2. Creating a ontology\n",
"3. Creating a project and attaching our ontology\n",
"4. Sending our data row to our project by creating a batch\n",
"5. Exporting our image data row from our project\n",
"\n",
"This notebook is geared towards new users of our SDK."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"## Setup\n",
"\n",
"We first need to install the labelbox library and then import the SDK module. It is recommended to install `\"labelbox[data]\"` over `labelbox` to obtain all the correct dependencies. We will also be importing the Python `uuid` library to generate universal unique IDs for the variety of objects that will be created with this notebook."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "%pip install -q \"labelbox[data]\"",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": "import labelbox as lb\nimport uuid",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## API Key and Client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Step 1: Create Dataset and Import Data Row\n",
"\n",
"Below, we will create a dataset and then attach a publicly hosted image data row. Typically, you would either import data rows hosted on a cloud provider (_recommended_) or import them locally. For more information, visit our [import image data section](https://docs.labelbox.com/reference/image) in our developer guides.\n",
"\n",
"- Data rows are internal representations of an asset in Labelbox. A data row contains the asset to be labeled and all of the relevant information about that asset\n",
"- A dataset is a collection of data rows imported into Labelbox. They live inside the [_Catalog_](https://docs.labelbox.com/docs/catalog-overview) section of Labelbox."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "# Create dataset from client\ndataset = client.create_dataset(name=\"Quick Start Example Dataset\")\n\nglobal_key = str(uuid.uuid4()) # Unique user specified ID\n\n# Data row structure\nimage_data_rows = [{\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n \"media_type\":\n \"IMAGE\",\n}]\n\n# Bulk import data row\ntask = dataset.create_data_rows(image_data_rows) # List of data rows\ntask.wait_till_done()\nprint(task.errors) # Print any errors",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Step 2: Creating an Ontology\n",
"\n",
"Before we send our data row to a labeling project we first must create an ontology. In the example below we will be creating a simple ontology with a bounding box tool and a checklist classification feature. For more information, visit the [ontology section](https://docs.labelbox.com/reference/ontology) inside our developer guides. \n",
"\n",
"* An ontology is a collection of annotations and their relationships (also known as a taxonomy). Ontologies can be reused across different projects. It is essential for data labeling, model training, and evaluation. Created ontologies with there associated features are located inside the _Schema_ section within Labelbox."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "# Bounding box feature\nobject_features = [\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"regulatory-sign\",\n color=\"#ff0000\",\n )\n]\n\n# Checklist feature\nclassification_features = [\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"Quality Issues\",\n options=[\n lb.Option(value=\"blurry\", label=\"Blurry\"),\n lb.Option(value=\"distorted\", label=\"Distorted\"),\n ],\n )\n]\n\n# Builder function\nontology_builder = lb.OntologyBuilder(tools=object_features,\n classifications=classification_features)\n\n# Create ontology\nontology = client.create_ontology(\n \"Ontology from new features\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Step 3: Creating a Project and Attaching our Ontology\n",
"\n",
"Now that we have made our ontology, we are ready to create a project where we can label our data row.\n",
"\n",
"* Projects are labeling environments in Labelbox similar to a factory assembly line for producing annotations. The initial state of the project can start with raw data, pre-existing ground truth, or pre-labeled data."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "# Create a new project\nproject = client.create_project(\n name=\"Quick Start Example Project\",\n media_type=lb.MediaType.Image, # specify the media type\n)\n\n# Attach created ontology\nproject.setup_editor(ontology)",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Step 4: Sending our Data Row to our Project by Creating a Batch\n",
"\n",
"With our project created, we can send our data rows by creating a batch. Our data rows will start in the initial labeling queue, where labelers are able to annotate our data row.\n",
"\n",
"* A batch is a curated selection of data rows you can send to a project for labeling. You can create a batch with a combination of data rows within any dataset. For more information on creating batches, review the [batches section](https://docs.labelbox.com/reference/batch#create-a-batch) of our developer guides."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "project.create_batch(\n name=\"Quick Start Example Batch\" + str(uuid.uuid4()),\n global_keys=[\n global_key\n ], # Global key we used earlier in this guide to create our dataset\n priority=5,\n)",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Step 5: Exporting from our Project\n",
"\n",
"We have now successfully set up a project for labeling using only the SDK! \ud83d\ude80 \n",
"\n",
"From here, you can either label our data row directly inside the [labeling queue](https://docs.labelbox.com/docs/labeling-queue) or [import annotations](https://docs.labelbox.com/reference/import-image-annotations) directly through our SDK. Below we will demonstrate the final step of this guide by exporting from our project. Since we did not label any data rows or import annotations within this guide, no labels will be presented on our data row. For a full overview of exporting, visit our [export overview](https://docs.labelbox.com/reference/label-export) developer guide."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "# Start export from project\nexport_task = project.export()\nexport_task.wait_till_done()\n\n# Conditional if task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\n# Start export stream\nstream = export_task.get_buffered_stream()\n\n# Iterate through data rows\nfor data_row in stream:\n print(data_row.json)",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Clean Up\n",
"\n",
"This section serves as an optional clean-up step to delete the Labelbox assets created within this guide. You will need to uncomment the delete methods shown."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "# project.delete()\n# client.delete_unused_ontology(ontology.uid)\n# dataset.delete()",
"cell_type": "code",
"outputs": [],
"execution_count": null
}
]
}
2 changes: 1 addition & 1 deletion examples/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ dev-dependencies = [
"black[jupyter]>=24.4.2",
"databooks>=1.3.10",
# higher versions dont support python 3.8
"pandas>=1.5.3",
"pandas>=2.0.3",
]

[tool.rye.scripts]
Expand Down