Date: Wed, 5 Jun 2024 17:34:57 +0000
Subject: [PATCH 05/19] :memo: README updated
---
examples/README.md | 142 ++++++++++++++++++++++++---------------------
1 file changed, 76 insertions(+), 66 deletions(-)
diff --git a/examples/README.md b/examples/README.md
index fdf1907e7..c60c98904 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -17,29 +17,29 @@
- Ontologies |
-  |
-  |
-
-
- Data Rows |
-  |
-  |
+ Basics |
+  |
+  |
Batches |
 |
 |
+
+ Custom Embeddings |
+  |
+  |
+
Projects |
 |
 |
- Custom Embeddings |
-  |
-  |
+ User Management |
+  |
+  |
Data Row Metadata |
@@ -52,14 +52,14 @@
 |
- Basics |
-  |
-  |
+ Ontologies |
+  |
+  |
- User Management |
-  |
-  |
+ Data Rows |
+  |
+  |
@@ -75,6 +75,11 @@
+
+ Exporting to Csv |
+  |
+  |
+
Composite Mask Export |
 |
@@ -114,6 +119,11 @@
 |
 |
+
+ Live Multimodal Chat Project |
+  |
+  |
+
Queue Management |
 |
@@ -134,9 +144,9 @@
- Conversational LLM Data Generation |
-  |
-  |
+ Audio |
+  |
+  |
Video |
@@ -149,9 +159,9 @@
 |
- Audio |
-  |
-  |
+ Tiled |
+  |
+  |
Conversational |
@@ -164,9 +174,9 @@
 |
- Image |
-  |
-  |
+ Conversational LLM Data Generation |
+  |
+  |
DICOM |
@@ -174,9 +184,9 @@
 |
- Conversational LLM |
-  |
-  |
+ Image |
+  |
+  |
HTML |
@@ -184,9 +194,9 @@
 |
- Tiled |
-  |
-  |
+ Conversational LLM |
+  |
+  |
@@ -203,14 +213,9 @@
- Huggingface Custom Embeddings |
-  |
-  |
-
-
- Langchain |
-  |
-  |
+ Meta SAM |
+  |
+  |
Meta SAM Video |
@@ -218,9 +223,14 @@
 |
- Meta SAM |
-  |
-  |
+ Langchain |
+  |
+  |
+
+
+ Huggingface Custom Embeddings |
+  |
+  |
@@ -236,11 +246,6 @@
-
- Custom Metrics Demo |
-  |
-  |
-
Model Slices |
 |
@@ -251,6 +256,11 @@
 |
 |
+
+ Custom Metrics Demo |
+  |
+  |
+
Model Predictions to Project |
 |
@@ -270,46 +280,46 @@
-
- PDF Predictions |
-  |
-  |
-
-
- HTML Predictions |
-  |
-  |
-
Conversational Predictions |
 |
 |
-
- Image Predictions |
-  |
-  |
-
Text Predictions |
 |
 |
- Geospatial Predictions |
-  |
-  |
+ HTML Predictions |
+  |
+  |
Conversational LLM Predictions |
 |
 |
+
+ Geospatial Predictions |
+  |
+  |
+
+
+ PDF Predictions |
+  |
+  |
+
Video Predictions |
 |
 |
+
+ Image Predictions |
+  |
+  |
+
From 8ece1ef5d80e3aae67cad2d6c779142a1944d34a Mon Sep 17 00:00:00 2001
From: Gabe <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 12:35:34 -0500
Subject: [PATCH 06/19] Update exporting_to_CSV.ipynb
---
examples/exports/exporting_to_CSV.ipynb | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/examples/exports/exporting_to_CSV.ipynb b/examples/exports/exporting_to_CSV.ipynb
index 4ddc2c8aa..1150be9fa 100644
--- a/examples/exports/exporting_to_CSV.ipynb
+++ b/examples/exports/exporting_to_CSV.ipynb
@@ -78,7 +78,7 @@
},
{
"metadata": {},
- "source": "API_KEY = \"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJjbHAxZ2w2OGIwMDBkMDc3eGRrcnI5azhmIiwib3JnYW5pemF0aW9uSWQiOiJjbDVibjhxdnExYXY5MDd4dGIzYnA4cTYwIiwiYXBpS2V5SWQiOiJjbHgyMDk4bmYwM3Q1MDd6MjBueXI5N2x2Iiwic2VjcmV0IjoiYzhhYzlkZDJmZTgyZTcwNmM4MzQ4MzZhNDIwMWVjZjEiLCJpYXQiOjE3MTc2MDI2MjYsImV4cCI6MjM0ODc1NDYyNn0.JxwtcWEOjS_sush7FhH2m7KZylPWUOYPcxO3z8Hg058\"\nclient = lb.Client(api_key=API_KEY)",
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
"outputs": [],
"execution_count": null
@@ -363,4 +363,4 @@
"execution_count": null
}
]
-}
\ No newline at end of file
+}
From 0603a0aea131f7b82125c844eb753ac6d4c7bad2 Mon Sep 17 00:00:00 2001
From: Gabe <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 12:44:04 -0500
Subject: [PATCH 07/19] Typos
---
examples/exports/exporting_to_CSV.ipynb | 46 ++++++++++++-------------
1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/examples/exports/exporting_to_CSV.ipynb b/examples/exports/exporting_to_CSV.ipynb
index 1150be9fa..8d19ca8ca 100644
--- a/examples/exports/exporting_to_CSV.ipynb
+++ b/examples/exports/exporting_to_CSV.ipynb
@@ -41,9 +41,9 @@
"source": [
"## Advance approach\n",
"\n",
- "For a more abstracted approach please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps that are about to be shown. In addition, this library support importing CSV data. \n",
+ "For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
- "We strongly encourage collaboration - please free to fork this repo and tweak the code base to work for you own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
+ "We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
],
"cell_type": "markdown"
},
@@ -88,7 +88,7 @@
"source": [
"## Create or select example project\n",
"\n",
- "The below steps will setup a project that can be used for this demo. Please feel free to comment out the below code block and uncomment the code block to receive your own project directly. For more information on this set up visit our quick start guide."
+ "The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our quick start guide."
],
"cell_type": "markdown"
},
@@ -136,7 +136,7 @@
"]\n",
"```\n",
"\n",
- "Essentially, we need to get our JSON data towards a list of Python dictionaries with each Python dictionary representing one row, each key representing a column and each value being an individual cell of our CSV table. Once we have our data to this format it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox export JSON towards this format."
+ "Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
],
"cell_type": "markdown"
},
@@ -145,9 +145,9 @@
"source": [
"## Labelbox JSON format\n",
"\n",
- "Labelbox JSON format is centralized at the individual data row of your export. This format allows expandability when things evolve and provides a centralized view of fields such as metadata or data row details. The main labels are located inside the projects key and can be nested which can make it difficult to parse. For complete samples of our project export format visit our export quick reference page. \n",
+ "Labelbox JSON format is centralized at the individual data row of your export. This format allows expandability when things evolve and provides a centralized view of fields such as metadata or data row details. The main labels are located inside the project key and can be nested, which can make it difficult to parse. For complete samples of our project export format visit our export quick reference page. \n",
"\n",
- "To get Labelbox export JSON format to our CSV format we established, we must do the following:\n",
+ "To get Labelbox export JSON format to our CSV format, we established, we must do the following:\n",
"\n",
"1. Establish our base data row columns (project_id, data_row_id, global_key etc)\n",
"2. Create our columns for label fields (label detail and annotations we care about)\n",
@@ -163,7 +163,7 @@
"source": [
"## Step 1: Establish our base columns\n",
"\n",
- "We first establish our base columns that represent individual data row details. Typically, this columns information can be received from within one or two levels of a Labelbox export per data row. \n",
+ "We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
],
@@ -181,7 +181,7 @@
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
- "In this step, we define our label details base columns we want to include in our CSV. In this case, we will use the following:"
+ "In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
],
"cell_type": "markdown"
},
@@ -195,7 +195,7 @@
{
"metadata": {},
"source": [
- "We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order of what our columns will be presented. You can approach getting the annotations in a list in a number of ways including hard defining the columns. We will be doing a mapping between `feature_schema_Id` and the our column name. The reason introduce this mapping is the annotation name can be the same in certain situations but `feature_schema_ids` are completely unique. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the name of the features. In the next step of this guide we will provide more information on recursion in context of parsing through JSON or Python dictionaries."
+ "We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_Id` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
],
"cell_type": "markdown"
},
@@ -218,7 +218,7 @@
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
- "Now that we have our columns defined we need to come up with a strategy of navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns it is always best to first check if a key exists in your data row before populating a column this is especially import for optional fields. In this demo, we will populate the value `None` for anything not present which will result in a blank cell our our CSV.\n"
+ "Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
],
"cell_type": "markdown"
},
@@ -226,7 +226,7 @@
"metadata": {},
"source": [
"### Data row detail base columns\n",
- "The data row details can be access within a depth of one or two keys. Below is a function we will use the access the columns we defined. The parameters are the data row itself, dictionary row that will be used to make our list and our base columns list."
+ "The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
],
"cell_type": "markdown"
},
@@ -241,7 +241,7 @@
"metadata": {},
"source": [
"### Label detail base columns\n",
- "The label details are similar to data row details but they exist at a label level of our export. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying and the label detail column list we created."
+ "The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
],
"cell_type": "markdown"
},
@@ -256,21 +256,21 @@
"metadata": {},
"source": [
"### Label annotation columns\n",
- "The label annotations are the final columns we will need to obtain. The approach of obtaining these fields are more challenging then the approach we made for our detail columns. If we attempt to obtain the fields with conditional statements and hard defined paths we will run into issues as each label can have annotations in different orders, annotations can be at different depths or not present at all. This will quickly create a mess especially when we want our methods to work for more then one ontology. The best and cleanest way of obtaining these annotations inside our export data is through recursive function.\n",
+ "The label annotations are the final columns we will need to obtain. The approach to obtaining these fields is more challenging than the approach we made for our detail columns. Suppose we attempt to obtain the fields with conditional statements and hard-defined paths. In that case, we will run into issues as each label can have annotations in different orders, at different depths, or not present at all. This will quickly create a mess, especially when we want our methods to work for more than one ontology. The best and cleanest way of obtaining these annotations inside our export data is through a recursive function.\n",
"\n",
"#### Recursion\n",
- "A recursive function can be defined as a routine that calls itself directly or indirectly. They solve a problems by solving smaller instances of the same problem. This technique is commonly used in programming to solve problems that can be broken down into simpler, similar subproblem. Our sub problem in this case is obtaining the each individual annotations. A recursive function is divided into two components:\n",
+ "A recursive function can be defined as a routine that calls itself directly or indirectly. They solve problems by solving smaller instances of the same problem. This technique is commonly used in programming to solve problems that can be broken down into simpler, similar subproblems. Our sub-problem, in this case, is obtaining each individual annotation. A recursive function is divided into two components:\n",
"\n",
"- **Base case:** This is a termination condition that prevents the function from calling itself indefinitely.\n",
"\n",
- "- **Recursive case:** In the recursive case, the function calls itself with the modified arguments. The recursive case should move closer to the base case with each iteration.\n",
+ "- **Recursive case:** The function calls itself with the modified arguments in the recursive case. The recursive case should move closer to the base case with each iteration.\n",
"\n",
- "For our example, our base case will be either the annotation exists on the label (return the value/answer) or it does not (return `None`). Our recursive case would be finding more classifications to parse.\n",
+ "For our example, our base case will be either the annotation exists on the label (return the value/answer), or it does not (return `None`). Our recursive case would be finding more classifications to parse.\n",
"\n",
- "In the below code block, I will highlight a few important details inside our function. Essentially, we will be navigating through our JSON file by moving one classification key at a time until we find our annotation or if everything has been searched returning `None` which will populate a blank cell on our CSV table. \n",
+ "In the below code block, I will highlight a few important details inside our function. Essentially, we will be navigating through our JSON file by moving one classification key at a time until we find our annotation or, if everything has been searched, returning `None`, which will populate a blank cell on our CSV table. \n",
"\n",
"#### Tools\n",
- "Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tool are at the base level of a label and each tool has a different value key name we will only be searching for bounding boxes for this tutorial. If you wanted to include other tools reference our export guide for your data type and find the appropriate key to add on."
+ "Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
],
"cell_type": "markdown"
},
@@ -285,7 +285,7 @@
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
- "Before we can start exporting we need to set up our main data row handler. This function will be feed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
+ "Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
],
"cell_type": "markdown"
},
@@ -300,7 +300,7 @@
"metadata": {},
"source": [
"## Step 5: Export our data\n",
- "Now that we have defined functions and strategies, we are ready to export. Below we are exporting directly from our project and feeding in our main function we created above."
+ "Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
],
"cell_type": "markdown"
},
@@ -314,7 +314,7 @@
{
"metadata": {},
"source": [
- "If everything went through correctly you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
+ "If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
],
"cell_type": "markdown"
},
@@ -328,9 +328,9 @@
{
"metadata": {},
"source": [
- "## Step 6: Convert to desired format\n",
+ "## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!\ud83d\ude80 Now that your have your export in a flatten format you can now easily convert to a CSV or a Pandas DataFrame!"
+ "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
],
"cell_type": "markdown"
},
From 1f217f1fdfb1bb5212788f88f47e62c12bfafa52 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 12:53:03 -0500
Subject: [PATCH 08/19] added links
---
examples/exports/exporting_to_CSV.ipynb | 628 ++++++++++++++++++++----
1 file changed, 520 insertions(+), 108 deletions(-)
diff --git a/examples/exports/exporting_to_CSV.ipynb b/examples/exports/exporting_to_CSV.ipynb
index 8d19ca8ca..4be20602f 100644
--- a/examples/exports/exporting_to_CSV.ipynb
+++ b/examples/exports/exporting_to_CSV.ipynb
@@ -1,18 +1,16 @@
{
- "nbformat": 4,
- "nbformat_minor": 2,
- "metadata": {},
"cells": [
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "",
- " ",
+ " | \n",
+ " \n",
" | \n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -24,19 +22,19 @@
" \n",
" | "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
"This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://github.com/Labelbox/labelpandas) friendly format. "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -44,83 +42,267 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "%pip install -q --upgrade \"Labelbox[data]\"\n",
+ "%pip install -q pandas"
+ ]
},
{
- "metadata": {},
- "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "import labelbox as lb\n",
+ "import labelbox.types as lb_types\n",
+ "import uuid\n",
+ "from pprint import pprint\n",
+ "import csv\n",
+ "import pandas as pd"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "API_KEY = None\n",
+ "client = lb.Client(api_key=API_KEY)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
- "The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our quick start guide."
- ],
- "cell_type": "markdown"
+ "The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Create dataset with image data row\n",
+ "global_key = str(uuid.uuid4())\n",
+ "\n",
+ "test_img_url = {\n",
+ " \"row_data\":\n",
+ " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
+ " \"global_key\":\n",
+ " global_key,\n",
+ "}\n",
+ "\n",
+ "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
+ "task = dataset.create_data_rows([test_img_url])\n",
+ "task.wait_till_done()\n",
+ "print(\"Errors:\", task.errors)\n",
+ "print(\"Failed data rows:\", task.failed_data_rows)\n",
+ "\n",
+ "# Create ontology\n",
+ "ontology_builder = lb.OntologyBuilder(\n",
+ " classifications=[ # List of Classification objects\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"radio_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_radio_answer\"),\n",
+ " lb.Option(value=\"second_radio_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.CHECKLIST,\n",
+ " name=\"checklist_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_checklist_answer\"),\n",
+ " lb.Option(value=\"second_checklist_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
+ " name=\"free_text\"),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"nested_radio_question\",\n",
+ " options=[\n",
+ " lb.Option(\n",
+ " \"first_radio_answer\",\n",
+ " options=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(\"first_sub_radio_answer\")],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ " tools=[ # List of Tool objects\n",
+ " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
+ " lb.Tool(\n",
+ " tool=lb.Tool.Type.BBOX,\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " classifications=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
+ " ),\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "ontology = client.create_ontology(\n",
+ " \"Image CSV Demo Ontology\",\n",
+ " ontology_builder.asdict(),\n",
+ " media_type=lb.MediaType.Image,\n",
+ ")\n",
+ "\n",
+ "# Set up project and connect ontology\n",
+ "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
+ " media_type=lb.MediaType.Image)\n",
+ "project.setup_editor(ontology)\n",
+ "\n",
+ "# Send data row towards our project\n",
+ "batch = project.create_batch(\n",
+ " \"image-demo-batch\",\n",
+ " global_keys=[\n",
+ " global_key\n",
+ " ], # paginated collection of data row objects, list of data row ids or global keys\n",
+ " priority=1,\n",
+ ")\n",
+ "\n",
+ "print(f\"Batch: {batch}\")\n",
+ "\n",
+ "# Create a label and imported it towards our project\n",
+ "radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"second_radio_answer\")),\n",
+ ")\n",
+ "checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
+ " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
+ " ]),\n",
+ ")\n",
+ "text_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"free_text\",\n",
+ " value=lb_types.Text(answer=\"sample text\"),\n",
+ ")\n",
+ "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ " )),\n",
+ ")\n",
+ "bbox_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bounding_box\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=1690, y=977),\n",
+ " end=lb_types.Point(x=1915, y=1307),\n",
+ " ),\n",
+ ")\n",
+ "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
+ " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
+ " ),\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"tool_first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "label = []\n",
+ "annotations = [\n",
+ " radio_annotation,\n",
+ " nested_radio_annotation,\n",
+ " checklist_annotation,\n",
+ " text_annotation,\n",
+ " bbox_annotation,\n",
+ " bbox_with_radio_subclass_annotation,\n",
+ "]\n",
+ "\n",
+ "label.append(\n",
+ " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
+ "\n",
+ "upload_job = lb.LabelImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=project.uid,\n",
+ " name=\"label_import_job\" + str(uuid.uuid4()),\n",
+ " labels=label,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# PROJECT_ID = None\n",
+ "# project = client.get_project(PROJECT_ID)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -137,15 +319,15 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
"\n",
- "Labelbox JSON format is centralized at the individual data row of your export. This format allows expandability when things evolve and provides a centralized view of fields such as metadata or data row details. The main labels are located inside the project key and can be nested, which can make it difficult to parse. For complete samples of our project export format visit our export quick reference page. \n",
+ "Labelbox JSON format is centralized at the individual data row of your export. This format allows expandability when things evolve and provides a centralized view of fields such as metadata or data row details. The main labels are located inside the project key and can be nested, which can make it difficult to parse. For complete samples of our project export format visit our [export overview](https://docs.labelbox.com/reference/label-export) page. \n",
"\n",
"To get Labelbox export JSON format to our CSV format, we established, we must do the following:\n",
"\n",
@@ -155,10 +337,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -166,93 +348,191 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "data_row_base_columns = [\n",
+ " \"Data Row ID\",\n",
+ " \"Global Key\",\n",
+ " \"External ID\",\n",
+ " \"Project ID\",\n",
+ "]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_Id` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
+ " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
+ " for classification in classifications:\n",
+ " if \"name\" in classification:\n",
+ " class_list.append({\n",
+ " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
+ " \"column_name\": classification[\"instructions\"],\n",
+ " })\n",
+ " if \"options\" in classification:\n",
+ " get_classification_features(classification[\"options\"], class_list)\n",
+ " return class_list\n",
+ "\n",
+ "\n",
+ "def get_tool_features(tools: list) -> None:\n",
+ " \"\"\"Creates list of tool names from ontology\"\"\"\n",
+ " tool_list = []\n",
+ " for tool in tools:\n",
+ " tool_list.append({\n",
+ " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
+ " \"column_name\": tool[\"name\"],\n",
+ " })\n",
+ " if \"classifications\" in tool:\n",
+ " tool_list = get_classification_features(tool[\"classifications\"],\n",
+ " tool_list)\n",
+ " return tool_list"
+ ]
},
{
- "metadata": {},
- "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])\n\npprint(class_annotation_columns)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Get ontology from project and normalized towards python dictionary\n",
+ "ontology = project.ontology().normalized\n",
+ "\n",
+ "class_annotation_columns = get_classification_features(\n",
+ " ontology[\"classifications\"])\n",
+ "tool_annotation_columns = get_tool_features(ontology[\"tools\"])\n",
+ "\n",
+ "pprint(class_annotation_columns)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
+ " base_columns: list[str]) -> dict[str:str]:\n",
+ " for base_column in base_columns:\n",
+ " if base_column == \"Data Row ID\":\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
+ "\n",
+ " elif base_column == \"Global Key\":\n",
+ " if (\"global_key\"\n",
+ " in data_row[\"data_row\"]): # Check if global key exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"External ID\":\n",
+ " if (\"external_id\"\n",
+ " in data_row[\"data_row\"]): # Check if external_id exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"Project ID\":\n",
+ " csv_row[base_column] = project.uid\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
+ " label_base_columns: list[str]) -> dict[str:str]:\n",
+ " for label_base_column in label_base_columns:\n",
+ " if label_base_column == \"Label ID\":\n",
+ " csv_row[label_base_column] = label[\"id\"]\n",
+ "\n",
+ " elif label_base_columns == \"Created By\":\n",
+ " if (\n",
+ " \"label_details\" in label\n",
+ " ): # Check if label details is present. This field can be omitted in export\n",
+ " csv_row[label_base_column] = label_base_columns[\n",
+ " \"label_details\"][\"created_by\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " elif label_base_column == \"Skipped\":\n",
+ " if (\n",
+ " \"performance_details\" in label\n",
+ " ): # Check if performance details are present. This field can be omitted in export.\n",
+ " csv_row[label_base_column] = label[\"performance_details\"][\n",
+ " \"skipped\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -271,96 +551,228 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "from pprint import pprint\n\n\ndef get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list we are looking for the feature\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "from pprint import pprint\n",
+ "\n",
+ "\n",
+ "def get_feature_answers(feature: str,\n",
+ " annotations: list[dict[str:str]]) -> None | str:\n",
+ " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
+ "\n",
+ " Args:\n",
+ " feature (str): feature we are searching\n",
+ " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with. \n",
+ "\n",
+ " Returns:\n",
+ " None | str: The answer/value of the feature returns None if nothing is found\n",
+ " \"\"\"\n",
+ " for annotation in annotations:\n",
+ " print(annotation)\n",
+ " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
+ " ): # Base conditions (found feature)\n",
+ " if \"text_answer\" in annotation:\n",
+ " return annotation[\"text_answer\"][\"content\"]\n",
+ " if \"radio_answer\" in annotation:\n",
+ " return annotation[\"radio_answer\"][\"value\"]\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
+ " return \", \".join([\n",
+ " check_list_ans[\"value\"]\n",
+ " for check_list_ans in annotation[\"checklist_answers\"]\n",
+ " ])\n",
+ " if \"bounding_box\" in annotation:\n",
+ " return annotation[\"bounding_box\"]\n",
+ " # Add more tools here with similar pattern as above\n",
+ "\n",
+ " # Recursion cases (found more classifications to search through)\n",
+ " if \"radio_answer\" in annotation:\n",
+ " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
+ " ) # Call function again return value if answer found\n",
+ " if value:\n",
+ " return value\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " for checklist_ans in annotation[\"checklist_answers\"]:\n",
+ " if len(checklist_ans[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, checklist_ans[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ " if (\"classifications\"\n",
+ " in annotation): # case for if tool has classifications\n",
+ " if len(annotation[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(feature,\n",
+ " annotation[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ "\n",
+ " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n pprint(label)\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n pprint(tool_annotation_columns)\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "GLOBAL_CSV_LIST = []\n",
+ "\n",
+ "\n",
+ "def main(output: lb.BufferedJsonConverterOutput):\n",
+ "\n",
+ " # Navigate to our label list\n",
+ " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
+ " for label in labels:\n",
+ " # Define our CSV \"row\"\n",
+ " csv_row = dict()\n",
+ "\n",
+ " # Start with data row base columns\n",
+ " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
+ " data_row_base_columns)\n",
+ "\n",
+ " # Add our label details\n",
+ " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
+ "\n",
+ " pprint(label)\n",
+ " # Add classification features\n",
+ " for classification in class_annotation_columns:\n",
+ " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
+ " classification, label[\"annotations\"][\"classifications\"])\n",
+ "\n",
+ " pprint(tool_annotation_columns)\n",
+ " # Add tools features\n",
+ " for tool in tool_annotation_columns:\n",
+ " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
+ " tool, label[\"annotations\"][\"objects\"])\n",
+ "\n",
+ " # Append to global csv list\n",
+ " GLOBAL_CSV_LIST.append(csv_row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT).start(\n stream_handler=main\n ) # Feeding our data row handler directly into export",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Params required to obtain all fields we need\n",
+ "params = {\"performance_details\": True, \"label_details\": True}\n",
+ "\n",
+ "export_task = project.export(params=params)\n",
+ "export_task.wait_till_done()\n",
+ "\n",
+ "# Conditional for if export task has errors\n",
+ "if export_task.has_errors():\n",
+ " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
+ " stream_handler=lambda error: print(error))\n",
+ "\n",
+ "if export_task.has_result():\n",
+ " export_json = export_task.get_buffered_stream(\n",
+ " stream_type=lb.StreamType.RESULT).start(\n",
+ " stream_handler=main\n",
+ " ) # Feeding our data row handler directly into export"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "pprint(GLOBAL_CSV_LIST)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "pprint(GLOBAL_CSV_LIST)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ],
- "cell_type": "markdown"
+ "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
+ " # Columns\n",
+ " fieldnames = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
+ "\n",
+ " writer.writeheader()\n",
+ "\n",
+ " for row in GLOBAL_CSV_LIST:\n",
+ " writer.writerow(row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "columns = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
+ ]
+ }
+ ],
+ "metadata": {
+ "language_info": {
+ "name": "python"
}
- ]
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
}
From 4e476b36c37ec9e3f6a2ada9782062ff535466b9 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 12:54:33 -0500
Subject: [PATCH 09/19] modified readme generator script
---
examples/exports/exporting_to_CSV.ipynb | 8 +++++++-
examples/scripts/generate_readme.py | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/examples/exports/exporting_to_CSV.ipynb b/examples/exports/exporting_to_CSV.ipynb
index 4be20602f..95ec69e04 100644
--- a/examples/exports/exporting_to_CSV.ipynb
+++ b/examples/exports/exporting_to_CSV.ipynb
@@ -769,8 +769,14 @@
}
],
"metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
"language_info": {
- "name": "python"
+ "name": "python",
+ "version": "3.11.9"
}
},
"nbformat": 4,
diff --git a/examples/scripts/generate_readme.py b/examples/scripts/generate_readme.py
index 80939acfc..135f9421d 100644
--- a/examples/scripts/generate_readme.py
+++ b/examples/scripts/generate_readme.py
@@ -69,7 +69,7 @@ def create_title(link: str) -> str:
# List to lower case certain words and list to keep certain acronyms capitalized
lower_case_words = ["to"]
- acronyms = ["html", "pdf", "llm", "dicom", "sam"]
+ acronyms = ["html", "pdf", "llm", "dicom", "sam", "csv"]
for word in split_link:
if word.lower() in acronyms:
From b1f88e59b4687c7a96c9f04e61ae9fea7533da89 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:06:50 -0500
Subject: [PATCH 10/19] fixed
---
examples/README.md | 87 +++-
examples/exports/exporting_to_CSV.ipynb | 636 ++++--------------------
2 files changed, 188 insertions(+), 535 deletions(-)
diff --git a/examples/README.md b/examples/README.md
index c60c98904..e1cda2598 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -20,6 +20,9 @@
Basics |
 |
 |
+ Basics |
+  |
+  |
Batches |
@@ -31,6 +34,11 @@
 |
 |
+
+ Custom Embeddings |
+  |
+  |
+
Projects |
 |
@@ -40,6 +48,9 @@
User Management |
 |
 |
+ User Management |
+  |
+  |
Data Row Metadata |
@@ -55,11 +66,17 @@
Ontologies |
 |
 |
+ Ontologies |
+  |
+  |
Data Rows |
 |
 |
+ Data Rows |
+  |
+  |
@@ -76,9 +93,9 @@
- Exporting to Csv |
-  |
-  |
+ Exporting to CSV |
+  |
+  |
Composite Mask Export |
@@ -90,6 +107,11 @@
 |
 |
+
+ Export V1 to V2 Migration Support |
+  |
+  |
+
@@ -104,11 +126,6 @@
-
- Live Multimodal Chat Project |
-  |
-  |
-
Project Setup |
 |
@@ -129,6 +146,11 @@
 |
 |
+
+ Model Chat Evaluation Project |
+  |
+  |
+
@@ -147,6 +169,9 @@
Audio |
 |
 |
+ Audio |
+  |
+  |
Video |
@@ -162,6 +187,9 @@
Tiled |
 |
 |
+ Tiled |
+  |
+  |
Conversational |
@@ -177,6 +205,9 @@
Conversational LLM Data Generation |
 |
 |
+ Conversational LLM Data Generation |
+  |
+  |
DICOM |
@@ -187,6 +218,9 @@
Image |
 |
 |
+ Image |
+  |
+  |
HTML |
@@ -197,6 +231,9 @@
Conversational LLM |
 |
 |
+ Conversational LLM |
+  |
+  |
@@ -216,6 +253,9 @@
Meta SAM |
 |
 |
+ Meta SAM |
+  |
+  |
Meta SAM Video |
@@ -227,6 +267,14 @@
 |
 |
+
+ Huggingface Custom Embeddings |
+  |
+  |
+ Langchain |
+  |
+  |
+
Huggingface Custom Embeddings |
 |
@@ -261,6 +309,11 @@
 |
 |
+
+ Custom Metrics Demo |
+  |
+  |
+
Model Predictions to Project |
 |
@@ -294,6 +347,9 @@
HTML Predictions |
 |
 |
+ HTML Predictions |
+  |
+  |
Conversational LLM Predictions |
@@ -310,6 +366,16 @@
 |
 |
+
+ Geospatial Predictions |
+  |
+  |
+
+
+ PDF Predictions |
+  |
+  |
+
Video Predictions |
 |
@@ -320,6 +386,11 @@
 |
 |
+
+ Image Predictions |
+  |
+  |
+
diff --git a/examples/exports/exporting_to_CSV.ipynb b/examples/exports/exporting_to_CSV.ipynb
index 95ec69e04..9ce718bce 100644
--- a/examples/exports/exporting_to_CSV.ipynb
+++ b/examples/exports/exporting_to_CSV.ipynb
@@ -1,40 +1,42 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {},
"cells": [
{
- "cell_type": "markdown",
"metadata": {},
"source": [
- "\n",
- " \n",
+ " | ",
+ " ",
" | \n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
- "![]() \n",
" | \n",
"\n",
"\n",
- "![]() \n",
" | "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
"This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://github.com/Labelbox/labelpandas) friendly format. "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -42,267 +44,83 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "%pip install -q --upgrade \"Labelbox[data]\"\n",
- "%pip install -q pandas"
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "import labelbox as lb\n",
- "import labelbox.types as lb_types\n",
- "import uuid\n",
- "from pprint import pprint\n",
- "import csv\n",
- "import pandas as pd"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "API_KEY = None\n",
- "client = lb.Client(api_key=API_KEY)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
"The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Create dataset with image data row\n",
- "global_key = str(uuid.uuid4())\n",
- "\n",
- "test_img_url = {\n",
- " \"row_data\":\n",
- " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
- " \"global_key\":\n",
- " global_key,\n",
- "}\n",
- "\n",
- "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
- "task = dataset.create_data_rows([test_img_url])\n",
- "task.wait_till_done()\n",
- "print(\"Errors:\", task.errors)\n",
- "print(\"Failed data rows:\", task.failed_data_rows)\n",
- "\n",
- "# Create ontology\n",
- "ontology_builder = lb.OntologyBuilder(\n",
- " classifications=[ # List of Classification objects\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"radio_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_radio_answer\"),\n",
- " lb.Option(value=\"second_radio_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.CHECKLIST,\n",
- " name=\"checklist_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_checklist_answer\"),\n",
- " lb.Option(value=\"second_checklist_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
- " name=\"free_text\"),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"nested_radio_question\",\n",
- " options=[\n",
- " lb.Option(\n",
- " \"first_radio_answer\",\n",
- " options=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(\"first_sub_radio_answer\")],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " ],\n",
- " tools=[ # List of Tool objects\n",
- " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
- " lb.Tool(\n",
- " tool=lb.Tool.Type.BBOX,\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " classifications=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
- " ),\n",
- " ],\n",
- " ),\n",
- " ],\n",
- ")\n",
- "\n",
- "ontology = client.create_ontology(\n",
- " \"Image CSV Demo Ontology\",\n",
- " ontology_builder.asdict(),\n",
- " media_type=lb.MediaType.Image,\n",
- ")\n",
- "\n",
- "# Set up project and connect ontology\n",
- "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
- " media_type=lb.MediaType.Image)\n",
- "project.setup_editor(ontology)\n",
- "\n",
- "# Send data row towards our project\n",
- "batch = project.create_batch(\n",
- " \"image-demo-batch\",\n",
- " global_keys=[\n",
- " global_key\n",
- " ], # paginated collection of data row objects, list of data row ids or global keys\n",
- " priority=1,\n",
- ")\n",
- "\n",
- "print(f\"Batch: {batch}\")\n",
- "\n",
- "# Create a label and imported it towards our project\n",
- "radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"second_radio_answer\")),\n",
- ")\n",
- "checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
- " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
- " ]),\n",
- ")\n",
- "text_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"free_text\",\n",
- " value=lb_types.Text(answer=\"sample text\"),\n",
- ")\n",
- "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- " )),\n",
- ")\n",
- "bbox_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bounding_box\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=1690, y=977),\n",
- " end=lb_types.Point(x=1915, y=1307),\n",
- " ),\n",
- ")\n",
- "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
- " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
- " ),\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"tool_first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- ")\n",
- "\n",
- "label = []\n",
- "annotations = [\n",
- " radio_annotation,\n",
- " nested_radio_annotation,\n",
- " checklist_annotation,\n",
- " text_annotation,\n",
- " bbox_annotation,\n",
- " bbox_with_radio_subclass_annotation,\n",
- "]\n",
- "\n",
- "label.append(\n",
- " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
- "\n",
- "upload_job = lb.LabelImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=project.uid,\n",
- " name=\"label_import_job\" + str(uuid.uuid4()),\n",
- " labels=label,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# PROJECT_ID = None\n",
- "# project = client.get_project(PROJECT_ID)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -319,10 +137,10 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
@@ -337,10 +155,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -348,191 +166,93 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "data_row_base_columns = [\n",
- " \"Data Row ID\",\n",
- " \"Global Key\",\n",
- " \"External ID\",\n",
- " \"Project ID\",\n",
- "]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_Id` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
- " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
- " for classification in classifications:\n",
- " if \"name\" in classification:\n",
- " class_list.append({\n",
- " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
- " \"column_name\": classification[\"instructions\"],\n",
- " })\n",
- " if \"options\" in classification:\n",
- " get_classification_features(classification[\"options\"], class_list)\n",
- " return class_list\n",
- "\n",
- "\n",
- "def get_tool_features(tools: list) -> None:\n",
- " \"\"\"Creates list of tool names from ontology\"\"\"\n",
- " tool_list = []\n",
- " for tool in tools:\n",
- " tool_list.append({\n",
- " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
- " \"column_name\": tool[\"name\"],\n",
- " })\n",
- " if \"classifications\" in tool:\n",
- " tool_list = get_classification_features(tool[\"classifications\"],\n",
- " tool_list)\n",
- " return tool_list"
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])\n\npprint(class_annotation_columns)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Get ontology from project and normalized towards python dictionary\n",
- "ontology = project.ontology().normalized\n",
- "\n",
- "class_annotation_columns = get_classification_features(\n",
- " ontology[\"classifications\"])\n",
- "tool_annotation_columns = get_tool_features(ontology[\"tools\"])\n",
- "\n",
- "pprint(class_annotation_columns)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
- " base_columns: list[str]) -> dict[str:str]:\n",
- " for base_column in base_columns:\n",
- " if base_column == \"Data Row ID\":\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
- "\n",
- " elif base_column == \"Global Key\":\n",
- " if (\"global_key\"\n",
- " in data_row[\"data_row\"]): # Check if global key exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"External ID\":\n",
- " if (\"external_id\"\n",
- " in data_row[\"data_row\"]): # Check if external_id exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"Project ID\":\n",
- " csv_row[base_column] = project.uid\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
- " label_base_columns: list[str]) -> dict[str:str]:\n",
- " for label_base_column in label_base_columns:\n",
- " if label_base_column == \"Label ID\":\n",
- " csv_row[label_base_column] = label[\"id\"]\n",
- "\n",
- " elif label_base_columns == \"Created By\":\n",
- " if (\n",
- " \"label_details\" in label\n",
- " ): # Check if label details is present. This field can be omitted in export\n",
- " csv_row[label_base_column] = label_base_columns[\n",
- " \"label_details\"][\"created_by\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " elif label_base_column == \"Skipped\":\n",
- " if (\n",
- " \"performance_details\" in label\n",
- " ): # Check if performance details are present. This field can be omitted in export.\n",
- " csv_row[label_base_column] = label[\"performance_details\"][\n",
- " \"skipped\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -551,234 +271,96 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "from pprint import pprint\n\n\ndef get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "from pprint import pprint\n",
- "\n",
- "\n",
- "def get_feature_answers(feature: str,\n",
- " annotations: list[dict[str:str]]) -> None | str:\n",
- " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
- "\n",
- " Args:\n",
- " feature (str): feature we are searching\n",
- " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with. \n",
- "\n",
- " Returns:\n",
- " None | str: The answer/value of the feature returns None if nothing is found\n",
- " \"\"\"\n",
- " for annotation in annotations:\n",
- " print(annotation)\n",
- " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
- " ): # Base conditions (found feature)\n",
- " if \"text_answer\" in annotation:\n",
- " return annotation[\"text_answer\"][\"content\"]\n",
- " if \"radio_answer\" in annotation:\n",
- " return annotation[\"radio_answer\"][\"value\"]\n",
- " if \"checklist_answers\" in annotation:\n",
- " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
- " return \", \".join([\n",
- " check_list_ans[\"value\"]\n",
- " for check_list_ans in annotation[\"checklist_answers\"]\n",
- " ])\n",
- " if \"bounding_box\" in annotation:\n",
- " return annotation[\"bounding_box\"]\n",
- " # Add more tools here with similar pattern as above\n",
- "\n",
- " # Recursion cases (found more classifications to search through)\n",
- " if \"radio_answer\" in annotation:\n",
- " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
- " ) # Call function again return value if answer found\n",
- " if value:\n",
- " return value\n",
- " if \"checklist_answers\" in annotation:\n",
- " for checklist_ans in annotation[\"checklist_answers\"]:\n",
- " if len(checklist_ans[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, checklist_ans[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- " if (\"classifications\"\n",
- " in annotation): # case for if tool has classifications\n",
- " if len(annotation[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(feature,\n",
- " annotation[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- "\n",
- " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n pprint(label)\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n pprint(tool_annotation_columns)\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "GLOBAL_CSV_LIST = []\n",
- "\n",
- "\n",
- "def main(output: lb.BufferedJsonConverterOutput):\n",
- "\n",
- " # Navigate to our label list\n",
- " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
- " for label in labels:\n",
- " # Define our CSV \"row\"\n",
- " csv_row = dict()\n",
- "\n",
- " # Start with data row base columns\n",
- " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
- " data_row_base_columns)\n",
- "\n",
- " # Add our label details\n",
- " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
- "\n",
- " pprint(label)\n",
- " # Add classification features\n",
- " for classification in class_annotation_columns:\n",
- " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
- " classification, label[\"annotations\"][\"classifications\"])\n",
- "\n",
- " pprint(tool_annotation_columns)\n",
- " # Add tools features\n",
- " for tool in tool_annotation_columns:\n",
- " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
- " tool, label[\"annotations\"][\"objects\"])\n",
- "\n",
- " # Append to global csv list\n",
- " GLOBAL_CSV_LIST.append(csv_row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT).start(\n stream_handler=main\n ) # Feeding our data row handler directly into export",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Params required to obtain all fields we need\n",
- "params = {\"performance_details\": True, \"label_details\": True}\n",
- "\n",
- "export_task = project.export(params=params)\n",
- "export_task.wait_till_done()\n",
- "\n",
- "# Conditional for if export task has errors\n",
- "if export_task.has_errors():\n",
- " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
- " stream_handler=lambda error: print(error))\n",
- "\n",
- "if export_task.has_result():\n",
- " export_json = export_task.get_buffered_stream(\n",
- " stream_type=lb.StreamType.RESULT).start(\n",
- " stream_handler=main\n",
- " ) # Feeding our data row handler directly into export"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "pprint(GLOBAL_CSV_LIST)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "pprint(GLOBAL_CSV_LIST)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ]
+ "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
- " # Columns\n",
- " fieldnames = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
- "\n",
- " writer.writeheader()\n",
- "\n",
- " for row in GLOBAL_CSV_LIST:\n",
- " writer.writerow(row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "columns = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "name": "python",
- "version": "3.11.9"
+ "execution_count": null
}
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ ]
+}
\ No newline at end of file
From 5594b3ec5e21d69236b7788e92350d45f2178120 Mon Sep 17 00:00:00 2001
From: Gabe <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 12:55:12 -0500
Subject: [PATCH 11/19] Rename exporting_to_CSV.ipynb to exporting_to_csv.ipynb
---
.../exports/{exporting_to_CSV.ipynb => exporting_to_csv.ipynb} | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename examples/exports/{exporting_to_CSV.ipynb => exporting_to_csv.ipynb} (100%)
diff --git a/examples/exports/exporting_to_CSV.ipynb b/examples/exports/exporting_to_csv.ipynb
similarity index 100%
rename from examples/exports/exporting_to_CSV.ipynb
rename to examples/exports/exporting_to_csv.ipynb
From fbb884f2c69dd5491753620fd8b70d9ffede17ed Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:09:08 -0500
Subject: [PATCH 12/19] forcing workflow to run
---
examples/exports/exporting_to_csv.ipynb | 651 ++++++++++++++++++++----
1 file changed, 543 insertions(+), 108 deletions(-)
diff --git a/examples/exports/exporting_to_csv.ipynb b/examples/exports/exporting_to_csv.ipynb
index 9ce718bce..94fd8ed77 100644
--- a/examples/exports/exporting_to_csv.ipynb
+++ b/examples/exports/exporting_to_csv.ipynb
@@ -1,18 +1,16 @@
{
- "nbformat": 4,
- "nbformat_minor": 2,
- "metadata": {},
"cells": [
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "",
- " ",
+ " | \n",
+ " \n",
" | \n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -24,19 +22,19 @@
" \n",
" | "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
"This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://github.com/Labelbox/labelpandas) friendly format. "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -44,83 +42,276 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
"cell_type": "code",
- "outputs": [],
- "execution_count": null
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Note: you may need to restart the kernel to use updated packages.\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "%pip install -q --upgrade \"Labelbox[data]\"\n",
+ "%pip install -q pandas"
+ ]
},
{
- "metadata": {},
- "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "import labelbox as lb\n",
+ "import labelbox.types as lb_types\n",
+ "import uuid\n",
+ "from pprint import pprint\n",
+ "import csv\n",
+ "import pandas as pd"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "API_KEY = None\n",
+ "client = lb.Client(api_key=API_KEY)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
"The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Create dataset with image data row\n",
+ "global_key = str(uuid.uuid4())\n",
+ "\n",
+ "test_img_url = {\n",
+ " \"row_data\":\n",
+ " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
+ " \"global_key\":\n",
+ " global_key,\n",
+ "}\n",
+ "\n",
+ "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
+ "task = dataset.create_data_rows([test_img_url])\n",
+ "task.wait_till_done()\n",
+ "print(\"Errors:\", task.errors)\n",
+ "print(\"Failed data rows:\", task.failed_data_rows)\n",
+ "\n",
+ "# Create ontology\n",
+ "ontology_builder = lb.OntologyBuilder(\n",
+ " classifications=[ # List of Classification objects\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"radio_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_radio_answer\"),\n",
+ " lb.Option(value=\"second_radio_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.CHECKLIST,\n",
+ " name=\"checklist_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_checklist_answer\"),\n",
+ " lb.Option(value=\"second_checklist_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
+ " name=\"free_text\"),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"nested_radio_question\",\n",
+ " options=[\n",
+ " lb.Option(\n",
+ " \"first_radio_answer\",\n",
+ " options=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(\"first_sub_radio_answer\")],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ " tools=[ # List of Tool objects\n",
+ " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
+ " lb.Tool(\n",
+ " tool=lb.Tool.Type.BBOX,\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " classifications=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
+ " ),\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "ontology = client.create_ontology(\n",
+ " \"Image CSV Demo Ontology\",\n",
+ " ontology_builder.asdict(),\n",
+ " media_type=lb.MediaType.Image,\n",
+ ")\n",
+ "\n",
+ "# Set up project and connect ontology\n",
+ "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
+ " media_type=lb.MediaType.Image)\n",
+ "project.setup_editor(ontology)\n",
+ "\n",
+ "# Send data row towards our project\n",
+ "batch = project.create_batch(\n",
+ " \"image-demo-batch\",\n",
+ " global_keys=[\n",
+ " global_key\n",
+ " ], # paginated collection of data row objects, list of data row ids or global keys\n",
+ " priority=1,\n",
+ ")\n",
+ "\n",
+ "print(f\"Batch: {batch}\")\n",
+ "\n",
+ "# Create a label and imported it towards our project\n",
+ "radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"second_radio_answer\")),\n",
+ ")\n",
+ "checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
+ " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
+ " ]),\n",
+ ")\n",
+ "text_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"free_text\",\n",
+ " value=lb_types.Text(answer=\"sample text\"),\n",
+ ")\n",
+ "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ " )),\n",
+ ")\n",
+ "bbox_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bounding_box\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=1690, y=977),\n",
+ " end=lb_types.Point(x=1915, y=1307),\n",
+ " ),\n",
+ ")\n",
+ "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
+ " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
+ " ),\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"tool_first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "label = []\n",
+ "annotations = [\n",
+ " radio_annotation,\n",
+ " nested_radio_annotation,\n",
+ " checklist_annotation,\n",
+ " text_annotation,\n",
+ " bbox_annotation,\n",
+ " bbox_with_radio_subclass_annotation,\n",
+ "]\n",
+ "\n",
+ "label.append(\n",
+ " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
+ "\n",
+ "upload_job = lb.LabelImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=project.uid,\n",
+ " name=\"label_import_job\" + str(uuid.uuid4()),\n",
+ " labels=label,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# PROJECT_ID = None\n",
+ "# project = client.get_project(PROJECT_ID)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -137,10 +328,10 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
@@ -155,10 +346,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -166,93 +357,191 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "data_row_base_columns = [\n",
+ " \"Data Row ID\",\n",
+ " \"Global Key\",\n",
+ " \"External ID\",\n",
+ " \"Project ID\",\n",
+ "]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_Id` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
+ " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
+ " for classification in classifications:\n",
+ " if \"name\" in classification:\n",
+ " class_list.append({\n",
+ " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
+ " \"column_name\": classification[\"instructions\"],\n",
+ " })\n",
+ " if \"options\" in classification:\n",
+ " get_classification_features(classification[\"options\"], class_list)\n",
+ " return class_list\n",
+ "\n",
+ "\n",
+ "def get_tool_features(tools: list) -> None:\n",
+ " \"\"\"Creates list of tool names from ontology\"\"\"\n",
+ " tool_list = []\n",
+ " for tool in tools:\n",
+ " tool_list.append({\n",
+ " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
+ " \"column_name\": tool[\"name\"],\n",
+ " })\n",
+ " if \"classifications\" in tool:\n",
+ " tool_list = get_classification_features(tool[\"classifications\"],\n",
+ " tool_list)\n",
+ " return tool_list"
+ ]
},
{
- "metadata": {},
- "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])\n\npprint(class_annotation_columns)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Get ontology from project and normalized towards python dictionary\n",
+ "ontology = project.ontology().normalized\n",
+ "\n",
+ "class_annotation_columns = get_classification_features(\n",
+ " ontology[\"classifications\"])\n",
+ "tool_annotation_columns = get_tool_features(ontology[\"tools\"])\n",
+ "\n",
+ "pprint(class_annotation_columns)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
+ " base_columns: list[str]) -> dict[str:str]:\n",
+ " for base_column in base_columns:\n",
+ " if base_column == \"Data Row ID\":\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
+ "\n",
+ " elif base_column == \"Global Key\":\n",
+ " if (\"global_key\"\n",
+ " in data_row[\"data_row\"]): # Check if global key exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"External ID\":\n",
+ " if (\"external_id\"\n",
+ " in data_row[\"data_row\"]): # Check if external_id exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"Project ID\":\n",
+ " csv_row[base_column] = project.uid\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
+ " label_base_columns: list[str]) -> dict[str:str]:\n",
+ " for label_base_column in label_base_columns:\n",
+ " if label_base_column == \"Label ID\":\n",
+ " csv_row[label_base_column] = label[\"id\"]\n",
+ "\n",
+ " elif label_base_columns == \"Created By\":\n",
+ " if (\n",
+ " \"label_details\" in label\n",
+ " ): # Check if label details is present. This field can be omitted in export\n",
+ " csv_row[label_base_column] = label_base_columns[\n",
+ " \"label_details\"][\"created_by\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " elif label_base_column == \"Skipped\":\n",
+ " if (\n",
+ " \"performance_details\" in label\n",
+ " ): # Check if performance details are present. This field can be omitted in export.\n",
+ " csv_row[label_base_column] = label[\"performance_details\"][\n",
+ " \"skipped\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -271,96 +560,242 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "from pprint import pprint\n\n\ndef get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "from pprint import pprint\n",
+ "\n",
+ "\n",
+ "def get_feature_answers(feature: str,\n",
+ " annotations: list[dict[str:str]]) -> None | str:\n",
+ " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
+ "\n",
+ " Args:\n",
+ " feature (str): feature we are searching\n",
+ " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n",
+ "\n",
+ " Returns:\n",
+ " None | str: The answer/value of the feature returns None if nothing is found\n",
+ " \"\"\"\n",
+ " for annotation in annotations:\n",
+ " print(annotation)\n",
+ " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
+ " ): # Base conditions (found feature)\n",
+ " if \"text_answer\" in annotation:\n",
+ " return annotation[\"text_answer\"][\"content\"]\n",
+ " if \"radio_answer\" in annotation:\n",
+ " return annotation[\"radio_answer\"][\"value\"]\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
+ " return \", \".join([\n",
+ " check_list_ans[\"value\"]\n",
+ " for check_list_ans in annotation[\"checklist_answers\"]\n",
+ " ])\n",
+ " if \"bounding_box\" in annotation:\n",
+ " return annotation[\"bounding_box\"]\n",
+ " # Add more tools here with similar pattern as above\n",
+ "\n",
+ " # Recursion cases (found more classifications to search through)\n",
+ " if \"radio_answer\" in annotation:\n",
+ " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
+ " ) # Call function again return value if answer found\n",
+ " if value:\n",
+ " return value\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " for checklist_ans in annotation[\"checklist_answers\"]:\n",
+ " if len(checklist_ans[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, checklist_ans[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ " if (\"classifications\"\n",
+ " in annotation): # case for if tool has classifications\n",
+ " if len(annotation[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(feature,\n",
+ " annotation[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ "\n",
+ " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n pprint(label)\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n pprint(tool_annotation_columns)\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "GLOBAL_CSV_LIST = []\n",
+ "\n",
+ "\n",
+ "def main(output: lb.BufferedJsonConverterOutput):\n",
+ "\n",
+ " # Navigate to our label list\n",
+ " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
+ " for label in labels:\n",
+ " # Define our CSV \"row\"\n",
+ " csv_row = dict()\n",
+ "\n",
+ " # Start with data row base columns\n",
+ " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
+ " data_row_base_columns)\n",
+ "\n",
+ " # Add our label details\n",
+ " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
+ "\n",
+ " pprint(label)\n",
+ " # Add classification features\n",
+ " for classification in class_annotation_columns:\n",
+ " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
+ " classification, label[\"annotations\"][\"classifications\"])\n",
+ "\n",
+ " pprint(tool_annotation_columns)\n",
+ " # Add tools features\n",
+ " for tool in tool_annotation_columns:\n",
+ " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
+ " tool, label[\"annotations\"][\"objects\"])\n",
+ "\n",
+ " # Append to global csv list\n",
+ " GLOBAL_CSV_LIST.append(csv_row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT).start(\n stream_handler=main\n ) # Feeding our data row handler directly into export",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Params required to obtain all fields we need\n",
+ "params = {\"performance_details\": True, \"label_details\": True}\n",
+ "\n",
+ "export_task = project.export(params=params)\n",
+ "export_task.wait_till_done()\n",
+ "\n",
+ "# Conditional for if export task has errors\n",
+ "if export_task.has_errors():\n",
+ " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
+ " stream_handler=lambda error: print(error))\n",
+ "\n",
+ "if export_task.has_result():\n",
+ " export_json = export_task.get_buffered_stream(\n",
+ " stream_type=lb.StreamType.RESULT).start(\n",
+ " stream_handler=main\n",
+ " ) # Feeding our data row handler directly into export"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "pprint(GLOBAL_CSV_LIST)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "pprint(GLOBAL_CSV_LIST)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ],
- "cell_type": "markdown"
+ "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
+ " # Columns\n",
+ " fieldnames = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
+ "\n",
+ " writer.writeheader()\n",
+ "\n",
+ " for row in GLOBAL_CSV_LIST:\n",
+ " writer.writerow(row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "columns = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.9"
}
- ]
-}
\ No newline at end of file
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
From f9f85ab96da981945ff1b6600143ac1efc9fdc21 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:12:10 -0500
Subject: [PATCH 13/19] redid readme
---
examples/README.md | 76 ----------------------------------------------
1 file changed, 76 deletions(-)
diff --git a/examples/README.md b/examples/README.md
index e1cda2598..666506bab 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -20,9 +20,6 @@
Basics |
 |
 |
- Basics |
-  |
-  |
Batches |
@@ -34,11 +31,6 @@
 |
 |
-
- Custom Embeddings |
-  |
-  |
-
Projects |
 |
@@ -48,9 +40,6 @@
User Management |
 |
 |
- User Management |
-  |
-  |
Data Row Metadata |
@@ -66,17 +55,11 @@
Ontologies |
 |
 |
- Ontologies |
-  |
-  |
Data Rows |
 |
 |
- Data Rows |
-  |
-  |
@@ -107,11 +90,6 @@
 |
 |
-
- Export V1 to V2 Migration Support |
-  |
-  |
-
@@ -146,11 +124,6 @@
 |
 |
-
- Model Chat Evaluation Project |
-  |
-  |
-
@@ -169,9 +142,6 @@
Audio |
 |
 |
- Audio |
-  |
-  |
Video |
@@ -187,9 +157,6 @@
Tiled |
 |
 |
- Tiled |
-  |
-  |
Conversational |
@@ -205,9 +172,6 @@
Conversational LLM Data Generation |
 |
 |
- Conversational LLM Data Generation |
-  |
-  |
DICOM |
@@ -218,9 +182,6 @@
Image |
 |
 |
- Image |
-  |
-  |
HTML |
@@ -231,9 +192,6 @@
Conversational LLM |
 |
 |
- Conversational LLM |
-  |
-  |
@@ -253,9 +211,6 @@
Meta SAM |
 |
 |
- Meta SAM |
-  |
-  |
Meta SAM Video |
@@ -267,14 +222,6 @@
 |
 |
-
- Huggingface Custom Embeddings |
-  |
-  |
- Langchain |
-  |
-  |
-
Huggingface Custom Embeddings |
 |
@@ -309,11 +256,6 @@
 |
 |
-
- Custom Metrics Demo |
-  |
-  |
-
Model Predictions to Project |
 |
@@ -347,9 +289,6 @@
HTML Predictions |
 |
 |
- HTML Predictions |
-  |
-  |
Conversational LLM Predictions |
@@ -366,16 +305,6 @@
 |
 |
-
- Geospatial Predictions |
-  |
-  |
-
-
- PDF Predictions |
-  |
-  |
-
Video Predictions |
 |
@@ -386,11 +315,6 @@
 |
 |
-
- Image Predictions |
-  |
-  |
-
From ab422cbdc52dfd884bc1e03a2f595a39d7943b00 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Wed, 5 Jun 2024 18:13:16 +0000
Subject: [PATCH 14/19] :art: Cleaned
---
examples/exports/exporting_to_csv.ipynb | 651 ++++--------------------
1 file changed, 108 insertions(+), 543 deletions(-)
diff --git a/examples/exports/exporting_to_csv.ipynb b/examples/exports/exporting_to_csv.ipynb
index 94fd8ed77..9ce718bce 100644
--- a/examples/exports/exporting_to_csv.ipynb
+++ b/examples/exports/exporting_to_csv.ipynb
@@ -1,16 +1,18 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {},
"cells": [
{
- "cell_type": "markdown",
"metadata": {},
"source": [
- "\n",
- " \n",
+ " | ",
+ " ",
" | \n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -22,19 +24,19 @@
" \n",
" | "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
"This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://github.com/Labelbox/labelpandas) friendly format. "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -42,276 +44,83 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ]
+ ],
+ "cell_type": "markdown"
},
{
+ "metadata": {},
+ "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
"cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Note: you may need to restart the kernel to use updated packages.\n",
- "Note: you may need to restart the kernel to use updated packages.\n"
- ]
- }
- ],
- "source": [
- "%pip install -q --upgrade \"Labelbox[data]\"\n",
- "%pip install -q pandas"
- ]
+ "outputs": [],
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "import labelbox as lb\n",
- "import labelbox.types as lb_types\n",
- "import uuid\n",
- "from pprint import pprint\n",
- "import csv\n",
- "import pandas as pd"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "API_KEY = None\n",
- "client = lb.Client(api_key=API_KEY)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
"The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Create dataset with image data row\n",
- "global_key = str(uuid.uuid4())\n",
- "\n",
- "test_img_url = {\n",
- " \"row_data\":\n",
- " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
- " \"global_key\":\n",
- " global_key,\n",
- "}\n",
- "\n",
- "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
- "task = dataset.create_data_rows([test_img_url])\n",
- "task.wait_till_done()\n",
- "print(\"Errors:\", task.errors)\n",
- "print(\"Failed data rows:\", task.failed_data_rows)\n",
- "\n",
- "# Create ontology\n",
- "ontology_builder = lb.OntologyBuilder(\n",
- " classifications=[ # List of Classification objects\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"radio_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_radio_answer\"),\n",
- " lb.Option(value=\"second_radio_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.CHECKLIST,\n",
- " name=\"checklist_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_checklist_answer\"),\n",
- " lb.Option(value=\"second_checklist_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
- " name=\"free_text\"),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"nested_radio_question\",\n",
- " options=[\n",
- " lb.Option(\n",
- " \"first_radio_answer\",\n",
- " options=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(\"first_sub_radio_answer\")],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " ],\n",
- " tools=[ # List of Tool objects\n",
- " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
- " lb.Tool(\n",
- " tool=lb.Tool.Type.BBOX,\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " classifications=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
- " ),\n",
- " ],\n",
- " ),\n",
- " ],\n",
- ")\n",
- "\n",
- "ontology = client.create_ontology(\n",
- " \"Image CSV Demo Ontology\",\n",
- " ontology_builder.asdict(),\n",
- " media_type=lb.MediaType.Image,\n",
- ")\n",
- "\n",
- "# Set up project and connect ontology\n",
- "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
- " media_type=lb.MediaType.Image)\n",
- "project.setup_editor(ontology)\n",
- "\n",
- "# Send data row towards our project\n",
- "batch = project.create_batch(\n",
- " \"image-demo-batch\",\n",
- " global_keys=[\n",
- " global_key\n",
- " ], # paginated collection of data row objects, list of data row ids or global keys\n",
- " priority=1,\n",
- ")\n",
- "\n",
- "print(f\"Batch: {batch}\")\n",
- "\n",
- "# Create a label and imported it towards our project\n",
- "radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"second_radio_answer\")),\n",
- ")\n",
- "checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
- " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
- " ]),\n",
- ")\n",
- "text_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"free_text\",\n",
- " value=lb_types.Text(answer=\"sample text\"),\n",
- ")\n",
- "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- " )),\n",
- ")\n",
- "bbox_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bounding_box\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=1690, y=977),\n",
- " end=lb_types.Point(x=1915, y=1307),\n",
- " ),\n",
- ")\n",
- "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
- " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
- " ),\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"tool_first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- ")\n",
- "\n",
- "label = []\n",
- "annotations = [\n",
- " radio_annotation,\n",
- " nested_radio_annotation,\n",
- " checklist_annotation,\n",
- " text_annotation,\n",
- " bbox_annotation,\n",
- " bbox_with_radio_subclass_annotation,\n",
- "]\n",
- "\n",
- "label.append(\n",
- " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
- "\n",
- "upload_job = lb.LabelImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=project.uid,\n",
- " name=\"label_import_job\" + str(uuid.uuid4()),\n",
- " labels=label,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# PROJECT_ID = None\n",
- "# project = client.get_project(PROJECT_ID)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -328,10 +137,10 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
@@ -346,10 +155,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -357,191 +166,93 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "data_row_base_columns = [\n",
- " \"Data Row ID\",\n",
- " \"Global Key\",\n",
- " \"External ID\",\n",
- " \"Project ID\",\n",
- "]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_Id` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
- " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
- " for classification in classifications:\n",
- " if \"name\" in classification:\n",
- " class_list.append({\n",
- " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
- " \"column_name\": classification[\"instructions\"],\n",
- " })\n",
- " if \"options\" in classification:\n",
- " get_classification_features(classification[\"options\"], class_list)\n",
- " return class_list\n",
- "\n",
- "\n",
- "def get_tool_features(tools: list) -> None:\n",
- " \"\"\"Creates list of tool names from ontology\"\"\"\n",
- " tool_list = []\n",
- " for tool in tools:\n",
- " tool_list.append({\n",
- " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
- " \"column_name\": tool[\"name\"],\n",
- " })\n",
- " if \"classifications\" in tool:\n",
- " tool_list = get_classification_features(tool[\"classifications\"],\n",
- " tool_list)\n",
- " return tool_list"
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])\n\npprint(class_annotation_columns)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Get ontology from project and normalized towards python dictionary\n",
- "ontology = project.ontology().normalized\n",
- "\n",
- "class_annotation_columns = get_classification_features(\n",
- " ontology[\"classifications\"])\n",
- "tool_annotation_columns = get_tool_features(ontology[\"tools\"])\n",
- "\n",
- "pprint(class_annotation_columns)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
- " base_columns: list[str]) -> dict[str:str]:\n",
- " for base_column in base_columns:\n",
- " if base_column == \"Data Row ID\":\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
- "\n",
- " elif base_column == \"Global Key\":\n",
- " if (\"global_key\"\n",
- " in data_row[\"data_row\"]): # Check if global key exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"External ID\":\n",
- " if (\"external_id\"\n",
- " in data_row[\"data_row\"]): # Check if external_id exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"Project ID\":\n",
- " csv_row[base_column] = project.uid\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
- " label_base_columns: list[str]) -> dict[str:str]:\n",
- " for label_base_column in label_base_columns:\n",
- " if label_base_column == \"Label ID\":\n",
- " csv_row[label_base_column] = label[\"id\"]\n",
- "\n",
- " elif label_base_columns == \"Created By\":\n",
- " if (\n",
- " \"label_details\" in label\n",
- " ): # Check if label details is present. This field can be omitted in export\n",
- " csv_row[label_base_column] = label_base_columns[\n",
- " \"label_details\"][\"created_by\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " elif label_base_column == \"Skipped\":\n",
- " if (\n",
- " \"performance_details\" in label\n",
- " ): # Check if performance details are present. This field can be omitted in export.\n",
- " csv_row[label_base_column] = label[\"performance_details\"][\n",
- " \"skipped\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -560,242 +271,96 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "from pprint import pprint\n\n\ndef get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "from pprint import pprint\n",
- "\n",
- "\n",
- "def get_feature_answers(feature: str,\n",
- " annotations: list[dict[str:str]]) -> None | str:\n",
- " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
- "\n",
- " Args:\n",
- " feature (str): feature we are searching\n",
- " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n",
- "\n",
- " Returns:\n",
- " None | str: The answer/value of the feature returns None if nothing is found\n",
- " \"\"\"\n",
- " for annotation in annotations:\n",
- " print(annotation)\n",
- " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
- " ): # Base conditions (found feature)\n",
- " if \"text_answer\" in annotation:\n",
- " return annotation[\"text_answer\"][\"content\"]\n",
- " if \"radio_answer\" in annotation:\n",
- " return annotation[\"radio_answer\"][\"value\"]\n",
- " if \"checklist_answers\" in annotation:\n",
- " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
- " return \", \".join([\n",
- " check_list_ans[\"value\"]\n",
- " for check_list_ans in annotation[\"checklist_answers\"]\n",
- " ])\n",
- " if \"bounding_box\" in annotation:\n",
- " return annotation[\"bounding_box\"]\n",
- " # Add more tools here with similar pattern as above\n",
- "\n",
- " # Recursion cases (found more classifications to search through)\n",
- " if \"radio_answer\" in annotation:\n",
- " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
- " ) # Call function again return value if answer found\n",
- " if value:\n",
- " return value\n",
- " if \"checklist_answers\" in annotation:\n",
- " for checklist_ans in annotation[\"checklist_answers\"]:\n",
- " if len(checklist_ans[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, checklist_ans[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- " if (\"classifications\"\n",
- " in annotation): # case for if tool has classifications\n",
- " if len(annotation[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(feature,\n",
- " annotation[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- "\n",
- " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n pprint(label)\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n pprint(tool_annotation_columns)\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "GLOBAL_CSV_LIST = []\n",
- "\n",
- "\n",
- "def main(output: lb.BufferedJsonConverterOutput):\n",
- "\n",
- " # Navigate to our label list\n",
- " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
- " for label in labels:\n",
- " # Define our CSV \"row\"\n",
- " csv_row = dict()\n",
- "\n",
- " # Start with data row base columns\n",
- " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
- " data_row_base_columns)\n",
- "\n",
- " # Add our label details\n",
- " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
- "\n",
- " pprint(label)\n",
- " # Add classification features\n",
- " for classification in class_annotation_columns:\n",
- " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
- " classification, label[\"annotations\"][\"classifications\"])\n",
- "\n",
- " pprint(tool_annotation_columns)\n",
- " # Add tools features\n",
- " for tool in tool_annotation_columns:\n",
- " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
- " tool, label[\"annotations\"][\"objects\"])\n",
- "\n",
- " # Append to global csv list\n",
- " GLOBAL_CSV_LIST.append(csv_row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT).start(\n stream_handler=main\n ) # Feeding our data row handler directly into export",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Params required to obtain all fields we need\n",
- "params = {\"performance_details\": True, \"label_details\": True}\n",
- "\n",
- "export_task = project.export(params=params)\n",
- "export_task.wait_till_done()\n",
- "\n",
- "# Conditional for if export task has errors\n",
- "if export_task.has_errors():\n",
- " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
- " stream_handler=lambda error: print(error))\n",
- "\n",
- "if export_task.has_result():\n",
- " export_json = export_task.get_buffered_stream(\n",
- " stream_type=lb.StreamType.RESULT).start(\n",
- " stream_handler=main\n",
- " ) # Feeding our data row handler directly into export"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "pprint(GLOBAL_CSV_LIST)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "pprint(GLOBAL_CSV_LIST)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ]
+ "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
- " # Columns\n",
- " fieldnames = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
- "\n",
- " writer.writeheader()\n",
- "\n",
- " for row in GLOBAL_CSV_LIST:\n",
- " writer.writerow(row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "columns = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.9"
+ "execution_count": null
}
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ ]
+}
\ No newline at end of file
From 8f2a4f4a6643982bd00870c5b75a0e466d39879d Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Wed, 5 Jun 2024 18:13:54 +0000
Subject: [PATCH 15/19] :memo: README updated
---
examples/README.md | 152 ++++++++++++++++++++++-----------------------
1 file changed, 76 insertions(+), 76 deletions(-)
diff --git a/examples/README.md b/examples/README.md
index 666506bab..faf0b39a2 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -17,29 +17,29 @@
- Basics |
-  |
-  |
+ Ontologies |
+  |
+  |
+
+
+ Data Rows |
+  |
+  |
Batches |
 |
 |
-
- Custom Embeddings |
-  |
-  |
-
Projects |
 |
 |
- User Management |
-  |
-  |
+ Custom Embeddings |
+  |
+  |
Data Row Metadata |
@@ -52,14 +52,14 @@
 |
- Ontologies |
-  |
-  |
+ Basics |
+  |
+  |
- Data Rows |
-  |
-  |
+ User Management |
+  |
+  |
@@ -75,11 +75,6 @@
-
- Exporting to CSV |
-  |
-  |
-
Composite Mask Export |
 |
@@ -90,6 +85,11 @@
 |
 |
+
+ Exporting to CSV |
+  |
+  |
+
@@ -104,6 +104,11 @@
+
+ Live Multimodal Chat Project |
+  |
+  |
+
Project Setup |
 |
@@ -114,11 +119,6 @@
 |
 |
-
- Live Multimodal Chat Project |
-  |
-  |
-
Queue Management |
 |
@@ -139,9 +139,9 @@
- Audio |
-  |
-  |
+ Conversational LLM Data Generation |
+  |
+  |
Video |
@@ -154,9 +154,9 @@
 |
- Tiled |
-  |
-  |
+ Audio |
+  |
+  |
Conversational |
@@ -169,9 +169,9 @@
 |
- Conversational LLM Data Generation |
-  |
-  |
+ Image |
+  |
+  |
DICOM |
@@ -179,9 +179,9 @@
 |
- Image |
-  |
-  |
+ Conversational LLM |
+  |
+  |
HTML |
@@ -189,9 +189,9 @@
 |
- Conversational LLM |
-  |
-  |
+ Tiled |
+  |
+  |
@@ -208,14 +208,9 @@
- Meta SAM |
-  |
-  |
-
-
- Meta SAM Video |
-  |
-  |
+ Huggingface Custom Embeddings |
+  |
+  |
Langchain |
@@ -223,9 +218,14 @@
 |
- Huggingface Custom Embeddings |
-  |
-  |
+ Meta SAM Video |
+  |
+  |
+
+
+ Meta SAM |
+  |
+  |
@@ -241,6 +241,11 @@
+
+ Custom Metrics Demo |
+  |
+  |
+
Model Slices |
 |
@@ -251,11 +256,6 @@
 |
 |
-
- Custom Metrics Demo |
-  |
-  |
-
Model Predictions to Project |
 |
@@ -275,46 +275,46 @@
+
+ PDF Predictions |
+  |
+  |
+
+
+ HTML Predictions |
+  |
+  |
+
Conversational Predictions |
 |
 |
+
+ Image Predictions |
+  |
+  |
+
Text Predictions |
 |
 |
-
- HTML Predictions |
-  |
-  |
-
-
- Conversational LLM Predictions |
-  |
-  |
-
Geospatial Predictions |
 |
 |
- PDF Predictions |
-  |
-  |
+ Conversational LLM Predictions |
+  |
+  |
Video Predictions |
 |
 |
-
- Image Predictions |
-  |
-  |
-
From b853e230ff9523ff76e39d13b1bed51931bd3448 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:43:15 -0500
Subject: [PATCH 16/19] typos
---
examples/exports/exporting_to_csv.ipynb | 628 ++++++++++++++++++++----
1 file changed, 519 insertions(+), 109 deletions(-)
diff --git a/examples/exports/exporting_to_csv.ipynb b/examples/exports/exporting_to_csv.ipynb
index 9ce718bce..dcee03f6a 100644
--- a/examples/exports/exporting_to_csv.ipynb
+++ b/examples/exports/exporting_to_csv.ipynb
@@ -1,18 +1,16 @@
{
- "nbformat": 4,
- "nbformat_minor": 2,
- "metadata": {},
"cells": [
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "",
- " ",
+ " | \n",
+ " \n",
" | \n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -24,19 +22,19 @@
" \n",
" | "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
- "This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://github.com/Labelbox/labelpandas) friendly format. "
- ],
- "cell_type": "markdown"
+ "This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://pandas.pydata.org/) friendly format. "
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -44,83 +42,267 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "%pip install -q --upgrade \"Labelbox[data]\"\n",
+ "%pip install -q pandas"
+ ]
},
{
- "metadata": {},
- "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "import labelbox as lb\n",
+ "import labelbox.types as lb_types\n",
+ "import uuid\n",
+ "from pprint import pprint\n",
+ "import csv\n",
+ "import pandas as pd"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "API_KEY = None\n",
+ "client = lb.Client(api_key=API_KEY)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
"The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Create dataset with image data row\n",
+ "global_key = str(uuid.uuid4())\n",
+ "\n",
+ "test_img_url = {\n",
+ " \"row_data\":\n",
+ " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
+ " \"global_key\":\n",
+ " global_key,\n",
+ "}\n",
+ "\n",
+ "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
+ "task = dataset.create_data_rows([test_img_url])\n",
+ "task.wait_till_done()\n",
+ "print(\"Errors:\", task.errors)\n",
+ "print(\"Failed data rows:\", task.failed_data_rows)\n",
+ "\n",
+ "# Create ontology\n",
+ "ontology_builder = lb.OntologyBuilder(\n",
+ " classifications=[ # List of Classification objects\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"radio_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_radio_answer\"),\n",
+ " lb.Option(value=\"second_radio_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.CHECKLIST,\n",
+ " name=\"checklist_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_checklist_answer\"),\n",
+ " lb.Option(value=\"second_checklist_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
+ " name=\"free_text\"),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"nested_radio_question\",\n",
+ " options=[\n",
+ " lb.Option(\n",
+ " \"first_radio_answer\",\n",
+ " options=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(\"first_sub_radio_answer\")],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ " tools=[ # List of Tool objects\n",
+ " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
+ " lb.Tool(\n",
+ " tool=lb.Tool.Type.BBOX,\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " classifications=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
+ " ),\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "ontology = client.create_ontology(\n",
+ " \"Image CSV Demo Ontology\",\n",
+ " ontology_builder.asdict(),\n",
+ " media_type=lb.MediaType.Image,\n",
+ ")\n",
+ "\n",
+ "# Set up project and connect ontology\n",
+ "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
+ " media_type=lb.MediaType.Image)\n",
+ "project.setup_editor(ontology)\n",
+ "\n",
+ "# Send data row towards our project\n",
+ "batch = project.create_batch(\n",
+ " \"image-demo-batch\",\n",
+ " global_keys=[\n",
+ " global_key\n",
+ " ], # paginated collection of data row objects, list of data row ids or global keys\n",
+ " priority=1,\n",
+ ")\n",
+ "\n",
+ "print(f\"Batch: {batch}\")\n",
+ "\n",
+ "# Create a label and imported it towards our project\n",
+ "radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"second_radio_answer\")),\n",
+ ")\n",
+ "checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
+ " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
+ " ]),\n",
+ ")\n",
+ "text_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"free_text\",\n",
+ " value=lb_types.Text(answer=\"sample text\"),\n",
+ ")\n",
+ "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ " )),\n",
+ ")\n",
+ "bbox_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bounding_box\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=1690, y=977),\n",
+ " end=lb_types.Point(x=1915, y=1307),\n",
+ " ),\n",
+ ")\n",
+ "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
+ " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
+ " ),\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"tool_first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "label = []\n",
+ "annotations = [\n",
+ " radio_annotation,\n",
+ " nested_radio_annotation,\n",
+ " checklist_annotation,\n",
+ " text_annotation,\n",
+ " bbox_annotation,\n",
+ " bbox_with_radio_subclass_annotation,\n",
+ "]\n",
+ "\n",
+ "label.append(\n",
+ " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
+ "\n",
+ "upload_job = lb.LabelImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=project.uid,\n",
+ " name=\"label_import_job\" + str(uuid.uuid4()),\n",
+ " labels=label,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# PROJECT_ID = None\n",
+ "# project = client.get_project(PROJECT_ID)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -137,10 +319,10 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
@@ -155,10 +337,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -166,93 +348,189 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "data_row_base_columns = [\n",
+ " \"Data Row ID\",\n",
+ " \"Global Key\",\n",
+ " \"External ID\",\n",
+ " \"Project ID\",\n",
+ "]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_Id` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ],
- "cell_type": "markdown"
+ "We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_ids` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
+ ]
},
{
- "metadata": {},
- "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
+ " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
+ " for classification in classifications:\n",
+ " if \"name\" in classification:\n",
+ " class_list.append({\n",
+ " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
+ " \"column_name\": classification[\"instructions\"],\n",
+ " })\n",
+ " if \"options\" in classification:\n",
+ " get_classification_features(classification[\"options\"], class_list)\n",
+ " return class_list\n",
+ "\n",
+ "\n",
+ "def get_tool_features(tools: list) -> None:\n",
+ " \"\"\"Creates list of tool names from ontology\"\"\"\n",
+ " tool_list = []\n",
+ " for tool in tools:\n",
+ " tool_list.append({\n",
+ " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
+ " \"column_name\": tool[\"name\"],\n",
+ " })\n",
+ " if \"classifications\" in tool:\n",
+ " tool_list = get_classification_features(tool[\"classifications\"],\n",
+ " tool_list)\n",
+ " return tool_list"
+ ]
},
{
- "metadata": {},
- "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])\n\npprint(class_annotation_columns)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Get ontology from project and normalized towards python dictionary\n",
+ "ontology = project.ontology().normalized\n",
+ "\n",
+ "class_annotation_columns = get_classification_features(\n",
+ " ontology[\"classifications\"])\n",
+ "tool_annotation_columns = get_tool_features(ontology[\"tools\"])"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
+ " base_columns: list[str]) -> dict[str:str]:\n",
+ " for base_column in base_columns:\n",
+ " if base_column == \"Data Row ID\":\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
+ "\n",
+ " elif base_column == \"Global Key\":\n",
+ " if (\"global_key\"\n",
+ " in data_row[\"data_row\"]): # Check if global key exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"External ID\":\n",
+ " if (\"external_id\"\n",
+ " in data_row[\"data_row\"]): # Check if external_id exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"Project ID\":\n",
+ " csv_row[base_column] = project.uid\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
+ " label_base_columns: list[str]) -> dict[str:str]:\n",
+ " for label_base_column in label_base_columns:\n",
+ " if label_base_column == \"Label ID\":\n",
+ " csv_row[label_base_column] = label[\"id\"]\n",
+ "\n",
+ " elif label_base_columns == \"Created By\":\n",
+ " if (\n",
+ " \"label_details\" in label\n",
+ " ): # Check if label details is present. This field can be omitted in export.\n",
+ " csv_row[label_base_column] = label_base_columns[\n",
+ " \"label_details\"][\"created_by\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " elif label_base_column == \"Skipped\":\n",
+ " if (\n",
+ " \"performance_details\" in label\n",
+ " ): # Check if performance details are present. This field can be omitted in export.\n",
+ " csv_row[label_base_column] = label[\"performance_details\"][\n",
+ " \"skipped\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -271,96 +549,228 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "from pprint import pprint\n\n\ndef get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "from pprint import pprint\n",
+ "\n",
+ "\n",
+ "def get_feature_answers(feature: str,\n",
+ " annotations: list[dict[str:str]]) -> None | str:\n",
+ " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
+ "\n",
+ " Args:\n",
+ " feature (str): feature we are searching\n",
+ " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n",
+ "\n",
+ " Returns:\n",
+ " None | str: The answer/value of the feature returns None if nothing is found\n",
+ " \"\"\"\n",
+ " for annotation in annotations:\n",
+ " print(annotation)\n",
+ " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
+ " ): # Base conditions (found feature)\n",
+ " if \"text_answer\" in annotation:\n",
+ " return annotation[\"text_answer\"][\"content\"]\n",
+ " if \"radio_answer\" in annotation:\n",
+ " return annotation[\"radio_answer\"][\"value\"]\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
+ " return \", \".join([\n",
+ " check_list_ans[\"value\"]\n",
+ " for check_list_ans in annotation[\"checklist_answers\"]\n",
+ " ])\n",
+ " if \"bounding_box\" in annotation:\n",
+ " return annotation[\"bounding_box\"]\n",
+ " # Add more tools here with similar pattern as above\n",
+ "\n",
+ " # Recursion cases (found more classifications to search through)\n",
+ " if \"radio_answer\" in annotation:\n",
+ " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
+ " ) # Call function again return value if answer found\n",
+ " if value:\n",
+ " return value\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " for checklist_ans in annotation[\"checklist_answers\"]:\n",
+ " if len(checklist_ans[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, checklist_ans[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ " if (\"classifications\"\n",
+ " in annotation): # case for if tool has classifications\n",
+ " if len(annotation[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(feature,\n",
+ " annotation[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ "\n",
+ " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n pprint(label)\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n pprint(tool_annotation_columns)\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "GLOBAL_CSV_LIST = []\n",
+ "\n",
+ "\n",
+ "def main(output: lb.BufferedJsonConverterOutput):\n",
+ "\n",
+ " # Navigate to our label list\n",
+ " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
+ " for label in labels:\n",
+ " # Define our CSV \"row\"\n",
+ " csv_row = dict()\n",
+ "\n",
+ " # Start with data row base columns\n",
+ " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
+ " data_row_base_columns)\n",
+ "\n",
+ " # Add our label details\n",
+ " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
+ "\n",
+ " pprint(label)\n",
+ " # Add classification features\n",
+ " for classification in class_annotation_columns:\n",
+ " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
+ " classification, label[\"annotations\"][\"classifications\"])\n",
+ "\n",
+ " pprint(tool_annotation_columns)\n",
+ " # Add tools features\n",
+ " for tool in tool_annotation_columns:\n",
+ " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
+ " tool, label[\"annotations\"][\"objects\"])\n",
+ "\n",
+ " # Append to global csv list\n",
+ " GLOBAL_CSV_LIST.append(csv_row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT).start(\n stream_handler=main\n ) # Feeding our data row handler directly into export",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Params required to obtain all fields we need\n",
+ "params = {\"performance_details\": True, \"label_details\": True}\n",
+ "\n",
+ "export_task = project.export(params=params)\n",
+ "export_task.wait_till_done()\n",
+ "\n",
+ "# Conditional for if export task has errors\n",
+ "if export_task.has_errors():\n",
+ " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
+ " stream_handler=lambda error: print(error))\n",
+ "\n",
+ "if export_task.has_result():\n",
+ " export_json = export_task.get_buffered_stream(\n",
+ " stream_type=lb.StreamType.RESULT).start(\n",
+ " stream_handler=main # Feeding our data row handler directly into export\n",
+ " )"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "pprint(GLOBAL_CSV_LIST)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "pprint(GLOBAL_CSV_LIST)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ],
- "cell_type": "markdown"
+ "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
+ " # Columns\n",
+ " fieldnames = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
+ "\n",
+ " writer.writeheader()\n",
+ "\n",
+ " for row in GLOBAL_CSV_LIST:\n",
+ " writer.writerow(row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "columns = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
+ ]
+ }
+ ],
+ "metadata": {
+ "language_info": {
+ "name": "python"
}
- ]
-}
\ No newline at end of file
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
From 693e2c8d6995283b7caaaa1d3b30074931a6a960 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Wed, 5 Jun 2024 18:44:15 +0000
Subject: [PATCH 17/19] :art: Cleaned
---
examples/exports/exporting_to_csv.ipynb | 624 ++++--------------------
1 file changed, 107 insertions(+), 517 deletions(-)
diff --git a/examples/exports/exporting_to_csv.ipynb b/examples/exports/exporting_to_csv.ipynb
index dcee03f6a..a09e3a9ee 100644
--- a/examples/exports/exporting_to_csv.ipynb
+++ b/examples/exports/exporting_to_csv.ipynb
@@ -1,16 +1,18 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {},
"cells": [
{
- "cell_type": "markdown",
"metadata": {},
"source": [
- "\n",
- " \n",
+ " | ",
+ " ",
" | \n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -22,19 +24,19 @@
" \n",
" | "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
"This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://pandas.pydata.org/) friendly format. "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -42,267 +44,83 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "%pip install -q --upgrade \"Labelbox[data]\"\n",
- "%pip install -q pandas"
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "import labelbox as lb\n",
- "import labelbox.types as lb_types\n",
- "import uuid\n",
- "from pprint import pprint\n",
- "import csv\n",
- "import pandas as pd"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "API_KEY = None\n",
- "client = lb.Client(api_key=API_KEY)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
"The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Create dataset with image data row\n",
- "global_key = str(uuid.uuid4())\n",
- "\n",
- "test_img_url = {\n",
- " \"row_data\":\n",
- " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
- " \"global_key\":\n",
- " global_key,\n",
- "}\n",
- "\n",
- "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
- "task = dataset.create_data_rows([test_img_url])\n",
- "task.wait_till_done()\n",
- "print(\"Errors:\", task.errors)\n",
- "print(\"Failed data rows:\", task.failed_data_rows)\n",
- "\n",
- "# Create ontology\n",
- "ontology_builder = lb.OntologyBuilder(\n",
- " classifications=[ # List of Classification objects\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"radio_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_radio_answer\"),\n",
- " lb.Option(value=\"second_radio_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.CHECKLIST,\n",
- " name=\"checklist_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_checklist_answer\"),\n",
- " lb.Option(value=\"second_checklist_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
- " name=\"free_text\"),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"nested_radio_question\",\n",
- " options=[\n",
- " lb.Option(\n",
- " \"first_radio_answer\",\n",
- " options=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(\"first_sub_radio_answer\")],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " ],\n",
- " tools=[ # List of Tool objects\n",
- " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
- " lb.Tool(\n",
- " tool=lb.Tool.Type.BBOX,\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " classifications=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
- " ),\n",
- " ],\n",
- " ),\n",
- " ],\n",
- ")\n",
- "\n",
- "ontology = client.create_ontology(\n",
- " \"Image CSV Demo Ontology\",\n",
- " ontology_builder.asdict(),\n",
- " media_type=lb.MediaType.Image,\n",
- ")\n",
- "\n",
- "# Set up project and connect ontology\n",
- "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
- " media_type=lb.MediaType.Image)\n",
- "project.setup_editor(ontology)\n",
- "\n",
- "# Send data row towards our project\n",
- "batch = project.create_batch(\n",
- " \"image-demo-batch\",\n",
- " global_keys=[\n",
- " global_key\n",
- " ], # paginated collection of data row objects, list of data row ids or global keys\n",
- " priority=1,\n",
- ")\n",
- "\n",
- "print(f\"Batch: {batch}\")\n",
- "\n",
- "# Create a label and imported it towards our project\n",
- "radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"second_radio_answer\")),\n",
- ")\n",
- "checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
- " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
- " ]),\n",
- ")\n",
- "text_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"free_text\",\n",
- " value=lb_types.Text(answer=\"sample text\"),\n",
- ")\n",
- "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- " )),\n",
- ")\n",
- "bbox_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bounding_box\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=1690, y=977),\n",
- " end=lb_types.Point(x=1915, y=1307),\n",
- " ),\n",
- ")\n",
- "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
- " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
- " ),\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"tool_first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- ")\n",
- "\n",
- "label = []\n",
- "annotations = [\n",
- " radio_annotation,\n",
- " nested_radio_annotation,\n",
- " checklist_annotation,\n",
- " text_annotation,\n",
- " bbox_annotation,\n",
- " bbox_with_radio_subclass_annotation,\n",
- "]\n",
- "\n",
- "label.append(\n",
- " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
- "\n",
- "upload_job = lb.LabelImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=project.uid,\n",
- " name=\"label_import_job\" + str(uuid.uuid4()),\n",
- " labels=label,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# PROJECT_ID = None\n",
- "# project = client.get_project(PROJECT_ID)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -319,10 +137,10 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
@@ -337,10 +155,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -348,189 +166,93 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "data_row_base_columns = [\n",
- " \"Data Row ID\",\n",
- " \"Global Key\",\n",
- " \"External ID\",\n",
- " \"Project ID\",\n",
- "]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_ids` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
- " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
- " for classification in classifications:\n",
- " if \"name\" in classification:\n",
- " class_list.append({\n",
- " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
- " \"column_name\": classification[\"instructions\"],\n",
- " })\n",
- " if \"options\" in classification:\n",
- " get_classification_features(classification[\"options\"], class_list)\n",
- " return class_list\n",
- "\n",
- "\n",
- "def get_tool_features(tools: list) -> None:\n",
- " \"\"\"Creates list of tool names from ontology\"\"\"\n",
- " tool_list = []\n",
- " for tool in tools:\n",
- " tool_list.append({\n",
- " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
- " \"column_name\": tool[\"name\"],\n",
- " })\n",
- " if \"classifications\" in tool:\n",
- " tool_list = get_classification_features(tool[\"classifications\"],\n",
- " tool_list)\n",
- " return tool_list"
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Get ontology from project and normalized towards python dictionary\n",
- "ontology = project.ontology().normalized\n",
- "\n",
- "class_annotation_columns = get_classification_features(\n",
- " ontology[\"classifications\"])\n",
- "tool_annotation_columns = get_tool_features(ontology[\"tools\"])"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
- " base_columns: list[str]) -> dict[str:str]:\n",
- " for base_column in base_columns:\n",
- " if base_column == \"Data Row ID\":\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
- "\n",
- " elif base_column == \"Global Key\":\n",
- " if (\"global_key\"\n",
- " in data_row[\"data_row\"]): # Check if global key exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"External ID\":\n",
- " if (\"external_id\"\n",
- " in data_row[\"data_row\"]): # Check if external_id exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"Project ID\":\n",
- " csv_row[base_column] = project.uid\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export.\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
- " label_base_columns: list[str]) -> dict[str:str]:\n",
- " for label_base_column in label_base_columns:\n",
- " if label_base_column == \"Label ID\":\n",
- " csv_row[label_base_column] = label[\"id\"]\n",
- "\n",
- " elif label_base_columns == \"Created By\":\n",
- " if (\n",
- " \"label_details\" in label\n",
- " ): # Check if label details is present. This field can be omitted in export.\n",
- " csv_row[label_base_column] = label_base_columns[\n",
- " \"label_details\"][\"created_by\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " elif label_base_column == \"Skipped\":\n",
- " if (\n",
- " \"performance_details\" in label\n",
- " ): # Check if performance details are present. This field can be omitted in export.\n",
- " csv_row[label_base_column] = label[\"performance_details\"][\n",
- " \"skipped\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -549,228 +271,96 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "from pprint import pprint\n\n\ndef get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "from pprint import pprint\n",
- "\n",
- "\n",
- "def get_feature_answers(feature: str,\n",
- " annotations: list[dict[str:str]]) -> None | str:\n",
- " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
- "\n",
- " Args:\n",
- " feature (str): feature we are searching\n",
- " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n",
- "\n",
- " Returns:\n",
- " None | str: The answer/value of the feature returns None if nothing is found\n",
- " \"\"\"\n",
- " for annotation in annotations:\n",
- " print(annotation)\n",
- " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
- " ): # Base conditions (found feature)\n",
- " if \"text_answer\" in annotation:\n",
- " return annotation[\"text_answer\"][\"content\"]\n",
- " if \"radio_answer\" in annotation:\n",
- " return annotation[\"radio_answer\"][\"value\"]\n",
- " if \"checklist_answers\" in annotation:\n",
- " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
- " return \", \".join([\n",
- " check_list_ans[\"value\"]\n",
- " for check_list_ans in annotation[\"checklist_answers\"]\n",
- " ])\n",
- " if \"bounding_box\" in annotation:\n",
- " return annotation[\"bounding_box\"]\n",
- " # Add more tools here with similar pattern as above\n",
- "\n",
- " # Recursion cases (found more classifications to search through)\n",
- " if \"radio_answer\" in annotation:\n",
- " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
- " ) # Call function again return value if answer found\n",
- " if value:\n",
- " return value\n",
- " if \"checklist_answers\" in annotation:\n",
- " for checklist_ans in annotation[\"checklist_answers\"]:\n",
- " if len(checklist_ans[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, checklist_ans[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- " if (\"classifications\"\n",
- " in annotation): # case for if tool has classifications\n",
- " if len(annotation[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(feature,\n",
- " annotation[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- "\n",
- " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n pprint(label)\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n pprint(tool_annotation_columns)\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "GLOBAL_CSV_LIST = []\n",
- "\n",
- "\n",
- "def main(output: lb.BufferedJsonConverterOutput):\n",
- "\n",
- " # Navigate to our label list\n",
- " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
- " for label in labels:\n",
- " # Define our CSV \"row\"\n",
- " csv_row = dict()\n",
- "\n",
- " # Start with data row base columns\n",
- " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
- " data_row_base_columns)\n",
- "\n",
- " # Add our label details\n",
- " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
- "\n",
- " pprint(label)\n",
- " # Add classification features\n",
- " for classification in class_annotation_columns:\n",
- " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
- " classification, label[\"annotations\"][\"classifications\"])\n",
- "\n",
- " pprint(tool_annotation_columns)\n",
- " # Add tools features\n",
- " for tool in tool_annotation_columns:\n",
- " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
- " tool, label[\"annotations\"][\"objects\"])\n",
- "\n",
- " # Append to global csv list\n",
- " GLOBAL_CSV_LIST.append(csv_row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT\n ).start(\n stream_handler=main # Feeding our data row handler directly into export\n )",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Params required to obtain all fields we need\n",
- "params = {\"performance_details\": True, \"label_details\": True}\n",
- "\n",
- "export_task = project.export(params=params)\n",
- "export_task.wait_till_done()\n",
- "\n",
- "# Conditional for if export task has errors\n",
- "if export_task.has_errors():\n",
- " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
- " stream_handler=lambda error: print(error))\n",
- "\n",
- "if export_task.has_result():\n",
- " export_json = export_task.get_buffered_stream(\n",
- " stream_type=lb.StreamType.RESULT).start(\n",
- " stream_handler=main # Feeding our data row handler directly into export\n",
- " )"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "pprint(GLOBAL_CSV_LIST)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "pprint(GLOBAL_CSV_LIST)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ]
+ "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
- " # Columns\n",
- " fieldnames = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
- "\n",
- " writer.writeheader()\n",
- "\n",
- " for row in GLOBAL_CSV_LIST:\n",
- " writer.writerow(row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "columns = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
- ]
- }
- ],
- "metadata": {
- "language_info": {
- "name": "python"
+ "execution_count": null
}
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ ]
+}
\ No newline at end of file
From bc54a17a14ddafc41e98cc0593adf256bb9a3a99 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Thu, 6 Jun 2024 08:10:04 -0500
Subject: [PATCH 18/19] removed some print statements
---
examples/exports/exporting_to_csv.ipynb | 620 ++++++++++++++++++++----
1 file changed, 513 insertions(+), 107 deletions(-)
diff --git a/examples/exports/exporting_to_csv.ipynb b/examples/exports/exporting_to_csv.ipynb
index a09e3a9ee..a7885866f 100644
--- a/examples/exports/exporting_to_csv.ipynb
+++ b/examples/exports/exporting_to_csv.ipynb
@@ -1,18 +1,16 @@
{
- "nbformat": 4,
- "nbformat_minor": 2,
- "metadata": {},
"cells": [
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "",
- " ",
+ " | \n",
+ " \n",
" | \n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -24,19 +22,19 @@
" \n",
" | "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
"This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://pandas.pydata.org/) friendly format. "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -44,83 +42,267 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "%pip install -q --upgrade \"Labelbox[data]\"\n",
+ "%pip install -q pandas"
+ ]
},
{
- "metadata": {},
- "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "import labelbox as lb\n",
+ "import labelbox.types as lb_types\n",
+ "import uuid\n",
+ "from pprint import pprint\n",
+ "import csv\n",
+ "import pandas as pd"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "API_KEY = None\n",
+ "client = lb.Client(api_key=API_KEY)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
"The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Create dataset with image data row\n",
+ "global_key = str(uuid.uuid4())\n",
+ "\n",
+ "test_img_url = {\n",
+ " \"row_data\":\n",
+ " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
+ " \"global_key\":\n",
+ " global_key,\n",
+ "}\n",
+ "\n",
+ "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
+ "task = dataset.create_data_rows([test_img_url])\n",
+ "task.wait_till_done()\n",
+ "print(\"Errors:\", task.errors)\n",
+ "print(\"Failed data rows:\", task.failed_data_rows)\n",
+ "\n",
+ "# Create ontology\n",
+ "ontology_builder = lb.OntologyBuilder(\n",
+ " classifications=[ # List of Classification objects\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"radio_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_radio_answer\"),\n",
+ " lb.Option(value=\"second_radio_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.CHECKLIST,\n",
+ " name=\"checklist_question\",\n",
+ " options=[\n",
+ " lb.Option(value=\"first_checklist_answer\"),\n",
+ " lb.Option(value=\"second_checklist_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
+ " name=\"free_text\"),\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"nested_radio_question\",\n",
+ " options=[\n",
+ " lb.Option(\n",
+ " \"first_radio_answer\",\n",
+ " options=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(\"first_sub_radio_answer\")],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ " tools=[ # List of Tool objects\n",
+ " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
+ " lb.Tool(\n",
+ " tool=lb.Tool.Type.BBOX,\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " classifications=[\n",
+ " lb.Classification(\n",
+ " class_type=lb.Classification.Type.RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
+ " ),\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "ontology = client.create_ontology(\n",
+ " \"Image CSV Demo Ontology\",\n",
+ " ontology_builder.asdict(),\n",
+ " media_type=lb.MediaType.Image,\n",
+ ")\n",
+ "\n",
+ "# Set up project and connect ontology\n",
+ "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
+ " media_type=lb.MediaType.Image)\n",
+ "project.setup_editor(ontology)\n",
+ "\n",
+ "# Send data row towards our project\n",
+ "batch = project.create_batch(\n",
+ " \"image-demo-batch\",\n",
+ " global_keys=[\n",
+ " global_key\n",
+ " ], # paginated collection of data row objects, list of data row ids or global keys\n",
+ " priority=1,\n",
+ ")\n",
+ "\n",
+ "print(f\"Batch: {batch}\")\n",
+ "\n",
+ "# Create a label and imported it towards our project\n",
+ "radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"second_radio_answer\")),\n",
+ ")\n",
+ "checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
+ " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
+ " ]),\n",
+ ")\n",
+ "text_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"free_text\",\n",
+ " value=lb_types.Text(answer=\"sample text\"),\n",
+ ")\n",
+ "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ " )),\n",
+ ")\n",
+ "bbox_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bounding_box\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=1690, y=977),\n",
+ " end=lb_types.Point(x=1915, y=1307),\n",
+ " ),\n",
+ ")\n",
+ "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
+ " name=\"bbox_with_radio_subclass\",\n",
+ " value=lb_types.Rectangle(\n",
+ " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
+ " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
+ " ),\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"tool_first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "label = []\n",
+ "annotations = [\n",
+ " radio_annotation,\n",
+ " nested_radio_annotation,\n",
+ " checklist_annotation,\n",
+ " text_annotation,\n",
+ " bbox_annotation,\n",
+ " bbox_with_radio_subclass_annotation,\n",
+ "]\n",
+ "\n",
+ "label.append(\n",
+ " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
+ "\n",
+ "upload_job = lb.LabelImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=project.uid,\n",
+ " name=\"label_import_job\" + str(uuid.uuid4()),\n",
+ " labels=label,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# PROJECT_ID = None\n",
+ "# project = client.get_project(PROJECT_ID)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -137,10 +319,10 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
@@ -155,10 +337,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -166,93 +348,189 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "data_row_base_columns = [\n",
+ " \"Data Row ID\",\n",
+ " \"Global Key\",\n",
+ " \"External ID\",\n",
+ " \"Project ID\",\n",
+ "]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_ids` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
+ " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
+ " for classification in classifications:\n",
+ " if \"name\" in classification:\n",
+ " class_list.append({\n",
+ " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
+ " \"column_name\": classification[\"instructions\"],\n",
+ " })\n",
+ " if \"options\" in classification:\n",
+ " get_classification_features(classification[\"options\"], class_list)\n",
+ " return class_list\n",
+ "\n",
+ "\n",
+ "def get_tool_features(tools: list) -> None:\n",
+ " \"\"\"Creates list of tool names from ontology\"\"\"\n",
+ " tool_list = []\n",
+ " for tool in tools:\n",
+ " tool_list.append({\n",
+ " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
+ " \"column_name\": tool[\"name\"],\n",
+ " })\n",
+ " if \"classifications\" in tool:\n",
+ " tool_list = get_classification_features(tool[\"classifications\"],\n",
+ " tool_list)\n",
+ " return tool_list"
+ ]
},
{
- "metadata": {},
- "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Get ontology from project and normalized towards python dictionary\n",
+ "ontology = project.ontology().normalized\n",
+ "\n",
+ "class_annotation_columns = get_classification_features(\n",
+ " ontology[\"classifications\"])\n",
+ "tool_annotation_columns = get_tool_features(ontology[\"tools\"])"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
+ " base_columns: list[str]) -> dict[str:str]:\n",
+ " for base_column in base_columns:\n",
+ " if base_column == \"Data Row ID\":\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
+ "\n",
+ " elif base_column == \"Global Key\":\n",
+ " if (\"global_key\"\n",
+ " in data_row[\"data_row\"]): # Check if global key exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"External ID\":\n",
+ " if (\"external_id\"\n",
+ " in data_row[\"data_row\"]): # Check if external_id exists\n",
+ " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
+ " else:\n",
+ " csv_row[base_column] = (\n",
+ " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
+ " )\n",
+ "\n",
+ " elif base_column == \"Project ID\":\n",
+ " csv_row[base_column] = project.uid\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export.\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
+ " label_base_columns: list[str]) -> dict[str:str]:\n",
+ " for label_base_column in label_base_columns:\n",
+ " if label_base_column == \"Label ID\":\n",
+ " csv_row[label_base_column] = label[\"id\"]\n",
+ "\n",
+ " elif label_base_columns == \"Created By\":\n",
+ " if (\n",
+ " \"label_details\" in label\n",
+ " ): # Check if label details is present. This field can be omitted in export.\n",
+ " csv_row[label_base_column] = label_base_columns[\n",
+ " \"label_details\"][\"created_by\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " elif label_base_column == \"Skipped\":\n",
+ " if (\n",
+ " \"performance_details\" in label\n",
+ " ): # Check if performance details are present. This field can be omitted in export.\n",
+ " csv_row[label_base_column] = label[\"performance_details\"][\n",
+ " \"skipped\"]\n",
+ " else:\n",
+ " csv_row[label_base_column] = None\n",
+ "\n",
+ " return csv_row"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -271,96 +549,224 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "from pprint import pprint\n\n\ndef get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "def get_feature_answers(feature: str,\n",
+ " annotations: list[dict[str:str]]) -> None | str:\n",
+ " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
+ "\n",
+ " Args:\n",
+ " feature (str): feature we are searching\n",
+ " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n",
+ "\n",
+ " Returns:\n",
+ " None | str: The answer/value of the feature returns None if nothing is found\n",
+ " \"\"\"\n",
+ " for annotation in annotations:\n",
+ " print(annotation)\n",
+ " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
+ " ): # Base conditions (found feature)\n",
+ " if \"text_answer\" in annotation:\n",
+ " return annotation[\"text_answer\"][\"content\"]\n",
+ " if \"radio_answer\" in annotation:\n",
+ " return annotation[\"radio_answer\"][\"value\"]\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
+ " return \", \".join([\n",
+ " check_list_ans[\"value\"]\n",
+ " for check_list_ans in annotation[\"checklist_answers\"]\n",
+ " ])\n",
+ " if \"bounding_box\" in annotation:\n",
+ " return annotation[\"bounding_box\"]\n",
+ " # Add more tools here with similar pattern as above\n",
+ "\n",
+ " # Recursion cases (found more classifications to search through)\n",
+ " if \"radio_answer\" in annotation:\n",
+ " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
+ " ) # Call function again return value if answer found\n",
+ " if value:\n",
+ " return value\n",
+ " if \"checklist_answers\" in annotation:\n",
+ " for checklist_ans in annotation[\"checklist_answers\"]:\n",
+ " if len(checklist_ans[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(\n",
+ " feature, checklist_ans[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ " if (\"classifications\"\n",
+ " in annotation): # case for if tool has classifications\n",
+ " if len(annotation[\"classifications\"]) > 0:\n",
+ " value = get_feature_answers(feature,\n",
+ " annotation[\"classifications\"])\n",
+ " if value:\n",
+ " return value\n",
+ "\n",
+ " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n pprint(label)\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n pprint(tool_annotation_columns)\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "GLOBAL_CSV_LIST = []\n",
+ "\n",
+ "\n",
+ "def main(output: lb.BufferedJsonConverterOutput):\n",
+ "\n",
+ " # Navigate to our label list\n",
+ " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
+ " for label in labels:\n",
+ " # Define our CSV \"row\"\n",
+ " csv_row = dict()\n",
+ "\n",
+ " # Start with data row base columns\n",
+ " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
+ " data_row_base_columns)\n",
+ "\n",
+ " # Add our label details\n",
+ " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
+ "\n",
+ " # Add classification features\n",
+ " for classification in class_annotation_columns:\n",
+ " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
+ " classification, label[\"annotations\"][\"classifications\"])\n",
+ " \n",
+ " # Add tools features\n",
+ " for tool in tool_annotation_columns:\n",
+ " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
+ " tool, label[\"annotations\"][\"objects\"])\n",
+ "\n",
+ " # Append to global csv list\n",
+ " GLOBAL_CSV_LIST.append(csv_row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT\n ).start(\n stream_handler=main # Feeding our data row handler directly into export\n )",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# Params required to obtain all fields we need\n",
+ "params = {\"performance_details\": True, \"label_details\": True}\n",
+ "\n",
+ "export_task = project.export(params=params)\n",
+ "export_task.wait_till_done()\n",
+ "\n",
+ "# Conditional for if export task has errors\n",
+ "if export_task.has_errors():\n",
+ " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
+ " stream_handler=lambda error: print(error))\n",
+ "\n",
+ "if export_task.has_result():\n",
+ " export_json = export_task.get_buffered_stream(\n",
+ " stream_type=lb.StreamType.RESULT\n",
+ " ).start(\n",
+ " stream_handler=main # Feeding our data row handler directly into export\n",
+ " )"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "pprint(GLOBAL_CSV_LIST)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "pprint(GLOBAL_CSV_LIST)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ],
- "cell_type": "markdown"
+ "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
+ " # Columns\n",
+ " fieldnames = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
+ "\n",
+ " writer.writeheader()\n",
+ "\n",
+ " for row in GLOBAL_CSV_LIST:\n",
+ " writer.writerow(row)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "columns = (data_row_base_columns + label_base_columns +\n",
+ " [name[\"column_name\"] for name in class_annotation_columns] +\n",
+ " [name[\"column_name\"] for name in tool_annotation_columns])\n",
+ "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
+ ]
}
- ]
-}
\ No newline at end of file
+ ],
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
From 004e59c5445bff2947f2a36398599aa77cdab960 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Thu, 6 Jun 2024 13:10:58 +0000
Subject: [PATCH 19/19] :art: Cleaned
---
examples/exports/exporting_to_csv.ipynb | 620 ++++--------------------
1 file changed, 107 insertions(+), 513 deletions(-)
diff --git a/examples/exports/exporting_to_csv.ipynb b/examples/exports/exporting_to_csv.ipynb
index a7885866f..80d906c37 100644
--- a/examples/exports/exporting_to_csv.ipynb
+++ b/examples/exports/exporting_to_csv.ipynb
@@ -1,16 +1,18 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {},
"cells": [
{
- "cell_type": "markdown",
"metadata": {},
"source": [
- "\n",
- " \n",
+ " | ",
+ " ",
" | \n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -22,19 +24,19 @@
" \n",
" | "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"# Export to CSV or Pandas format\n",
"\n",
"This notebook serves as a simplified How-To guide and provides examples of converting Labelbox export JSON to a CSV and [Pandas](https://pandas.pydata.org/) friendly format. "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Advance approach\n",
@@ -42,267 +44,83 @@
"For a more abstract approach, please visit our [LabelPandas](https://github.com/Labelbox/labelpandas) library. You can use this library to abstract the steps to be shown. In addition, this library supports importing CSV data. \n",
"\n",
"We strongly encourage collaboration - please feel free to fork this repo and tweak the code base to work for your own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "%pip install -q --upgrade \"Labelbox[data]\"\n%pip install -q pandas",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "%pip install -q --upgrade \"Labelbox[data]\"\n",
- "%pip install -q pandas"
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid\nfrom pprint import pprint\nimport csv\nimport pandas as pd",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "import labelbox as lb\n",
- "import labelbox.types as lb_types\n",
- "import uuid\n",
- "from pprint import pprint\n",
- "import csv\n",
- "import pandas as pd"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## API key and client\n",
"Provide a valid API key below to connect to the Labelbox client properly. For more information, please review the [Create API Key](https://docs.labelbox.com/reference/create-api-key) guide."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "API_KEY = None\n",
- "client = lb.Client(api_key=API_KEY)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Create or select example project\n",
"\n",
"The below steps will set up a project that can be used for this demo. Please feel free to delete the code block below and uncomment the code block that fetches your own project directly. For more information on this setup, visit our [quick start guide](https://docs.labelbox.com/reference/quick-start)."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Create Project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Create dataset with image data row\nglobal_key = str(uuid.uuid4())\n\ntest_img_url = {\n \"row_data\":\n \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n \"global_key\":\n global_key,\n}\n\ndataset = client.create_dataset(name=\"image-demo-dataset\")\ntask = dataset.create_data_rows([test_img_url])\ntask.wait_till_done()\nprint(\"Errors:\", task.errors)\nprint(\"Failed data rows:\", task.failed_data_rows)\n\n# Create ontology\nontology_builder = lb.OntologyBuilder(\n classifications=[ # List of Classification objects\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_question\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_question\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"free_text\"),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"nested_radio_question\",\n options=[\n lb.Option(\n \"first_radio_answer\",\n options=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(\"first_sub_radio_answer\")],\n )\n ],\n )\n ],\n ),\n ],\n tools=[ # List of Tool objects\n lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n lb.Tool(\n tool=lb.Tool.Type.BBOX,\n name=\"bbox_with_radio_subclass\",\n classifications=[\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"sub_radio_question\",\n options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n ),\n ],\n ),\n ],\n)\n\nontology = client.create_ontology(\n \"Image CSV Demo Ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Image,\n)\n\n# Set up project and connect ontology\nproject = client.create_project(name=\"Image Annotation Import Demo\",\n media_type=lb.MediaType.Image)\nproject.setup_editor(ontology)\n\n# Send data row towards our project\nbatch = project.create_batch(\n \"image-demo-batch\",\n global_keys=[\n global_key\n ], # paginated collection of data row objects, list of data row ids or global keys\n priority=1,\n)\n\nprint(f\"Batch: {batch}\")\n\n# Create a label and imported it towards our project\nradio_annotation = lb_types.ClassificationAnnotation(\n name=\"radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"second_radio_answer\")),\n)\nchecklist_annotation = lb_types.ClassificationAnnotation(\n name=\"checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n ]),\n)\ntext_annotation = lb_types.ClassificationAnnotation(\n name=\"free_text\",\n value=lb_types.Text(answer=\"sample text\"),\n)\nnested_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\nbbox_annotation = lb_types.ObjectAnnotation(\n name=\"bounding_box\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=1690, y=977),\n end=lb_types.Point(x=1915, y=1307),\n ),\n)\nbbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n name=\"bbox_with_radio_subclass\",\n value=lb_types.Rectangle(\n start=lb_types.Point(x=541, y=933), # x = left, y = top\n end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n ),\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"tool_first_sub_radio_answer\")),\n )\n ],\n)\n\nlabel = []\nannotations = [\n radio_annotation,\n nested_radio_annotation,\n checklist_annotation,\n text_annotation,\n bbox_annotation,\n bbox_with_radio_subclass_annotation,\n]\n\nlabel.append(\n lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n\nupload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Create dataset with image data row\n",
- "global_key = str(uuid.uuid4())\n",
- "\n",
- "test_img_url = {\n",
- " \"row_data\":\n",
- " \"https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg\",\n",
- " \"global_key\":\n",
- " global_key,\n",
- "}\n",
- "\n",
- "dataset = client.create_dataset(name=\"image-demo-dataset\")\n",
- "task = dataset.create_data_rows([test_img_url])\n",
- "task.wait_till_done()\n",
- "print(\"Errors:\", task.errors)\n",
- "print(\"Failed data rows:\", task.failed_data_rows)\n",
- "\n",
- "# Create ontology\n",
- "ontology_builder = lb.OntologyBuilder(\n",
- " classifications=[ # List of Classification objects\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"radio_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_radio_answer\"),\n",
- " lb.Option(value=\"second_radio_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.CHECKLIST,\n",
- " name=\"checklist_question\",\n",
- " options=[\n",
- " lb.Option(value=\"first_checklist_answer\"),\n",
- " lb.Option(value=\"second_checklist_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.Classification(class_type=lb.Classification.Type.TEXT,\n",
- " name=\"free_text\"),\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"nested_radio_question\",\n",
- " options=[\n",
- " lb.Option(\n",
- " \"first_radio_answer\",\n",
- " options=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(\"first_sub_radio_answer\")],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " ],\n",
- " tools=[ # List of Tool objects\n",
- " lb.Tool(tool=lb.Tool.Type.BBOX, name=\"bounding_box\"),\n",
- " lb.Tool(\n",
- " tool=lb.Tool.Type.BBOX,\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " classifications=[\n",
- " lb.Classification(\n",
- " class_type=lb.Classification.Type.RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.Option(value=\"tool_first_sub_radio_answer\")],\n",
- " ),\n",
- " ],\n",
- " ),\n",
- " ],\n",
- ")\n",
- "\n",
- "ontology = client.create_ontology(\n",
- " \"Image CSV Demo Ontology\",\n",
- " ontology_builder.asdict(),\n",
- " media_type=lb.MediaType.Image,\n",
- ")\n",
- "\n",
- "# Set up project and connect ontology\n",
- "project = client.create_project(name=\"Image Annotation Import Demo\",\n",
- " media_type=lb.MediaType.Image)\n",
- "project.setup_editor(ontology)\n",
- "\n",
- "# Send data row towards our project\n",
- "batch = project.create_batch(\n",
- " \"image-demo-batch\",\n",
- " global_keys=[\n",
- " global_key\n",
- " ], # paginated collection of data row objects, list of data row ids or global keys\n",
- " priority=1,\n",
- ")\n",
- "\n",
- "print(f\"Batch: {batch}\")\n",
- "\n",
- "# Create a label and imported it towards our project\n",
- "radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"second_radio_answer\")),\n",
- ")\n",
- "checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(name=\"first_checklist_answer\"),\n",
- " lb_types.ClassificationAnswer(name=\"second_checklist_answer\"),\n",
- " ]),\n",
- ")\n",
- "text_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"free_text\",\n",
- " value=lb_types.Text(answer=\"sample text\"),\n",
- ")\n",
- "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- " )),\n",
- ")\n",
- "bbox_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bounding_box\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=1690, y=977),\n",
- " end=lb_types.Point(x=1915, y=1307),\n",
- " ),\n",
- ")\n",
- "bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(\n",
- " name=\"bbox_with_radio_subclass\",\n",
- " value=lb_types.Rectangle(\n",
- " start=lb_types.Point(x=541, y=933), # x = left, y = top\n",
- " end=lb_types.Point(x=871, y=1124), # x= left + width , y = top + height\n",
- " ),\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"tool_first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- ")\n",
- "\n",
- "label = []\n",
- "annotations = [\n",
- " radio_annotation,\n",
- " nested_radio_annotation,\n",
- " checklist_annotation,\n",
- " text_annotation,\n",
- " bbox_annotation,\n",
- " bbox_with_radio_subclass_annotation,\n",
- "]\n",
- "\n",
- "label.append(\n",
- " lb_types.Label(data={\"global_key\": global_key}, annotations=annotations))\n",
- "\n",
- "upload_job = lb.LabelImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=project.uid,\n",
- " name=\"label_import_job\" + str(uuid.uuid4()),\n",
- " labels=label,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Select project"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# PROJECT_ID = None\n# project = client.get_project(PROJECT_ID)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# PROJECT_ID = None\n",
- "# project = client.get_project(PROJECT_ID)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## CSV format overview\n",
@@ -319,10 +137,10 @@
"```\n",
"\n",
"Essentially, we need to get our JSON data towards a list of Python dictionaries, with each Python dictionary representing one row, each key representing a column, and each value is an individual cell of our CSV table. Once we have our data in this format, it is trivial to create Pandas DataFrames or write our CSV file. The tricky part is getting Labelbox to export JSON towards this format."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Labelbox JSON format\n",
@@ -337,10 +155,10 @@
"4. Setting up our main data row handler function\n",
"5. Export our data\n",
"6. Convert to our desired format"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Establish our base columns\n",
@@ -348,189 +166,93 @@
"We first establish our base columns that represent individual data row details. Typically, this column's information can be received from within one or two levels of a Labelbox export per data row. \n",
"\n",
"Please feel free to modify the below columns if you want to include more. You will need to update the code later in this guide to pick up any additional columns."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "data_row_base_columns = [\n \"Data Row ID\",\n \"Global Key\",\n \"External ID\",\n \"Project ID\",\n]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "data_row_base_columns = [\n",
- " \"Data Row ID\",\n",
- " \"Global Key\",\n",
- " \"External ID\",\n",
- " \"Project ID\",\n",
- "]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create our columns for label fields\n",
"\n",
"In this step, we define the label details base columns we want to include in our CSV. In this case, we will use the following:"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "label_base_columns = [\"Label ID\", \"Created By\", \"Skipped\"]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"We then need to establish the annotations we want to include in our columns. The order of our list matters since that is the order in which our columns will be presented. You can approach getting the annotations in a list in a number of ways, including hard defining the columns. We will be mapping between `feature_schema_ids` and our column name. The reason for introducing this mapping is the annotation name can be the same in certain situations, but `feature_schema_ids` are completely unique. This also allows you to change the column names to something other than what is included in the ontology. In the code below, I will be recursively going through the ontology we created to get our `feature_schema_ids` and column names based on the names of the features. In the next step of this guide, we will provide more information on recursion in the context of parsing through JSON or Python dictionaries."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_classification_features(classifications: list, class_list=[]) -> None:\n \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n for classification in classifications:\n if \"name\" in classification:\n class_list.append({\n \"feature_schema_id\": classification[\"featureSchemaId\"],\n \"column_name\": classification[\"instructions\"],\n })\n if \"options\" in classification:\n get_classification_features(classification[\"options\"], class_list)\n return class_list\n\n\ndef get_tool_features(tools: list) -> None:\n \"\"\"Creates list of tool names from ontology\"\"\"\n tool_list = []\n for tool in tools:\n tool_list.append({\n \"feature_schema_id\": tool[\"featureSchemaId\"],\n \"column_name\": tool[\"name\"],\n })\n if \"classifications\" in tool:\n tool_list = get_classification_features(tool[\"classifications\"],\n tool_list)\n return tool_list",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_classification_features(classifications: list, class_list=[]) -> None:\n",
- " \"\"\"Finds classification features inside an ontology recursively and returns them in a list\"\"\"\n",
- " for classification in classifications:\n",
- " if \"name\" in classification:\n",
- " class_list.append({\n",
- " \"feature_schema_id\": classification[\"featureSchemaId\"],\n",
- " \"column_name\": classification[\"instructions\"],\n",
- " })\n",
- " if \"options\" in classification:\n",
- " get_classification_features(classification[\"options\"], class_list)\n",
- " return class_list\n",
- "\n",
- "\n",
- "def get_tool_features(tools: list) -> None:\n",
- " \"\"\"Creates list of tool names from ontology\"\"\"\n",
- " tool_list = []\n",
- " for tool in tools:\n",
- " tool_list.append({\n",
- " \"feature_schema_id\": tool[\"featureSchemaId\"],\n",
- " \"column_name\": tool[\"name\"],\n",
- " })\n",
- " if \"classifications\" in tool:\n",
- " tool_list = get_classification_features(tool[\"classifications\"],\n",
- " tool_list)\n",
- " return tool_list"
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Get ontology from project and normalized towards python dictionary\nontology = project.ontology().normalized\n\nclass_annotation_columns = get_classification_features(\n ontology[\"classifications\"])\ntool_annotation_columns = get_tool_features(ontology[\"tools\"])",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Get ontology from project and normalized towards python dictionary\n",
- "ontology = project.ontology().normalized\n",
- "\n",
- "class_annotation_columns = get_classification_features(\n",
- " ontology[\"classifications\"])\n",
- "tool_annotation_columns = get_tool_features(ontology[\"tools\"])"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Define our functions and strategy used to parse through our data\n",
"\n",
"Now that we have our columns defined, we need to come up with a strategy for navigating our export data. Review this [sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export) to follow along. While creating our columns, it is always best to first check if a key exists in your data row before populating a column. This is especially important for optional fields. In this demo, we will populate the value `None` for anything not present, which will result in a blank cell our CSV.\n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Data row detail base columns\n",
"The data row details can be accessed within a depth of one or two keys. Below is a function we will use to access the columns we defined. The parameters are the data row itself, the dictionary row that will be used to make our list, and our base columns list."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n base_columns: list[str]) -> dict[str:str]:\n for base_column in base_columns:\n if base_column == \"Data Row ID\":\n csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n\n elif base_column == \"Global Key\":\n if (\"global_key\"\n in data_row[\"data_row\"]): # Check if global key exists\n csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n else:\n csv_row[base_column] = (\n None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"External ID\":\n if (\"external_id\"\n in data_row[\"data_row\"]): # Check if external_id exists\n csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n else:\n csv_row[base_column] = (\n None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n )\n\n elif base_column == \"Project ID\":\n csv_row[base_column] = project.uid\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_data_row_columns(data_row: dict[str:str], csv_row: dict[str:str],\n",
- " base_columns: list[str]) -> dict[str:str]:\n",
- " for base_column in base_columns:\n",
- " if base_column == \"Data Row ID\":\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"id\"]\n",
- "\n",
- " elif base_column == \"Global Key\":\n",
- " if (\"global_key\"\n",
- " in data_row[\"data_row\"]): # Check if global key exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"global_key\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If global key does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"External ID\":\n",
- " if (\"external_id\"\n",
- " in data_row[\"data_row\"]): # Check if external_id exists\n",
- " csv_row[base_column] = data_row[\"data_row\"][\"external_id\"]\n",
- " else:\n",
- " csv_row[base_column] = (\n",
- " None # If external id does not exist on data row set cell to None. This will create a blank cell on your csv\n",
- " )\n",
- "\n",
- " elif base_column == \"Project ID\":\n",
- " csv_row[base_column] = project.uid\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label detail base columns\n",
"The label details are similar to data row details but exist at our export's label level. Later in the guide we will demonstrate how to get our exported data row at this level. The function below shows the process of obtaining the details we defined above. The parameters are the label, the dictionary row that we will be modifying, and the label detail column list we created."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n label_base_columns: list[str]) -> dict[str:str]:\n for label_base_column in label_base_columns:\n if label_base_column == \"Label ID\":\n csv_row[label_base_column] = label[\"id\"]\n\n elif label_base_columns == \"Created By\":\n if (\n \"label_details\" in label\n ): # Check if label details is present. This field can be omitted in export.\n csv_row[label_base_column] = label_base_columns[\n \"label_details\"][\"created_by\"]\n else:\n csv_row[label_base_column] = None\n\n elif label_base_column == \"Skipped\":\n if (\n \"performance_details\" in label\n ): # Check if performance details are present. This field can be omitted in export.\n csv_row[label_base_column] = label[\"performance_details\"][\n \"skipped\"]\n else:\n csv_row[label_base_column] = None\n\n return csv_row",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_base_label_columns(label: dict[str:str], csv_row: dict[str:str],\n",
- " label_base_columns: list[str]) -> dict[str:str]:\n",
- " for label_base_column in label_base_columns:\n",
- " if label_base_column == \"Label ID\":\n",
- " csv_row[label_base_column] = label[\"id\"]\n",
- "\n",
- " elif label_base_columns == \"Created By\":\n",
- " if (\n",
- " \"label_details\" in label\n",
- " ): # Check if label details is present. This field can be omitted in export.\n",
- " csv_row[label_base_column] = label_base_columns[\n",
- " \"label_details\"][\"created_by\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " elif label_base_column == \"Skipped\":\n",
- " if (\n",
- " \"performance_details\" in label\n",
- " ): # Check if performance details are present. This field can be omitted in export.\n",
- " csv_row[label_base_column] = label[\"performance_details\"][\n",
- " \"skipped\"]\n",
- " else:\n",
- " csv_row[label_base_column] = None\n",
- "\n",
- " return csv_row"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Label annotation columns\n",
@@ -549,224 +271,96 @@
"\n",
"#### Tools\n",
"Tools are not nested but they can have nested classifications we will use or `get_feature_answers` function below to find the nested classification. Since tools are at the base level of a label and each tool has a different value key name, we will only be searching for bounding boxes for this tutorial. If you want to include other tools, reference our export guide for your data type and find the appropriate key to add on."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "def get_feature_answers(feature: str,\n annotations: list[dict[str:str]]) -> None | str:\n \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n\n Args:\n feature (str): feature we are searching\n classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n\n Returns:\n None | str: The answer/value of the feature returns None if nothing is found\n \"\"\"\n for annotation in annotations:\n print(annotation)\n if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n ): # Base conditions (found feature)\n if \"text_answer\" in annotation:\n return annotation[\"text_answer\"][\"content\"]\n if \"radio_answer\" in annotation:\n return annotation[\"radio_answer\"][\"value\"]\n if \"checklist_answers\" in annotation:\n # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n return \", \".join([\n check_list_ans[\"value\"]\n for check_list_ans in annotation[\"checklist_answers\"]\n ])\n if \"bounding_box\" in annotation:\n return annotation[\"bounding_box\"]\n # Add more tools here with similar pattern as above\n\n # Recursion cases (found more classifications to search through)\n if \"radio_answer\" in annotation:\n if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, annotation[\"radio_answer\"][\"classifications\"]\n ) # Call function again return value if answer found\n if value:\n return value\n if \"checklist_answers\" in annotation:\n for checklist_ans in annotation[\"checklist_answers\"]:\n if len(checklist_ans[\"classifications\"]) > 0:\n value = get_feature_answers(\n feature, checklist_ans[\"classifications\"])\n if value:\n return value\n if (\"classifications\"\n in annotation): # case for if tool has classifications\n if len(annotation[\"classifications\"]) > 0:\n value = get_feature_answers(feature,\n annotation[\"classifications\"])\n if value:\n return value\n\n return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "def get_feature_answers(feature: str,\n",
- " annotations: list[dict[str:str]]) -> None | str:\n",
- " \"\"\"Returns answer of feature provided by navigating through a label's annotation list. Will return None if answer is not found.\n",
- "\n",
- " Args:\n",
- " feature (str): feature we are searching\n",
- " classifications (list[dict[str:str]]): annotation list that we will be searching for our feature with.\n",
- "\n",
- " Returns:\n",
- " None | str: The answer/value of the feature returns None if nothing is found\n",
- " \"\"\"\n",
- " for annotation in annotations:\n",
- " print(annotation)\n",
- " if (annotation[\"feature_schema_id\"] == feature[\"feature_schema_id\"]\n",
- " ): # Base conditions (found feature)\n",
- " if \"text_answer\" in annotation:\n",
- " return annotation[\"text_answer\"][\"content\"]\n",
- " if \"radio_answer\" in annotation:\n",
- " return annotation[\"radio_answer\"][\"value\"]\n",
- " if \"checklist_answers\" in annotation:\n",
- " # Since classifications can have more then one answer. This is set up to combine all classifications separated by a comma. Feel free to modify.\n",
- " return \", \".join([\n",
- " check_list_ans[\"value\"]\n",
- " for check_list_ans in annotation[\"checklist_answers\"]\n",
- " ])\n",
- " if \"bounding_box\" in annotation:\n",
- " return annotation[\"bounding_box\"]\n",
- " # Add more tools here with similar pattern as above\n",
- "\n",
- " # Recursion cases (found more classifications to search through)\n",
- " if \"radio_answer\" in annotation:\n",
- " if len(annotation[\"radio_answer\"][\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, annotation[\"radio_answer\"][\"classifications\"]\n",
- " ) # Call function again return value if answer found\n",
- " if value:\n",
- " return value\n",
- " if \"checklist_answers\" in annotation:\n",
- " for checklist_ans in annotation[\"checklist_answers\"]:\n",
- " if len(checklist_ans[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(\n",
- " feature, checklist_ans[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- " if (\"classifications\"\n",
- " in annotation): # case for if tool has classifications\n",
- " if len(annotation[\"classifications\"]) > 0:\n",
- " value = get_feature_answers(feature,\n",
- " annotation[\"classifications\"])\n",
- " if value:\n",
- " return value\n",
- "\n",
- " return None # Base case if searched through classifications and nothing was found (end of JSON). This can be omitted but included to visualize"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Setting up our main data row handler function\n",
"Before we can start exporting, we need to set up our main data row handler. This function will be fed straight into our export. This function will put everything together and connect all the pieces. We will also be defining our global dictionary list that will be used to create our CSVs. The output parameter represents each data row."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "GLOBAL_CSV_LIST = []\n\n\ndef main(output: lb.BufferedJsonConverterOutput):\n\n # Navigate to our label list\n labels = output.json[\"projects\"][project.uid][\"labels\"]\n for label in labels:\n # Define our CSV \"row\"\n csv_row = dict()\n\n # Start with data row base columns\n csv_row = get_base_data_row_columns(output.json, csv_row,\n data_row_base_columns)\n\n # Add our label details\n csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n\n # Add classification features\n for classification in class_annotation_columns:\n csv_row[classification[\"column_name\"]] = get_feature_answers(\n classification, label[\"annotations\"][\"classifications\"])\n\n # Add tools features\n for tool in tool_annotation_columns:\n csv_row[tool[\"column_name\"]] = get_feature_answers(\n tool, label[\"annotations\"][\"objects\"])\n\n # Append to global csv list\n GLOBAL_CSV_LIST.append(csv_row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "GLOBAL_CSV_LIST = []\n",
- "\n",
- "\n",
- "def main(output: lb.BufferedJsonConverterOutput):\n",
- "\n",
- " # Navigate to our label list\n",
- " labels = output.json[\"projects\"][project.uid][\"labels\"]\n",
- " for label in labels:\n",
- " # Define our CSV \"row\"\n",
- " csv_row = dict()\n",
- "\n",
- " # Start with data row base columns\n",
- " csv_row = get_base_data_row_columns(output.json, csv_row,\n",
- " data_row_base_columns)\n",
- "\n",
- " # Add our label details\n",
- " csv_row = get_base_label_columns(label, csv_row, label_base_columns)\n",
- "\n",
- " # Add classification features\n",
- " for classification in class_annotation_columns:\n",
- " csv_row[classification[\"column_name\"]] = get_feature_answers(\n",
- " classification, label[\"annotations\"][\"classifications\"])\n",
- " \n",
- " # Add tools features\n",
- " for tool in tool_annotation_columns:\n",
- " csv_row[tool[\"column_name\"]] = get_feature_answers(\n",
- " tool, label[\"annotations\"][\"objects\"])\n",
- "\n",
- " # Append to global csv list\n",
- " GLOBAL_CSV_LIST.append(csv_row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Export our data\n",
"Now that we have defined functions and strategies, we are ready to export. Below, we are exporting directly from our project and feeding in the main function we created above."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Params required to obtain all fields we need\nparams = {\"performance_details\": True, \"label_details\": True}\n\nexport_task = project.export(params=params)\nexport_task.wait_till_done()\n\n# Conditional for if export task has errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nif export_task.has_result():\n export_json = export_task.get_buffered_stream(\n stream_type=lb.StreamType.RESULT\n ).start(\n stream_handler=main # Feeding our data row handler directly into export\n )",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Params required to obtain all fields we need\n",
- "params = {\"performance_details\": True, \"label_details\": True}\n",
- "\n",
- "export_task = project.export(params=params)\n",
- "export_task.wait_till_done()\n",
- "\n",
- "# Conditional for if export task has errors\n",
- "if export_task.has_errors():\n",
- " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
- " stream_handler=lambda error: print(error))\n",
- "\n",
- "if export_task.has_result():\n",
- " export_json = export_task.get_buffered_stream(\n",
- " stream_type=lb.StreamType.RESULT\n",
- " ).start(\n",
- " stream_handler=main # Feeding our data row handler directly into export\n",
- " )"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"If everything went through correctly, you should see your `GLOBAL_CSV_LIST` printed out below with all your \"rows\" filled out."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "pprint(GLOBAL_CSV_LIST)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "pprint(GLOBAL_CSV_LIST)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Convert to our desired format\n",
"\n",
- "The hard part is now completed!🚀 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
- ]
+ "The hard part is now completed!\ud83d\ude80 Now that you have your export in a flattened format, you can easily convert to a CSV or a Pandas DataFrame!"
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option A: CSV writer"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n # Columns\n fieldnames = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\n writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n\n writer.writeheader()\n\n for row in GLOBAL_CSV_LIST:\n writer.writerow(row)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "with open(\"file.csv\", \"w\", newline=\"\") as csvfile:\n",
- " # Columns\n",
- " fieldnames = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n",
- "\n",
- " writer.writeheader()\n",
- "\n",
- " for row in GLOBAL_CSV_LIST:\n",
- " writer.writerow(row)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Option B: Pandas DataFrame"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "columns = (data_row_base_columns + label_base_columns +\n [name[\"column_name\"] for name in class_annotation_columns] +\n [name[\"column_name\"] for name in tool_annotation_columns])\npd.DataFrame(GLOBAL_CSV_LIST, columns=columns)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "columns = (data_row_base_columns + label_base_columns +\n",
- " [name[\"column_name\"] for name in class_annotation_columns] +\n",
- " [name[\"column_name\"] for name in tool_annotation_columns])\n",
- "pd.DataFrame(GLOBAL_CSV_LIST, columns=columns)"
- ]
+ "execution_count": null
}
- ],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ ]
+}
\ No newline at end of file