From 666920e3e08acdc91b7fb3c1cb964fa088cc39e5 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Fri, 26 Jul 2024 12:58:13 -0500
Subject: [PATCH 01/12] added prompt response create project method to notebook
---
annotation_import/prompt_response.ipynb | 605 ++++++++++++++++++++++++
1 file changed, 605 insertions(+)
create mode 100644 annotation_import/prompt_response.ipynb
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
new file mode 100644
index 0000000..de8d13b
--- /dev/null
+++ b/annotation_import/prompt_response.ipynb
@@ -0,0 +1,605 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
\n",
+ " \n",
+ " | \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " | \n",
+ "\n",
+ "\n",
+ " \n",
+ " | "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Prompt and response projects with MAL and Ground Truth\n",
+ "\n",
+ "This notebook is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Annotation payload types\n",
+ "\n",
+ "Labelbox supports two formats for the annotations payload:\n",
+ "\n",
+ "- Python annotation types (recommended)\n",
+ " - Provides a seamless transition between third-party platforms, machine learning pipelines, and Labelbox.\n",
+ " - Allows you to build annotations locally with local file paths, numpy arrays, or URLs.\n",
+ " - Supports easy conversion to NDJSON format to quickly import annotations to Labelbox.\n",
+ " - Supports one-level nested classification (radio, checklist, or free-form text) under a tool or classification annotation.\n",
+ "- JSON\n",
+ " - Skips formatting annotation payload in the Labelbox Python annotation type.\n",
+ " - Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Label Import Types\n",
+ "\n",
+ "Labelbox supports two types of label imports:\n",
+ "\n",
+ "- [Model-assisted labeling (MAL)](https://docs.labelbox.com/docs/model-assisted-labeling)\n",
+ " - This workflow allows you to import computer-generated predictions (or simply annotations created outside of Labelbox) as pre-labels on an asset.\n",
+ "- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth)\n",
+ " - This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Set up "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install -q \"labelbox[data]\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import labelbox as lb\n",
+ "import labelbox.types as lb_types\n",
+ "import uuid"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Replace with your API key\n",
+ "\n",
+ "Replace the value of `API_KEY` with a valid [API key]([ref:create-api-key](https://docs.labelbox.com/reference/create-api-key)) to connect to the Labelbox client."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "API_KEY = None\n",
+ "client = lb.Client(api_key=API_KEY)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Supported Annotations\n",
+ "\n",
+ "The following annotations are supported for an prompt and response generated project:\n",
+ "\n",
+ "- Prompt and response creation projects\n",
+ " - Prompt text\n",
+ " - Radio\n",
+ " - Checklist\n",
+ " - Response text\n",
+ "\n",
+ "- Prompt creation projects\n",
+ " - Prompt text\n",
+ "\n",
+ "- Response creation projects\n",
+ " - Radio\n",
+ " - Checklist"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Prompt:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Free-form text"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt_annotation = lb_types.PromptClassificationAnnotation(\n",
+ " name = \"Follow the prompt and select answers\",\n",
+ " value = lb_types.PromptText(answer = \"This is an example of a prompt\")\n",
+ ")\n",
+ "\n",
+ "prompt_annotation_ndjson = {\n",
+ " \"name\": \"Follow the prompt and select answers\",\n",
+ " \"answer\": \"This is an example of a prompt\",\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Responses:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Radio (single-choice)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "response_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"response radio feature\",\n",
+ " value=lb_types.Radio(answer = \n",
+ " lb_types.ClassificationAnswer(name = \"first_radio_answer\")\n",
+ " )\n",
+ ")\n",
+ "\n",
+ "response_radio_annotation_ndjson = {\n",
+ " \"name\": \"response radio feature\",\n",
+ " \"answer\": {\n",
+ " \"name\": \"first_radio_answer\"\n",
+ " },\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Checklist (multi-choice)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"response checklist feature\",\n",
+ " value=lb_types.Checklist(answer = [\n",
+ " lb_types.ClassificationAnswer(name = \"option_1\"),\n",
+ " lb_types.ClassificationAnswer(name = \"option_2\"),\n",
+ " ])\n",
+ " )\n",
+ "\n",
+ "response_checklist_annotation_ndjson = {\n",
+ " \"name\": \"response checklist feature\",\n",
+ " \"answer\": [\n",
+ " {\n",
+ " \"name\": \"option_1\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"option_2\"\n",
+ " }\n",
+ " ]\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Free-form text"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "response_text_annotation = lb_types.ClassificationAnnotation(\n",
+ " name = \"Provide a reason for your choice\", \n",
+ " value = lb_types.Text(answer = \"This is an example of a response text\")\n",
+ ")\n",
+ "\n",
+ "response_text_annotation_ndjson = {\n",
+ " \"name\": \"Provide a reason for your choice\",\n",
+ " \"answer\": \"This is an example of a response text\"\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Nested classifications"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_response_radio_question\",\n",
+ " value=lb_types.Radio(\n",
+ " answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(\n",
+ " answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_radio_answer\"\n",
+ " )\n",
+ " )\n",
+ " )\n",
+ " ]\n",
+ " )\n",
+ " )\n",
+ ")\n",
+ "\n",
+ "nested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_response_checklist_question\",\n",
+ " value=lb_types.Checklist(\n",
+ " answer=[lb_types.ClassificationAnswer(\n",
+ " name=\"first_checklist_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_checklist_question\",\n",
+ " value=lb_types.Checklist(\n",
+ " answer=[lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_checklist_answer\"\n",
+ " )]\n",
+ " ))\n",
+ " ]\n",
+ " )]\n",
+ " )\n",
+ ")\n",
+ "\n",
+ "nested_response_radio_annotation_ndjson = {\n",
+ " \"name\": \"nested_radio_question\",\n",
+ " \"answer\": [{\n",
+ " \"name\": \"first_radio_answer\", \n",
+ " \"classifications\" : [\n",
+ " {\n",
+ " \"name\": \"sub_radio_question\", \n",
+ " \"answer\": {\"name\": \"first_sub_radio_answer\"}\n",
+ " } \n",
+ " ] \n",
+ " }]\n",
+ "}\n",
+ "\n",
+ "nested_response_checklist_annotation_ndjson = {\n",
+ " \"name\": \"nested_checklist_question\",\n",
+ " \"answer\": [{\n",
+ " \"name\": \"first_checklist_answer\", \n",
+ " \"classifications\" : [\n",
+ " {\n",
+ " \"name\": \"sub_checklist_question\", \n",
+ " \"answer\": {\"name\": \"first_sub_checklist_answer\"}\n",
+ " } \n",
+ " ] \n",
+ " }]\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 1: Create a project and data rows in Labelbox UI\n",
+ "\n",
+ "Depending on what prompt response project type this step could look different. Review [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) creation guide for more details on the differences. \n",
+ "\n",
+ "In this tutorial, we will just be importing annotations for a prompt response creation project. But the process will look similar for prompt creation and response creation projects. Review the corresponding [developer guide](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) to this tutorial for more examples on the other project types."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Prompt response and prompt creation\n",
+ "\n",
+ "For prompt response and prompt creation empty data rows are generated for you on project creation. After your projects are created you will need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows. This can been done by exporting from the newly created project."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt_response_project = client.create_model_evaluation_project(\n",
+ " name=\"Demo prompt response project\",\n",
+ " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
+ " dataset_name=\"Demo prompt response dataset\",\n",
+ " data_row_count=1,\n",
+ ")\n",
+ "\n",
+ "export_task = prompt_response_project.export()\n",
+ "export_task.wait_till_done()\n",
+ "\n",
+ "\n",
+ "# Check export for any errors\n",
+ "if export_task.has_errors():\n",
+ " export_task.get_buffered_stream(\n",
+ " stream_type=lb.StreamType.ERRORS\n",
+ " ).start(stream_handler=lambda error: print(error))\n",
+ "\n",
+ "stream = export_task.get_buffered_stream()\n",
+ "\n",
+ "# Obtain global keys to be used later on\n",
+ "global_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 2: Set up ontology\n",
+ "\n",
+ "Your project ontology should support the classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the `name` parameter should match the value of the `name` field in your annotation. \n",
+ "\n",
+ "For example, when we created an annotation above, we provided a name`annotation_name`. Now, when we set up our ontology, we must ensure that the name of our bounding box tool is also `anotations_name`. The same alignment must hold true for the other tools and classifications we create in our ontology.\n",
+ "\n",
+ "This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ontology_builder = lb.OntologyBuilder(\n",
+ " tools=[],\n",
+ " classifications=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.PROMPT,\n",
+ " name=\"prompt text\",\n",
+ " character_min = 1, # Minimum character count of prompt field (optional)\n",
+ " character_max = 20, # Maximum character count of prompt field (optional)\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
+ " name=\"response checklist feature\",\n",
+ " options=[\n",
+ " lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n",
+ " lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
+ " name=\"response radio feature\",\n",
+ " options=[\n",
+ " lb.ResponseOption(value=\"first_radio_answer\"),\n",
+ " lb.ResponseOption(value=\"second_radio_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n",
+ " name=\"response text\",\n",
+ " character_min = 1, # Minimum character count of response text field (optional)\n",
+ " character_max = 20, # Maximum character count of response text field (optional)\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
+ " name=\"nested_response_radio_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\"first_radio_answer\",\n",
+ " options=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.RESPONSE_RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[lb.ResponseOption(\"first_sub_radio_answer\")]\n",
+ " )\n",
+ " ])\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
+ " name=\"nested_response_checklist_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\"first_checklist_answer\",\n",
+ " options=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.RESPONSE_CHECKLIST,\n",
+ " name=\"sub_checklist_question\",\n",
+ " options=[lb.ResponseOption(\"first_sub_checklist_answer\")]\n",
+ " )\n",
+ " ])\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "# Create ontology\n",
+ "ontology = client.create_ontology(\n",
+ " \"Prompt and response ontology\",\n",
+ " ontology_builder.asdict(),\n",
+ " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 3: Create the annotations payload\n",
+ "\n",
+ "For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter. For ground truths, pass the payload to the `labels` parameter."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Python annotation objects\n",
+ "label = []\n",
+ "annotations = [\n",
+ " prompt_annotation,\n",
+ " response_radio_annotation,\n",
+ " response_checklist_annotation,\n",
+ " response_text_annotation,\n",
+ " nested_response_radio_annotation,\n",
+ " nested_response_checklist_annotation\n",
+ "]\n",
+ "label.append(\n",
+ " lb_types.Label(data={\"global_key\" : global_keys[0] },\n",
+ " annotations=annotations)\n",
+ " )\n",
+ "\n",
+ "# NDJSON\n",
+ "label_ndjson = []\n",
+ "annotations = [\n",
+ " prompt_annotation_ndjson,\n",
+ " response_radio_annotation_ndjson,\n",
+ " response_checklist_annotation_ndjson,\n",
+ " response_text_annotation_ndjson,\n",
+ " nested_response_radio_annotation_ndjson,\n",
+ " nested_response_checklist_annotation_ndjson\n",
+ "]\n",
+ "for annotation in annotations:\n",
+ " annotation.update({\n",
+ " \"dataRow\": {\n",
+ " \"globalKey\": global_keys[0]\n",
+ " },\n",
+ " })\n",
+ " label_ndjson.append(annotation)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Option A: Upload as [prelabels (model assisted labeling)](doc:model-assisted-labeling)\n",
+ "\n",
+ "This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "upload_job = lb.MALPredictionImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=prompt_response_project.uid,\n",
+ " name=f\"mal_job-{str(uuid.uuid4())}\",\n",
+ " predictions=label,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Option B: Upload to a labeling project as [ground truth](doc:import-ground-truth)\n",
+ "\n",
+ "This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "upload_job = lb.LabelImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=prompt_response_project.uid,\n",
+ " name=\"label_import_job\" + str(uuid.uuid4()),\n",
+ " labels=label_ndjson,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Clean up\n",
+ "\n",
+ "Uncomment and run the cell below to optionally delete Labelbox objects created"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# project.delete()\n",
+ "# client.delete_unused_ontology(ontology.uid)"
+ ]
+ }
+ ],
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
From 20ab1c5a3140876c84a4e05043a099109d46233a Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Fri, 26 Jul 2024 13:02:38 -0500
Subject: [PATCH 02/12] added prompt response
---
annotation_import/prompt_response.ipynb | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index de8d13b..f9a7ebc 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -144,7 +144,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "#### Free-form text"
+ "#### Prompt text"
]
},
{
@@ -237,7 +237,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "#### Free-form text"
+ "#### Response text"
]
},
{
From 2800b6e91787a964e9e557c59d9846fa62fbee16 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Fri, 26 Jul 2024 13:07:29 -0500
Subject: [PATCH 03/12] fixed workflow
---
.github/workflows/notebooks.yml | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/notebooks.yml b/.github/workflows/notebooks.yml
index 61ec646..04e20f2 100644
--- a/.github/workflows/notebooks.yml
+++ b/.github/workflows/notebooks.yml
@@ -4,11 +4,11 @@ on:
push:
branches: [main]
paths:
- - ./**/*.ipynb
+ - **.ipynb
pull_request:
branches: [main]
paths:
- - ./**/*.ipynb
+ - **.ipynb
permissions:
contents: write
From 55f02555decf908bcce1e0d9220c71a8b1a285d6 Mon Sep 17 00:00:00 2001
From: Gabefire <33893811+Gabefire@users.noreply.github.com>
Date: Fri, 26 Jul 2024 13:08:45 -0500
Subject: [PATCH 04/12] fixed workflow
---
.github/workflows/notebooks.yml | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/notebooks.yml b/.github/workflows/notebooks.yml
index 04e20f2..62529ef 100644
--- a/.github/workflows/notebooks.yml
+++ b/.github/workflows/notebooks.yml
@@ -4,11 +4,11 @@ on:
push:
branches: [main]
paths:
- - **.ipynb
+ - "**.ipynb"
pull_request:
branches: [main]
paths:
- - **.ipynb
+ - "**.ipynb"
permissions:
contents: write
From 6a0f7a303880e52c45dc91dd63900db9917137ed Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Fri, 26 Jul 2024 18:09:33 +0000
Subject: [PATCH 05/12] :art: Cleaned
---
annotation_import/prompt_response.ipynb | 477 +++++-------------------
scripts/generate_readme.py | 12 +-
2 files changed, 106 insertions(+), 383 deletions(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index f9a7ebc..4aceea4 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -1,16 +1,18 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {},
"cells": [
{
- "cell_type": "markdown",
"metadata": {},
"source": [
- "\n",
- " \n",
+ " | ",
+ " ",
" | \n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -22,19 +24,19 @@
" \n",
" | "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"# Prompt and response projects with MAL and Ground Truth\n",
"\n",
"This notebook is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Annotation payload types\n",
@@ -49,10 +51,10 @@
"- JSON\n",
" - Skips formatting annotation payload in the Labelbox Python annotation type.\n",
" - Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Label Import Types\n",
@@ -63,56 +65,47 @@
" - This workflow allows you to import computer-generated predictions (or simply annotations created outside of Labelbox) as pre-labels on an asset.\n",
"- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth)\n",
" - This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "%pip install -q \"labelbox[data]\"",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "%pip install -q \"labelbox[data]\""
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "import labelbox as lb\n",
- "import labelbox.types as lb_types\n",
- "import uuid"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Replace with your API key\n",
"\n",
"Replace the value of `API_KEY` with a valid [API key]([ref:create-api-key](https://docs.labelbox.com/reference/create-api-key)) to connect to the Labelbox client."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "API_KEY = None\n",
- "client = lb.Client(api_key=API_KEY)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Supported Annotations\n",
@@ -131,211 +124,94 @@
"- Response creation projects\n",
" - Radio\n",
" - Checklist"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt:"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Prompt text"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "prompt_annotation = lb_types.PromptClassificationAnnotation(\n name=\"Follow the prompt and select answers\",\n value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n)\n\nprompt_annotation_ndjson = {\n \"name\": \"Follow the prompt and select answers\",\n \"answer\": \"This is an example of a prompt\",\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "prompt_annotation = lb_types.PromptClassificationAnnotation(\n",
- " name = \"Follow the prompt and select answers\",\n",
- " value = lb_types.PromptText(answer = \"This is an example of a prompt\")\n",
- ")\n",
- "\n",
- "prompt_annotation_ndjson = {\n",
- " \"name\": \"Follow the prompt and select answers\",\n",
- " \"answer\": \"This is an example of a prompt\",\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Responses:"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Radio (single-choice)"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"response radio feature\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\")),\n)\n\nresponse_radio_annotation_ndjson = {\n \"name\": \"response radio feature\",\n \"answer\": {\n \"name\": \"first_radio_answer\"\n },\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"response radio feature\",\n",
- " value=lb_types.Radio(answer = \n",
- " lb_types.ClassificationAnswer(name = \"first_radio_answer\")\n",
- " )\n",
- ")\n",
- "\n",
- "response_radio_annotation_ndjson = {\n",
- " \"name\": \"response radio feature\",\n",
- " \"answer\": {\n",
- " \"name\": \"first_radio_answer\"\n",
- " },\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Checklist (multi-choice)"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"response checklist feature\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"option_1\"),\n lb_types.ClassificationAnswer(name=\"option_2\"),\n ]),\n)\n\nresponse_checklist_annotation_ndjson = {\n \"name\": \"response checklist feature\",\n \"answer\": [{\n \"name\": \"option_1\"\n }, {\n \"name\": \"option_2\"\n }],\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"response checklist feature\",\n",
- " value=lb_types.Checklist(answer = [\n",
- " lb_types.ClassificationAnswer(name = \"option_1\"),\n",
- " lb_types.ClassificationAnswer(name = \"option_2\"),\n",
- " ])\n",
- " )\n",
- "\n",
- "response_checklist_annotation_ndjson = {\n",
- " \"name\": \"response checklist feature\",\n",
- " \"answer\": [\n",
- " {\n",
- " \"name\": \"option_1\"\n",
- " },\n",
- " {\n",
- " \"name\": \"option_2\"\n",
- " }\n",
- " ]\n",
- "}"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": null
+ },
+ {
"metadata": {},
"source": [
"#### Response text"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_text_annotation = lb_types.ClassificationAnnotation(\n name=\"Provide a reason for your choice\",\n value=lb_types.Text(answer=\"This is an example of a response text\"),\n)\n\nresponse_text_annotation_ndjson = {\n \"name\": \"Provide a reason for your choice\",\n \"answer\": \"This is an example of a response text\",\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_text_annotation = lb_types.ClassificationAnnotation(\n",
- " name = \"Provide a reason for your choice\", \n",
- " value = lb_types.Text(answer = \"This is an example of a response text\")\n",
- ")\n",
- "\n",
- "response_text_annotation_ndjson = {\n",
- " \"name\": \"Provide a reason for your choice\",\n",
- " \"answer\": \"This is an example of a response text\"\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Nested classifications"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\n\nnested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_checklist_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_sub_checklist_answer\")\n ]),\n )\n ],\n )\n ]),\n)\n\nnested_response_radio_annotation_ndjson = {\n \"name\":\n \"nested_radio_question\",\n \"answer\": [{\n \"name\":\n \"first_radio_answer\",\n \"classifications\": [{\n \"name\": \"sub_radio_question\",\n \"answer\": {\n \"name\": \"first_sub_radio_answer\"\n },\n }],\n }],\n}\n\nnested_response_checklist_annotation_ndjson = {\n \"name\":\n \"nested_checklist_question\",\n \"answer\": [{\n \"name\":\n \"first_checklist_answer\",\n \"classifications\": [{\n \"name\": \"sub_checklist_question\",\n \"answer\": {\n \"name\": \"first_sub_checklist_answer\"\n },\n }],\n }],\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_response_radio_question\",\n",
- " value=lb_types.Radio(\n",
- " answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(\n",
- " answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_radio_answer\"\n",
- " )\n",
- " )\n",
- " )\n",
- " ]\n",
- " )\n",
- " )\n",
- ")\n",
- "\n",
- "nested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_response_checklist_question\",\n",
- " value=lb_types.Checklist(\n",
- " answer=[lb_types.ClassificationAnswer(\n",
- " name=\"first_checklist_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_checklist_question\",\n",
- " value=lb_types.Checklist(\n",
- " answer=[lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_checklist_answer\"\n",
- " )]\n",
- " ))\n",
- " ]\n",
- " )]\n",
- " )\n",
- ")\n",
- "\n",
- "nested_response_radio_annotation_ndjson = {\n",
- " \"name\": \"nested_radio_question\",\n",
- " \"answer\": [{\n",
- " \"name\": \"first_radio_answer\", \n",
- " \"classifications\" : [\n",
- " {\n",
- " \"name\": \"sub_radio_question\", \n",
- " \"answer\": {\"name\": \"first_sub_radio_answer\"}\n",
- " } \n",
- " ] \n",
- " }]\n",
- "}\n",
- "\n",
- "nested_response_checklist_annotation_ndjson = {\n",
- " \"name\": \"nested_checklist_question\",\n",
- " \"answer\": [{\n",
- " \"name\": \"first_checklist_answer\", \n",
- " \"classifications\" : [\n",
- " {\n",
- " \"name\": \"sub_checklist_question\", \n",
- " \"answer\": {\"name\": \"first_sub_checklist_answer\"}\n",
- " } \n",
- " ] \n",
- " }]\n",
- "}"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": null
+ },
+ {
"metadata": {},
"source": [
"## Step 1: Create a project and data rows in Labelbox UI\n",
@@ -343,48 +219,26 @@
"Depending on what prompt response project type this step could look different. Review [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) creation guide for more details on the differences. \n",
"\n",
"In this tutorial, we will just be importing annotations for a prompt response creation project. But the process will look similar for prompt creation and response creation projects. Review the corresponding [developer guide](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) to this tutorial for more examples on the other project types."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt response and prompt creation\n",
"\n",
"For prompt response and prompt creation empty data rows are generated for you on project creation. After your projects are created you will need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows. This can been done by exporting from the newly created project."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "prompt_response_project = client.create_model_evaluation_project(\n name=\"Demo prompt response project\",\n media_type=lb.MediaType.LLMPromptResponseCreation,\n dataset_name=\"Demo prompt response dataset\",\n data_row_count=1,\n)\n\nexport_task = prompt_response_project.export()\nexport_task.wait_till_done()\n\n# Check export for any errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nstream = export_task.get_buffered_stream()\n\n# Obtain global keys to be used later on\nglobal_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "prompt_response_project = client.create_model_evaluation_project(\n",
- " name=\"Demo prompt response project\",\n",
- " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
- " dataset_name=\"Demo prompt response dataset\",\n",
- " data_row_count=1,\n",
- ")\n",
- "\n",
- "export_task = prompt_response_project.export()\n",
- "export_task.wait_till_done()\n",
- "\n",
- "\n",
- "# Check export for any errors\n",
- "if export_task.has_errors():\n",
- " export_task.get_buffered_stream(\n",
- " stream_type=lb.StreamType.ERRORS\n",
- " ).start(stream_handler=lambda error: print(error))\n",
- "\n",
- "stream = export_task.get_buffered_stream()\n",
- "\n",
- "# Obtain global keys to be used later on\n",
- "global_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Set up ontology\n",
@@ -394,212 +248,79 @@
"For example, when we created an annotation above, we provided a name`annotation_name`. Now, when we set up our ontology, we must ensure that the name of our bounding box tool is also `anotations_name`. The same alignment must hold true for the other tools and classifications we create in our ontology.\n",
"\n",
"This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations)."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "ontology_builder = lb.OntologyBuilder(\n tools=[],\n classifications=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.PROMPT,\n name=\"prompt text\",\n character_min=1, # Minimum character count of prompt field (optional)\n character_max=\n 20, # Maximum character count of prompt field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"response checklist feature\",\n options=[\n lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"response radio feature\",\n options=[\n lb.ResponseOption(value=\"first_radio_answer\"),\n lb.ResponseOption(value=\"second_radio_answer\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n name=\"response text\",\n character_min=\n 1, # Minimum character count of response text field (optional)\n character_max=\n 20, # Maximum character count of response text field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"nested_response_radio_question\",\n options=[\n lb.ResponseOption(\n \"first_radio_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_RADIO,\n name=\"sub_radio_question\",\n options=[\n lb.ResponseOption(\"first_sub_radio_answer\")\n ],\n )\n ],\n )\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"nested_response_checklist_question\",\n options=[\n lb.ResponseOption(\n \"first_checklist_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_CHECKLIST,\n name=\"sub_checklist_question\",\n options=[\n lb.ResponseOption(\"first_sub_checklist_answer\")\n ],\n )\n ],\n )\n ],\n ),\n ],\n)\n\n# Create ontology\nontology = client.create_ontology(\n \"Prompt and response ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.LLMPromptResponseCreation,\n)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "ontology_builder = lb.OntologyBuilder(\n",
- " tools=[],\n",
- " classifications=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.PROMPT,\n",
- " name=\"prompt text\",\n",
- " character_min = 1, # Minimum character count of prompt field (optional)\n",
- " character_max = 20, # Maximum character count of prompt field (optional)\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
- " name=\"response checklist feature\",\n",
- " options=[\n",
- " lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n",
- " lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
- " name=\"response radio feature\",\n",
- " options=[\n",
- " lb.ResponseOption(value=\"first_radio_answer\"),\n",
- " lb.ResponseOption(value=\"second_radio_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n",
- " name=\"response text\",\n",
- " character_min = 1, # Minimum character count of response text field (optional)\n",
- " character_max = 20, # Maximum character count of response text field (optional)\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
- " name=\"nested_response_radio_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\"first_radio_answer\",\n",
- " options=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.RESPONSE_RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[lb.ResponseOption(\"first_sub_radio_answer\")]\n",
- " )\n",
- " ])\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
- " name=\"nested_response_checklist_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\"first_checklist_answer\",\n",
- " options=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.RESPONSE_CHECKLIST,\n",
- " name=\"sub_checklist_question\",\n",
- " options=[lb.ResponseOption(\"first_sub_checklist_answer\")]\n",
- " )\n",
- " ])\n",
- " ],\n",
- " ),\n",
- " ],\n",
- ")\n",
- "\n",
- "# Create ontology\n",
- "ontology = client.create_ontology(\n",
- " \"Prompt and response ontology\",\n",
- " ontology_builder.asdict(),\n",
- " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
- ")"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Create the annotations payload\n",
"\n",
"For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter. For ground truths, pass the payload to the `labels` parameter."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Python annotation objects\nlabel = []\nannotations = [\n prompt_annotation,\n response_radio_annotation,\n response_checklist_annotation,\n response_text_annotation,\n nested_response_radio_annotation,\n nested_response_checklist_annotation,\n]\nlabel.append(\n lb_types.Label(data={\"global_key\": global_keys[0]},\n annotations=annotations))\n\n# NDJSON\nlabel_ndjson = []\nannotations = [\n prompt_annotation_ndjson,\n response_radio_annotation_ndjson,\n response_checklist_annotation_ndjson,\n response_text_annotation_ndjson,\n nested_response_radio_annotation_ndjson,\n nested_response_checklist_annotation_ndjson,\n]\nfor annotation in annotations:\n annotation.update({\n \"dataRow\": {\n \"globalKey\": global_keys[0]\n },\n })\n label_ndjson.append(annotation)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Python annotation objects\n",
- "label = []\n",
- "annotations = [\n",
- " prompt_annotation,\n",
- " response_radio_annotation,\n",
- " response_checklist_annotation,\n",
- " response_text_annotation,\n",
- " nested_response_radio_annotation,\n",
- " nested_response_checklist_annotation\n",
- "]\n",
- "label.append(\n",
- " lb_types.Label(data={\"global_key\" : global_keys[0] },\n",
- " annotations=annotations)\n",
- " )\n",
- "\n",
- "# NDJSON\n",
- "label_ndjson = []\n",
- "annotations = [\n",
- " prompt_annotation_ndjson,\n",
- " response_radio_annotation_ndjson,\n",
- " response_checklist_annotation_ndjson,\n",
- " response_text_annotation_ndjson,\n",
- " nested_response_radio_annotation_ndjson,\n",
- " nested_response_checklist_annotation_ndjson\n",
- "]\n",
- "for annotation in annotations:\n",
- " annotation.update({\n",
- " \"dataRow\": {\n",
- " \"globalKey\": global_keys[0]\n",
- " },\n",
- " })\n",
- " label_ndjson.append(annotation)"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": null
+ },
+ {
"metadata": {},
"source": [
"#### Option A: Upload as [prelabels (model assisted labeling)](doc:model-assisted-labeling)\n",
"\n",
"This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "upload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=f\"mal_job-{str(uuid.uuid4())}\",\n predictions=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "upload_job = lb.MALPredictionImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=prompt_response_project.uid,\n",
- " name=f\"mal_job-{str(uuid.uuid4())}\",\n",
- " predictions=label,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Option B: Upload to a labeling project as [ground truth](doc:import-ground-truth)\n",
"\n",
"This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "upload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label_ndjson,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "upload_job = lb.LabelImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=prompt_response_project.uid,\n",
- " name=\"label_import_job\" + str(uuid.uuid4()),\n",
- " labels=label_ndjson,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"\n",
"Uncomment and run the cell below to optionally delete Labelbox objects created"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# project.delete()\n# client.delete_unused_ontology(ontology.uid)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# project.delete()\n",
- "# client.delete_unused_ontology(ontology.uid)"
- ]
+ "execution_count": null
}
- ],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ ]
+}
\ No newline at end of file
diff --git a/scripts/generate_readme.py b/scripts/generate_readme.py
index 7542cb6..db02293 100644
--- a/scripts/generate_readme.py
+++ b/scripts/generate_readme.py
@@ -45,21 +45,23 @@
"""
COLAB_TEMPLATE = "https://colab.research.google.com/github/Labelbox/labelbox-notebooks/blob/main/{filename}"
-GITHUB_TEMPLATE = "https://github.com/Labelbox/labelbox-notebooks/tree/main/{filename}"
+GITHUB_TEMPLATE = (
+ "https://github.com/Labelbox/labelbox-notebooks/tree/main/{filename}"
+)
+
def special_order(link_dict: Dict[str, list]) -> Dict:
- """This is used to add a special order to certain sections. It makes a copy of the link dict provided then loops through items inside the link dict to create a specified order. (Not random) anything not found in the global variable for the section is just tacked on to the end.
- """
+ """This is used to add a special order to certain sections. It makes a copy of the link dict provided then loops through items inside the link dict to create a specified order. (Not random) anything not found in the global variable for the section is just tacked on to the end."""
modified_link_dict = copy.deepcopy(link_dict)
for section, links in link_dict.items():
-
+
if section == "basics":
basic_order = BASICS_ORDER
for link_name in links:
if link_name not in BASICS_ORDER:
basic_order.append(link_name)
modified_link_dict[section] = basic_order
-
+
return modified_link_dict
From e63c901ea80cf3b4fceb515e27c9ff15dc666f7b Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Fri, 26 Jul 2024 18:10:13 +0000
Subject: [PATCH 06/12] :memo: README updated
---
README.md | 135 ++++++++++++++++++++++++++++--------------------------
1 file changed, 70 insertions(+), 65 deletions(-)
diff --git a/README.md b/README.md
index 614cfce..9031bec 100644
--- a/README.md
+++ b/README.md
@@ -77,11 +77,6 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
 |
-
- Composite mask export |
-  |
-  |
-
Export data |
 |
@@ -92,6 +87,11 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
 |
+
+ Composite mask export |
+  |
+  |
+
@@ -111,6 +111,11 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
 |
+
+ Queue management |
+  |
+  |
+
Webhooks |
 |
@@ -121,11 +126,6 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
 |
-
- Queue management |
-  |
-  |
-
@@ -141,29 +141,29 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
- Audio |
-  |
-  |
+ Tiled |
+  |
+  |
- Video |
-  |
-  |
+ Conversational LLM |
+  |
+  |
- Text |
-  |
-  |
+ HTML |
+  |
+  |
- Tiled |
-  |
-  |
+ Conversational LLM data generation |
+  |
+  |
- Conversational |
-  |
-  |
+ Image |
+  |
+  |
PDF |
@@ -171,9 +171,9 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
- Conversational LLM data generation |
-  |
-  |
+ Prompt response |
+  |
+  |
DICOM |
@@ -181,19 +181,24 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
- Image |
-  |
-  |
+ Text |
+  |
+  |
- HTML |
-  |
-  |
+ Audio |
+  |
+  |
- Conversational LLM |
-  |
-  |
+ Conversational |
+  |
+  |
+
+
+ Video |
+  |
+  |
@@ -210,15 +215,20 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
- Meta SAM |
-  |
-  |
+ Import YOLOv8 annotations |
+  |
+  |
Meta SAM video |
 |
 |
+
+ Meta SAM |
+  |
+  |
+
Langchain |
 |
@@ -229,11 +239,6 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
 |
-
- Import YOLOv8 annotations |
-  |
-  |
-
@@ -248,6 +253,11 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
+
+ Custom metrics demo |
+  |
+  |
+
Model slices |
 |
@@ -258,11 +268,6 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
 |
-
- Custom metrics demo |
-  |
-  |
-
Model predictions to project |
 |
@@ -282,6 +287,21 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
+
+ Video predictions |
+  |
+  |
+
+
+ HTML predictions |
+  |
+  |
+
+
+ Geospatial predictions |
+  |
+  |
+
Conversational predictions |
 |
@@ -292,31 +312,16 @@ Welcome to Labelbox Notebooks! These documents are directly linked from our Labe
 |
 |
-
- HTML predictions |
-  |
-  |
-
Conversational LLM predictions |
 |
 |
-
- Geospatial predictions |
-  |
-  |
-
PDF predictions |
 |
 |
-
- Video predictions |
-  |
-  |
-
Image predictions |
 |
From 5b680706646b44bbf83fdde906956a5fa6053dd0 Mon Sep 17 00:00:00 2001
From: x-eun
Date: Sun, 28 Jul 2024 23:42:38 -0700
Subject: [PATCH 07/12] Update prompt_response.ipynb
---
annotation_import/prompt_response.ipynb | 500 ++++++++++++++++++------
1 file changed, 391 insertions(+), 109 deletions(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index 4aceea4..dc17d77 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -1,18 +1,16 @@
{
- "nbformat": 4,
- "nbformat_minor": 2,
- "metadata": {},
"cells": [
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "",
- " ",
+ " | \n",
+ " \n",
" | \n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -24,19 +22,19 @@
" \n",
" | "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"# Prompt and response projects with MAL and Ground Truth\n",
"\n",
"This notebook is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Annotation payload types\n",
@@ -51,66 +49,73 @@
"- JSON\n",
" - Skips formatting annotation payload in the Labelbox Python annotation type.\n",
" - Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Label Import Types\n",
"\n",
"Labelbox supports two types of label imports:\n",
"\n",
- "- [Model-assisted labeling (MAL)](https://docs.labelbox.com/docs/model-assisted-labeling)\n",
- " - This workflow allows you to import computer-generated predictions (or simply annotations created outside of Labelbox) as pre-labels on an asset.\n",
- "- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth)\n",
- " - This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth."
- ],
- "cell_type": "markdown"
+ "- [Model-assisted labeling (MAL)](https://docs.labelbox.com/docs/model-assisted-labeling) allows you to import computer-generated predictions and simple annotations created outside of Labelbox as pre-labels on an asset.\n",
+ "- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth) allows you to bulk import ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data can consolidate and migrate all annotations into Labelbox as a single source of truth."
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up "
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "%pip install -q \"labelbox[data]\"",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "%pip install -q \"labelbox[data]\""
+ ]
},
{
- "metadata": {},
- "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "import labelbox as lb\n",
+ "import labelbox.types as lb_types\n",
+ "import uuid"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Replace with your API key\n",
"\n",
"Replace the value of `API_KEY` with a valid [API key]([ref:create-api-key](https://docs.labelbox.com/reference/create-api-key)) to connect to the Labelbox client."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "API_KEY = None\n",
+ "client = lb.Client(api_key=API_KEY)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Supported Annotations\n",
"\n",
- "The following annotations are supported for an prompt and response generated project:\n",
+ "Prompt and response generated projects support the following annotations:\n",
"\n",
"- Prompt and response creation projects\n",
" - Prompt text\n",
@@ -124,203 +129,480 @@
"- Response creation projects\n",
" - Radio\n",
" - Checklist"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "### Prompt:"
- ],
- "cell_type": "markdown"
+ "### Prompt"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Prompt text"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "prompt_annotation = lb_types.PromptClassificationAnnotation(\n name=\"Follow the prompt and select answers\",\n value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n)\n\nprompt_annotation_ndjson = {\n \"name\": \"Follow the prompt and select answers\",\n \"answer\": \"This is an example of a prompt\",\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "prompt_annotation = lb_types.PromptClassificationAnnotation(\n",
+ " name=\"Follow the prompt and select answers\",\n",
+ " value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n",
+ ")\n",
+ "\n",
+ "prompt_annotation_ndjson = {\n",
+ " \"name\": \"Follow the prompt and select answers\",\n",
+ " \"answer\": \"This is an example of a prompt\",\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "### Responses:"
- ],
- "cell_type": "markdown"
+ "### Responses"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Radio (single-choice)"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"response radio feature\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\")),\n)\n\nresponse_radio_annotation_ndjson = {\n \"name\": \"response radio feature\",\n \"answer\": {\n \"name\": \"first_radio_answer\"\n },\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "response_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"response radio feature\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\")),\n",
+ ")\n",
+ "\n",
+ "response_radio_annotation_ndjson = {\n",
+ " \"name\": \"response radio feature\",\n",
+ " \"answer\": {\n",
+ " \"name\": \"first_radio_answer\"\n",
+ " },\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Checklist (multi-choice)"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"response checklist feature\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"option_1\"),\n lb_types.ClassificationAnswer(name=\"option_2\"),\n ]),\n)\n\nresponse_checklist_annotation_ndjson = {\n \"name\": \"response checklist feature\",\n \"answer\": [{\n \"name\": \"option_1\"\n }, {\n \"name\": \"option_2\"\n }],\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"response checklist feature\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(name=\"option_1\"),\n",
+ " lb_types.ClassificationAnswer(name=\"option_2\"),\n",
+ " ]),\n",
+ ")\n",
+ "\n",
+ "response_checklist_annotation_ndjson = {\n",
+ " \"name\": \"response checklist feature\",\n",
+ " \"answer\": [{\n",
+ " \"name\": \"option_1\"\n",
+ " }, {\n",
+ " \"name\": \"option_2\"\n",
+ " }],\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Response text"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "response_text_annotation = lb_types.ClassificationAnnotation(\n name=\"Provide a reason for your choice\",\n value=lb_types.Text(answer=\"This is an example of a response text\"),\n)\n\nresponse_text_annotation_ndjson = {\n \"name\": \"Provide a reason for your choice\",\n \"answer\": \"This is an example of a response text\",\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "response_text_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"Provide a reason for your choice\",\n",
+ " value=lb_types.Text(answer=\"This is an example of a response text\"),\n",
+ ")\n",
+ "\n",
+ "response_text_annotation_ndjson = {\n",
+ " \"name\": \"Provide a reason for your choice\",\n",
+ " \"answer\": \"This is an example of a response text\",\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Nested classifications"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\n\nnested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_checklist_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_sub_checklist_answer\")\n ]),\n )\n ],\n )\n ]),\n)\n\nnested_response_radio_annotation_ndjson = {\n \"name\":\n \"nested_radio_question\",\n \"answer\": [{\n \"name\":\n \"first_radio_answer\",\n \"classifications\": [{\n \"name\": \"sub_radio_question\",\n \"answer\": {\n \"name\": \"first_sub_radio_answer\"\n },\n }],\n }],\n}\n\nnested_response_checklist_annotation_ndjson = {\n \"name\":\n \"nested_checklist_question\",\n \"answer\": [{\n \"name\":\n \"first_checklist_answer\",\n \"classifications\": [{\n \"name\": \"sub_checklist_question\",\n \"answer\": {\n \"name\": \"first_sub_checklist_answer\"\n },\n }],\n }],\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
- },
- {
+ "source": [
+ "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_response_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ " )),\n",
+ ")\n",
+ "\n",
+ "nested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_response_checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(\n",
+ " name=\"first_checklist_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_checklist_answer\")\n",
+ " ]),\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ]),\n",
+ ")\n",
+ "\n",
+ "nested_response_radio_annotation_ndjson = {\n",
+ " \"name\":\n",
+ " \"nested_radio_question\",\n",
+ " \"answer\": [{\n",
+ " \"name\":\n",
+ " \"first_radio_answer\",\n",
+ " \"classifications\": [{\n",
+ " \"name\": \"sub_radio_question\",\n",
+ " \"answer\": {\n",
+ " \"name\": \"first_sub_radio_answer\"\n",
+ " },\n",
+ " }],\n",
+ " }],\n",
+ "}\n",
+ "\n",
+ "nested_response_checklist_annotation_ndjson = {\n",
+ " \"name\":\n",
+ " \"nested_checklist_question\",\n",
+ " \"answer\": [{\n",
+ " \"name\":\n",
+ " \"first_checklist_answer\",\n",
+ " \"classifications\": [{\n",
+ " \"name\": \"sub_checklist_question\",\n",
+ " \"answer\": {\n",
+ " \"name\": \"first_sub_checklist_answer\"\n",
+ " },\n",
+ " }],\n",
+ " }],\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Create a project and data rows in Labelbox UI\n",
"\n",
- "Depending on what prompt response project type this step could look different. Review [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) creation guide for more details on the differences. \n",
+ "Each type of the prompt and response generation project requires different setup. See [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) for more details on the differences.\n",
"\n",
- "In this tutorial, we will just be importing annotations for a prompt response creation project. But the process will look similar for prompt creation and response creation projects. Review the corresponding [developer guide](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) to this tutorial for more examples on the other project types."
- ],
- "cell_type": "markdown"
+ "In this tutorial, we will show how to import annotations for a **humans generate prompts and responses** project. The process is also similar for **humans generate prompts** and **humans generate responses to uploaded prompts** projects. See [import prompt and response annotations](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) for a tutorial and more examples on other project types."
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt response and prompt creation\n",
"\n",
- "For prompt response and prompt creation empty data rows are generated for you on project creation. After your projects are created you will need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows. This can been done by exporting from the newly created project."
- ],
- "cell_type": "markdown"
+ "A **humans generate prompts and responses** project automatically generates empty data rows upon creation. You will then need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows by exporting them from the created project."
+ ]
},
{
- "metadata": {},
- "source": "prompt_response_project = client.create_model_evaluation_project(\n name=\"Demo prompt response project\",\n media_type=lb.MediaType.LLMPromptResponseCreation,\n dataset_name=\"Demo prompt response dataset\",\n data_row_count=1,\n)\n\nexport_task = prompt_response_project.export()\nexport_task.wait_till_done()\n\n# Check export for any errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nstream = export_task.get_buffered_stream()\n\n# Obtain global keys to be used later on\nglobal_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "prompt_response_project = client.create_model_evaluation_project(\n",
+ " name=\"Demo prompt response project\",\n",
+ " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
+ " dataset_name=\"Demo prompt response dataset\",\n",
+ " data_row_count=1,\n",
+ ")\n",
+ "\n",
+ "export_task = prompt_response_project.export()\n",
+ "export_task.wait_till_done()\n",
+ "\n",
+ "# Check export for any errors\n",
+ "if export_task.has_errors():\n",
+ " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
+ " stream_handler=lambda error: print(error))\n",
+ "\n",
+ "stream = export_task.get_buffered_stream()\n",
+ "\n",
+ "# Obtain global keys to be used later on\n",
+ "global_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Set up ontology\n",
"\n",
- "Your project ontology should support the classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the `name` parameter should match the value of the `name` field in your annotation. \n",
+ "Your project ontology should support the classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the `name` parameter need to match the value of the `name` field in your annotation. \n",
"\n",
- "For example, when we created an annotation above, we provided a name`annotation_name`. Now, when we set up our ontology, we must ensure that the name of our bounding box tool is also `anotations_name`. The same alignment must hold true for the other tools and classifications we create in our ontology.\n",
+ "For example, if you provide a name`annotation_name` for your created annotation, you need to name the bounding box tool as `anotations_name` when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.\n",
"\n",
"This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations)."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "ontology_builder = lb.OntologyBuilder(\n tools=[],\n classifications=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.PROMPT,\n name=\"prompt text\",\n character_min=1, # Minimum character count of prompt field (optional)\n character_max=\n 20, # Maximum character count of prompt field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"response checklist feature\",\n options=[\n lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"response radio feature\",\n options=[\n lb.ResponseOption(value=\"first_radio_answer\"),\n lb.ResponseOption(value=\"second_radio_answer\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n name=\"response text\",\n character_min=\n 1, # Minimum character count of response text field (optional)\n character_max=\n 20, # Maximum character count of response text field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"nested_response_radio_question\",\n options=[\n lb.ResponseOption(\n \"first_radio_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_RADIO,\n name=\"sub_radio_question\",\n options=[\n lb.ResponseOption(\"first_sub_radio_answer\")\n ],\n )\n ],\n )\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"nested_response_checklist_question\",\n options=[\n lb.ResponseOption(\n \"first_checklist_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_CHECKLIST,\n name=\"sub_checklist_question\",\n options=[\n lb.ResponseOption(\"first_sub_checklist_answer\")\n ],\n )\n ],\n )\n ],\n ),\n ],\n)\n\n# Create ontology\nontology = client.create_ontology(\n \"Prompt and response ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.LLMPromptResponseCreation,\n)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "ontology_builder = lb.OntologyBuilder(\n",
+ " tools=[],\n",
+ " classifications=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.PROMPT,\n",
+ " name=\"prompt text\",\n",
+ " character_min=1, # Minimum character count of prompt field (optional)\n",
+ " character_max=\n",
+ " 20, # Maximum character count of prompt field (optional)\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
+ " name=\"response checklist feature\",\n",
+ " options=[\n",
+ " lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n",
+ " lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
+ " name=\"response radio feature\",\n",
+ " options=[\n",
+ " lb.ResponseOption(value=\"first_radio_answer\"),\n",
+ " lb.ResponseOption(value=\"second_radio_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n",
+ " name=\"response text\",\n",
+ " character_min=\n",
+ " 1, # Minimum character count of response text field (optional)\n",
+ " character_max=\n",
+ " 20, # Maximum character count of response text field (optional)\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
+ " name=\"nested_response_radio_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\n",
+ " \"first_radio_answer\",\n",
+ " options=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.\n",
+ " RESPONSE_RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\"first_sub_radio_answer\")\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
+ " name=\"nested_response_checklist_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\n",
+ " \"first_checklist_answer\",\n",
+ " options=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.\n",
+ " RESPONSE_CHECKLIST,\n",
+ " name=\"sub_checklist_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\"first_sub_checklist_answer\")\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "# Create ontology\n",
+ "ontology = client.create_ontology(\n",
+ " \"Prompt and response ontology\",\n",
+ " ontology_builder.asdict(),\n",
+ " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
+ ")"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Create the annotations payload\n",
"\n",
"For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter. For ground truths, pass the payload to the `labels` parameter."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Python annotation objects\nlabel = []\nannotations = [\n prompt_annotation,\n response_radio_annotation,\n response_checklist_annotation,\n response_text_annotation,\n nested_response_radio_annotation,\n nested_response_checklist_annotation,\n]\nlabel.append(\n lb_types.Label(data={\"global_key\": global_keys[0]},\n annotations=annotations))\n\n# NDJSON\nlabel_ndjson = []\nannotations = [\n prompt_annotation_ndjson,\n response_radio_annotation_ndjson,\n response_checklist_annotation_ndjson,\n response_text_annotation_ndjson,\n nested_response_radio_annotation_ndjson,\n nested_response_checklist_annotation_ndjson,\n]\nfor annotation in annotations:\n annotation.update({\n \"dataRow\": {\n \"globalKey\": global_keys[0]\n },\n })\n label_ndjson.append(annotation)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
- },
- {
+ "source": [
+ "# Python annotation objects\n",
+ "label = []\n",
+ "annotations = [\n",
+ " prompt_annotation,\n",
+ " response_radio_annotation,\n",
+ " response_checklist_annotation,\n",
+ " response_text_annotation,\n",
+ " nested_response_radio_annotation,\n",
+ " nested_response_checklist_annotation,\n",
+ "]\n",
+ "label.append(\n",
+ " lb_types.Label(data={\"global_key\": global_keys[0]},\n",
+ " annotations=annotations))\n",
+ "\n",
+ "# NDJSON\n",
+ "label_ndjson = []\n",
+ "annotations = [\n",
+ " prompt_annotation_ndjson,\n",
+ " response_radio_annotation_ndjson,\n",
+ " response_checklist_annotation_ndjson,\n",
+ " response_text_annotation_ndjson,\n",
+ " nested_response_radio_annotation_ndjson,\n",
+ " nested_response_checklist_annotation_ndjson,\n",
+ "]\n",
+ "for annotation in annotations:\n",
+ " annotation.update({\n",
+ " \"dataRow\": {\n",
+ " \"globalKey\": global_keys[0]\n",
+ " },\n",
+ " })\n",
+ " label_ndjson.append(annotation)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Option A: Upload as [prelabels (model assisted labeling)](doc:model-assisted-labeling)\n",
"\n",
"This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "upload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=f\"mal_job-{str(uuid.uuid4())}\",\n predictions=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "upload_job = lb.MALPredictionImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=prompt_response_project.uid,\n",
+ " name=f\"mal_job-{str(uuid.uuid4())}\",\n",
+ " predictions=label,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Option B: Upload to a labeling project as [ground truth](doc:import-ground-truth)\n",
"\n",
"This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "upload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label_ndjson,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "upload_job = lb.LabelImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=prompt_response_project.uid,\n",
+ " name=\"label_import_job\" + str(uuid.uuid4()),\n",
+ " labels=label_ndjson,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"\n",
"Uncomment and run the cell below to optionally delete Labelbox objects created"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# project.delete()\n# client.delete_unused_ontology(ontology.uid)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# project.delete()\n",
+ "# client.delete_unused_ontology(ontology.uid)"
+ ]
}
- ]
-}
\ No newline at end of file
+ ],
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
From 2fe912d093c07108c8cdee296326deec6da6f2d5 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Mon, 29 Jul 2024 06:43:27 +0000
Subject: [PATCH 08/12] :art: Cleaned
---
annotation_import/prompt_response.ipynb | 478 +++++-------------------
1 file changed, 97 insertions(+), 381 deletions(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index dc17d77..c77d800 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -1,16 +1,18 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {},
"cells": [
{
- "cell_type": "markdown",
"metadata": {},
"source": [
- "\n",
- " \n",
+ " | ",
+ " ",
" | \n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -22,19 +24,19 @@
" \n",
" | "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"# Prompt and response projects with MAL and Ground Truth\n",
"\n",
"This notebook is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Annotation payload types\n",
@@ -49,10 +51,10 @@
"- JSON\n",
" - Skips formatting annotation payload in the Labelbox Python annotation type.\n",
" - Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Label Import Types\n",
@@ -61,56 +63,47 @@
"\n",
"- [Model-assisted labeling (MAL)](https://docs.labelbox.com/docs/model-assisted-labeling) allows you to import computer-generated predictions and simple annotations created outside of Labelbox as pre-labels on an asset.\n",
"- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth) allows you to bulk import ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data can consolidate and migrate all annotations into Labelbox as a single source of truth."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "%pip install -q \"labelbox[data]\"",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "%pip install -q \"labelbox[data]\""
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "import labelbox as lb\n",
- "import labelbox.types as lb_types\n",
- "import uuid"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Replace with your API key\n",
"\n",
"Replace the value of `API_KEY` with a valid [API key]([ref:create-api-key](https://docs.labelbox.com/reference/create-api-key)) to connect to the Labelbox client."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "API_KEY = None\n",
- "client = lb.Client(api_key=API_KEY)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Supported Annotations\n",
@@ -129,206 +122,94 @@
"- Response creation projects\n",
" - Radio\n",
" - Checklist"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Prompt text"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "prompt_annotation = lb_types.PromptClassificationAnnotation(\n name=\"Follow the prompt and select answers\",\n value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n)\n\nprompt_annotation_ndjson = {\n \"name\": \"Follow the prompt and select answers\",\n \"answer\": \"This is an example of a prompt\",\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "prompt_annotation = lb_types.PromptClassificationAnnotation(\n",
- " name=\"Follow the prompt and select answers\",\n",
- " value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n",
- ")\n",
- "\n",
- "prompt_annotation_ndjson = {\n",
- " \"name\": \"Follow the prompt and select answers\",\n",
- " \"answer\": \"This is an example of a prompt\",\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Responses"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Radio (single-choice)"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"response radio feature\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\")),\n)\n\nresponse_radio_annotation_ndjson = {\n \"name\": \"response radio feature\",\n \"answer\": {\n \"name\": \"first_radio_answer\"\n },\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"response radio feature\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\")),\n",
- ")\n",
- "\n",
- "response_radio_annotation_ndjson = {\n",
- " \"name\": \"response radio feature\",\n",
- " \"answer\": {\n",
- " \"name\": \"first_radio_answer\"\n",
- " },\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Checklist (multi-choice)"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"response checklist feature\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"option_1\"),\n lb_types.ClassificationAnswer(name=\"option_2\"),\n ]),\n)\n\nresponse_checklist_annotation_ndjson = {\n \"name\": \"response checklist feature\",\n \"answer\": [{\n \"name\": \"option_1\"\n }, {\n \"name\": \"option_2\"\n }],\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"response checklist feature\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(name=\"option_1\"),\n",
- " lb_types.ClassificationAnswer(name=\"option_2\"),\n",
- " ]),\n",
- ")\n",
- "\n",
- "response_checklist_annotation_ndjson = {\n",
- " \"name\": \"response checklist feature\",\n",
- " \"answer\": [{\n",
- " \"name\": \"option_1\"\n",
- " }, {\n",
- " \"name\": \"option_2\"\n",
- " }],\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Response text"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_text_annotation = lb_types.ClassificationAnnotation(\n name=\"Provide a reason for your choice\",\n value=lb_types.Text(answer=\"This is an example of a response text\"),\n)\n\nresponse_text_annotation_ndjson = {\n \"name\": \"Provide a reason for your choice\",\n \"answer\": \"This is an example of a response text\",\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_text_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"Provide a reason for your choice\",\n",
- " value=lb_types.Text(answer=\"This is an example of a response text\"),\n",
- ")\n",
- "\n",
- "response_text_annotation_ndjson = {\n",
- " \"name\": \"Provide a reason for your choice\",\n",
- " \"answer\": \"This is an example of a response text\",\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Nested classifications"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\n\nnested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_checklist_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_sub_checklist_answer\")\n ]),\n )\n ],\n )\n ]),\n)\n\nnested_response_radio_annotation_ndjson = {\n \"name\":\n \"nested_radio_question\",\n \"answer\": [{\n \"name\":\n \"first_radio_answer\",\n \"classifications\": [{\n \"name\": \"sub_radio_question\",\n \"answer\": {\n \"name\": \"first_sub_radio_answer\"\n },\n }],\n }],\n}\n\nnested_response_checklist_annotation_ndjson = {\n \"name\":\n \"nested_checklist_question\",\n \"answer\": [{\n \"name\":\n \"first_checklist_answer\",\n \"classifications\": [{\n \"name\": \"sub_checklist_question\",\n \"answer\": {\n \"name\": \"first_sub_checklist_answer\"\n },\n }],\n }],\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_response_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- " )),\n",
- ")\n",
- "\n",
- "nested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_response_checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(\n",
- " name=\"first_checklist_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_checklist_answer\")\n",
- " ]),\n",
- " )\n",
- " ],\n",
- " )\n",
- " ]),\n",
- ")\n",
- "\n",
- "nested_response_radio_annotation_ndjson = {\n",
- " \"name\":\n",
- " \"nested_radio_question\",\n",
- " \"answer\": [{\n",
- " \"name\":\n",
- " \"first_radio_answer\",\n",
- " \"classifications\": [{\n",
- " \"name\": \"sub_radio_question\",\n",
- " \"answer\": {\n",
- " \"name\": \"first_sub_radio_answer\"\n",
- " },\n",
- " }],\n",
- " }],\n",
- "}\n",
- "\n",
- "nested_response_checklist_annotation_ndjson = {\n",
- " \"name\":\n",
- " \"nested_checklist_question\",\n",
- " \"answer\": [{\n",
- " \"name\":\n",
- " \"first_checklist_answer\",\n",
- " \"classifications\": [{\n",
- " \"name\": \"sub_checklist_question\",\n",
- " \"answer\": {\n",
- " \"name\": \"first_sub_checklist_answer\"\n",
- " },\n",
- " }],\n",
- " }],\n",
- "}"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": null
+ },
+ {
"metadata": {},
"source": [
"## Step 1: Create a project and data rows in Labelbox UI\n",
@@ -336,46 +217,26 @@
"Each type of the prompt and response generation project requires different setup. See [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) for more details on the differences.\n",
"\n",
"In this tutorial, we will show how to import annotations for a **humans generate prompts and responses** project. The process is also similar for **humans generate prompts** and **humans generate responses to uploaded prompts** projects. See [import prompt and response annotations](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) for a tutorial and more examples on other project types."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt response and prompt creation\n",
"\n",
"A **humans generate prompts and responses** project automatically generates empty data rows upon creation. You will then need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows by exporting them from the created project."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "prompt_response_project = client.create_model_evaluation_project(\n name=\"Demo prompt response project\",\n media_type=lb.MediaType.LLMPromptResponseCreation,\n dataset_name=\"Demo prompt response dataset\",\n data_row_count=1,\n)\n\nexport_task = prompt_response_project.export()\nexport_task.wait_till_done()\n\n# Check export for any errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nstream = export_task.get_buffered_stream()\n\n# Obtain global keys to be used later on\nglobal_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "prompt_response_project = client.create_model_evaluation_project(\n",
- " name=\"Demo prompt response project\",\n",
- " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
- " dataset_name=\"Demo prompt response dataset\",\n",
- " data_row_count=1,\n",
- ")\n",
- "\n",
- "export_task = prompt_response_project.export()\n",
- "export_task.wait_till_done()\n",
- "\n",
- "# Check export for any errors\n",
- "if export_task.has_errors():\n",
- " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
- " stream_handler=lambda error: print(error))\n",
- "\n",
- "stream = export_task.get_buffered_stream()\n",
- "\n",
- "# Obtain global keys to be used later on\n",
- "global_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Set up ontology\n",
@@ -385,224 +246,79 @@
"For example, if you provide a name`annotation_name` for your created annotation, you need to name the bounding box tool as `anotations_name` when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.\n",
"\n",
"This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations)."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "ontology_builder = lb.OntologyBuilder(\n tools=[],\n classifications=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.PROMPT,\n name=\"prompt text\",\n character_min=1, # Minimum character count of prompt field (optional)\n character_max=\n 20, # Maximum character count of prompt field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"response checklist feature\",\n options=[\n lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"response radio feature\",\n options=[\n lb.ResponseOption(value=\"first_radio_answer\"),\n lb.ResponseOption(value=\"second_radio_answer\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n name=\"response text\",\n character_min=\n 1, # Minimum character count of response text field (optional)\n character_max=\n 20, # Maximum character count of response text field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"nested_response_radio_question\",\n options=[\n lb.ResponseOption(\n \"first_radio_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_RADIO,\n name=\"sub_radio_question\",\n options=[\n lb.ResponseOption(\"first_sub_radio_answer\")\n ],\n )\n ],\n )\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"nested_response_checklist_question\",\n options=[\n lb.ResponseOption(\n \"first_checklist_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_CHECKLIST,\n name=\"sub_checklist_question\",\n options=[\n lb.ResponseOption(\"first_sub_checklist_answer\")\n ],\n )\n ],\n )\n ],\n ),\n ],\n)\n\n# Create ontology\nontology = client.create_ontology(\n \"Prompt and response ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.LLMPromptResponseCreation,\n)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "ontology_builder = lb.OntologyBuilder(\n",
- " tools=[],\n",
- " classifications=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.PROMPT,\n",
- " name=\"prompt text\",\n",
- " character_min=1, # Minimum character count of prompt field (optional)\n",
- " character_max=\n",
- " 20, # Maximum character count of prompt field (optional)\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
- " name=\"response checklist feature\",\n",
- " options=[\n",
- " lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n",
- " lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
- " name=\"response radio feature\",\n",
- " options=[\n",
- " lb.ResponseOption(value=\"first_radio_answer\"),\n",
- " lb.ResponseOption(value=\"second_radio_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n",
- " name=\"response text\",\n",
- " character_min=\n",
- " 1, # Minimum character count of response text field (optional)\n",
- " character_max=\n",
- " 20, # Maximum character count of response text field (optional)\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
- " name=\"nested_response_radio_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\n",
- " \"first_radio_answer\",\n",
- " options=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.\n",
- " RESPONSE_RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\"first_sub_radio_answer\")\n",
- " ],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
- " name=\"nested_response_checklist_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\n",
- " \"first_checklist_answer\",\n",
- " options=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.\n",
- " RESPONSE_CHECKLIST,\n",
- " name=\"sub_checklist_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\"first_sub_checklist_answer\")\n",
- " ],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " ],\n",
- ")\n",
- "\n",
- "# Create ontology\n",
- "ontology = client.create_ontology(\n",
- " \"Prompt and response ontology\",\n",
- " ontology_builder.asdict(),\n",
- " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
- ")"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Create the annotations payload\n",
"\n",
"For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter. For ground truths, pass the payload to the `labels` parameter."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Python annotation objects\nlabel = []\nannotations = [\n prompt_annotation,\n response_radio_annotation,\n response_checklist_annotation,\n response_text_annotation,\n nested_response_radio_annotation,\n nested_response_checklist_annotation,\n]\nlabel.append(\n lb_types.Label(data={\"global_key\": global_keys[0]},\n annotations=annotations))\n\n# NDJSON\nlabel_ndjson = []\nannotations = [\n prompt_annotation_ndjson,\n response_radio_annotation_ndjson,\n response_checklist_annotation_ndjson,\n response_text_annotation_ndjson,\n nested_response_radio_annotation_ndjson,\n nested_response_checklist_annotation_ndjson,\n]\nfor annotation in annotations:\n annotation.update({\n \"dataRow\": {\n \"globalKey\": global_keys[0]\n },\n })\n label_ndjson.append(annotation)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Python annotation objects\n",
- "label = []\n",
- "annotations = [\n",
- " prompt_annotation,\n",
- " response_radio_annotation,\n",
- " response_checklist_annotation,\n",
- " response_text_annotation,\n",
- " nested_response_radio_annotation,\n",
- " nested_response_checklist_annotation,\n",
- "]\n",
- "label.append(\n",
- " lb_types.Label(data={\"global_key\": global_keys[0]},\n",
- " annotations=annotations))\n",
- "\n",
- "# NDJSON\n",
- "label_ndjson = []\n",
- "annotations = [\n",
- " prompt_annotation_ndjson,\n",
- " response_radio_annotation_ndjson,\n",
- " response_checklist_annotation_ndjson,\n",
- " response_text_annotation_ndjson,\n",
- " nested_response_radio_annotation_ndjson,\n",
- " nested_response_checklist_annotation_ndjson,\n",
- "]\n",
- "for annotation in annotations:\n",
- " annotation.update({\n",
- " \"dataRow\": {\n",
- " \"globalKey\": global_keys[0]\n",
- " },\n",
- " })\n",
- " label_ndjson.append(annotation)"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": null
+ },
+ {
"metadata": {},
"source": [
"#### Option A: Upload as [prelabels (model assisted labeling)](doc:model-assisted-labeling)\n",
"\n",
"This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "upload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=f\"mal_job-{str(uuid.uuid4())}\",\n predictions=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "upload_job = lb.MALPredictionImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=prompt_response_project.uid,\n",
- " name=f\"mal_job-{str(uuid.uuid4())}\",\n",
- " predictions=label,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Option B: Upload to a labeling project as [ground truth](doc:import-ground-truth)\n",
"\n",
"This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "upload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label_ndjson,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "upload_job = lb.LabelImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=prompt_response_project.uid,\n",
- " name=\"label_import_job\" + str(uuid.uuid4()),\n",
- " labels=label_ndjson,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"\n",
"Uncomment and run the cell below to optionally delete Labelbox objects created"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# project.delete()\n# client.delete_unused_ontology(ontology.uid)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# project.delete()\n",
- "# client.delete_unused_ontology(ontology.uid)"
- ]
+ "execution_count": null
}
- ],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ ]
+}
\ No newline at end of file
From 27dd6b5fb228cf3366c2a42ba2e7b403382b1b40 Mon Sep 17 00:00:00 2001
From: x-eun
Date: Sun, 28 Jul 2024 23:46:52 -0700
Subject: [PATCH 09/12] tweak
---
annotation_import/prompt_response.ipynb | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index dc17d77..cc19d5f 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -331,7 +331,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Step 1: Create a project and data rows in Labelbox UI\n",
+ "## Step 1: Create a project and data rows using the Labelbox UI\n",
"\n",
"Each type of the prompt and response generation project requires different setup. See [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) for more details on the differences.\n",
"\n",
From b9cb6907e5f23a6aa14c38fb5e95dcdc9e295d47 Mon Sep 17 00:00:00 2001
From: x-eun
Date: Sun, 28 Jul 2024 23:52:28 -0700
Subject: [PATCH 10/12] tweak
---
annotation_import/prompt_response.ipynb | 482 +++++++++++++++++++-----
1 file changed, 383 insertions(+), 99 deletions(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index d5f4b56..f0193e4 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -1,18 +1,16 @@
{
- "nbformat": 4,
- "nbformat_minor": 2,
- "metadata": {},
"cells": [
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
- "",
- " ",
+ " | \n",
+ " \n",
" | \n"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -24,19 +22,19 @@
" \n",
" | "
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"# Prompt and response projects with MAL and Ground Truth\n",
"\n",
"This notebook is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Annotation payload types\n",
@@ -51,10 +49,10 @@
"- JSON\n",
" - Skips formatting annotation payload in the Labelbox Python annotation type.\n",
" - Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Label Import Types\n",
@@ -63,47 +61,56 @@
"\n",
"- [Model-assisted labeling (MAL)](https://docs.labelbox.com/docs/model-assisted-labeling) allows you to import computer-generated predictions and simple annotations created outside of Labelbox as pre-labels on an asset.\n",
"- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth) allows you to bulk import ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data can consolidate and migrate all annotations into Labelbox as a single source of truth."
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up "
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "%pip install -q \"labelbox[data]\"",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "%pip install -q \"labelbox[data]\""
+ ]
},
{
- "metadata": {},
- "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "import labelbox as lb\n",
+ "import labelbox.types as lb_types\n",
+ "import uuid"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Replace with your API key\n",
"\n",
"Replace the value of `API_KEY` with a valid [API key]([ref:create-api-key](https://docs.labelbox.com/reference/create-api-key)) to connect to the Labelbox client."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "API_KEY = None\n",
+ "client = lb.Client(api_key=API_KEY)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Supported Annotations\n",
@@ -122,121 +129,253 @@
"- Response creation projects\n",
" - Radio\n",
" - Checklist"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Prompt text"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "prompt_annotation = lb_types.PromptClassificationAnnotation(\n name=\"Follow the prompt and select answers\",\n value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n)\n\nprompt_annotation_ndjson = {\n \"name\": \"Follow the prompt and select answers\",\n \"answer\": \"This is an example of a prompt\",\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "prompt_annotation = lb_types.PromptClassificationAnnotation(\n",
+ " name=\"Follow the prompt and select answers\",\n",
+ " value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n",
+ ")\n",
+ "\n",
+ "prompt_annotation_ndjson = {\n",
+ " \"name\": \"Follow the prompt and select answers\",\n",
+ " \"answer\": \"This is an example of a prompt\",\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Responses"
- ],
- "cell_type": "markdown"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Radio (single-choice)"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"response radio feature\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\")),\n)\n\nresponse_radio_annotation_ndjson = {\n \"name\": \"response radio feature\",\n \"answer\": {\n \"name\": \"first_radio_answer\"\n },\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "response_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"response radio feature\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\")),\n",
+ ")\n",
+ "\n",
+ "response_radio_annotation_ndjson = {\n",
+ " \"name\": \"response radio feature\",\n",
+ " \"answer\": {\n",
+ " \"name\": \"first_radio_answer\"\n",
+ " },\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Checklist (multi-choice)"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"response checklist feature\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"option_1\"),\n lb_types.ClassificationAnswer(name=\"option_2\"),\n ]),\n)\n\nresponse_checklist_annotation_ndjson = {\n \"name\": \"response checklist feature\",\n \"answer\": [{\n \"name\": \"option_1\"\n }, {\n \"name\": \"option_2\"\n }],\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"response checklist feature\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(name=\"option_1\"),\n",
+ " lb_types.ClassificationAnswer(name=\"option_2\"),\n",
+ " ]),\n",
+ ")\n",
+ "\n",
+ "response_checklist_annotation_ndjson = {\n",
+ " \"name\": \"response checklist feature\",\n",
+ " \"answer\": [{\n",
+ " \"name\": \"option_1\"\n",
+ " }, {\n",
+ " \"name\": \"option_2\"\n",
+ " }],\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Response text"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "response_text_annotation = lb_types.ClassificationAnnotation(\n name=\"Provide a reason for your choice\",\n value=lb_types.Text(answer=\"This is an example of a response text\"),\n)\n\nresponse_text_annotation_ndjson = {\n \"name\": \"Provide a reason for your choice\",\n \"answer\": \"This is an example of a response text\",\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "response_text_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"Provide a reason for your choice\",\n",
+ " value=lb_types.Text(answer=\"This is an example of a response text\"),\n",
+ ")\n",
+ "\n",
+ "response_text_annotation_ndjson = {\n",
+ " \"name\": \"Provide a reason for your choice\",\n",
+ " \"answer\": \"This is an example of a response text\",\n",
+ "}"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Nested classifications"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\n\nnested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_checklist_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_sub_checklist_answer\")\n ]),\n )\n ],\n )\n ]),\n)\n\nnested_response_radio_annotation_ndjson = {\n \"name\":\n \"nested_radio_question\",\n \"answer\": [{\n \"name\":\n \"first_radio_answer\",\n \"classifications\": [{\n \"name\": \"sub_radio_question\",\n \"answer\": {\n \"name\": \"first_sub_radio_answer\"\n },\n }],\n }],\n}\n\nnested_response_checklist_annotation_ndjson = {\n \"name\":\n \"nested_checklist_question\",\n \"answer\": [{\n \"name\":\n \"first_checklist_answer\",\n \"classifications\": [{\n \"name\": \"sub_checklist_question\",\n \"answer\": {\n \"name\": \"first_sub_checklist_answer\"\n },\n }],\n }],\n}",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
- },
- {
+ "source": [
+ "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_response_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_radio_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_radio_question\",\n",
+ " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_radio_answer\")),\n",
+ " )\n",
+ " ],\n",
+ " )),\n",
+ ")\n",
+ "\n",
+ "nested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
+ " name=\"nested_response_checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(\n",
+ " name=\"first_checklist_answer\",\n",
+ " classifications=[\n",
+ " lb_types.ClassificationAnnotation(\n",
+ " name=\"sub_checklist_question\",\n",
+ " value=lb_types.Checklist(answer=[\n",
+ " lb_types.ClassificationAnswer(\n",
+ " name=\"first_sub_checklist_answer\")\n",
+ " ]),\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ]),\n",
+ ")\n",
+ "\n",
+ "nested_response_radio_annotation_ndjson = {\n",
+ " \"name\":\n",
+ " \"nested_radio_question\",\n",
+ " \"answer\": [{\n",
+ " \"name\":\n",
+ " \"first_radio_answer\",\n",
+ " \"classifications\": [{\n",
+ " \"name\": \"sub_radio_question\",\n",
+ " \"answer\": {\n",
+ " \"name\": \"first_sub_radio_answer\"\n",
+ " },\n",
+ " }],\n",
+ " }],\n",
+ "}\n",
+ "\n",
+ "nested_response_checklist_annotation_ndjson = {\n",
+ " \"name\":\n",
+ " \"nested_checklist_question\",\n",
+ " \"answer\": [{\n",
+ " \"name\":\n",
+ " \"first_checklist_answer\",\n",
+ " \"classifications\": [{\n",
+ " \"name\": \"sub_checklist_question\",\n",
+ " \"answer\": {\n",
+ " \"name\": \"first_sub_checklist_answer\"\n",
+ " },\n",
+ " }],\n",
+ " }],\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Create a project and data rows using the Labelbox UI\n",
"\n",
"Each type of the prompt and response generation project requires different setup. See [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) for more details on the differences.\n",
"\n",
- "In this tutorial, we will show how to import annotations for a **humans generate prompts and responses** project. The process is also similar for **humans generate prompts** and **humans generate responses to uploaded prompts** projects. See [import prompt and response annotations](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) for a tutorial and more examples on other project types."
- ],
- "cell_type": "markdown"
+ "In this tutorial, we will show how to import annotations for a prompt and response creation (humans generate prompts and responses) project. The process is also similar for prompt creation (humans generate prompts) and response creation (humans generate responses to uploaded prompts) projects. See [import prompt and response annotations](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) for a tutorial and more examples on other project types."
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt response and prompt creation\n",
"\n",
- "A **humans generate prompts and responses** project automatically generates empty data rows upon creation. You will then need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows by exporting them from the created project."
- ],
- "cell_type": "markdown"
+ "A prompts and responses creation project automatically generates empty data rows upon creation. You will then need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows by exporting them from the created project."
+ ]
},
{
- "metadata": {},
- "source": "prompt_response_project = client.create_model_evaluation_project(\n name=\"Demo prompt response project\",\n media_type=lb.MediaType.LLMPromptResponseCreation,\n dataset_name=\"Demo prompt response dataset\",\n data_row_count=1,\n)\n\nexport_task = prompt_response_project.export()\nexport_task.wait_till_done()\n\n# Check export for any errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nstream = export_task.get_buffered_stream()\n\n# Obtain global keys to be used later on\nglobal_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "prompt_response_project = client.create_model_evaluation_project(\n",
+ " name=\"Demo prompt response project\",\n",
+ " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
+ " dataset_name=\"Demo prompt response dataset\",\n",
+ " data_row_count=1,\n",
+ ")\n",
+ "\n",
+ "export_task = prompt_response_project.export()\n",
+ "export_task.wait_till_done()\n",
+ "\n",
+ "# Check export for any errors\n",
+ "if export_task.has_errors():\n",
+ " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
+ " stream_handler=lambda error: print(error))\n",
+ "\n",
+ "stream = export_task.get_buffered_stream()\n",
+ "\n",
+ "# Obtain global keys to be used later on\n",
+ "global_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Set up ontology\n",
@@ -246,79 +385,224 @@
"For example, if you provide a name`annotation_name` for your created annotation, you need to name the bounding box tool as `anotations_name` when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.\n",
"\n",
"This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations)."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "ontology_builder = lb.OntologyBuilder(\n tools=[],\n classifications=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.PROMPT,\n name=\"prompt text\",\n character_min=1, # Minimum character count of prompt field (optional)\n character_max=\n 20, # Maximum character count of prompt field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"response checklist feature\",\n options=[\n lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"response radio feature\",\n options=[\n lb.ResponseOption(value=\"first_radio_answer\"),\n lb.ResponseOption(value=\"second_radio_answer\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n name=\"response text\",\n character_min=\n 1, # Minimum character count of response text field (optional)\n character_max=\n 20, # Maximum character count of response text field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"nested_response_radio_question\",\n options=[\n lb.ResponseOption(\n \"first_radio_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_RADIO,\n name=\"sub_radio_question\",\n options=[\n lb.ResponseOption(\"first_sub_radio_answer\")\n ],\n )\n ],\n )\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"nested_response_checklist_question\",\n options=[\n lb.ResponseOption(\n \"first_checklist_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_CHECKLIST,\n name=\"sub_checklist_question\",\n options=[\n lb.ResponseOption(\"first_sub_checklist_answer\")\n ],\n )\n ],\n )\n ],\n ),\n ],\n)\n\n# Create ontology\nontology = client.create_ontology(\n \"Prompt and response ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.LLMPromptResponseCreation,\n)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "ontology_builder = lb.OntologyBuilder(\n",
+ " tools=[],\n",
+ " classifications=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.PROMPT,\n",
+ " name=\"prompt text\",\n",
+ " character_min=1, # Minimum character count of prompt field (optional)\n",
+ " character_max=\n",
+ " 20, # Maximum character count of prompt field (optional)\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
+ " name=\"response checklist feature\",\n",
+ " options=[\n",
+ " lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n",
+ " lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
+ " name=\"response radio feature\",\n",
+ " options=[\n",
+ " lb.ResponseOption(value=\"first_radio_answer\"),\n",
+ " lb.ResponseOption(value=\"second_radio_answer\"),\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n",
+ " name=\"response text\",\n",
+ " character_min=\n",
+ " 1, # Minimum character count of response text field (optional)\n",
+ " character_max=\n",
+ " 20, # Maximum character count of response text field (optional)\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
+ " name=\"nested_response_radio_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\n",
+ " \"first_radio_answer\",\n",
+ " options=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.\n",
+ " RESPONSE_RADIO,\n",
+ " name=\"sub_radio_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\"first_sub_radio_answer\")\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
+ " name=\"nested_response_checklist_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\n",
+ " \"first_checklist_answer\",\n",
+ " options=[\n",
+ " lb.PromptResponseClassification(\n",
+ " class_type=lb.PromptResponseClassification.\n",
+ " RESPONSE_CHECKLIST,\n",
+ " name=\"sub_checklist_question\",\n",
+ " options=[\n",
+ " lb.ResponseOption(\"first_sub_checklist_answer\")\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " )\n",
+ " ],\n",
+ " ),\n",
+ " ],\n",
+ ")\n",
+ "\n",
+ "# Create ontology\n",
+ "ontology = client.create_ontology(\n",
+ " \"Prompt and response ontology\",\n",
+ " ontology_builder.asdict(),\n",
+ " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
+ ")"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Create the annotations payload\n",
"\n",
"For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter. For ground truths, pass the payload to the `labels` parameter."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# Python annotation objects\nlabel = []\nannotations = [\n prompt_annotation,\n response_radio_annotation,\n response_checklist_annotation,\n response_text_annotation,\n nested_response_radio_annotation,\n nested_response_checklist_annotation,\n]\nlabel.append(\n lb_types.Label(data={\"global_key\": global_keys[0]},\n annotations=annotations))\n\n# NDJSON\nlabel_ndjson = []\nannotations = [\n prompt_annotation_ndjson,\n response_radio_annotation_ndjson,\n response_checklist_annotation_ndjson,\n response_text_annotation_ndjson,\n nested_response_radio_annotation_ndjson,\n nested_response_checklist_annotation_ndjson,\n]\nfor annotation in annotations:\n annotation.update({\n \"dataRow\": {\n \"globalKey\": global_keys[0]\n },\n })\n label_ndjson.append(annotation)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
- },
- {
+ "source": [
+ "# Python annotation objects\n",
+ "label = []\n",
+ "annotations = [\n",
+ " prompt_annotation,\n",
+ " response_radio_annotation,\n",
+ " response_checklist_annotation,\n",
+ " response_text_annotation,\n",
+ " nested_response_radio_annotation,\n",
+ " nested_response_checklist_annotation,\n",
+ "]\n",
+ "label.append(\n",
+ " lb_types.Label(data={\"global_key\": global_keys[0]},\n",
+ " annotations=annotations))\n",
+ "\n",
+ "# NDJSON\n",
+ "label_ndjson = []\n",
+ "annotations = [\n",
+ " prompt_annotation_ndjson,\n",
+ " response_radio_annotation_ndjson,\n",
+ " response_checklist_annotation_ndjson,\n",
+ " response_text_annotation_ndjson,\n",
+ " nested_response_radio_annotation_ndjson,\n",
+ " nested_response_checklist_annotation_ndjson,\n",
+ "]\n",
+ "for annotation in annotations:\n",
+ " annotation.update({\n",
+ " \"dataRow\": {\n",
+ " \"globalKey\": global_keys[0]\n",
+ " },\n",
+ " })\n",
+ " label_ndjson.append(annotation)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Option A: Upload as [prelabels (model assisted labeling)](doc:model-assisted-labeling)\n",
"\n",
"This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "upload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=f\"mal_job-{str(uuid.uuid4())}\",\n predictions=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "upload_job = lb.MALPredictionImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=prompt_response_project.uid,\n",
+ " name=f\"mal_job-{str(uuid.uuid4())}\",\n",
+ " predictions=label,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"#### Option B: Upload to a labeling project as [ground truth](doc:import-ground-truth)\n",
"\n",
"This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort."
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "upload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label_ndjson,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "upload_job = lb.LabelImport.create_from_objects(\n",
+ " client=client,\n",
+ " project_id=prompt_response_project.uid,\n",
+ " name=\"label_import_job\" + str(uuid.uuid4()),\n",
+ " labels=label_ndjson,\n",
+ ")\n",
+ "\n",
+ "upload_job.wait_until_done()\n",
+ "print(\"Errors:\", upload_job.errors)\n",
+ "print(\"Status of uploads: \", upload_job.statuses)"
+ ]
},
{
+ "cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"\n",
"Uncomment and run the cell below to optionally delete Labelbox objects created"
- ],
- "cell_type": "markdown"
+ ]
},
{
- "metadata": {},
- "source": "# project.delete()\n# client.delete_unused_ontology(ontology.uid)",
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
"outputs": [],
- "execution_count": null
+ "source": [
+ "# project.delete()\n",
+ "# client.delete_unused_ontology(ontology.uid)"
+ ]
}
- ]
-}
\ No newline at end of file
+ ],
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
From 4a04b76a11093a4ec97a4f77cbc7c19d4bf256a5 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
Date: Mon, 29 Jul 2024 06:53:12 +0000
Subject: [PATCH 11/12] :art: Cleaned
---
annotation_import/prompt_response.ipynb | 478 +++++-------------------
1 file changed, 97 insertions(+), 381 deletions(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index f0193e4..b0136af 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -1,16 +1,18 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {},
"cells": [
{
- "cell_type": "markdown",
"metadata": {},
"source": [
- "\n",
- " \n",
+ " | ",
+ " ",
" | \n"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"\n",
@@ -22,19 +24,19 @@
" \n",
" | "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"# Prompt and response projects with MAL and Ground Truth\n",
"\n",
"This notebook is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Annotation payload types\n",
@@ -49,10 +51,10 @@
"- JSON\n",
" - Skips formatting annotation payload in the Labelbox Python annotation type.\n",
" - Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Label Import Types\n",
@@ -61,56 +63,47 @@
"\n",
"- [Model-assisted labeling (MAL)](https://docs.labelbox.com/docs/model-assisted-labeling) allows you to import computer-generated predictions and simple annotations created outside of Labelbox as pre-labels on an asset.\n",
"- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth) allows you to bulk import ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data can consolidate and migrate all annotations into Labelbox as a single source of truth."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Set up "
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "%pip install -q \"labelbox[data]\"",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "%pip install -q \"labelbox[data]\""
- ]
+ "execution_count": null
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "import labelbox as lb\nimport labelbox.types as lb_types\nimport uuid",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "import labelbox as lb\n",
- "import labelbox.types as lb_types\n",
- "import uuid"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Replace with your API key\n",
"\n",
"Replace the value of `API_KEY` with a valid [API key]([ref:create-api-key](https://docs.labelbox.com/reference/create-api-key)) to connect to the Labelbox client."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "API_KEY = None\nclient = lb.Client(api_key=API_KEY)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "API_KEY = None\n",
- "client = lb.Client(api_key=API_KEY)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Supported Annotations\n",
@@ -129,206 +122,94 @@
"- Response creation projects\n",
" - Radio\n",
" - Checklist"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Prompt text"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "prompt_annotation = lb_types.PromptClassificationAnnotation(\n name=\"Follow the prompt and select answers\",\n value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n)\n\nprompt_annotation_ndjson = {\n \"name\": \"Follow the prompt and select answers\",\n \"answer\": \"This is an example of a prompt\",\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "prompt_annotation = lb_types.PromptClassificationAnnotation(\n",
- " name=\"Follow the prompt and select answers\",\n",
- " value=lb_types.PromptText(answer=\"This is an example of a prompt\"),\n",
- ")\n",
- "\n",
- "prompt_annotation_ndjson = {\n",
- " \"name\": \"Follow the prompt and select answers\",\n",
- " \"answer\": \"This is an example of a prompt\",\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Responses"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Radio (single-choice)"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"response radio feature\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\")),\n)\n\nresponse_radio_annotation_ndjson = {\n \"name\": \"response radio feature\",\n \"answer\": {\n \"name\": \"first_radio_answer\"\n },\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"response radio feature\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\")),\n",
- ")\n",
- "\n",
- "response_radio_annotation_ndjson = {\n",
- " \"name\": \"response radio feature\",\n",
- " \"answer\": {\n",
- " \"name\": \"first_radio_answer\"\n",
- " },\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Checklist (multi-choice)"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"response checklist feature\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(name=\"option_1\"),\n lb_types.ClassificationAnswer(name=\"option_2\"),\n ]),\n)\n\nresponse_checklist_annotation_ndjson = {\n \"name\": \"response checklist feature\",\n \"answer\": [{\n \"name\": \"option_1\"\n }, {\n \"name\": \"option_2\"\n }],\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"response checklist feature\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(name=\"option_1\"),\n",
- " lb_types.ClassificationAnswer(name=\"option_2\"),\n",
- " ]),\n",
- ")\n",
- "\n",
- "response_checklist_annotation_ndjson = {\n",
- " \"name\": \"response checklist feature\",\n",
- " \"answer\": [{\n",
- " \"name\": \"option_1\"\n",
- " }, {\n",
- " \"name\": \"option_2\"\n",
- " }],\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Response text"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "response_text_annotation = lb_types.ClassificationAnnotation(\n name=\"Provide a reason for your choice\",\n value=lb_types.Text(answer=\"This is an example of a response text\"),\n)\n\nresponse_text_annotation_ndjson = {\n \"name\": \"Provide a reason for your choice\",\n \"answer\": \"This is an example of a response text\",\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "response_text_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"Provide a reason for your choice\",\n",
- " value=lb_types.Text(answer=\"This is an example of a response text\"),\n",
- ")\n",
- "\n",
- "response_text_annotation_ndjson = {\n",
- " \"name\": \"Provide a reason for your choice\",\n",
- " \"answer\": \"This is an example of a response text\",\n",
- "}"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Nested classifications"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_radio_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_radio_question\",\n value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n name=\"first_sub_radio_answer\")),\n )\n ],\n )),\n)\n\nnested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n name=\"nested_response_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_checklist_answer\",\n classifications=[\n lb_types.ClassificationAnnotation(\n name=\"sub_checklist_question\",\n value=lb_types.Checklist(answer=[\n lb_types.ClassificationAnswer(\n name=\"first_sub_checklist_answer\")\n ]),\n )\n ],\n )\n ]),\n)\n\nnested_response_radio_annotation_ndjson = {\n \"name\":\n \"nested_radio_question\",\n \"answer\": [{\n \"name\":\n \"first_radio_answer\",\n \"classifications\": [{\n \"name\": \"sub_radio_question\",\n \"answer\": {\n \"name\": \"first_sub_radio_answer\"\n },\n }],\n }],\n}\n\nnested_response_checklist_annotation_ndjson = {\n \"name\":\n \"nested_checklist_question\",\n \"answer\": [{\n \"name\":\n \"first_checklist_answer\",\n \"classifications\": [{\n \"name\": \"sub_checklist_question\",\n \"answer\": {\n \"name\": \"first_sub_checklist_answer\"\n },\n }],\n }],\n}",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "nested_response_radio_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_response_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_radio_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_radio_question\",\n",
- " value=lb_types.Radio(answer=lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_radio_answer\")),\n",
- " )\n",
- " ],\n",
- " )),\n",
- ")\n",
- "\n",
- "nested_response_checklist_annotation = lb_types.ClassificationAnnotation(\n",
- " name=\"nested_response_checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(\n",
- " name=\"first_checklist_answer\",\n",
- " classifications=[\n",
- " lb_types.ClassificationAnnotation(\n",
- " name=\"sub_checklist_question\",\n",
- " value=lb_types.Checklist(answer=[\n",
- " lb_types.ClassificationAnswer(\n",
- " name=\"first_sub_checklist_answer\")\n",
- " ]),\n",
- " )\n",
- " ],\n",
- " )\n",
- " ]),\n",
- ")\n",
- "\n",
- "nested_response_radio_annotation_ndjson = {\n",
- " \"name\":\n",
- " \"nested_radio_question\",\n",
- " \"answer\": [{\n",
- " \"name\":\n",
- " \"first_radio_answer\",\n",
- " \"classifications\": [{\n",
- " \"name\": \"sub_radio_question\",\n",
- " \"answer\": {\n",
- " \"name\": \"first_sub_radio_answer\"\n",
- " },\n",
- " }],\n",
- " }],\n",
- "}\n",
- "\n",
- "nested_response_checklist_annotation_ndjson = {\n",
- " \"name\":\n",
- " \"nested_checklist_question\",\n",
- " \"answer\": [{\n",
- " \"name\":\n",
- " \"first_checklist_answer\",\n",
- " \"classifications\": [{\n",
- " \"name\": \"sub_checklist_question\",\n",
- " \"answer\": {\n",
- " \"name\": \"first_sub_checklist_answer\"\n",
- " },\n",
- " }],\n",
- " }],\n",
- "}"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": null
+ },
+ {
"metadata": {},
"source": [
"## Step 1: Create a project and data rows using the Labelbox UI\n",
@@ -336,46 +217,26 @@
"Each type of the prompt and response generation project requires different setup. See [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) for more details on the differences.\n",
"\n",
"In this tutorial, we will show how to import annotations for a prompt and response creation (humans generate prompts and responses) project. The process is also similar for prompt creation (humans generate prompts) and response creation (humans generate responses to uploaded prompts) projects. See [import prompt and response annotations](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) for a tutorial and more examples on other project types."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"### Prompt response and prompt creation\n",
"\n",
"A prompts and responses creation project automatically generates empty data rows upon creation. You will then need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows by exporting them from the created project."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "prompt_response_project = client.create_model_evaluation_project(\n name=\"Demo prompt response project\",\n media_type=lb.MediaType.LLMPromptResponseCreation,\n dataset_name=\"Demo prompt response dataset\",\n data_row_count=1,\n)\n\nexport_task = prompt_response_project.export()\nexport_task.wait_till_done()\n\n# Check export for any errors\nif export_task.has_errors():\n export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n stream_handler=lambda error: print(error))\n\nstream = export_task.get_buffered_stream()\n\n# Obtain global keys to be used later on\nglobal_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "prompt_response_project = client.create_model_evaluation_project(\n",
- " name=\"Demo prompt response project\",\n",
- " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
- " dataset_name=\"Demo prompt response dataset\",\n",
- " data_row_count=1,\n",
- ")\n",
- "\n",
- "export_task = prompt_response_project.export()\n",
- "export_task.wait_till_done()\n",
- "\n",
- "# Check export for any errors\n",
- "if export_task.has_errors():\n",
- " export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(\n",
- " stream_handler=lambda error: print(error))\n",
- "\n",
- "stream = export_task.get_buffered_stream()\n",
- "\n",
- "# Obtain global keys to be used later on\n",
- "global_keys = [dr.json[\"data_row\"][\"global_key\"] for dr in stream]"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Set up ontology\n",
@@ -385,224 +246,79 @@
"For example, if you provide a name`annotation_name` for your created annotation, you need to name the bounding box tool as `anotations_name` when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.\n",
"\n",
"This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations)."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "ontology_builder = lb.OntologyBuilder(\n tools=[],\n classifications=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.PROMPT,\n name=\"prompt text\",\n character_min=1, # Minimum character count of prompt field (optional)\n character_max=\n 20, # Maximum character count of prompt field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"response checklist feature\",\n options=[\n lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"response radio feature\",\n options=[\n lb.ResponseOption(value=\"first_radio_answer\"),\n lb.ResponseOption(value=\"second_radio_answer\"),\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n name=\"response text\",\n character_min=\n 1, # Minimum character count of response text field (optional)\n character_max=\n 20, # Maximum character count of response text field (optional)\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n name=\"nested_response_radio_question\",\n options=[\n lb.ResponseOption(\n \"first_radio_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_RADIO,\n name=\"sub_radio_question\",\n options=[\n lb.ResponseOption(\"first_sub_radio_answer\")\n ],\n )\n ],\n )\n ],\n ),\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n name=\"nested_response_checklist_question\",\n options=[\n lb.ResponseOption(\n \"first_checklist_answer\",\n options=[\n lb.PromptResponseClassification(\n class_type=lb.PromptResponseClassification.\n RESPONSE_CHECKLIST,\n name=\"sub_checklist_question\",\n options=[\n lb.ResponseOption(\"first_sub_checklist_answer\")\n ],\n )\n ],\n )\n ],\n ),\n ],\n)\n\n# Create ontology\nontology = client.create_ontology(\n \"Prompt and response ontology\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.LLMPromptResponseCreation,\n)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "ontology_builder = lb.OntologyBuilder(\n",
- " tools=[],\n",
- " classifications=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.PROMPT,\n",
- " name=\"prompt text\",\n",
- " character_min=1, # Minimum character count of prompt field (optional)\n",
- " character_max=\n",
- " 20, # Maximum character count of prompt field (optional)\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
- " name=\"response checklist feature\",\n",
- " options=[\n",
- " lb.ResponseOption(value=\"option_1\", label=\"option_1\"),\n",
- " lb.ResponseOption(value=\"option_2\", label=\"option_2\"),\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
- " name=\"response radio feature\",\n",
- " options=[\n",
- " lb.ResponseOption(value=\"first_radio_answer\"),\n",
- " lb.ResponseOption(value=\"second_radio_answer\"),\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,\n",
- " name=\"response text\",\n",
- " character_min=\n",
- " 1, # Minimum character count of response text field (optional)\n",
- " character_max=\n",
- " 20, # Maximum character count of response text field (optional)\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,\n",
- " name=\"nested_response_radio_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\n",
- " \"first_radio_answer\",\n",
- " options=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.\n",
- " RESPONSE_RADIO,\n",
- " name=\"sub_radio_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\"first_sub_radio_answer\")\n",
- " ],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,\n",
- " name=\"nested_response_checklist_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\n",
- " \"first_checklist_answer\",\n",
- " options=[\n",
- " lb.PromptResponseClassification(\n",
- " class_type=lb.PromptResponseClassification.\n",
- " RESPONSE_CHECKLIST,\n",
- " name=\"sub_checklist_question\",\n",
- " options=[\n",
- " lb.ResponseOption(\"first_sub_checklist_answer\")\n",
- " ],\n",
- " )\n",
- " ],\n",
- " )\n",
- " ],\n",
- " ),\n",
- " ],\n",
- ")\n",
- "\n",
- "# Create ontology\n",
- "ontology = client.create_ontology(\n",
- " \"Prompt and response ontology\",\n",
- " ontology_builder.asdict(),\n",
- " media_type=lb.MediaType.LLMPromptResponseCreation,\n",
- ")"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Create the annotations payload\n",
"\n",
"For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter. For ground truths, pass the payload to the `labels` parameter."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# Python annotation objects\nlabel = []\nannotations = [\n prompt_annotation,\n response_radio_annotation,\n response_checklist_annotation,\n response_text_annotation,\n nested_response_radio_annotation,\n nested_response_checklist_annotation,\n]\nlabel.append(\n lb_types.Label(data={\"global_key\": global_keys[0]},\n annotations=annotations))\n\n# NDJSON\nlabel_ndjson = []\nannotations = [\n prompt_annotation_ndjson,\n response_radio_annotation_ndjson,\n response_checklist_annotation_ndjson,\n response_text_annotation_ndjson,\n nested_response_radio_annotation_ndjson,\n nested_response_checklist_annotation_ndjson,\n]\nfor annotation in annotations:\n annotation.update({\n \"dataRow\": {\n \"globalKey\": global_keys[0]\n },\n })\n label_ndjson.append(annotation)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# Python annotation objects\n",
- "label = []\n",
- "annotations = [\n",
- " prompt_annotation,\n",
- " response_radio_annotation,\n",
- " response_checklist_annotation,\n",
- " response_text_annotation,\n",
- " nested_response_radio_annotation,\n",
- " nested_response_checklist_annotation,\n",
- "]\n",
- "label.append(\n",
- " lb_types.Label(data={\"global_key\": global_keys[0]},\n",
- " annotations=annotations))\n",
- "\n",
- "# NDJSON\n",
- "label_ndjson = []\n",
- "annotations = [\n",
- " prompt_annotation_ndjson,\n",
- " response_radio_annotation_ndjson,\n",
- " response_checklist_annotation_ndjson,\n",
- " response_text_annotation_ndjson,\n",
- " nested_response_radio_annotation_ndjson,\n",
- " nested_response_checklist_annotation_ndjson,\n",
- "]\n",
- "for annotation in annotations:\n",
- " annotation.update({\n",
- " \"dataRow\": {\n",
- " \"globalKey\": global_keys[0]\n",
- " },\n",
- " })\n",
- " label_ndjson.append(annotation)"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": null
+ },
+ {
"metadata": {},
"source": [
"#### Option A: Upload as [prelabels (model assisted labeling)](doc:model-assisted-labeling)\n",
"\n",
"This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "upload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=f\"mal_job-{str(uuid.uuid4())}\",\n predictions=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "upload_job = lb.MALPredictionImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=prompt_response_project.uid,\n",
- " name=f\"mal_job-{str(uuid.uuid4())}\",\n",
- " predictions=label,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"#### Option B: Upload to a labeling project as [ground truth](doc:import-ground-truth)\n",
"\n",
"This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort."
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "upload_job = lb.LabelImport.create_from_objects(\n client=client,\n project_id=prompt_response_project.uid,\n name=\"label_import_job\" + str(uuid.uuid4()),\n labels=label_ndjson,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "upload_job = lb.LabelImport.create_from_objects(\n",
- " client=client,\n",
- " project_id=prompt_response_project.uid,\n",
- " name=\"label_import_job\" + str(uuid.uuid4()),\n",
- " labels=label_ndjson,\n",
- ")\n",
- "\n",
- "upload_job.wait_until_done()\n",
- "print(\"Errors:\", upload_job.errors)\n",
- "print(\"Status of uploads: \", upload_job.statuses)"
- ]
+ "execution_count": null
},
{
- "cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"\n",
"Uncomment and run the cell below to optionally delete Labelbox objects created"
- ]
+ ],
+ "cell_type": "markdown"
},
{
- "cell_type": "code",
- "execution_count": null,
"metadata": {},
+ "source": "# project.delete()\n# client.delete_unused_ontology(ontology.uid)",
+ "cell_type": "code",
"outputs": [],
- "source": [
- "# project.delete()\n",
- "# client.delete_unused_ontology(ontology.uid)"
- ]
+ "execution_count": null
}
- ],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ ]
+}
\ No newline at end of file
From 6a882cb2b4818dcac5840c23adf95d37f0eb103e Mon Sep 17 00:00:00 2001
From: x-eun
Date: Mon, 29 Jul 2024 10:12:10 -0700
Subject: [PATCH 12/12] Update prompt_response.ipynb
---
annotation_import/prompt_response.ipynb | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/annotation_import/prompt_response.ipynb b/annotation_import/prompt_response.ipynb
index f0193e4..c1e9b31 100644
--- a/annotation_import/prompt_response.ipynb
+++ b/annotation_import/prompt_response.ipynb
@@ -380,9 +380,9 @@
"source": [
"## Step 2: Set up ontology\n",
"\n",
- "Your project ontology should support the classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the `name` parameter need to match the value of the `name` field in your annotation. \n",
+ "Your project ontology needs to support the classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the `name` parameter needs to match the value of the `name` field in your annotation. \n",
"\n",
- "For example, if you provide a name`annotation_name` for your created annotation, you need to name the bounding box tool as `anotations_name` when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.\n",
+ "For example, if you provide a name `annotation_name` for your created annotation, you need to name the bounding box tool as `anotations_name` when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.\n",
"\n",
"This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations)."
]