diff --git a/docs/credits.rst b/docs/credits.rst index bb7b7ddcc..0248d8a54 100644 --- a/docs/credits.rst +++ b/docs/credits.rst @@ -43,7 +43,6 @@ To purchase credits, navigate to the `Credits `_. + +.. code-block:: python + + # Import modules from EDSL + from edsl import ( + QuestionYesNo, + QuestionNumerical, + QuestionLinearScale, + Survey, + Agent, + Model, + Coop + ) + + # Create a survey with different question types + q1 = QuestionYesNo( + question_name="drive", + question_text="Do you drive?" + ) + + q2 = QuestionNumerical( + question_name="count", + question_text="How many vehicles do you currently own or lease?", + ) + + q3 = QuestionLinearScale( + question_name="enjoy", + question_text="On a scale from 1 to 10, how much do you enjoy driving?", + question_options=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], + option_labels={1: "Hate it", 10: "Love it"}, + ) + + # Create a survey with the questions + survey = Survey(questions=[q1, q2, q3]) + + # Create an AI agent to respond to the survey + agent = Agent( + traits={ + "persona": "You are a middle-aged mom working on a software startup.", + "location": "Massachusetts", + } + ) + + # Select a language model to generate the responses + model = Model("gemini-1.5-pro", service_name="google") + + # Run the survey with the AI agent and model + results = survey.by(agent).by(model).run() + + # Generate a web-based version of the survey for human respondents + web_survey_info = survey.humanize() + + # Create a Coop instance + coop = Coop() + + # Get human responses from Coop + human_responses = coop.get_project_human_responses(web_survey_info["uuid"]) + + # Combine results (you can add Results objects for the same survey) + combined_results = results + human_results + + +*We are continually adding features for launching hybrid LLM and human surveys, so check back for updates!* +*If you are interested in testing new features please reach out at anytime for credits and access.* + diff --git a/docs/index.rst b/docs/index.rst index 1b672e2b3..4bbf2cfbe 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -5,11 +5,11 @@ Expected Parrot: Tools for AI-Powered Research Expected Parrot delivers powerful tools for conducting research with human and artificial intelligences. -This page provides documentation for **Expected Parrot Domain-Specific Language (EDSL)**, a Python package for performing research with AI agents and language models, -and **Coop**, a platform for creating, storing and sharing AI-based research projects. +This page provides documentation for **Expected Parrot Domain-Specific Language (EDSL)**, an open-source Python package for performing research with AI agents and language models, +and **Coop**, a platform for creating, storing and sharing AI research projects. * EDSL is available to download from `PyPI `_ (run `pip install edsl`). The source code is available at `GitHub `_. -* `Create an account `_ to post and share content, run surveys and store results at the Expected Parrot survey. Learn more about `how it works `_ and start `exploring `_. +* `Create an account `_ to post and share content, run surveys with LLMs and humans, and store results at the Expected Parrot server. Learn more about `how it works `_ and start `exploring `_. Key features @@ -94,6 +94,10 @@ Please see the links in the steps below for more details: Read the :ref:`starter_tutorial` and `download a notebook `_ to create a survey and run it. See examples for many use cases and `tips `_ on using EDSL effectively in the documentation. +5. **Validate with real respondents.** + + You can run surveys with real respondents using the Coop platform or at your workspace. + Learn about methods for generating web-based surveys and collecting responses in the :ref:`survey_builder` and :ref:`humanize` sections. Join our `Discord channel `_ to ask questions and chat with other users! @@ -142,6 +146,7 @@ Coop It is fully integrated with EDSL and provides access to special features for working with AI agents and language models, free storage and collaboration tools, including: - :ref:`survey_builder`: A user-friendly no-code interface for creating surveys and gathering responses from humans and AI agents. +- :ref:`humanize`: Generate web-based surveys and collect responses from human respondents. - :ref:`remote_inference`: Access all available language models and run surveys at the Expected Parrot server. - :ref:`remote_caching`: Automatically store results and API calls at the Expected Parrot server. - :ref:`notebooks` & :ref:`colab_notebooks`: Easily post and share `.ipynb` and `.py` files to the Coop and access with Colab. @@ -173,6 +178,7 @@ Examples of special methods and use cases for EDSL, including: - Conducting agent conversations - Converting surveys into EDSL - Cognitive testing +- Validating LLM answers with humans - Research methods @@ -227,6 +233,7 @@ Links :hidden: results + humanize dataset data exceptions @@ -285,6 +292,8 @@ Links :caption: Notebooks :hidden: + notebooks/data_labeling_validation_example.ipynb + notebooks/reasoning_model_example.ipynb notebooks/next_token_probs.ipynb notebooks/summarizing_transcripts.ipynb notebooks/analyze_evaluations.ipynb diff --git a/docs/language_models.rst b/docs/language_models.rst index d8451af4f..f7be3716f 100644 --- a/docs/language_models.rst +++ b/docs/language_models.rst @@ -56,6 +56,9 @@ Output: * - xai +*Note*: We recently added support for OpenAI reasoning models. See an example notebook for usage `here `_. +Use `service_name = "openai_v2"` when using these models. The `Results` that are generated with reasoning models include additional fields for reasoning summaries. + .. Available models .. ---------------- diff --git a/docs/notebooks/data_labeling_validation_example.ipynb b/docs/notebooks/data_labeling_validation_example.ipynb new file mode 100644 index 000000000..0e053eb03 --- /dev/null +++ b/docs/notebooks/data_labeling_validation_example.ipynb @@ -0,0 +1,2497 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "76703153-7227-4be0-a3d3-e1595ea094bf", + "metadata": {}, + "source": [ + "# Data labeling with LLMs, validating with humans\n", + "This notebook provides example [EDSL](https://github.com/expectedparrot/edsl) code for conducting a data labeling task with large language models and validating responses with humans.\n", + "The example below consists of the following steps, which can be conducted entirely in EDSL code or interactively at your [Coop account](https://www.expectedparrot.com):\n", + "\n", + "* Construct questions about a dataset, using a placeholder in each question for the individual piece of data to be labeled (each answer is a \"label\" for a piece of data)\n", + "* Combine the questions in a survey to administer them together\n", + "* *Optionally* create AI agent personas to answer the questions (e.g., if there is relevant expertise or background for the task)\n", + "* Select language models to generate the answers (for the agents, or without referencing any AI personas)\n", + "* Run the survey with the data, agents and models to generate a formatted dataset of results\n", + "* Select questions and data that you want to validate with humans to create a subset of your survey (or leave it unchanged to run the entire survey with humans)\n", + "* Send a web-based version of the survey to human respondents\n", + "* Compare LLM and human answers, and iterate on the data labeling survey as needed!\n", + "\n", + "Before running the code below please see instructions on [getting started](https://www.expectedparrot.com/en/latest/getting-started) using Expected Parrot tools for AI research." + ] + }, + { + "cell_type": "markdown", + "id": "25a4ef4a-7bd1-47bd-a7da-70a9cb79286d", + "metadata": {}, + "source": [ + "## Construct questions about a dataset\n", + "We start by creating questions about a dataset, where each answer will provide a \"label\" for each piece of data. \n", + "EDSL comes with many [common question types](https://docs.expectedparrot.com/en/latest/questions.html) that we can choose from based on the form of the response that we want to get back from a model (multiple choice, linear scale, matrix, etc.).\n", + "\n", + "We use a \"scenario\" placeholder in each question text for data that we want to add to it.\n", + "This method allows us to efficiently readminister a question for each piece of data.\n", + "[Scenarios](https://docs.expectedparrot.com/en/latest/scenarios.html) can be created from many types of data, including PNG, PDF, CSV, docs, lists, tables, videos, and other types.\n", + "\n", + "We combine the questions in a [survey](https://docs.expectedparrot.com/en/latest/surveys.html) in order to administer them together, asynchronously by default, or else according to any [logic or rules](https://docs.expectedparrot.com/en/latest/surveys.html#survey-rules-logic) that we want to add (e.g., skip/stop rules)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "8df971a0-20d0-4f48-a7c6-ef12ecf429d6", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import ScenarioList, QuestionList, QuestionNumerical, Survey\n", + "\n", + "q1 = QuestionList(\n", + " question_name = \"characters\",\n", + " question_text = \"Name all of the characters in this show: {{ scenario.show }}\"\n", + ")\n", + "\n", + "q2 = QuestionNumerical(\n", + " question_name = \"years\",\n", + " question_text = \"Identify the year this show first aired: {{ scenario.show }}\"\n", + ")\n", + "\n", + "scenarios = ScenarioList.from_source(\"list\", \"show\", [\"The Simpsons\", \"South Park\", \"I Love Lucy\"])\n", + "\n", + "questions = q1.loop(scenarios) + q2.loop(scenarios)\n", + "\n", + "survey = Survey(questions)" + ] + }, + { + "cell_type": "markdown", + "id": "5c21b4a8-c4ba-4826-bf0d-7b151456a231", + "metadata": {}, + "source": [ + "## Generate data \"labels\" using LLMs\n", + "EDSL allows us to [specify the models](https://docs.expectedparrot.com/en/latest/language_models.html) that we want to use to answer the questions, and optionally [design AI agent personas](https://docs.expectedparrot.com/en/latest/agents.html) for the models to reference in answering the questions.\n", + "This can be useful if you want to reference specific expertise that is relevant to the labeling task.\n", + "\n", + "We administer the questions by adding the scenarios, agents and models to the survey and calling the `run()` method.\n", + "This generates a formatted dataset of `Results` that we can analyze with [built-in methods for working with results](https://docs.expectedparrot.com/en/latest/results.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ba31b583-d56b-4d23-b98b-5cf2846c44a7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + " Job Status 🦜\n", + "
\n", + "
Completed (6 completed, 0 failed)
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "
\n", + "
Job Links
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
Content
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
Identifiers
\n", + "
\n", + " \n", + "
\n", + " Results UUID:\n", + "
\n", + "
\n", + " 964a459b...a492\n", + " \n", + " \n", + " \n", + "
\n", + "
Use Results.pull(uuid) to fetch results.
\n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + " Job UUID:\n", + "
\n", + "
\n", + " e9fc9270...bbff\n", + " \n", + " \n", + " \n", + "
\n", + "
Use Jobs.pull(uuid) to fetch job.
\n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " ✓\n", + " Status: Completed\n", + "
\n", + "
\n", + " \n", + " Last updated: 2025-05-27 17:25:28\n", + "
\n", + " \n", + " \n", + "
\n", + " \n", + "
\n", + " 17:25:28\n", + " \n", + "
Job completed and Results stored on Coop. View Results
\n", + "
\n", + " \n", + "
\n", + " 17:25:22\n", + " \n", + "
Job status: running - last update: 2025-05-27 05:25:22 PM
\n", + "
\n", + " \n", + "
\n", + " 17:25:18\n", + " \n", + "
Job status: running - last update: 2025-05-27 05:25:18 PM
\n", + "
\n", + " \n", + "
\n", + " 17:25:14\n", + " \n", + "
Job status: queued - last update: 2025-05-27 05:25:14 PM
\n", + "
\n", + " \n", + "
\n", + " 17:25:13\n", + " \n", + "
View job progress here
\n", + "
\n", + " \n", + "
\n", + " 17:25:13\n", + " \n", + "
Job details are available at your Coop account. Go to Remote Inference page
\n", + "
\n", + " \n", + "
\n", + " 17:25:13\n", + " \n", + "
Job sent to server. (Job uuid=e9fc9270-b411-4fb3-a855-fb020446bbff).
\n", + "
\n", + " \n", + "
\n", + " 17:25:13\n", + " \n", + "
Your survey is running at the Expected Parrot server...
\n", + "
\n", + " \n", + "
\n", + " 17:25:13\n", + " \n", + "
Remote inference activated. Sending job to server...
\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + " Model Costs ($0.0197 / 1.97 credits total)\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ServiceModelInput TokensInput CostOutput TokensOutput CostTotal CostTotal Credits
googlegemini-1.5-flash1,995$0.00021,854$0.0006$0.00080.08
openaigpt-4o2,112$0.00531,353$0.0136$0.01891.89
Totals4,107$0.00553,207$0.0142$0.01971.97
\n", + "

\n", + " You can obtain the total credit cost by multiplying the total USD cost by 100. A lower credit cost indicates that you saved money by retrieving responses from the universal remote cache.\n", + "

\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from edsl import Agent, AgentList, Model, ModelList\n", + "\n", + "agents = AgentList([\n", + " Agent(traits = {\"persona\":\"You watch a lot of TV.\"})\n", + "])\n", + "\n", + "models = ModelList([\n", + " Model(\"gemini-1.5-flash\", service_name = \"google\"),\n", + " Model(\"gpt-4o\", service_name = \"openai\")\n", + "])\n", + "\n", + "results = survey.by(scenarios).by(agents).by(models).run()" + ] + }, + { + "cell_type": "markdown", + "id": "683c8a23-719d-481a-ae79-10dd44e7d24a", + "metadata": {}, + "source": [ + "Results are accessible at your Coop account (see link above ^) and at your workspace. \n", + "We can inspect a list of all the components of the results:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "aeb218bf-0a77-43b8-8cb7-c89b8604b5d7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 0
0agent.agent_index
1agent.agent_instruction
2agent.agent_name
3agent.persona
4answer.characters_0
5answer.characters_1
6answer.characters_2
7answer.years_0
8answer.years_1
9answer.years_2
10cache_keys.characters_0_cache_key
11cache_keys.characters_1_cache_key
12cache_keys.characters_2_cache_key
13cache_keys.years_0_cache_key
14cache_keys.years_1_cache_key
15cache_keys.years_2_cache_key
16cache_used.characters_0_cache_used
17cache_used.characters_1_cache_used
18cache_used.characters_2_cache_used
19cache_used.years_0_cache_used
20cache_used.years_1_cache_used
21cache_used.years_2_cache_used
22comment.characters_0_comment
23comment.characters_1_comment
24comment.characters_2_comment
25comment.years_0_comment
26comment.years_1_comment
27comment.years_2_comment
28generated_tokens.characters_0_generated_tokens
29generated_tokens.characters_1_generated_tokens
30generated_tokens.characters_2_generated_tokens
31generated_tokens.years_0_generated_tokens
32generated_tokens.years_1_generated_tokens
33generated_tokens.years_2_generated_tokens
34iteration.iteration
35model.frequency_penalty
36model.inference_service
37model.logprobs
38model.maxOutputTokens
39model.max_tokens
40model.model
41model.model_index
42model.presence_penalty
43model.stopSequences
44model.temperature
45model.topK
46model.topP
47model.top_logprobs
48model.top_p
49prompt.characters_0_system_prompt
50prompt.characters_0_user_prompt
51prompt.characters_1_system_prompt
52prompt.characters_1_user_prompt
53prompt.characters_2_system_prompt
54prompt.characters_2_user_prompt
55prompt.years_0_system_prompt
56prompt.years_0_user_prompt
57prompt.years_1_system_prompt
58prompt.years_1_user_prompt
59prompt.years_2_system_prompt
60prompt.years_2_user_prompt
61question_options.characters_0_question_options
62question_options.characters_1_question_options
63question_options.characters_2_question_options
64question_options.years_0_question_options
65question_options.years_1_question_options
66question_options.years_2_question_options
67question_text.characters_0_question_text
68question_text.characters_1_question_text
69question_text.characters_2_question_text
70question_text.years_0_question_text
71question_text.years_1_question_text
72question_text.years_2_question_text
73question_type.characters_0_question_type
74question_type.characters_1_question_type
75question_type.characters_2_question_type
76question_type.years_0_question_type
77question_type.years_1_question_type
78question_type.years_2_question_type
79raw_model_response.characters_0_cost
80raw_model_response.characters_0_input_price_per_million_tokens
81raw_model_response.characters_0_input_tokens
82raw_model_response.characters_0_one_usd_buys
83raw_model_response.characters_0_output_price_per_million_tokens
84raw_model_response.characters_0_output_tokens
85raw_model_response.characters_0_raw_model_response
86raw_model_response.characters_1_cost
87raw_model_response.characters_1_input_price_per_million_tokens
88raw_model_response.characters_1_input_tokens
89raw_model_response.characters_1_one_usd_buys
90raw_model_response.characters_1_output_price_per_million_tokens
91raw_model_response.characters_1_output_tokens
92raw_model_response.characters_1_raw_model_response
93raw_model_response.characters_2_cost
94raw_model_response.characters_2_input_price_per_million_tokens
95raw_model_response.characters_2_input_tokens
96raw_model_response.characters_2_one_usd_buys
97raw_model_response.characters_2_output_price_per_million_tokens
98raw_model_response.characters_2_output_tokens
99raw_model_response.characters_2_raw_model_response
100raw_model_response.years_0_cost
101raw_model_response.years_0_input_price_per_million_tokens
102raw_model_response.years_0_input_tokens
103raw_model_response.years_0_one_usd_buys
104raw_model_response.years_0_output_price_per_million_tokens
105raw_model_response.years_0_output_tokens
106raw_model_response.years_0_raw_model_response
107raw_model_response.years_1_cost
108raw_model_response.years_1_input_price_per_million_tokens
109raw_model_response.years_1_input_tokens
110raw_model_response.years_1_one_usd_buys
111raw_model_response.years_1_output_price_per_million_tokens
112raw_model_response.years_1_output_tokens
113raw_model_response.years_1_raw_model_response
114raw_model_response.years_2_cost
115raw_model_response.years_2_input_price_per_million_tokens
116raw_model_response.years_2_input_tokens
117raw_model_response.years_2_one_usd_buys
118raw_model_response.years_2_output_price_per_million_tokens
119raw_model_response.years_2_output_tokens
120raw_model_response.years_2_raw_model_response
121reasoning_summary.characters_0_reasoning_summary
122reasoning_summary.characters_1_reasoning_summary
123reasoning_summary.characters_2_reasoning_summary
124reasoning_summary.years_0_reasoning_summary
125reasoning_summary.years_1_reasoning_summary
126reasoning_summary.years_2_reasoning_summary
127scenario.scenario_index
128scenario.show
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "PrettyList(['agent.agent_index',\n", + " 'agent.agent_instruction',\n", + " 'agent.agent_name',\n", + " 'agent.persona',\n", + " 'answer.characters_0',\n", + " 'answer.characters_1',\n", + " 'answer.characters_2',\n", + " 'answer.years_0',\n", + " 'answer.years_1',\n", + " 'answer.years_2',\n", + " 'cache_keys.characters_0_cache_key',\n", + " 'cache_keys.characters_1_cache_key',\n", + " 'cache_keys.characters_2_cache_key',\n", + " 'cache_keys.years_0_cache_key',\n", + " 'cache_keys.years_1_cache_key',\n", + " 'cache_keys.years_2_cache_key',\n", + " 'cache_used.characters_0_cache_used',\n", + " 'cache_used.characters_1_cache_used',\n", + " 'cache_used.characters_2_cache_used',\n", + " 'cache_used.years_0_cache_used',\n", + " 'cache_used.years_1_cache_used',\n", + " 'cache_used.years_2_cache_used',\n", + " 'comment.characters_0_comment',\n", + " 'comment.characters_1_comment',\n", + " 'comment.characters_2_comment',\n", + " 'comment.years_0_comment',\n", + " 'comment.years_1_comment',\n", + " 'comment.years_2_comment',\n", + " 'generated_tokens.characters_0_generated_tokens',\n", + " 'generated_tokens.characters_1_generated_tokens',\n", + " 'generated_tokens.characters_2_generated_tokens',\n", + " 'generated_tokens.years_0_generated_tokens',\n", + " 'generated_tokens.years_1_generated_tokens',\n", + " 'generated_tokens.years_2_generated_tokens',\n", + " 'iteration.iteration',\n", + " 'model.frequency_penalty',\n", + " 'model.inference_service',\n", + " 'model.logprobs',\n", + " 'model.maxOutputTokens',\n", + " 'model.max_tokens',\n", + " 'model.model',\n", + " 'model.model_index',\n", + " 'model.presence_penalty',\n", + " 'model.stopSequences',\n", + " 'model.temperature',\n", + " 'model.topK',\n", + " 'model.topP',\n", + " 'model.top_logprobs',\n", + " 'model.top_p',\n", + " 'prompt.characters_0_system_prompt',\n", + " 'prompt.characters_0_user_prompt',\n", + " 'prompt.characters_1_system_prompt',\n", + " 'prompt.characters_1_user_prompt',\n", + " 'prompt.characters_2_system_prompt',\n", + " 'prompt.characters_2_user_prompt',\n", + " 'prompt.years_0_system_prompt',\n", + " 'prompt.years_0_user_prompt',\n", + " 'prompt.years_1_system_prompt',\n", + " 'prompt.years_1_user_prompt',\n", + " 'prompt.years_2_system_prompt',\n", + " 'prompt.years_2_user_prompt',\n", + " 'question_options.characters_0_question_options',\n", + " 'question_options.characters_1_question_options',\n", + " 'question_options.characters_2_question_options',\n", + " 'question_options.years_0_question_options',\n", + " 'question_options.years_1_question_options',\n", + " 'question_options.years_2_question_options',\n", + " 'question_text.characters_0_question_text',\n", + " 'question_text.characters_1_question_text',\n", + " 'question_text.characters_2_question_text',\n", + " 'question_text.years_0_question_text',\n", + " 'question_text.years_1_question_text',\n", + " 'question_text.years_2_question_text',\n", + " 'question_type.characters_0_question_type',\n", + " 'question_type.characters_1_question_type',\n", + " 'question_type.characters_2_question_type',\n", + " 'question_type.years_0_question_type',\n", + " 'question_type.years_1_question_type',\n", + " 'question_type.years_2_question_type',\n", + " 'raw_model_response.characters_0_cost',\n", + " 'raw_model_response.characters_0_input_price_per_million_tokens',\n", + " 'raw_model_response.characters_0_input_tokens',\n", + " 'raw_model_response.characters_0_one_usd_buys',\n", + " 'raw_model_response.characters_0_output_price_per_million_tokens',\n", + " 'raw_model_response.characters_0_output_tokens',\n", + " 'raw_model_response.characters_0_raw_model_response',\n", + " 'raw_model_response.characters_1_cost',\n", + " 'raw_model_response.characters_1_input_price_per_million_tokens',\n", + " 'raw_model_response.characters_1_input_tokens',\n", + " 'raw_model_response.characters_1_one_usd_buys',\n", + " 'raw_model_response.characters_1_output_price_per_million_tokens',\n", + " 'raw_model_response.characters_1_output_tokens',\n", + " 'raw_model_response.characters_1_raw_model_response',\n", + " 'raw_model_response.characters_2_cost',\n", + " 'raw_model_response.characters_2_input_price_per_million_tokens',\n", + " 'raw_model_response.characters_2_input_tokens',\n", + " 'raw_model_response.characters_2_one_usd_buys',\n", + " 'raw_model_response.characters_2_output_price_per_million_tokens',\n", + " 'raw_model_response.characters_2_output_tokens',\n", + " 'raw_model_response.characters_2_raw_model_response',\n", + " 'raw_model_response.years_0_cost',\n", + " 'raw_model_response.years_0_input_price_per_million_tokens',\n", + " 'raw_model_response.years_0_input_tokens',\n", + " 'raw_model_response.years_0_one_usd_buys',\n", + " 'raw_model_response.years_0_output_price_per_million_tokens',\n", + " 'raw_model_response.years_0_output_tokens',\n", + " 'raw_model_response.years_0_raw_model_response',\n", + " 'raw_model_response.years_1_cost',\n", + " 'raw_model_response.years_1_input_price_per_million_tokens',\n", + " 'raw_model_response.years_1_input_tokens',\n", + " 'raw_model_response.years_1_one_usd_buys',\n", + " 'raw_model_response.years_1_output_price_per_million_tokens',\n", + " 'raw_model_response.years_1_output_tokens',\n", + " 'raw_model_response.years_1_raw_model_response',\n", + " 'raw_model_response.years_2_cost',\n", + " 'raw_model_response.years_2_input_price_per_million_tokens',\n", + " 'raw_model_response.years_2_input_tokens',\n", + " 'raw_model_response.years_2_one_usd_buys',\n", + " 'raw_model_response.years_2_output_price_per_million_tokens',\n", + " 'raw_model_response.years_2_output_tokens',\n", + " 'raw_model_response.years_2_raw_model_response',\n", + " 'reasoning_summary.characters_0_reasoning_summary',\n", + " 'reasoning_summary.characters_1_reasoning_summary',\n", + " 'reasoning_summary.characters_2_reasoning_summary',\n", + " 'reasoning_summary.years_0_reasoning_summary',\n", + " 'reasoning_summary.years_1_reasoning_summary',\n", + " 'reasoning_summary.years_2_reasoning_summary',\n", + " 'scenario.scenario_index',\n", + " 'scenario.show'])" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.columns" + ] + }, + { + "cell_type": "markdown", + "id": "285ea482-6aca-4fdc-9945-2eebe545ff65", + "metadata": {}, + "source": [ + "Here we select components to display in a table:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "2f592156-3ec5-4dc9-bd8c-ae4924b3ad38", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 model.modelagent.personaanswer.characters_0answer.years_0answer.characters_1answer.years_1answer.characters_2answer.years_2
0gemini-1.5-flashYou watch a lot of TV.['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Grandpa Simpson', 'Apu Nahasapeemapetilon', 'Ned Flanders', 'Moe Szyslak', 'Barney Gumble', 'Chief Wiggum', 'Krusty the Clown', 'Milhouse Van Houten', 'Nelson Muntz', 'Lenny Leonard', 'Carl Carlson', 'Smithers', 'Burns', 'Sideshow Bob', 'Ralph Wiggum']1989['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Randy Marsh', 'Sharon Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Butters Stotch', 'Chef', 'Mr. Garrison', 'Mr. Mackey', 'Jimbo Kern', 'Ned Gerblansky', 'Officer Barbrady', 'Liane Cartman', 'Scott Tenorman', 'Wendy Testaburger', 'Heidi Turner', 'Token Black', 'Clyde Donovan', 'Craig Tucker', 'Tweek Tweak', 'Timmy Burch', 'Kevin Stoley', 'Ike Broflovski', 'Leopold Stotch']1997['Lucy Ricardo', 'Ricky Ricardo', 'Fred Mertz', 'Ethel Mertz', 'Little Ricky Ricardo', 'Mr. and Mrs. Howard']1951
1gpt-4oYou watch a lot of TV.['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abe Simpson', 'Ned Flanders', 'Milhouse Van Houten', 'Mr. Burns', 'Waylon Smithers', 'Krusty the Clown', 'Principal Skinner', 'Moe Szyslak', 'Barney Gumble', 'Chief Wiggum', 'Ralph Wiggum', 'Apu Nahasapeemapetilon', 'Sideshow Bob', 'Edna Krabappel', 'Patty Bouvier', 'Selma Bouvier', 'Comic Book Guy', 'Nelson Muntz', 'Groundskeeper Willie', 'Lenny Leonard', 'Carl Carlson', 'Dr. Hibbert', 'Reverend Lovejoy', 'Mayor Quimby', 'Martin Prince']1989['Eric Cartman', 'Stan Marsh', 'Kyle Broflovski', 'Kenny McCormick', 'Randy Marsh', 'Butters Stotch', 'Mr. Garrison', 'Chef', 'Wendy Testaburger', 'Mr. Mackey']1997['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz']1951
2gemini-1.5-flashYou watch a lot of TV.['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abe Simpson', 'Clancy Wiggum', 'Ned Flanders', 'Moe Szyslak', 'Apu Nahasapeemapetilon', 'Krusty the Clown', 'Milhouse Van Houten', 'Barney Gumble', 'Lenny Leonard', 'Carl Carlson']1989['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Randy Marsh', 'Sharon Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Butters Stotch', 'Chef', 'Mr. Garrison', 'Mr. Mackey', 'Jimbo Kern', 'Ned Gerblansky', 'Officer Barbrady', 'Liane Cartman', 'Scott Tenorman', 'Wendy Testaburger', 'Heidi Turner', 'Token Black', 'Clyde Donovan', 'Craig Tucker', 'Tweek Tweak', 'Timmy Burch', 'Kevin Stoley', 'Ike Broflovski', 'Leopold Stotch']1997['Lucy Ricardo', 'Ricky Ricardo', 'Fred Mertz', 'Ethel Mertz', 'Little Ricky Ricardo', 'Mr. Mooney']1951
3gpt-4oYou watch a lot of TV.['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abe Simpson', 'Ned Flanders', 'Mr. Burns', 'Waylon Smithers', 'Moe Szyslak', 'Barney Gumble', 'Seymour Skinner', 'Edna Krabappel', 'Nelson Muntz', 'Milhouse Van Houten', 'Ralph Wiggum', 'Chief Wiggum', 'Apu Nahasapeemapetilon', 'Krusty the Clown', 'Sideshow Bob']1989['Eric Cartman', 'Stan Marsh', 'Kyle Broflovski', 'Kenny McCormick', 'Butters Stotch', 'Randy Marsh', 'Mr. Garrison', 'Chef', 'Mr. Mackey', 'Wendy Testaburger', 'Jimmy Valmer', 'Timmy Burch', 'Terrance', 'Phillip', 'Token Black', 'Tweek Tweak', 'Craig Tucker', 'Clyde Donovan', 'Bebe Stevens', 'Ike Broflovski', 'Shelley Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Sharon Marsh', 'Liane Cartman', 'Principal Victoria', 'Officer Barbrady']1997['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz']1951
4gemini-1.5-flashYou watch a lot of TV.['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Grandpa Simpson', 'Apu Nahasapeemapetilon', 'Ned Flanders', 'Moe Szyslak', 'Barney Gumble', 'Chief Wiggum', 'Krusty the Clown', 'Milhouse Van Houten', 'Nelson Muntz', 'Principal Skinner', 'Superintendent Chalmers', 'Lenny Leonard', 'Carl Carlson', 'Smithers', 'Burns']1989['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Randy Marsh', 'Sharon Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Butters Stotch', 'Chef', 'Mr. Garrison', 'Mr. Mackey', 'Jimbo Kern', 'Ned Gerblansky', 'Officer Barbrady', 'Terrance', 'Phillip', 'Satan', 'Jesus', 'God']1997['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz', 'Little Ricky Ricardo', 'Mr. and Mrs. Howard']1951
5gpt-4oYou watch a lot of TV.['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Ned Flanders', 'Mr. Burns', 'Waylon Smithers', 'Apu Nahasapeemapetilon', 'Moe Szyslak', 'Krusty the Clown', 'Chief Wiggum', 'Milhouse Van Houten', 'Seymour Skinner', 'Edna Krabappel', 'Barney Gumble', 'Ralph Wiggum', 'Nelson Muntz', 'Comic Book Guy', 'Groundskeeper Willie', 'Patty Bouvier', 'Selma Bouvier', 'Sideshow Bob', 'Lenny Leonard', 'Carl Carlson', 'Dr. Hibbert', 'Reverend Lovejoy', 'Mayor Quimby']1989['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Butters Stotch', 'Randy Marsh', 'Mr. Garrison', 'Mr. Mackey', 'Chef', 'Wendy Testaburger', 'Token Black', 'Tweek Tweak', 'Jimmy Valmer', 'Timmy Burch', 'Bebe Stevens']1997['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz']1951
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "Dataset([{'model.model': ['gemini-1.5-flash', 'gpt-4o', 'gemini-1.5-flash', 'gpt-4o', 'gemini-1.5-flash', 'gpt-4o']}, {'agent.persona': ['You watch a lot of TV.', 'You watch a lot of TV.', 'You watch a lot of TV.', 'You watch a lot of TV.', 'You watch a lot of TV.', 'You watch a lot of TV.']}, {'answer.characters_0': [['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Grandpa Simpson', 'Apu Nahasapeemapetilon', 'Ned Flanders', 'Moe Szyslak', 'Barney Gumble', 'Chief Wiggum', 'Krusty the Clown', 'Milhouse Van Houten', 'Nelson Muntz', 'Lenny Leonard', 'Carl Carlson', 'Smithers', 'Burns', 'Sideshow Bob', 'Ralph Wiggum'], ['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abe Simpson', 'Ned Flanders', 'Milhouse Van Houten', 'Mr. Burns', 'Waylon Smithers', 'Krusty the Clown', 'Principal Skinner', 'Moe Szyslak', 'Barney Gumble', 'Chief Wiggum', 'Ralph Wiggum', 'Apu Nahasapeemapetilon', 'Sideshow Bob', 'Edna Krabappel', 'Patty Bouvier', 'Selma Bouvier', 'Comic Book Guy', 'Nelson Muntz', 'Groundskeeper Willie', 'Lenny Leonard', 'Carl Carlson', 'Dr. Hibbert', 'Reverend Lovejoy', 'Mayor Quimby', 'Martin Prince'], ['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abe Simpson', 'Clancy Wiggum', 'Ned Flanders', 'Moe Szyslak', 'Apu Nahasapeemapetilon', 'Krusty the Clown', 'Milhouse Van Houten', 'Barney Gumble', 'Lenny Leonard', 'Carl Carlson'], ['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Abe Simpson', 'Ned Flanders', 'Mr. Burns', 'Waylon Smithers', 'Moe Szyslak', 'Barney Gumble', 'Seymour Skinner', 'Edna Krabappel', 'Nelson Muntz', 'Milhouse Van Houten', 'Ralph Wiggum', 'Chief Wiggum', 'Apu Nahasapeemapetilon', 'Krusty the Clown', 'Sideshow Bob'], ['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Grandpa Simpson', 'Apu Nahasapeemapetilon', 'Ned Flanders', 'Moe Szyslak', 'Barney Gumble', 'Chief Wiggum', 'Krusty the Clown', 'Milhouse Van Houten', 'Nelson Muntz', 'Principal Skinner', 'Superintendent Chalmers', 'Lenny Leonard', 'Carl Carlson', 'Smithers', 'Burns'], ['Homer Simpson', 'Marge Simpson', 'Bart Simpson', 'Lisa Simpson', 'Maggie Simpson', 'Ned Flanders', 'Mr. Burns', 'Waylon Smithers', 'Apu Nahasapeemapetilon', 'Moe Szyslak', 'Krusty the Clown', 'Chief Wiggum', 'Milhouse Van Houten', 'Seymour Skinner', 'Edna Krabappel', 'Barney Gumble', 'Ralph Wiggum', 'Nelson Muntz', 'Comic Book Guy', 'Groundskeeper Willie', 'Patty Bouvier', 'Selma Bouvier', 'Sideshow Bob', 'Lenny Leonard', 'Carl Carlson', 'Dr. Hibbert', 'Reverend Lovejoy', 'Mayor Quimby']]}, {'answer.years_0': [1989, 1989, 1989, 1989, 1989, 1989]}, {'answer.characters_1': [['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Randy Marsh', 'Sharon Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Butters Stotch', 'Chef', 'Mr. Garrison', 'Mr. Mackey', 'Jimbo Kern', 'Ned Gerblansky', 'Officer Barbrady', 'Liane Cartman', 'Scott Tenorman', 'Wendy Testaburger', 'Heidi Turner', 'Token Black', 'Clyde Donovan', 'Craig Tucker', 'Tweek Tweak', 'Timmy Burch', 'Kevin Stoley', 'Ike Broflovski', 'Leopold Stotch'], ['Eric Cartman', 'Stan Marsh', 'Kyle Broflovski', 'Kenny McCormick', 'Randy Marsh', 'Butters Stotch', 'Mr. Garrison', 'Chef', 'Wendy Testaburger', 'Mr. Mackey'], ['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Randy Marsh', 'Sharon Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Butters Stotch', 'Chef', 'Mr. Garrison', 'Mr. Mackey', 'Jimbo Kern', 'Ned Gerblansky', 'Officer Barbrady', 'Liane Cartman', 'Scott Tenorman', 'Wendy Testaburger', 'Heidi Turner', 'Token Black', 'Clyde Donovan', 'Craig Tucker', 'Tweek Tweak', 'Timmy Burch', 'Kevin Stoley', 'Ike Broflovski', 'Leopold Stotch'], ['Eric Cartman', 'Stan Marsh', 'Kyle Broflovski', 'Kenny McCormick', 'Butters Stotch', 'Randy Marsh', 'Mr. Garrison', 'Chef', 'Mr. Mackey', 'Wendy Testaburger', 'Jimmy Valmer', 'Timmy Burch', 'Terrance', 'Phillip', 'Token Black', 'Tweek Tweak', 'Craig Tucker', 'Clyde Donovan', 'Bebe Stevens', 'Ike Broflovski', 'Shelley Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Sharon Marsh', 'Liane Cartman', 'Principal Victoria', 'Officer Barbrady'], ['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Randy Marsh', 'Sharon Marsh', 'Gerald Broflovski', 'Sheila Broflovski', 'Butters Stotch', 'Chef', 'Mr. Garrison', 'Mr. Mackey', 'Jimbo Kern', 'Ned Gerblansky', 'Officer Barbrady', 'Terrance', 'Phillip', 'Satan', 'Jesus', 'God'], ['Stan Marsh', 'Kyle Broflovski', 'Eric Cartman', 'Kenny McCormick', 'Butters Stotch', 'Randy Marsh', 'Mr. Garrison', 'Mr. Mackey', 'Chef', 'Wendy Testaburger', 'Token Black', 'Tweek Tweak', 'Jimmy Valmer', 'Timmy Burch', 'Bebe Stevens']]}, {'answer.years_1': [1997, 1997, 1997, 1997, 1997, 1997]}, {'answer.characters_2': [['Lucy Ricardo', 'Ricky Ricardo', 'Fred Mertz', 'Ethel Mertz', 'Little Ricky Ricardo', 'Mr. and Mrs. Howard'], ['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz'], ['Lucy Ricardo', 'Ricky Ricardo', 'Fred Mertz', 'Ethel Mertz', 'Little Ricky Ricardo', 'Mr. Mooney'], ['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz'], ['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz', 'Little Ricky Ricardo', 'Mr. and Mrs. Howard'], ['Lucy Ricardo', 'Ricky Ricardo', 'Ethel Mertz', 'Fred Mertz']]}, {'answer.years_2': [1951, 1951, 1951, 1951, 1951, 1951]}])" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.select(\"model\", \"persona\", \"characters_0\", \"years_0\", \"characters_1\", \"years_1\", \"characters_2\", \"years_2\")" + ] + }, + { + "cell_type": "markdown", + "id": "00d4073c-6126-4d42-949c-f1ecd48dd33e", + "metadata": {}, + "source": [ + "## Run the survey with human respondents\n", + "We can validate some of all of the responses with human respondents by calling the `humanize()` method on the version of the survey that we want to validate with humans.\n", + "This method generates a shareable URL for a web-based version of the survey that you can distribute, together with a URL for tracking the responses at your Coop account.\n", + "\n", + "Here we create a new version of the survey to add some screening/information questions of the humans that answer it:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4f93ddd4-0cdb-4306-abaf-815c23169879", + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import QuestionLinearScale\n", + "\n", + "q3 = QuestionLinearScale(\n", + " question_name = \"tv_viewing\",\n", + " question_text = \"On a scale from 1 to 5, how much tv would you say that you've watched in your life?\",\n", + " question_options = [1,2,3,4,5],\n", + " option_labels = {\n", + " 1:\"None at all\",\n", + " 5:\"A ton\"\n", + " }\n", + ")\n", + "\n", + "q4 = QuestionNumerical(\n", + " question_name = \"age\",\n", + " question_text = \"How old are you (in years)?\"\n", + ")\n", + "\n", + "new_questions = [q3, q4]\n", + "\n", + "human_survey = Survey(questions + new_questions)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "db2cd0f4-4b72-43a1-a463-fa9568b86911", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'project_name': 'Project',\n", + " 'uuid': 'bbb84776-3364-4bc9-b028-0119cd84d480',\n", + " 'admin_url': 'https://www.expectedparrot.com/home/projects/bbb84776-3364-4bc9-b028-0119cd84d480',\n", + " 'respondent_url': 'https://www.expectedparrot.com/respond/bbb84776-3364-4bc9-b028-0119cd84d480'}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "human_survey.humanize()" + ] + }, + { + "cell_type": "markdown", + "id": "f7c1c5da-0185-4cea-8c7c-e53a9920270d", + "metadata": {}, + "source": [ + "Responses automatically appear at your Coop account, and you can import them into your workspace using `Coop` methods:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "e8f0681c-5af6-401b-8630-147243a593b0", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/html": [ + "

Results observations: 2; agents: 2; models: 1; scenarios: 1; questions: 8; Survey question names: ['characters_0', 'characters_1', 'characters_2', 'years_0', 'years_1', 'years_2', ...];

\n", + "
\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 characters_1ageyears_1tv_viewingyears_0characters_0years_2characters_2scenario_indexagent_indexagent_instructionagent_namemodeltemperatureinference_servicemodel_indexcharacters_2_user_promptcharacters_2_system_promptyears_1_system_promptyears_2_user_promptyears_2_system_promptyears_1_user_promptcharacters_0_user_promptyears_0_user_promptcharacters_1_system_promptcharacters_0_system_promptyears_0_system_promptcharacters_1_user_prompttv_viewing_user_prompttv_viewing_system_promptage_user_promptage_system_promptcharacters_2_input_tokensyears_1_one_usd_buysage_output_price_per_million_tokenstv_viewing_costyears_1_input_tokensyears_1_output_price_per_million_tokenscharacters_1_output_tokensyears_2_one_usd_buysyears_1_costcharacters_2_output_price_per_million_tokenscharacters_0_one_usd_buysyears_2_input_price_per_million_tokensyears_1_raw_model_responseyears_0_input_tokenscharacters_1_output_price_per_million_tokenscharacters_0_input_tokensyears_2_raw_model_responsetv_viewing_one_usd_buyscharacters_1_input_price_per_million_tokenscharacters_2_input_price_per_million_tokensage_input_tokensyears_2_input_tokenstv_viewing_raw_model_responseyears_2_output_tokenscharacters_2_raw_model_responseage_input_price_per_million_tokensyears_0_one_usd_buyscharacters_1_raw_model_responseyears_2_output_price_per_million_tokensyears_0_output_tokenstv_viewing_input_tokensyears_0_costage_costcharacters_2_output_tokenscharacters_1_input_tokensage_one_usd_buystv_viewing_output_tokenscharacters_2_costcharacters_0_output_price_per_million_tokenscharacters_0_output_tokensyears_1_input_price_per_million_tokensyears_1_output_tokenscharacters_0_input_price_per_million_tokensage_output_tokenscharacters_1_costyears_0_input_price_per_million_tokenscharacters_1_one_usd_buyscharacters_0_costcharacters_0_raw_model_responseyears_0_raw_model_responsetv_viewing_output_price_per_million_tokensage_raw_model_responseyears_2_costyears_0_output_price_per_million_tokenscharacters_2_one_usd_buystv_viewing_input_price_per_million_tokensiterationcharacters_1_question_textcharacters_2_question_texttv_viewing_question_textyears_1_question_textyears_2_question_textcharacters_0_question_textyears_0_question_textage_question_textcharacters_0_question_optionsyears_1_question_optionscharacters_2_question_optionstv_viewing_question_optionscharacters_1_question_optionsyears_0_question_optionsyears_2_question_optionsage_question_optionscharacters_1_question_typecharacters_2_question_typeage_question_typetv_viewing_question_typecharacters_0_question_typeyears_1_question_typeyears_2_question_typeyears_0_question_typeage_commenttv_viewing_commentcharacters_2_commentcharacters_0_commentyears_0_commentcharacters_1_commentyears_1_commentyears_2_commentcharacters_1_generated_tokensyears_0_generated_tokenstv_viewing_generated_tokenscharacters_2_generated_tokenscharacters_0_generated_tokensyears_1_generated_tokensage_generated_tokensyears_2_generated_tokenscharacters_2_cache_usedtv_viewing_cache_usedcharacters_1_cache_usedyears_0_cache_usedyears_1_cache_usedcharacters_0_cache_usedage_cache_usedyears_2_cache_usedcharacters_2_cache_keyyears_0_cache_keyage_cache_keyyears_2_cache_keycharacters_0_cache_keycharacters_1_cache_keyyears_1_cache_keytv_viewing_cache_keycharacters_2_reasoning_summarytv_viewing_reasoning_summaryyears_1_reasoning_summaryyears_2_reasoning_summaryyears_0_reasoning_summarycharacters_1_reasoning_summarycharacters_0_reasoning_summaryage_reasoning_summary
0['Cartman', 'Stewie']46199851989['Homer', 'Marge', 'Randall', 'Bort']1953['Lucy ', 'Dezi']00nana782895b-e5dc-41cb-80e8-8a956db1ee18test0.500000test0nannannannannannannannannannannannannannannannannannannannannannannannannannannannanNot ApplicablenannannanNot ApplicablenannannannannanNot ApplicablenanNot ApplicablenannanNot ApplicablenannannannannannannannannannannannannannannannannannannannanNot ApplicableNot ApplicablenanNot Applicablenannannannan0Name all of the characters in this show: South ParkName all of the characters in this show: I Love LucyOn a scale from 1 to 5, how much tv would you say that you've watched in your life?Identify the year this show first aired: South ParkIdentify the year this show first aired: I Love LucyName all of the characters in this show: The SimpsonsIdentify the year this show first aired: The SimpsonsHow old are you (in years)?nannannan[1, 2, 3, 4, 5]nannannannanlistlistnumericallinear_scalelistnumericalnumericalnumericalThis is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.Not ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot Applicablenannannannannannannannan
1[\"I don't know\"]11200041989['Homer', 'Marge', 'Bart', 'Lisa', 'Maggie', 'Grandpa', 'Mr. Burns', 'Apu', 'Snake', 'Moe', 'Krusty', 'Lenny', 'Carl', 'Barney', 'Smithers', 'Sideshow Mel', 'Patty', 'Selma', 'Martin', 'Nelson', 'Ralph', 'Chief Wiggum', 'Reverend Lovejoy', \"Santa's Little Helper\", 'Snowball II', 'Miss Crabapple', 'Miss Hoover', 'Principal Skinner', 'Willie', 'Superintendent Chalmers', 'Lou', 'Comic Book Guy', 'Sherry', 'Terry', 'Fat Tony', 'Johnny Tightlips', 'Jimmy the Squealer', 'Mayor Quimby', 'Sideshow Bob', 'Luigi', 'Spiderpig', 'Duffman', 'Larry', 'Grandma', 'Mr. Teasy']2010['Uhhhhh no']01nan083d74cb-64c6-4560-809c-1a09cc9d8955test0.500000test0nannannannannannannannannannannannannannannannannannannannannannannannannannannannanNot ApplicablenannannanNot ApplicablenannannannannanNot ApplicablenanNot ApplicablenannanNot ApplicablenannannannannannannannannannannannannannannannannannannannanNot ApplicableNot ApplicablenanNot Applicablenannannannan0Name all of the characters in this show: South ParkName all of the characters in this show: I Love LucyOn a scale from 1 to 5, how much tv would you say that you've watched in your life?Identify the year this show first aired: South ParkIdentify the year this show first aired: I Love LucyName all of the characters in this show: The SimpsonsIdentify the year this show first aired: The SimpsonsHow old are you (in years)?nannannan[1, 2, 3, 4, 5]nannannannanlistlistnumericallinear_scalelistnumericalnumericalnumericalThis is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.This is a real survey response from a human.Not ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot ApplicableNot Applicablenannannannannannannannan
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "Results(data = [Result(agent=Agent(name = \"\"\"a782895b-e5dc-41cb-80e8-8a956db1ee18\"\"\", traits = {}, instruction = \"\"\"\"\"\"), scenario=Scenario({'scenario_index': 0}), model=Model(model_name = 'test', service_name = 'test', temperature = 0.5), iteration=0, answer={'characters_0': ['Homer', 'Marge', 'Randall', 'Bort'], 'characters_1': ['Cartman', 'Stewie'], 'characters_2': ['Lucy ', 'Dezi'], 'years_0': 1989, 'years_1': 1998, 'years_2': 1953, 'tv_viewing': 5, 'age': 46}, prompt={'characters_0_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_0_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_1_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_1_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_2_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_2_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_0_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_0_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_1_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_1_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_2_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_2_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'tv_viewing_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'tv_viewing_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'age_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'age_system_prompt': Prompt(text=\"\"\"NA\"\"\")}, raw_model_response={'characters_0_raw_model_response': 'Not Applicable', 'characters_0_input_tokens': None, 'characters_0_output_tokens': None, 'characters_0_input_price_per_million_tokens': None, 'characters_0_output_price_per_million_tokens': None, 'characters_0_cost': None, 'characters_0_one_usd_buys': 'NA', 'characters_1_raw_model_response': 'Not Applicable', 'characters_1_input_tokens': None, 'characters_1_output_tokens': None, 'characters_1_input_price_per_million_tokens': None, 'characters_1_output_price_per_million_tokens': None, 'characters_1_cost': None, 'characters_1_one_usd_buys': 'NA', 'characters_2_raw_model_response': 'Not Applicable', 'characters_2_input_tokens': None, 'characters_2_output_tokens': None, 'characters_2_input_price_per_million_tokens': None, 'characters_2_output_price_per_million_tokens': None, 'characters_2_cost': None, 'characters_2_one_usd_buys': 'NA', 'years_0_raw_model_response': 'Not Applicable', 'years_0_input_tokens': None, 'years_0_output_tokens': None, 'years_0_input_price_per_million_tokens': None, 'years_0_output_price_per_million_tokens': None, 'years_0_cost': None, 'years_0_one_usd_buys': 'NA', 'years_1_raw_model_response': 'Not Applicable', 'years_1_input_tokens': None, 'years_1_output_tokens': None, 'years_1_input_price_per_million_tokens': None, 'years_1_output_price_per_million_tokens': None, 'years_1_cost': None, 'years_1_one_usd_buys': 'NA', 'years_2_raw_model_response': 'Not Applicable', 'years_2_input_tokens': None, 'years_2_output_tokens': None, 'years_2_input_price_per_million_tokens': None, 'years_2_output_price_per_million_tokens': None, 'years_2_cost': None, 'years_2_one_usd_buys': 'NA', 'tv_viewing_raw_model_response': 'Not Applicable', 'tv_viewing_input_tokens': None, 'tv_viewing_output_tokens': None, 'tv_viewing_input_price_per_million_tokens': None, 'tv_viewing_output_price_per_million_tokens': None, 'tv_viewing_cost': None, 'tv_viewing_one_usd_buys': 'NA', 'age_raw_model_response': 'Not Applicable', 'age_input_tokens': None, 'age_output_tokens': None, 'age_input_price_per_million_tokens': None, 'age_output_price_per_million_tokens': None, 'age_cost': None, 'age_one_usd_buys': 'NA'}, question_to_attributes={'characters_0': {'question_text': 'Name all of the characters in this show: The Simpsons', 'question_type': 'list', 'question_options': None}, 'characters_1': {'question_text': 'Name all of the characters in this show: South Park', 'question_type': 'list', 'question_options': None}, 'characters_2': {'question_text': 'Name all of the characters in this show: I Love Lucy', 'question_type': 'list', 'question_options': None}, 'years_0': {'question_text': 'Identify the year this show first aired: The Simpsons', 'question_type': 'numerical', 'question_options': None}, 'years_1': {'question_text': 'Identify the year this show first aired: South Park', 'question_type': 'numerical', 'question_options': None}, 'years_2': {'question_text': 'Identify the year this show first aired: I Love Lucy', 'question_type': 'numerical', 'question_options': None}, 'tv_viewing': {'question_text': \"On a scale from 1 to 5, how much tv would you say that you've watched in your life?\", 'question_type': 'linear_scale', 'question_options': [1, 2, 3, 4, 5]}, 'age': {'question_text': 'How old are you (in years)?', 'question_type': 'numerical', 'question_options': None}}, generated_tokens={'characters_0_generated_tokens': 'Not Applicable', 'characters_1_generated_tokens': 'Not Applicable', 'characters_2_generated_tokens': 'Not Applicable', 'years_0_generated_tokens': 'Not Applicable', 'years_1_generated_tokens': 'Not Applicable', 'years_2_generated_tokens': 'Not Applicable', 'tv_viewing_generated_tokens': 'Not Applicable', 'age_generated_tokens': 'Not Applicable'}, comments_dict={'characters_0_comment': 'This is a real survey response from a human.', 'characters_1_comment': 'This is a real survey response from a human.', 'characters_2_comment': 'This is a real survey response from a human.', 'years_0_comment': 'This is a real survey response from a human.', 'years_1_comment': 'This is a real survey response from a human.', 'years_2_comment': 'This is a real survey response from a human.', 'tv_viewing_comment': 'This is a real survey response from a human.', 'age_comment': 'This is a real survey response from a human.'}, reasoning_summaries_dict={'characters_0_reasoning_summary': None, 'characters_1_reasoning_summary': None, 'characters_2_reasoning_summary': None, 'years_0_reasoning_summary': None, 'years_1_reasoning_summary': None, 'years_2_reasoning_summary': None, 'tv_viewing_reasoning_summary': None, 'age_reasoning_summary': None}, cache_used_dict={'characters_0': 'Not Applicable', 'characters_1': 'Not Applicable', 'characters_2': 'Not Applicable', 'years_0': 'Not Applicable', 'years_1': 'Not Applicable', 'years_2': 'Not Applicable', 'tv_viewing': 'Not Applicable', 'age': 'Not Applicable'}, cache_keys={'characters_0': 'Not Applicable', 'characters_1': 'Not Applicable', 'characters_2': 'Not Applicable', 'years_0': 'Not Applicable', 'years_1': 'Not Applicable', 'years_2': 'Not Applicable', 'tv_viewing': 'Not Applicable', 'age': 'Not Applicable'}), Result(agent=Agent(name = \"\"\"083d74cb-64c6-4560-809c-1a09cc9d8955\"\"\", traits = {}, instruction = \"\"\"\"\"\"), scenario=Scenario({'scenario_index': 0}), model=Model(model_name = 'test', service_name = 'test', temperature = 0.5), iteration=0, answer={'characters_0': ['Homer', 'Marge', 'Bart', 'Lisa', 'Maggie', 'Grandpa', 'Mr. Burns', 'Apu', 'Snake', 'Moe', 'Krusty', 'Lenny', 'Carl', 'Barney', 'Smithers', 'Sideshow Mel', 'Patty', 'Selma', 'Martin', 'Nelson', 'Ralph', 'Chief Wiggum', 'Reverend Lovejoy', \"Santa's Little Helper\", 'Snowball II', 'Miss Crabapple', 'Miss Hoover', 'Principal Skinner', 'Willie', 'Superintendent Chalmers', 'Lou', 'Comic Book Guy', 'Sherry', 'Terry', 'Fat Tony', 'Johnny Tightlips', 'Jimmy the Squealer', 'Mayor Quimby', 'Sideshow Bob', 'Luigi', 'Spiderpig', 'Duffman', 'Larry', 'Grandma', 'Mr. Teasy'], 'characters_1': [\"I don't know\"], 'characters_2': ['Uhhhhh no'], 'years_0': 1989, 'years_1': 2000, 'years_2': 2010, 'tv_viewing': 4, 'age': 11}, prompt={'characters_0_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_0_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_1_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_1_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_2_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'characters_2_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_0_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_0_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_1_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_1_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_2_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'years_2_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'tv_viewing_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'tv_viewing_system_prompt': Prompt(text=\"\"\"NA\"\"\"), 'age_user_prompt': Prompt(text=\"\"\"NA\"\"\"), 'age_system_prompt': Prompt(text=\"\"\"NA\"\"\")}, raw_model_response={'characters_0_raw_model_response': 'Not Applicable', 'characters_0_input_tokens': None, 'characters_0_output_tokens': None, 'characters_0_input_price_per_million_tokens': None, 'characters_0_output_price_per_million_tokens': None, 'characters_0_cost': None, 'characters_0_one_usd_buys': 'NA', 'characters_1_raw_model_response': 'Not Applicable', 'characters_1_input_tokens': None, 'characters_1_output_tokens': None, 'characters_1_input_price_per_million_tokens': None, 'characters_1_output_price_per_million_tokens': None, 'characters_1_cost': None, 'characters_1_one_usd_buys': 'NA', 'characters_2_raw_model_response': 'Not Applicable', 'characters_2_input_tokens': None, 'characters_2_output_tokens': None, 'characters_2_input_price_per_million_tokens': None, 'characters_2_output_price_per_million_tokens': None, 'characters_2_cost': None, 'characters_2_one_usd_buys': 'NA', 'years_0_raw_model_response': 'Not Applicable', 'years_0_input_tokens': None, 'years_0_output_tokens': None, 'years_0_input_price_per_million_tokens': None, 'years_0_output_price_per_million_tokens': None, 'years_0_cost': None, 'years_0_one_usd_buys': 'NA', 'years_1_raw_model_response': 'Not Applicable', 'years_1_input_tokens': None, 'years_1_output_tokens': None, 'years_1_input_price_per_million_tokens': None, 'years_1_output_price_per_million_tokens': None, 'years_1_cost': None, 'years_1_one_usd_buys': 'NA', 'years_2_raw_model_response': 'Not Applicable', 'years_2_input_tokens': None, 'years_2_output_tokens': None, 'years_2_input_price_per_million_tokens': None, 'years_2_output_price_per_million_tokens': None, 'years_2_cost': None, 'years_2_one_usd_buys': 'NA', 'tv_viewing_raw_model_response': 'Not Applicable', 'tv_viewing_input_tokens': None, 'tv_viewing_output_tokens': None, 'tv_viewing_input_price_per_million_tokens': None, 'tv_viewing_output_price_per_million_tokens': None, 'tv_viewing_cost': None, 'tv_viewing_one_usd_buys': 'NA', 'age_raw_model_response': 'Not Applicable', 'age_input_tokens': None, 'age_output_tokens': None, 'age_input_price_per_million_tokens': None, 'age_output_price_per_million_tokens': None, 'age_cost': None, 'age_one_usd_buys': 'NA'}, question_to_attributes={'characters_0': {'question_text': 'Name all of the characters in this show: The Simpsons', 'question_type': 'list', 'question_options': None}, 'characters_1': {'question_text': 'Name all of the characters in this show: South Park', 'question_type': 'list', 'question_options': None}, 'characters_2': {'question_text': 'Name all of the characters in this show: I Love Lucy', 'question_type': 'list', 'question_options': None}, 'years_0': {'question_text': 'Identify the year this show first aired: The Simpsons', 'question_type': 'numerical', 'question_options': None}, 'years_1': {'question_text': 'Identify the year this show first aired: South Park', 'question_type': 'numerical', 'question_options': None}, 'years_2': {'question_text': 'Identify the year this show first aired: I Love Lucy', 'question_type': 'numerical', 'question_options': None}, 'tv_viewing': {'question_text': \"On a scale from 1 to 5, how much tv would you say that you've watched in your life?\", 'question_type': 'linear_scale', 'question_options': [1, 2, 3, 4, 5]}, 'age': {'question_text': 'How old are you (in years)?', 'question_type': 'numerical', 'question_options': None}}, generated_tokens={'characters_0_generated_tokens': 'Not Applicable', 'characters_1_generated_tokens': 'Not Applicable', 'characters_2_generated_tokens': 'Not Applicable', 'years_0_generated_tokens': 'Not Applicable', 'years_1_generated_tokens': 'Not Applicable', 'years_2_generated_tokens': 'Not Applicable', 'tv_viewing_generated_tokens': 'Not Applicable', 'age_generated_tokens': 'Not Applicable'}, comments_dict={'characters_0_comment': 'This is a real survey response from a human.', 'characters_1_comment': 'This is a real survey response from a human.', 'characters_2_comment': 'This is a real survey response from a human.', 'years_0_comment': 'This is a real survey response from a human.', 'years_1_comment': 'This is a real survey response from a human.', 'years_2_comment': 'This is a real survey response from a human.', 'tv_viewing_comment': 'This is a real survey response from a human.', 'age_comment': 'This is a real survey response from a human.'}, reasoning_summaries_dict={'characters_0_reasoning_summary': None, 'characters_1_reasoning_summary': None, 'characters_2_reasoning_summary': None, 'years_0_reasoning_summary': None, 'years_1_reasoning_summary': None, 'years_2_reasoning_summary': None, 'tv_viewing_reasoning_summary': None, 'age_reasoning_summary': None}, cache_used_dict={'characters_0': 'Not Applicable', 'characters_1': 'Not Applicable', 'characters_2': 'Not Applicable', 'years_0': 'Not Applicable', 'years_1': 'Not Applicable', 'years_2': 'Not Applicable', 'tv_viewing': 'Not Applicable', 'age': 'Not Applicable'}, cache_keys={'characters_0': 'Not Applicable', 'characters_1': 'Not Applicable', 'characters_2': 'Not Applicable', 'years_0': 'Not Applicable', 'years_1': 'Not Applicable', 'years_2': 'Not Applicable', 'tv_viewing': 'Not Applicable', 'age': 'Not Applicable'})], survey = Survey(questions=[Question('list', question_name = \"\"\"characters_0\"\"\", question_text = \"\"\"Name all of the characters in this show: The Simpsons\"\"\", max_list_items = None, min_list_items = None), Question('list', question_name = \"\"\"characters_1\"\"\", question_text = \"\"\"Name all of the characters in this show: South Park\"\"\", max_list_items = None, min_list_items = None), Question('list', question_name = \"\"\"characters_2\"\"\", question_text = \"\"\"Name all of the characters in this show: I Love Lucy\"\"\", max_list_items = None, min_list_items = None), Question('numerical', question_name = \"\"\"years_0\"\"\", question_text = \"\"\"Identify the year this show first aired: The Simpsons\"\"\", min_value = None, max_value = None), Question('numerical', question_name = \"\"\"years_1\"\"\", question_text = \"\"\"Identify the year this show first aired: South Park\"\"\", min_value = None, max_value = None), Question('numerical', question_name = \"\"\"years_2\"\"\", question_text = \"\"\"Identify the year this show first aired: I Love Lucy\"\"\", min_value = None, max_value = None), Question('linear_scale', question_name = \"\"\"tv_viewing\"\"\", question_text = \"\"\"On a scale from 1 to 5, how much tv would you say that you've watched in your life?\"\"\", question_options = [1, 2, 3, 4, 5], option_labels = {1: 'None at all', 5: 'A ton'}), Question('numerical', question_name = \"\"\"age\"\"\", question_text = \"\"\"How old are you (in years)?\"\"\", min_value = None, max_value = None)], memory_plan={}, rule_collection=RuleCollection(rules=[Rule(current_q=0, expression=\"True\", next_q=1, priority=-1, question_name_to_index={'characters_0': 0}, before_rule=False), Rule(current_q=1, expression=\"True\", next_q=2, priority=-1, question_name_to_index={'characters_0': 0, 'characters_1': 1}, before_rule=False), Rule(current_q=2, expression=\"True\", next_q=3, priority=-1, question_name_to_index={'characters_0': 0, 'characters_1': 1, 'characters_2': 2}, before_rule=False), Rule(current_q=3, expression=\"True\", next_q=4, priority=-1, question_name_to_index={'characters_0': 0, 'characters_1': 1, 'characters_2': 2, 'years_0': 3}, before_rule=False), Rule(current_q=4, expression=\"True\", next_q=5, priority=-1, question_name_to_index={'characters_0': 0, 'characters_1': 1, 'characters_2': 2, 'years_0': 3, 'years_1': 4}, before_rule=False), Rule(current_q=5, expression=\"True\", next_q=6, priority=-1, question_name_to_index={'characters_0': 0, 'characters_1': 1, 'characters_2': 2, 'years_0': 3, 'years_1': 4, 'years_2': 5}, before_rule=False), Rule(current_q=6, expression=\"True\", next_q=7, priority=-1, question_name_to_index={'characters_0': 0, 'characters_1': 1, 'characters_2': 2, 'years_0': 3, 'years_1': 4, 'years_2': 5, 'tv_viewing': 6}, before_rule=False), Rule(current_q=7, expression=\"True\", next_q=8, priority=-1, question_name_to_index={'characters_0': 0, 'characters_1': 1, 'characters_2': 2, 'years_0': 3, 'years_1': 4, 'years_2': 5, 'tv_viewing': 6, 'age': 7}, before_rule=False)], num_questions=8), question_groups={}, questions_to_randomize=[]), created_columns = [])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from edsl import Coop\n", + "\n", + "human_results = Coop().get_project_human_responses(\"bbb84776-3364-4bc9-b028-0119cd84d480\")\n", + "human_results" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "bc271f97-3581-4c33-a5b1-fada63d6d0a2", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 answer.ageanswer.tv_viewinganswer.characters_0answer.years_0answer.characters_1answer.years_1answer.characters_2answer.years_2
0465['Homer', 'Marge', 'Randall', 'Bort']1989['Cartman', 'Stewie']1998['Lucy ', 'Dezi']1953
1114['Homer', 'Marge', 'Bart', 'Lisa', 'Maggie', 'Grandpa', 'Mr. Burns', 'Apu', 'Snake', 'Moe', 'Krusty', 'Lenny', 'Carl', 'Barney', 'Smithers', 'Sideshow Mel', 'Patty', 'Selma', 'Martin', 'Nelson', 'Ralph', 'Chief Wiggum', 'Reverend Lovejoy', \"Santa's Little Helper\", 'Snowball II', 'Miss Crabapple', 'Miss Hoover', 'Principal Skinner', 'Willie', 'Superintendent Chalmers', 'Lou', 'Comic Book Guy', 'Sherry', 'Terry', 'Fat Tony', 'Johnny Tightlips', 'Jimmy the Squealer', 'Mayor Quimby', 'Sideshow Bob', 'Luigi', 'Spiderpig', 'Duffman', 'Larry', 'Grandma', 'Mr. Teasy']1989[\"I don't know\"]2000['Uhhhhh no']2010
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "Dataset([{'answer.age': [46, 11]}, {'answer.tv_viewing': [5, 4]}, {'answer.characters_0': [['Homer', 'Marge', 'Randall', 'Bort'], ['Homer', 'Marge', 'Bart', 'Lisa', 'Maggie', 'Grandpa', 'Mr. Burns', 'Apu', 'Snake', 'Moe', 'Krusty', 'Lenny', 'Carl', 'Barney', 'Smithers', 'Sideshow Mel', 'Patty', 'Selma', 'Martin', 'Nelson', 'Ralph', 'Chief Wiggum', 'Reverend Lovejoy', \"Santa's Little Helper\", 'Snowball II', 'Miss Crabapple', 'Miss Hoover', 'Principal Skinner', 'Willie', 'Superintendent Chalmers', 'Lou', 'Comic Book Guy', 'Sherry', 'Terry', 'Fat Tony', 'Johnny Tightlips', 'Jimmy the Squealer', 'Mayor Quimby', 'Sideshow Bob', 'Luigi', 'Spiderpig', 'Duffman', 'Larry', 'Grandma', 'Mr. Teasy']]}, {'answer.years_0': [1989, 1989]}, {'answer.characters_1': [['Cartman', 'Stewie'], [\"I don't know\"]]}, {'answer.years_1': [1998, 2000]}, {'answer.characters_2': [['Lucy ', 'Dezi'], ['Uhhhhh no']]}, {'answer.years_2': [1953, 2010]}])" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "human_results.select(\"age\", \"tv_viewing\", \"characters_0\", \"years_0\", \"characters_1\", \"years_1\", \"characters_2\", \"years_2\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/notebooks/reasoning_model_example.ipynb b/docs/notebooks/reasoning_model_example.ipynb new file mode 100644 index 000000000..a9b29f2fa --- /dev/null +++ b/docs/notebooks/reasoning_model_example.ipynb @@ -0,0 +1,1076 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using EDSL with a reasoning model" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from edsl import Model, QuestionMultipleChoice, Survey" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note the different `service_name` parameter:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "model = Model(\"o3-mini\", service_name = \"openai_v2\") # other models use \"openai\"" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "q = QuestionMultipleChoice(\n", + " question_name = \"movie\",\n", + " question_text = \"\"\"My son is 11 years old. How many times do you think he has watched the movie\n", + " 'The Nightmare Before Christmas'?\"\"\",\n", + " question_options = [\"Never\", \"Once\", \"A couple times\", \"Many times\", \"Dozens of times\"]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "survey = Survey(questions = [q])" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + " Job Status 🦜\n", + "
\n", + "
Completed (1 completed, 0 failed)
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "
\n", + "
Job Links
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
Content
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
Identifiers
\n", + "
\n", + " \n", + "
\n", + " Results UUID:\n", + "
\n", + "
\n", + " 3d3ffaf1...6a24\n", + " \n", + " \n", + " \n", + "
\n", + "
Use Results.pull(uuid) to fetch results.
\n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + " Job UUID:\n", + "
\n", + "
\n", + " 70ff61ed...8b84\n", + " \n", + " \n", + " \n", + "
\n", + "
Use Jobs.pull(uuid) to fetch job.
\n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " ✓\n", + " Status: Completed\n", + "
\n", + "
\n", + " \n", + " Last updated: 2025-05-23 08:20:57\n", + "
\n", + " \n", + " \n", + "
\n", + " \n", + "
\n", + " 08:20:57\n", + " \n", + "
Job completed and Results stored on Coop. View Results
\n", + "
\n", + " \n", + "
\n", + " 08:20:52\n", + " \n", + "
Job status: running - last update: 2025-05-23 08:20:52 AM
\n", + "
\n", + " \n", + "
\n", + " 08:20:47\n", + " \n", + "
Job status: running - last update: 2025-05-23 08:20:47 AM
\n", + "
\n", + " \n", + "
\n", + " 08:20:43\n", + " \n", + "
Job status: running - last update: 2025-05-23 08:20:43 AM
\n", + "
\n", + " \n", + "
\n", + " 08:20:38\n", + " \n", + "
Job status: queued - last update: 2025-05-23 08:20:38 AM
\n", + "
\n", + " \n", + "
\n", + " 08:20:38\n", + " \n", + "
View job progress here
\n", + "
\n", + " \n", + "
\n", + " 08:20:38\n", + " \n", + "
Job details are available at your Coop account. Go to Remote Inference page
\n", + "
\n", + " \n", + "
\n", + " 08:20:38\n", + " \n", + "
Job sent to server. (Job uuid=70ff61ed-049b-4212-a4fa-77f16cc78b84).
\n", + "
\n", + " \n", + "
\n", + " 08:20:38\n", + " \n", + "
Your survey is running at the Expected Parrot server...
\n", + "
\n", + " \n", + "
\n", + " 08:20:38\n", + " \n", + "
Remote inference activated. Sending job to server...
\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "results = survey.by(model).run()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The `answer` and `comment` fields are available for all models" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 answer.movie
0Many times
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "Dataset([{'answer.movie': ['Many times']}])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.select(\"answer.movie\")" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 comment.movie_comment
0I chose this option assuming that an 11-year-old who enjoys the film would likely have watched it repeatedly, given its popularity and seasonal appeal.
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "Dataset([{'comment.movie_comment': ['I chose this option assuming that an 11-year-old who enjoys the film would likely have watched it repeatedly, given its popularity and seasonal appeal.']}])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.select(\"comment.movie_comment\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reasoning model results include summaries" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'**Choosing viewing frequency**\\n\\nI\\'m thinking about how to answer the question: \"How many times do you think he watched \\'The Nightmare Before Christmas\\'?\" Since he\\'s 11 and that movie has Halloween vibes, my first guess is \"Many times.\" \\n\\nMany kids love this movie, but maybe he hasn\\'t seen it dozens of times—unless he\\'s really into it! I could also go with \"A couple times,\" since he might not watch it repeatedly. Ultimately, I\\'m leaning towards \"Many times\" due to its popularity among kids.\\n\\n**Finalizing answer choice**\\n\\nI’m considering that since he’s 11, he’s likely watched the movie more than just a couple of times if he enjoys it. The answer options range from \"Never\" to \"Dozens of times.\" The idea of \"A couple times\" suggests maybe 2-3 viewings, while \"many times\" implies regular rewatching. Given its seasonal nature and popularity, I\\'ll choose \"Many times.\" I\\'ll also include a comment about how I based this on the movie\\'s appeal to kids and his love for it.'" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results[0][\"reasoning_summaries_dict\"][\"movie_reasoning_summary\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 reasoning_summary.movie_reasoning_summary
0**Choosing viewing frequency**\n", + "\n", + "I'm thinking about how to answer the question: \"How many times do you think he watched 'The Nightmare Before Christmas'?\" Since he's 11 and that movie has Halloween vibes, my first guess is \"Many times.\" \n", + "\n", + "Many kids love this movie, but maybe he hasn't seen it dozens of times—unless he's really into it! I could also go with \"A couple times,\" since he might not watch it repeatedly. Ultimately, I'm leaning towards \"Many times\" due to its popularity among kids.\n", + "\n", + "**Finalizing answer choice**\n", + "\n", + "I’m considering that since he’s 11, he’s likely watched the movie more than just a couple of times if he enjoys it. The answer options range from \"Never\" to \"Dozens of times.\" The idea of \"A couple times\" suggests maybe 2-3 viewings, while \"many times\" implies regular rewatching. Given its seasonal nature and popularity, I'll choose \"Many times.\" I'll also include a comment about how I based this on the movie's appeal to kids and his love for it.
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "Dataset([{'reasoning_summary.movie_reasoning_summary': ['**Choosing viewing frequency**\\n\\nI\\'m thinking about how to answer the question: \"How many times do you think he watched \\'The Nightmare Before Christmas\\'?\" Since he\\'s 11 and that movie has Halloween vibes, my first guess is \"Many times.\" \\n\\nMany kids love this movie, but maybe he hasn\\'t seen it dozens of times—unless he\\'s really into it! I could also go with \"A couple times,\" since he might not watch it repeatedly. Ultimately, I\\'m leaning towards \"Many times\" due to its popularity among kids.\\n\\n**Finalizing answer choice**\\n\\nI’m considering that since he’s 11, he’s likely watched the movie more than just a couple of times if he enjoys it. The answer options range from \"Never\" to \"Dozens of times.\" The idea of \"A couple times\" suggests maybe 2-3 viewings, while \"many times\" implies regular rewatching. Given its seasonal nature and popularity, I\\'ll choose \"Many times.\" I\\'ll also include a comment about how I based this on the movie\\'s appeal to kids and his love for it.']}])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results.select(\"reasoning_summary.movie_reasoning_summary\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Posting this notebook to Coop" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [ + "skip-execution" + ] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'description': 'Reasoning model example',\n", + " 'object_type': 'notebook',\n", + " 'url': 'https://www.expectedparrot.com/content/29e99f21-72fb-416f-b12b-994206a0f5ae',\n", + " 'alias_url': 'https://www.expectedparrot.com/content/RobinHorton/reasoning-model-example',\n", + " 'uuid': '29e99f21-72fb-416f-b12b-994206a0f5ae',\n", + " 'version': '0.1.61.dev1',\n", + " 'visibility': 'public'}" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from edsl import Notebook\n", + "\n", + "notebook = Notebook(\"reasoning_model_example.ipynb\")\n", + "\n", + "notebook.push(\n", + " description = \"Reasoning model example\",\n", + " alias = \"reasoning-model-example\",\n", + " visibility = \"public\"\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/questions.rst b/docs/questions.rst index 60c84c686..0a5439aeb 100644 --- a/docs/questions.rst +++ b/docs/questions.rst @@ -353,6 +353,9 @@ We can combine multiple questions into a survey by passing them as a list to a ` This allows us to administer multiple questions at once, either asynchronously (by default) or according to specified logic (e.g., skip or stop rules). To learn more about designing surveys with conditional logic, please see the :ref:`surveys` section. +*Note:* If you want multiple choice question options to be randomized, you can pass an optional parameter `questions_to_randomize` (a list of the relevant question names) to the `Survey` object when it is created. +See more details about `QuestionMultipleChoice` below and the :ref:`surveys` section on randomizing question options. + Simulating a response --------------------- @@ -923,7 +926,23 @@ An example can also created using the `example` method: QuestionMultipleChoice.example() -Note: Question options can be strings of any length, but if they are long or complex, it may be useful to add the `use_code` parameter to the question. +If you want the question options to be randomized, you can pass an optional parameter `questions_to_randomize` (a list of the relevant question names) to the `Survey` object when it is created. +For example: + +.. code-block:: python + + from edsl import QuestionMultipleChoice, Survey + + q = QuestionMultipleChoice( + question_name = "color", + question_text = "What is your favorite color?", + question_options = ["Red", "Blue", "Green", "Yellow"] + ) + + survey = Survey([q], questions_to_randomize=["color"]) + + +*Note:* Question options can be strings of any length, but if they are long or complex, it may be useful to add the `use_code` parameter to the question. This will add an instruction to the `user_prompt` for the model to provide the code number of the question option that it selects as its answer (i.e., 0, 1, 2, etc.) instead of the value of the option. This can be useful when the question options are long or complex, or include formatting that a model may make errors in reproducing to provide an answer, resulting in a validation error that may be avoidable by returning the code number of the option instead. The code is then translated back to the option value in the survey results. diff --git a/docs/results.rst b/docs/results.rst index ef6b95783..e63be75f6 100644 --- a/docs/results.rst +++ b/docs/results.rst @@ -406,6 +406,9 @@ Note that the cost of a result for a question is specific to the components (sce * **scenario.scenario_index**: The index of the scenario. * **scenario.topic**: The values provided for the "topic" scenario for the questions. +*Note*: We recently added support for OpenAI reasoning models. See an example notebook for usage `here `_. +The `Results` that are generated with reasoning models include additional fields for reasoning summaries. + Creating tables by selecting columns ------------------------------------