Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 5 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,16 @@
![structured-logprobs](images/logo.png)
![structured-logprobs](docs/images/logo.png)

# structured-logprobs

This Python library is designed to enhance OpenAI chat completion responses by adding detailed information about token log probabilities.
This library works with OpenAI [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), which is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don't need to worry about the model omitting a required key, or hallucinating an invalid enum value.
It provides utilities to analyze and incorporate token-level log probabilities into structured outputs, helping developers understand the reliability of structured data extracted from OpenAI models.

## Purpose
## Objective

![structured-logprobs](images/pitch.png)
![structured-logprobs](docs/images/pitch.png)

The primary goal of `structured-logprobs` is to provide insights into the **reliability** of extracted data. By analyzing token-level log probabilities, the library enables:

- Understand how likely each token is based on the model's predictions.
- Detect low-confidence areas in responses for further review.

## Prerequisites

Before using this library, one should be familiar with:

- the OpenAI API and its client.
- the concept of log probabilities, a measure of the likelihood assigned to each token by the model.
The primary goal of `structured-logprobs` is to provide insights into the **reliability** of extracted data. By analyzing token-level log probabilities, the library helps assess how likely each value generated from an LLM's structured outputs is.

## Key Features

Expand Down Expand Up @@ -76,7 +66,7 @@ print(chat_completion_inline.choices[0].message.content)

## Example JSON Schema

The `response_format` in the request body is an object specifying the format that the model must output. Setting to { "type": "json_schema", "json_schema": {...} } ensures the model will match your supplied JSON schema.
The `response_format` in the request body is an object specifying the format that the model must output. Setting to { "type": "json_schema", "json_schema": {...} } ensures the model will match your supplied [JSON schema](https://json-schema.org/overview/what-is-jsonschema).

Below is the example of the JSON file that defines the schema used for validating the responses.

Expand Down
Binary file added docs/images/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/images/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/pitch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/images/pitch.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 54 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,54 @@
{%
include-markdown "../README.md"
%}
**structured-logprobs** is an open-source Python library that enhances OpenAI's structured outputs by providing detailed information about token log probabilities.

![structured-logprobs](images/pitch.png)

This library is designed to offer valuable insights into the **reliability of an LLM's structured outputs**. It works with OpenAI's [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), a feature that ensures the model consistently generates responses adhering to a supplied JSON Schema. This eliminates concerns about missing required keys or hallucinating invalid values.

## Installation

Simply install with `pip install structured-logprobs`

Then use it this way:

```python
from openai import OpenAI
from structured_logprobs.main import add_logprobs

client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"Please output JSON-formatted metadata about the 'structured-logprobs' library."
),
}
],
logprobs=True,
response_format={
"type": "json_schema",
"json_schema": {
"name": "answear",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"version": {"type": "string"},
},
},
},
},
)
chat_completion = add_logprobs(completion)
print(chat_completion)
```

For more details, visit [Getting Started](https://github.com/arena-ai/structured-logprobs/blob/4-finalize-the-packaging/docs/notebooks/notebook.ipynb).

## Key Features

The module contains a function for mapping characters to token indices (`map_characters_to_token_indices`) and two methods for incorporating log probabilities:

1. Adding log probabilities as a separate field in the response (`add_logprobs`).
2. Embedding log probabilities inline within the message content (`add_logprobs_inline`).
1 change: 0 additions & 1 deletion docs/modules.md

This file was deleted.

Loading
Loading