Skip to content

Commit 773bd30

Browse files
author
Preetam Joshi
committed
Adding the Aimon Rely README, images, the postman collection, a simple client and examples.
1 parent 2b5efe4 commit 773bd30

14 files changed

+600
-2
lines changed

.gitignore

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Pytest
2+
*.pytest_cache
3+
4+
# pycache
5+
*__pycache__*
6+
7+
# Packages
8+
*.egg
9+
*.egg-info
10+
build
11+
eggs
12+
parts
13+
bin
14+
var
15+
sdist
16+
develop-eggs
17+
.installed.cfg
18+
lib
19+
lib64
20+
21+
# Installer logs
22+
pip-log.txt

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) Microsoft Corporation. All rights reserved.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 99 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,99 @@
1-
# aimon-rely
2-
Aimon Rely is a state-of-the-art, multi-model system for detecting LLM quality issues both offline and online.
1+
# 🎉**Welcome to Aimon Rely**
2+
3+
Aimon Rely is a state-of-the-art, multi-model system for detecting LLM quality issues both offline and online. Our
4+
Hallucination detector performs well on the popular industry benchmarks. It is available via a convenient hosted API
5+
(currently in beta).
6+
7+
Read our [blog post](https://aimon.ai/blogs/introducing-rely) for more details.
8+
9+
**Join our [discord](https://discord.gg/Cp6YZ9qTdm) or reach out to us at info@aimon.ai to get your API key.**
10+
11+
<div align="center">
12+
<img src="images/aimon-rely-image.png" alt="Aimon Rely" width="550" height="450">
13+
</div>
14+
15+
## Metrics Supported
16+
17+
These are the quality metrics that are currently available via the API. Some of them are in progress and will be
18+
available in a future release.
19+
20+
| Metric | Status |
21+
|--------------------------------------------------|--------------------------------------------------------------|
22+
| Model Hallucination (Passage and Sentence Level) | <span style="font-size: 24px; color: green;">&#10003;</span> |
23+
| Semantic Similarity | <span style="font-size: 24px;">⌛</span> |
24+
| Completeness | <span style="font-size: 24px;">⌛</span> |
25+
| Conciseness | <span style="font-size: 24px;">⌛</span> |
26+
| Toxicity | <span style="font-size: 24px;">⌛</span> |
27+
| Sentiment | <span style="font-size: 24px;">⌛</span> |
28+
| Coherence | <span style="font-size: 24px;">⌛</span> |
29+
| Sensitive Data (PII/PHI/PCI) | <span style="font-size: 24px;">⌛</span> |
30+
31+
## Quick Usage
32+
33+
### Sandbox
34+
35+
You can play with a [Sandbox](https://aimon.ai/tryproduct) that is available on our website.
36+
37+
### API
38+
39+
Here is how to try the API:
40+
41+
- Step 1: Get your API key by requesting it on our [discord](https://discord.gg/Cp6YZ9qTdm) or sending an email
42+
to info@aimon.ai
43+
- Step 2: You can try the API using either of these methods
44+
- [OPTION 1] Try the simple langchain summarization application that is augmented with Aimon Rely to detect
45+
hallucinations at the sentence level.
46+
- Step 1: Run `cd src`, `pip install -r requirements.txt && python setup.py install`
47+
- Step 2: Make sure to add the Aimon API key to the `langchain_summarization_app.py`
48+
- Step 3: Run `cd ..`, `streamlit run src/examples/langchain_summarization_app.py`
49+
- [OPTION 2] Download the Postman collection specified below to access the API
50+
- Model Hallucination (Passage and Sentence
51+
Level): [Postman Collection](postman_collections/aimon_hallucination_detection_beta_march_2024.postman_collection.json)
52+
53+
**This GIF demonstrates a simple langchain based document summarization application that is augmented with Aimon Rely
54+
to demonstrate the ease of integration.**
55+
56+
![Simple Langchain App with Aimon Rely](images/aimon-rely-langchain-app.gif)
57+
58+
## Benchmarks
59+
60+
To demonstrate the effectiveness of our system, we benchmarked it against popular industry benchmarks for the
61+
hallucination detection task. The table below shows our results.
62+
63+
A few key takeaways:
64+
65+
✅ Aimon Rely is **10x cheaper** than GPT-4 Turbo.
66+
67+
✅ Aimon Rely is **4x faster** than GPT-4 Turbo.
68+
69+
✅ Aimon Rely provides the convenience of a fully hosted API that includes baked-in explainability.
70+
71+
✅ Support for a context length of up to 32,000 tokens (with plans to further expand this in the near future).
72+
73+
Overall, Aimon Rely is 10 times cheaper, 4 times faster and close to or even **better than GPT-4** on the benchmarks
74+
making it a suitable choice for both offline and online detection of hallucinations.
75+
76+
<div align="center">
77+
<img src="images/hallucination-benchmarks.png" alt="Hallucination Benchmarks">
78+
</div>
79+
80+
## Pricing
81+
82+
We offer a generous free tier and an attractive low cost, low latency API.
83+
84+
### Model Hallucination (Passage and Sentence Level)
85+
86+
*Only the input payload is used for pricing calculations.*
87+
88+
| Number of tokens | Price per 1M tokens |
89+
|----------------------------|---------------------|
90+
| First 5M tokens | **FREE** |
91+
| Subsequent price/1M tokens | $1 |
92+
93+
## Future Work
94+
95+
- We are working on additional metrics as detailed in the table above.
96+
- In addition, we are working on something awesome to make the offline evaluation and continuous model quality
97+
monitoring experience more seamless.
98+
99+
Join our [discord](https://discord.gg/Cp6YZ9qTdm) for the latest updates and discussions on generative AI reliability.

images/aimon-rely-image.png

394 KB
Loading

images/aimon-rely-langchain-app.gif

209 KB
Loading

images/hallucination-benchmarks.png

80.9 KB
Loading
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
{
2+
"info": {
3+
"_postman_id": "09bcf8ba-8811-43b2-a410-4a61783f4364",
4+
"name": "[Beta] Aimon Hallucination Detection",
5+
"description": "## Overview\n\nThis is a beta version of **Aimon Rely.** It includes our proprietary hallucination detector. This is an beta-release, so please treat it as such. Check with us (send a note to [info@aimon.ai](https://mailto:info@aimon.ai)) before using this API in a production setting. There are limited uptime guarantees at the moment. Please report any issues to the Aimon team (at [info@aimon.ai](https://mailto:info@aimon.ai)).\n\n> Use the APIs with caution - do not send sensitive or protected data to this API. \n \n\n## Features\n\nGiven a context and the generated text, we are able to detect 2 different types of model hallucinations: intrinsic and extrinsic.\n\n- The \"is_hallucinated\" field indicates whether the \"generated_text\" (passed in the input) is hallucinated.\n- A top level passage level \"score\" indicates if the entire set of sentences contain any hallucinations. The score is a probabilty measure of how hallucinated the text is compared to the context. A score >= 0.5 can be classified as a hallucination.\n- We also provide sentence level scores to help with explanability.\n \n\n## **Limitations**\n\n- Input payloads with context sizes greater than 32,000 tokens will not work at the moment.\n- Maximum batch size is 25 items at the moment.",
6+
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json",
7+
"_exporter_id": "30634528",
8+
"_collection_link": "https://aimon-trailblazers.postman.co/workspace/0c99cd4f-6ba5-41e9-9cbf-4f942a218086/collection/30634662-09bcf8ba-8811-43b2-a410-4a61783f4364?action=share&source=collection_link&creator=30634528"
9+
},
10+
"item": [
11+
{
12+
"name": "Batch Inference",
13+
"event": [
14+
{
15+
"listen": "test",
16+
"script": {
17+
"exec": [
18+
"pm.test(\"Response status code is 200\", function () {",
19+
" pm.response.to.have.status(200);",
20+
"});",
21+
"",
22+
"",
23+
"pm.test(\"Verify that the response is an array with at least one element\", function () {",
24+
" const responseData = pm.response.json();",
25+
"",
26+
" pm.expect(responseData).to.be.an('array').and.to.have.lengthOf.at.least(1);",
27+
"});",
28+
"",
29+
"",
30+
"pm.test(\"Each element in the response has the 'score' and 'sentences' fields\", function () {",
31+
" const responseData = pm.response.json();",
32+
" ",
33+
" pm.expect(responseData).to.be.an('array').that.is.not.empty;",
34+
" ",
35+
" responseData.forEach(function(element) {",
36+
" pm.expect(element).to.have.property('score');",
37+
" pm.expect(element).to.have.property('sentences');",
38+
" });",
39+
"});",
40+
"",
41+
"",
42+
"pm.test(\"Validate that the 'score' field in each element is a number\", function () {",
43+
" const responseData = pm.response.json();",
44+
"",
45+
" pm.expect(responseData).to.be.an('array');",
46+
"",
47+
" responseData.forEach(function(element) {",
48+
" pm.expect(element.score).to.be.a('number');",
49+
" });",
50+
"});",
51+
"",
52+
"",
53+
"pm.test(\"Verify that the 'sentences' field in each element is an array with at least one element\", function () {",
54+
" const responseData = pm.response.json();",
55+
"",
56+
" pm.expect(responseData).to.be.an('array');",
57+
"",
58+
" responseData.forEach(function (element) {",
59+
" pm.expect(element.sentences).to.be.an('array').with.lengthOf.at.least(1);",
60+
" });",
61+
"});"
62+
],
63+
"type": "text/javascript"
64+
}
65+
}
66+
],
67+
"request": {
68+
"auth": {
69+
"type": "bearer",
70+
"bearer": [
71+
{
72+
"key": "token",
73+
"value": "{{AIMON_API_KEY}}",
74+
"type": "string"
75+
}
76+
]
77+
},
78+
"method": "POST",
79+
"header": [],
80+
"body": {
81+
"mode": "raw",
82+
"raw": "[\n {\n \"context\": \"the abc have reported that those who receive centrelink payments made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\",\n \"generated_text\": \"those who receive centrelink payments that themselves were up 20% made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\"\n },\n {\n \"context\": \"the abc have reported that those who receive centrelink payments made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\",\n \"generated_text\": \"the abc have reported that those who receive centrelink payments that themselves were up 20% made up three fourths of radio rental's income last year.\"\n },\n {\n \"context\": \"the abc have reported that those who receive centrelink payments made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\",\n \"generated_text\": \"the abc have reported that those who receive centrelink payments made up three fourths of radio rental's income last year. Centrelink payments themselves were 20% up.\"\n }\n]",
83+
"options": {
84+
"raw": {
85+
"language": "json"
86+
}
87+
}
88+
},
89+
"url": {
90+
"raw": "{{AIMON_HALLUCINATION_API_URL}}",
91+
"host": [
92+
"{{AIMON_HALLUCINATION_API_URL}}"
93+
]
94+
},
95+
"description": "This request consists of an array of 3 items. The first item does not contain hallucinations but the 2nd and the 3rd items do contain hallucinations."
96+
},
97+
"response": [
98+
{
99+
"name": "Response",
100+
"originalRequest": {
101+
"method": "POST",
102+
"header": [],
103+
"body": {
104+
"mode": "raw",
105+
"raw": "[\n {\n \"context\": \"the abc have reported that those who receive centrelink payments made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\",\n \"generated_text\": \"those who receive centrelink payments that themselves were up 20% made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\"\n },\n {\n \"context\": \"the abc have reported that those who receive centrelink payments made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\",\n \"generated_text\": \"the abc have reported that those who receive centrelink payments that themselves were up 20% made up three fourths of radio rental's income last year.\"\n },\n {\n \"context\": \"the abc have reported that those who receive centrelink payments made up half of radio rental's income last year. Centrelink payments themselves were up 20%.\",\n \"generated_text\": \"the abc have reported that those who receive centrelink payments made up three fourths of radio rental's income last year. Centrelink payments themselves were 20% up.\"\n }\n]",
106+
"options": {
107+
"raw": {
108+
"language": "json"
109+
}
110+
}
111+
},
112+
"url": {
113+
"raw": "{{AIMON_HALLUCINATION_API_URL}}",
114+
"host": [
115+
"{{AIMON_HALLUCINATION_API_URL}}"
116+
]
117+
}
118+
},
119+
"status": "OK",
120+
"code": 200,
121+
"_postman_previewlanguage": "json",
122+
"header": [
123+
{
124+
"key": "Date",
125+
"value": "Mon, 11 Mar 2024 07:11:23 GMT"
126+
},
127+
{
128+
"key": "Content-Type",
129+
"value": "application/json"
130+
},
131+
{
132+
"key": "Content-Length",
133+
"value": "786"
134+
},
135+
{
136+
"key": "Connection",
137+
"value": "keep-alive"
138+
},
139+
{
140+
"key": "Access-Control-Allow-Origin",
141+
"value": "*"
142+
},
143+
{
144+
"key": "Strict-Transport-Security",
145+
"value": "max-age=15724800; includeSubDomains"
146+
}
147+
],
148+
"cookie": [],
149+
"body": "[\n {\n \"is_hallucinated\": \"False\",\n \"score\": 0.07905,\n \"sentences\": [\n {\n \"score\": 0.07905,\n \"text\": \"those who receive centrelink payments that themselves were up 20% made up half of radio rental's income last year.\"\n },\n {\n \"score\": 0.01975,\n \"text\": \"Centrelink payments themselves were up 20%.\"\n }\n ]\n },\n {\n \"is_hallucinated\": \"True\",\n \"score\": 0.90819,\n \"sentences\": [\n {\n \"score\": 0.90819,\n \"text\": \"the abc have reported that those who receive centrelink payments that themselves were up 20% made up three fourths of radio rental's income last year.\"\n }\n ]\n },\n {\n \"is_hallucinated\": \"True\",\n \"score\": 0.99121,\n \"sentences\": [\n {\n \"score\": 0.99121,\n \"text\": \"the abc have reported that those who receive centrelink payments made up three fourths of radio rental's income last year.\"\n },\n {\n \"score\": 0.02108,\n \"text\": \"Centrelink payments themselves were 20% up.\"\n }\n ]\n }\n]"
150+
}
151+
]
152+
},
153+
{
154+
"name": "Point Inference",
155+
"request": {
156+
"auth": {
157+
"type": "bearer",
158+
"bearer": [
159+
{
160+
"key": "token",
161+
"value": "{{AIMON_API_KEY}}",
162+
"type": "string"
163+
}
164+
]
165+
},
166+
"method": "POST",
167+
"header": [],
168+
"body": {
169+
"mode": "raw",
170+
"raw": "[\n {\n \"context\": \"five ambitious clubs are locked in a scramble for two champions league places behind chelsea and manchester city and it could prove more tense and exciting than the barclays premier league title race itself.\",\n \"generated_text\": \"five ambitious clubs are locked in a scramble for two champions league places.\"\n }\n]\n",
171+
"options": {
172+
"raw": {
173+
"language": "json"
174+
}
175+
}
176+
},
177+
"url": {
178+
"raw": "{{AIMON_HALLUCINATION_API_URL}}",
179+
"host": [
180+
"{{AIMON_HALLUCINATION_API_URL}}"
181+
]
182+
},
183+
"description": "This is a point inference example i.e., the payload only contains one item for inference."
184+
},
185+
"response": [
186+
{
187+
"name": "Response",
188+
"originalRequest": {
189+
"method": "POST",
190+
"header": [],
191+
"body": {
192+
"mode": "raw",
193+
"raw": "[\n {\n \"context\": \"five ambitious clubs are locked in a scramble for two champions league places behind chelsea and manchester city and it could prove more tense and exciting than the barclays premier league title race itself.\",\n \"generated_text\": \"five ambitious clubs are locked in a scramble for two champions league places.\"\n }\n]\n",
194+
"options": {
195+
"raw": {
196+
"language": "json"
197+
}
198+
}
199+
},
200+
"url": {
201+
"raw": "{{AIMON_HALLUCINATION_API_URL}}",
202+
"host": [
203+
"{{AIMON_HALLUCINATION_API_URL}}"
204+
]
205+
}
206+
},
207+
"status": "OK",
208+
"code": 200,
209+
"_postman_previewlanguage": "json",
210+
"header": [
211+
{
212+
"key": "Date",
213+
"value": "Mon, 11 Mar 2024 07:53:18 GMT"
214+
},
215+
{
216+
"key": "Content-Type",
217+
"value": "application/json"
218+
},
219+
{
220+
"key": "Content-Length",
221+
"value": "166"
222+
},
223+
{
224+
"key": "Connection",
225+
"value": "keep-alive"
226+
},
227+
{
228+
"key": "Access-Control-Allow-Origin",
229+
"value": "*"
230+
},
231+
{
232+
"key": "Strict-Transport-Security",
233+
"value": "max-age=15724800; includeSubDomains"
234+
}
235+
],
236+
"cookie": [],
237+
"body": "[\n {\n \"is_hallucinated\": \"False\",\n \"score\": 0.01675,\n \"sentences\": [\n {\n \"score\": 0.01675,\n \"text\": \"five ambitious clubs are locked in a scramble for two champions league places.\"\n }\n ]\n }\n]"
238+
}
239+
]
240+
}
241+
],
242+
"variable": [
243+
{
244+
"key": "AIMON_HALLUCINATION_API_URL",
245+
"value": "https://am-hd-m1-ser-2380-7615d7e0-wkx4g8t7.onporter.run/inference"
246+
}
247+
]
248+
}

src/aimon_rely_client/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)