Skip to content

Commit 8700cb2

Browse files
authored
Alex readme changes (#38)
* Update README.md * Update README.md: reduce logo size
1 parent 1bb22d4 commit 8700cb2

File tree

1 file changed

+13
-4
lines changed

1 file changed

+13
-4
lines changed

README.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ AIMon helps developers build, ship, and monitor LLM Apps more confidently and re
55
**Join our community on [Slack](https://join.slack.com/t/generativeair/shared_invite/zt-2jab62lsj-xM9a_s~Qweu8lf3YS2cANg)**
66

77
<div align="center">
8-
<img src="images/aimon-rely-image.png" alt="AIMon" width="650" height="350">
8+
<img src="images/aimon-rely-image.png" alt="AIMon" width="325" height="175">
99
</div>
1010

1111
## Metrics Supported
@@ -81,9 +81,18 @@ A few key takeaways:
8181
Overall, AIMon is 10 times cheaper, 4 times faster, and close to or even **better than GPT-4** on the benchmarks
8282
making it a suitable choice for both offline and online detection of hallucinations.
8383

84-
<div align="center">
85-
<img src="images/hallucination-benchmarks.png" alt="Hallucination Benchmarks">
86-
</div>
84+
| Metric | Aimon Rely v1 | GPT-4 Turbo (LLM-as-a-judge) |
85+
|---------------------------------------------------------------|------------------------|----------------------------------|
86+
| Context Length | 32,000 | **128,000** |
87+
| TRUE Dataset Precision/Recall | 0.808 / 0.922 | **0.810 / 0.926** |
88+
| SummaC (test) Balanced Accuracy | **0.778** | 0.756 |
89+
| SummaC (test) AUC | **0.809** | 0.780 |
90+
| AnyScale Ranking Test for Hallucinations Accuracy | 0.665 | **0.741** |
91+
| AnyScale Ranking Test for Hallucinations Rel. Accuracy | 0.804 | **0.855** |
92+
| Avg. Latency | **417ms** | 1800ms |
93+
| Cost (15M tokens across all benchmark datasets) excluding free tier | **$15** | $158 |
94+
| Fully Hosted | :white_check_mark: | :white_check_mark: |
95+
| Explainability | **Automatic sentence-level Scores** | Detailed reasoning with additional prompt engineering |
8796

8897
### Benchmarks on other Detectors
8998

0 commit comments

Comments
 (0)