@@ -5,7 +5,7 @@ AIMon helps developers build, ship, and monitor LLM Apps more confidently and re
5
5
✨ ** Join our community on [ Slack] ( https://join.slack.com/t/generativeair/shared_invite/zt-2jab62lsj-xM9a_s~Qweu8lf3YS2cANg ) **
6
6
7
7
<div align =" center " >
8
- <img src="images/aimon-rely-image.png" alt="AIMon" width="650 " height="350 ">
8
+ <img src="images/aimon-rely-image.png" alt="AIMon" width="325 " height="175 ">
9
9
</div >
10
10
11
11
## Metrics Supported
@@ -81,9 +81,18 @@ A few key takeaways:
81
81
Overall, AIMon is 10 times cheaper, 4 times faster, and close to or even ** better than GPT-4** on the benchmarks
82
82
making it a suitable choice for both offline and online detection of hallucinations.
83
83
84
- <div align =" center " >
85
- <img src="images/hallucination-benchmarks.png" alt="Hallucination Benchmarks">
86
- </div >
84
+ | Metric | Aimon Rely v1 | GPT-4 Turbo (LLM-as-a-judge) |
85
+ | ---------------------------------------------------------------| ------------------------| ----------------------------------|
86
+ | Context Length | 32,000 | ** 128,000** |
87
+ | TRUE Dataset Precision/Recall | 0.808 / 0.922 | ** 0.810 / 0.926** |
88
+ | SummaC (test) Balanced Accuracy | ** 0.778** | 0.756 |
89
+ | SummaC (test) AUC | ** 0.809** | 0.780 |
90
+ | AnyScale Ranking Test for Hallucinations Accuracy | 0.665 | ** 0.741** |
91
+ | AnyScale Ranking Test for Hallucinations Rel. Accuracy | 0.804 | ** 0.855** |
92
+ | Avg. Latency | ** 417ms** | 1800ms |
93
+ | Cost (15M tokens across all benchmark datasets) excluding free tier | ** $15** | $158 |
94
+ | Fully Hosted | :white_check_mark : | :white_check_mark : |
95
+ | Explainability | ** Automatic sentence-level Scores** | Detailed reasoning with additional prompt engineering |
87
96
88
97
### Benchmarks on other Detectors
89
98
0 commit comments