John Snow Labs Releases LangTest 2.6.0: De-biasing Data Augmentation, Structured Output Evaluation, Med Halt Confidence Tests, Expanded QA & Summarization Support, and Enhanced Security #1187
chakravarthik27
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📢 Highlights
We are excited to introduce the latest langtest release, bringing you a suite of improvements designed to streamline model evaluation and enhance overall performance:
🛠 De-biasing Data Augmentation:
We’ve integrated de-biasing techniques into our data augmentation process, ensuring more equitable and representative model assessments.
🔄 Evaluation with Structured Outputs:
LangTest now supports structured output APIs for both OpenAI and Ollama, offering greater flexibility and precision when processing model responses.
🏥 Confidence Testing with Med Halt Tests:
Introducing med halt tests for confidence evaluation, enabling more robust insights into your LLMs’ reliability under diverse conditions.
📖 Expanded Task Support for JSL LLM Models:
QA and Summarization tasks are now fully supported for JSL LLM models, enhancing their capabilities for real-world applications.
🔒Security Enhancements:
Critical vulnerabilities and security issues have been addressed, reinforcing the LangTest overall stability and safety.
🐛 Resolved Bugs:
We’ve fixed issues with templatic augmentation to ensure consistent, accurate, and reliable outputs across your workflows.
🔥 Key Enhancements
🛠 De-biasing Data Augmentation
We’ve integrated de-biasing techniques into our data augmentation process, ensuring more equitable and representative model assessments.
Key Features:
How it works:
To load the dataset
🔄Evaluation with Structured Outputs
Now supporting structured output APIs for OpenAI, Ollama, and Azure-OpenAI, offering greater flexibility and precision when processing model responses.
Key Features:
How it works:
Pydantic Model Setup:
Harness Setup:
🏥 Confidence Testing with Med Halt Tests
Gain deeper insights into your LLMs’ robustness and reliability under diverse conditions with our upgraded Med Halt tests. This release focuses on refining confidence assessments in LLMs.
Key Features:
(False Confidence Test)
(Fake Questions Test)
Test
How it works:
Generate and Execute the test cases:
Report
📖 QA and Summarization Support for JSL LLM Models
JSL LLM models now support both Question Answering (QA) and Summarization tasks, which makes testing more practical in real-world scenarios
Key Features:
How it works:
Pipeline Setup:
Harness Setup:
generate and run testcases
Results


Report
🔒 Security Enhancements
Critical vulnerabilities and security issues have been resolved, reinforcing the overall stability and safety of our platform. In this update, we upgraded dependencies to fix vulnerabilities, ensuring a more secure and reliable environment for our users.
🐛 Fixes
⚡ Enhancements
What's Changed
Full Changelog: 2.5.0...2.6.0
This discussion was created from the release John Snow Labs Releases LangTest 2.6.0: De-biasing Data Augmentation, Structured Output Evaluation, Med Halt Confidence Tests, Expanded QA & Summarization Support, and Enhanced Security.
Beta Was this translation helpful? Give feedback.
All reactions