A Vitest custom matcher that uses LLMs (via Ollama) to validate whether AI-generated responses make sense in production contexts.
When building AI agents, chatbots, or any system that generates text, you need to ensure the outputs are:
- Coherent and contextually appropriate
- Free from hallucinations or nonsensical content
- Consistent with the intended personality or voice
- Safe for production use
This matcher helps catch issues like:
- Responses that contradict themselves
- Impossible claims presented as facts
- Random word salad that sounds AI-generated
- Responses that break character or voice
npm install --save-dev @loqwai/to-make-senseThis package requires Ollama to be installed and running locally:
- Install Ollama: https://ollama.ai
- Pull a model (we recommend gemma2:2bfor speed):ollama pull gemma2:2b 
- Ensure Ollama is running (it starts automatically on most systems)
import { expect } from 'vitest'
import '@loqwai/to-make-sense'
// The matcher is now available globallyimport { describe, it, expect } from 'vitest'
import '@loqwai/to-make-sense'
describe('AI Assistant', () => {
  it('should generate coherent responses', async () => {
    const conversation = {
      messages: [
        { role: 'user', content: 'What is the capital of France?' },
        { role: 'assistant', content: 'The capital of France is Paris.' }
      ]
    }
    
    await expect(conversation).toMakeSense()
  })
  it('should reject nonsensical responses', async () => {
    const conversation = {
      messages: [
        { role: 'user', content: 'How do I reset my password?' },
        { role: 'assistant', content: 'Purple monkey dishwasher in the quantum realm!' }
      ]
    }
    
    await expect(conversation).not.toMakeSense()
  })
})await expect(conversation).toMakeSense({
  model: 'gemma2:2b',              // Ollama model to use
  temperature: 0.3,                 // LLM temperature (0-1)
  endpoint: 'http://localhost:11434/api/chat', // Ollama endpoint
  systemPrompt: 'Custom prompt...'  // Override the default validation prompt
})describe('Fantasy Game NPC', () => {
  it('should maintain character voice', async () => {
    const mysticalKeeper = {
      messages: [
        { role: 'user', content: 'Where can I find healing potions?' },
        { role: 'assistant', content: '*sighs with ancient weariness* Seven vials remain in the eastern chamber, though my incorporeal form can no longer grasp them. The third shelf, behind the cobwebs of centuries...' }
      ]
    }
    
    // This should pass - maintains mystical character
    await expect(mysticalKeeper).toMakeSense()
  })
  it('should reject out-of-character responses', async () => {
    const brokenNPC = {
      messages: [
        { role: 'user', content: 'Where can I find healing potions?' },
        { role: 'assistant', content: 'Yo dawg, check aisle 3 at the supermarket lol' }
      ]
    }
    
    // This should fail - breaks character
    await expect(brokenNPC).not.toMakeSense()
  })
})The matcher sends the conversation to an LLM with a carefully crafted prompt that instructs it to evaluate whether the response "makes sense" given the context. The LLM considers:
- Logical Coherence: Does the response follow logically from the question?
- Contextual Appropriateness: Is the response suitable for the context?
- Consistency: Are there internal contradictions?
- Realism: Are claims plausible within the established context?
The matcher distinguishes between creative fiction (which can "make sense" within its context) and true nonsense or hallucinations.
- LLM calls take time (typically 1-5 seconds with gemma2:2b)
- Tests run with a 20-second timeout by default
- Consider using smaller, faster models for testing
- Run tests in sequence to avoid overloading Ollama
# Clone the repository
git clone https://github.com/loqwai/to-make-sense.git
cd to-make-sense
# Install dependencies
npm install
# Run tests (requires Ollama)
npm test
# Type checking
npm run typecheck
# Build
npm run build
# Deploy to npm (runs tests first)
npm run deployThis project follows a "no mocking" philosophy. All tests use real LLM integrations to ensure we're validating actual behavior, not our assumptions about how LLMs work.
MIT
Contributions are welcome! Please ensure:
- All tests pass with real Ollama integration
- No mocking of LLM calls
- Follow the existing code style
- Add tests for new features
Created by @loqwai for the Loqwai project.