This release adds 3 notable features for engineering teams evaluating rollout.
Published 5mo
Developer Productivity
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
agent-benchmark
agent-evaluation
agentic-ai
ai-agents
anthropic
autogen
+12 more
cli
crewai
evaluation
langchain-agent
langgraph
llm
mcp
openai-assistants
pytest
python
regression-testing
testing
Summary
AI summaryOllama support enables free local LLM-as-judge evaluations.
Full changelog
What's New
Ollama Support (Free Local Evaluation)
- Ollama as LLM-as-judge - Run evaluations locally with zero API costs
- Auto-detection - Automatically detects Ollama running on localhost:11434
- New adapter - Test LangGraph agents powered by local Llama models
# Free local evaluation
evalview run --judge-provider ollama --judge-model llama3.2
Improved Hallucination Detection
- Reduced false positives for local models
- Unit conversions and formatting no longer flagged as hallucinations
- Confidence threshold: 90% for Ollama, 70% for cloud providers
README Updates
- Added "Who is EvalView for?" section
- Added LangSmith/Langfuse complement positioning
- New Ollama example in /examples/ollama/
Fixes
- Fixed mypy type annotation error
- Fixed action.yml description length for Marketplace
Full Changelog: https://github.com/hidai25/eval-view/compare/v0.1.3...v0.1.4
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About hidai25/eval-view
Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.
Related context
Related tools
Beta — feedback welcome: [email protected]