hidai25/eval-view

v0.1.4 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 7mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Ollama support enables free local LLM-as-judge evaluations.

Full changelog

What's New

Ollama Support (Free Local Evaluation)

Ollama as LLM-as-judge - Run evaluations locally with zero API costs
Auto-detection - Automatically detects Ollama running on localhost:11434
New adapter - Test LangGraph agents powered by local Llama models

# Free local evaluation
evalview run --judge-provider ollama --judge-model llama3.2

Improved Hallucination Detection

- Reduced false positives for local models
- Unit conversions and formatting no longer flagged as hallucinations
- Confidence threshold: 90% for Ollama, 70% for cloud providers

README Updates

- Added "Who is EvalView for?" section
- Added LangSmith/Langfuse complement positioning
- New Ollama example in /examples/ollama/

Fixes

- Fixed mypy type annotation error
- Fixed action.yml description length for Marketplace

Full Changelog: https://github.com/hidai25/eval-view/compare/v0.1.3...v0.1.4

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

hidai25/eval-view

Summary

What's New

Ollama Support (Free Local Evaluation)

Related context

Related tools