Define qualitative evaluation criteria and let an LLM judge if responses pass. Perfect for testing AI agents, comparing models, and evaluating subjective qualities.
Eric Stiens
January 5, 2026 10:43pm
MIT