Define qualitative evaluation criteria and let an LLM judge if responses pass. Perfect for testing AI agents, comparing models, and evaluating subjective qualities.
Required Ruby Version
>= 3.3.0
Authors
Eric Stiens
Define qualitative evaluation criteria and let an LLM judge if responses pass. Perfect for testing AI agents, comparing models, and evaluating subjective qualities.
>= 3.3.0
Eric Stiens