DeepEval

LLM evaluation framework like Pytest — 50+ metrics, CI/CD integration, red-teaming

"Pytest for LLMs" with 12.6K+ ⭐. Specialized unit testing framework for LLM outputs: 50+ research-based metrics (G-Eval, hallucination detection, faithfulness, relevancy, RAGAs). Integrates with CI...

v7.0