DeepEval
LLM evaluation framework like Pytest — 50+ metrics, CI/CD integration, red-teaming
"Pytest for LLMs" with 12.6K+ ⭐. Specialized unit testing framework for LLM outputs: 50+ research-based metrics (G-Eval, hallucination detection, faithfulness, relevancy, RAGAs). Integrates with CI...