Ensure high-quality LLM outputs with automatic evals. Use a representative sample of user inputs to reduce subjectivity when tuning prompts. Use built-in metrics, LLM-graded evals, or define your own custom metrics. Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow. Use OpenAI, Anthropic, and open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API.
Features
- Create a list of test cases
- Set up evaluation metrics
- Select the best prompt & model
- Use a representative sample of user inputs to reduce subjectivity when tuning prompts
- Use built-in metrics, LLM-graded evals, or define your own custom metrics
- Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow
Categories
Large Language Models (LLM)License
MIT LicenseFollow promptfoo
Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit
Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of promptfoo!