このコンテンツは選択した言語では利用できません。
Chapter 1. Overview of evaluating AI systems
Evaluate your AI systems to generate an analysis of your model’s ability by using the following TrustyAI tools:
- EvalHub. Use EvalHub to automate, standardize, and scale LLMs evaluation across multiple frameworks. Evaluate AI artifacts, such as prompts, models, AI agents, datasets, and AI risk.
- LM-Eval: You can use TrustyAI to monitor your LLM against a range of different evaluation tasks and to ensure the accuracy and quality of its output. Features such as summarization, language toxicity, and question-answering accuracy are assessed to inform and improve your model parameters.