Public benchmarks are designed to evaluate general LLM capabilities. Custom evals measure LLM performance on specific tasks.
Educators can use the generative AI assistant Copilot, created by Microsoft, to automate lesson planning and mapping quickly ...