Archives for MMAU
02
Aug
Apple Unveils MMAU: A New Benchmark for Evaluating Language Model Agents Across Diverse Domains



The MMAU benchmark features 20 tasks and over 3,000 prompts for a detailed assessment of LLM capabilities, aiming to pinpoint specific skill-related model failures.
The post Apple Unveils MMAU: A New Benchmark for Evaluating Language Model Agents Across Diverse Domains appeared first on AIM.