BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Drupal//recurring_events_ical//2.0//EN
BEGIN:VEVENT
UID:4f18dc43-eba3-40ba-b493-758d504c4c9d@support.access-ci.org
DTSTAMP:20251009T091218Z
DTSTART:20251107T150000Z
DTEND:20251107T160000Z
SUMMARY:Evaluating LLMs: Benchmarks & Metrics *CANCELED*
DESCRIPTION:Who Should AttendThis session is for practitioners, researchers
 , and students who are working with large language models and want to bett
 er understand how to measure their strengths and weaknesses. It’s releva
 nt whether you’re fine-tuning models, deploying them in applications, or
  simply interested in how the field defines “good performance.”What Yo
 u’ll LearnWe’ll explore evaluation across multiple dimensions, like re
 asoning, language understanding, safety, and efficiency. You’ll see how 
 benchmarks like MMLU, HumanEval, and HELM are used, what quantitative metr
 ics (e.g., perplexity, latency, throughput) tell us, and why human evaluat
 ion still matters. The goal is to provide a structured overview of the eva
 luation landscape so you can think critically about LLM performance in you
 r own context.LevelIntermediate. Assumes some familiarity with NLP or mach
 ine learning concepts, but no deep background in evaluation research is re
 quired.
URL:https://support.access-ci.org/events/8570
END:VEVENT
END:VCALENDAR