Language Model Evaluation

LLM Evaluation is Key to Accurate, Reliable, Effective GenAI

Enter large language model (LLM) evaluation. The purpose of LLM evaluation is to analyze and refine GenAI outputs to improve their accuracy and reliability while avoiding bias. The evaluation process ...

ascopubs.org

Simulation-Based Evaluation of a Large Language Model–Enabled Clinical Decision Support Platform in Oncology

In a remote, within-participant simulation, 26 oncologists from the United Kingdom, United States, Spain, and Singapore reviewed synthetic breast cancer cases and created comprehensive summaries for ...

European Medical Journal

Large Language Models in Glaucoma Need Guardrails

Scoping review finds large language models can support glaucoma education and decision support, but accuracy and multimodal limits persist.

ZDNet

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

Forbes

Augmenting The American Psychiatric Association App Evaluation Model To Include AI-Based Mental Health Apps

Forbes contributors publish independent expert analyses and insights. Dr. Lance B. Eliot is a world-renowned AI scientist and consultant. In today’s column, I examine an existing formalized evaluation ...

News Medical

Study finds health care evaluations of large language models lacking in real patient data and bias assessment

A new systematic review reveals that only 5% of health care evaluations for large language models use real patient data, with significant gaps in assessing bias, fairness, and a wide range of tasks, ...

How Large Scale Speech Models Will Impact Voice AI

A duplex speech-to-speech model changes the premise: The intelligence layer consumes audio and produces audio directly. The model can attend to what was said and how it was said—content and delivery ...

Slator

Academia and Hyperscalers Building the Core Infrastructure for African Language AI

New translation models, open speech datasets, and automatic speech recognition benchmarks aim to expand AI support for African languages.

ERR News

Estonia looking into AI grading for native language exams

The Education and Youth Board (Harno) is discussing the possibility of using artificial intelligence to grade mother tongue ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results