Can AI read papers like a scientist? A new benchmark shows where LLMs fail

To stay up to date and work forward in their fields, scientists must have at their fingertips and in their minds thousands of published studies. Large language models (LLMs) show promise as a tool for exploring the vast scientific literature, but are they trustworthy when it comes to providing full and scientifically accurate answers to complex questions in specialized fields?