The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

26 November 2024

151

Each time a brand new AI mannequin is launched, it’s sometimes touted as acing its efficiency towards a sequence of benchmarks. OpenAI’s GPT-4o, for instance, was launched in Could with a compilation of outcomes that confirmed its efficiency topping each different AI firm’s newest mannequin in a number of checks.

The issue is that these benchmarks are poorly designed, the outcomes arduous to duplicate, and the metrics they use are ceaselessly arbitrary, in accordance with new analysis. That issues as a result of AI fashions’ scores towards these benchmarks decide the extent of scrutiny they obtain.

AI firms ceaselessly cite benchmarks as testomony to a brand new mannequin’s success, and people benchmarks already type a part of some governments’ plans for regulating AI. However proper now, they won’t be ok to make use of that manner—and researchers have some concepts for the way they need to be improved.

—Scott J Mulligan

We have to begin wrestling with the ethics of AI brokers

Generative AI fashions have change into remarkably good at conversing with us, and creating pictures, movies, and music for us, however they’re not all that good at doing issues for us.

AI brokers promise to vary that. Final week researchers printed a brand new paper explaining how they skilled simulation brokers to duplicate 1,000 individuals’s personalities with gorgeous accuracy.

AI fashions that mimic you can exit and act in your behalf within the close to future. If such instruments change into low-cost and simple to construct, it’ll elevate plenty of new moral issues, however two specifically stand out. Learn the complete story.

—James O’Donnell

The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

Related Articles

Constructing Outlook Add-ins from Thought to Launch: Outlook Add-in Growth

Your Information to Asynchronous Java

Shadow AI : Learn how to take care of unauthorized fashions and uncontrolled brokers

LEAVE A REPLY Cancel reply

Latest Articles

Constructing Outlook Add-ins from Thought to Launch: Outlook Add-in Growth

Your Information to Asynchronous Java

Shadow AI : Learn how to take care of unauthorized fashions and uncontrolled brokers

Your AI Coding Instrument Has Amnesia

Cilium, eBPF, and Fashionable Kubernetes Networking with Invoice Mulligan