Turing Award Goes to A.I. Pioneers Andrew Barto and Richard Sutton


In 1977, Andrew Barto, as a researcher at the University of Massachusetts, Amherst, began exploring a new theory that neurons behaved like hedonists. The basic idea was that the human brain was driven by billions of nerve cells that were each trying to maximize pleasure and minimize pain.

A year later, he was joined by another young researcher, Richard Sutton. Together, they worked to explain human intelligence using this simple concept and applied it to artificial intelligence. The result was “reinforcement learning,” a way for A.I. systems to learn from the digital equivalent of pleasure and pain.

On Wednesday, the Association for Computing Machinery, the world’s largest society of computing professionals, announced that Dr. Barto and Dr. Sutton had won this year’s Turing Award for their work on reinforcement learning. The Turing Award, which was introduced in 1966, is often called the Nobel Prize of computing. The two scientists will share the $1 million prize that comes with the award.

Over the past decade, reinforcement learning has played a vital role in the rise of artificial intelligence, including breakthrough technologies such as Google’s AlphaGo and OpenAI’s ChatGPT. The techniques that powered these systems were rooted in the work of Dr. Barto and Dr. Sutton.

“They are the undisputed pioneers of reinforcement learning,” said Oren Etzioni, a professor emeritus of computer science at the University of Washington and founding chief executive of the Allen Institute for Artificial Intelligence. “They generated the key ideas — and they wrote the book on the subject.”

Their book, “Reinforcement Learning: An Introduction,” which was published in 1998, remains the definitive exploration of an idea that many experts say is only beginning to realize its potential.

Psychologists have long studied the ways that humans and animals learn from their experiences. In the 1940s, the pioneering British computer scientist Alan Turing suggested that machines could learn in much the same way.

But it was Dr. Barto and Dr. Sutton who began exploring the mathematics of how this might work, building on a theory that A. Harry Klopf, a computer scientist working for the government, had proposed. Dr. Barto went on to build a lab at UMass Amherst dedicated to the idea, while Dr. Sutton founded a similar kind of lab at the University of Alberta in Canada.

“It is kind of an obvious idea when you’re talking about humans and animals,” said Dr. Sutton, who is also a research scientist at Keen Technologies, an A.I. start-up, and a fellow at the Alberta Machine Intelligence Institute, one of Canada’s three national A.I. labs. “As we revived it, it was about machines.”

This remained an academic pursuit until the arrival of AlphaGo in 2016. Most experts believed that another 10 years would pass before anyone built an A.I. system that could beat the world’s best players at the game of Go.

But during a match in Seoul, South Korea, AlphaGo beat Lee Sedol, the best Go player of the past decade. The trick was that the system had played millions of games against itself, learning by trial and error. It learned which moves brought success (pleasure) and which brought failure (pain).

The Google team that built the system was led by David Silver, a researcher who had studied reinforcement learning under Dr. Sutton at the University of Alberta.

Many experts still question whether reinforcement learning could work outside of games. Game winnings are determined by points, which makes it easy for machines to distinguish between success and failure.

But reinforcement learning has also played an essential role in online chatbots.

Leading up to the release of ChatGPT in the fall of 2022, OpenAI hired hundreds of people to use an early version and provide precise suggestions that could hone its skills. They showed the chatbot how to respond to particular questions, rated its responses and corrected its mistakes. By analyzing those suggestions, ChatGPT learned to be a better chatbot.

Researchers call this “reinforcement learning from human feedback,” or R.L.H.F. And it is one of the key reasons that today’s chatbots respond in surprisingly lifelike ways.

(The New York Times has sued OpenAI and its partner, Microsoft, for copyright infringement of news content related to A.I. systems. OpenAI and Microsoft have denied those claims.)

More recently, companies like OpenAI and the Chinese start-up DeepSeek have developed a form of reinforcement learning that allows chatbots to learn from themselves — much as AlphaGo did. By working through various math problems, for instance, a chatbot can learn which methods lead to the right answer and which do not.

If it repeats this process with an enormously large set of problems, the bot can learn to mimic the way humans reason — at least in some ways. The result is so-called reasoning systems like OpenAI’s o1 or DeepSeek’s R1.

Dr. Barto and Dr. Sutton say these systems hint at the ways machines will learn in the future. Eventually, they say, robots imbued with A.I. will learn from trial and error in the real world, as humans and animals do.

“Learning to control a body through reinforcement learning — that is a very natural thing,” Dr. Barto said.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *