i amNA meeting In the rooms of the Royal Society of London, several dozen graduate students were recently introduced to a large language model (LL.M), a kind of A.I Designed to hold useful conversations. LL.Ms are often programmed with guardrails to stop them from giving answers they believe are harmful: instructions for making Semtex in the bathtub, say, or confident claims of “facts” that aren’t actually true.
The session, organized by the Royal Society in partnership with the American non-profit Human Intelligence, aimed to break down those barriers. Some of the results were simply perfect: One participant got the chatbot to claim that ducks could be used as an air quality indicator (apparently, they easily absorb lead). Another has prompted health authorities to claim lavender oil as a long-lasting Covid treatment. (They don’t.) But the most successful attempts were those that prompted the machine to generate the titles, publication dates, and host journals of non-existent academic articles. “This is one of the easiest challenges we’ve ever set,” said Juta Williams of Human Intelligence.
A.I Has the potential to be a great boon to science. Optimists talk of machines producing readable summaries of complex areas of research; Tirelessly analyze oceans of data to suggest new drugs or exotic substances and even one day, come up with their own hypotheses. but A.I Comes with downsides, too. This can make it easy for scientists to game the system, or even commit outright fraud. And the models themselves are subject to subtle biases.
Start with the simplest problem: academic misconduct. Some journals allow researchers to use LL.Ms to help write papers, if they say a lot. But not everyone agrees. Sometimes, that’s true LL.MIt is clear that s has been used. Guillaume Cabanac, a computer scientist at the University of Toulouse, has uncovered dozens of papers that contain phrases like “regenerate response”—the text of a button in some versions of chat.GPT This instructs the program to rewrite its most recent answer, possibly mistakenly copied into manuscript.
It is impossible to know the scale of the problem. But indirect measures may shed some light. In 2022, when LL.Ms was available only to those in the know, the number of research-integrity cases investigated by Taylor & Francis, a major publisher of scientific papers, rose from about 80 to about 2,900 in 2021. Early figures for 2023 indicate that number is on track to double. A possible telltale is strange synonyms: “fog figuring” as another way of saying “cloud computing”, for example, or instead of “fake consciousness”.A.I“
Even honest researchers can work with data that has been corrupted by them A.I. Last year Robert West and his students at the Swiss Federal Institute of Technology recruited remote workers through Mechanical Turk, a website that allows users to list odd jobs, to shorten long stretches of text. In a paper published in June, though not yet peer-reviewed, the team revealed that more than a third of all responses they received were generated with the help of chatbots.
Dr. West’s team was able to compare the responses they received with another dataset that was completely human-generated, putting them in a better position to detect fraud. Not all scientists using mechanical Turks will be so lucky. Many disciplines, especially in the social sciences, rely on similar platforms to find respondents willing to answer questionnaires. The quality of their research seems less likely to improve if many of the responses come from machines rather than real people. Dr. West now plans to apply similar research to other crowdsourcing platforms that he did not want to name.
It can’t just text that doctor. Between 2016 and 2020, Elizabeth Bick, a microbiologist at Stanford University and an authority on dodgy images in scientific papers, identified dozens of papers containing images that appeared to have identical characteristics, despite coming from different labs. Over a thousand other papers have since been identified by Dr. Bick and others. Dr. Bick’s best guess is that the images produced it A.Iand deliberately made to support a paper decision.
Currently, there is no way to identify machine-generated content, be it images or sounds. Rahul Kumar, a researcher at Canada’s Brock University, found in a paper published last year that educators can correctly locate about a quarter of computer-generated text. A.I Companies have tried embedding “watermarks”, but these have proven easy to dodge “We may now be getting to the point where we can no longer distinguish genuine from fake photos,” said Dr Bick.
Producing dodgy papers is not the only problem. There may be subtle problems with A.I Models, especially if they are used in the process of scientific discovery. Most of the data used to train them, for example, will necessarily be somewhat outdated. This risks models getting stuck behind the cutting edge in fast-moving fields.
Another problem arises when A.I Models are trained A.I– generated data. Synthetic a machine training MRI For example, scans may raise patient privacy issues. But sometimes such data can be used unintentionally. LL.MTraining is done on text scraped from the internet. As they churn out more text, risk LL.Minhaling their own output increase.
This can cause “model collapse”. In 2023 Ilya Shumailov, a computer scientist at the University of Oxford, co-authored a paper (yet to be peer-reviewed) in which a model was fed handwritten digits and asked to generate its own digits, which were returned to it after a few cycles in turn. Computer numbers become more or less illegible. After 20 iterations, it can only produce rough circles or blurred lines. Models trained on their own results, Dr. Shumailov says, produce outputs that are significantly less rich and varied than their training data.
Some worry that computer-generated insights may come from models whose inner workings are not understood. Machine-learning systems are “black boxes” that are difficult for humans to distinguish. Unexplained models are useless, says David Leslie of the Alan Turing Institute, a A.I-Research institutes in London, but their output will need rigorous testing in the real world. It’s probably less annoying than it sounds. Testing models against reality is what science is all about, as it should be. Because no one fully understands how the human body works, for example, new drugs must be tested in clinical trials to determine whether they work.
For now, at least, the questions outnumber the answers. What is certain is that many of the perverse incentives currently prevalent in science are ripe for exploitation. An emphasis on evaluating academic performance by how many papers a researcher can publish, for example, serves as a strong incentive to cheat at worst and to game the system at best. The threats that machines pose to the scientific method are, at the end of the day, the same threats posed by humans. A.I It can accelerate the production of fraud and nonsense just as much as it accelerates good science. As the Royal Society has, nullius in verba: Take someone’s word for it. Nothing, either. ■
Curious about the world? To enjoy our mind-expanding science coverage, sign up to our weekly subscriber-only newsletter, Simply Science.