Australian researchers launch new tool to outsmart rapidly evolving audio deepfakes

Audio deepfakes have become a serious cybercrime threat, with criminals using cloned voices to bypass voice biometrics, impersonate public figures and spread disinformation.

Australian researchers have developed a powerful new method to detect audio deepfakes, offering a major safeguard as synthetic voice scams grow more sophisticated.

Scientists from CSIRO, Federation University Australia and RMIT University have created a technique called Rehearsal with Auxiliary-Informed Sampling (RAIS), designed to determine whether an audio clip is genuine or artificially generated — and crucially, to keep pace as deepfake attack styles change.

Audio deepfakes have become a serious cybercrime threat, with criminals using cloned voices to bypass voice biometrics, impersonate public figures and spread disinformation. In Italy earlier this year, an AI-generated clone of the country’s Defence Minister demanded a €1 million “ransom” from prominent business leaders, convincing some to pay.

- Advertisement -

Dr Kristen Moore from CSIRO’s Data61, a joint author of the study, said current detection systems struggle because new deepfake techniques differ significantly from older ones.

“We want these detection systems to learn the new deepfakes without having to train the model again from scratch. If you just fine-tune on the new samples, it will cause the model to forget the older deepfakes it knew before,” Dr Moore said.

“RAIS solves this by automatically selecting and storing a small but diverse set of past examples, including hidden audio traits that humans may not even notice, to help the AI learn the new deepfake styles without forgetting the old ones.”

RAIS uses a selection process powered by a network that generates auxiliary labels — additional markers that go beyond simple “fake” or “real” tags. These labels help ensure the system retains a rich, representative mix of audio samples, boosting its ability to remember, adapt and stay accurate over time.

In testing, RAIS outperformed competing methods, recording the lowest average error rate — just 1.95 per cent — across five consecutive learning experiences. The model remains effective even with a small memory buffer, making it well-suited to real-world deployment as attackers develop more advanced techniques.

“Audio deepfakes are evolving rapidly, and traditional detection methods can’t keep up,” said Falih Gozi Febrinanto, a recent PhD graduate of Federation University Australia.

“RAIS helps the model retain what it has learned and adapt to new attacks. Overall, it reduces the risk of forgetting and enhances its ability to detect deepfakes.”

Dr Moore added that the approach “not only boosts detection performance, but also makes continual learning practical for real-world applications. By capturing the full diversity of audio signals, RAIS sets a new standard for efficiency and reliability.”

- Advertisement -

The code has been made publicly available on GitHub.

Support our Journalism

No-nonsense journalism. No paywalls. Whether you’re in Australia, the UK, Canada, the USA, or India, you can support The Australia Today by taking a paid subscription via Patreon or donating via PayPal — and help keep honest, fearless journalism alive.

Add a little bit of body text 8 1 1
,