Over 160,000 new virus species discovered by AI - The University of Sydney
This is the second paper today to show the apparent obsession creationism's putative designer has with creating viruses, if you believe that superstition.
The first paper dealt with the discovery that there are some 600 different viruses to be found on a used toothbrush and on the shower heads in US bathrooms; this one reports on a discovery that makes that finding pale into insignificance. It is the discovery, using the machine learning of AI, of 161,979 new viruses!
This is just tip of the iceberg as the authors say the method just scratches the surface of biodiversity and opens up a world of discovery with millions more to be discovered.
Researchers at the University of Sydney have used AI to discover over 160,000 new viruses* How is this done? Researchers from the University of Sydney, along with international collaborators, used artificial intelligence to identify over 160,000 previously unknown RNA viruses. They achieved this by developing a deep learning algorithm called LucaProt. This AI system was designed to analyze vast amounts of genetic data, including highly divergent sequences of RNA-dependent RNA polymerase (RdRP), which is crucial for viral replication.The research team from the University of Sydney developed an AI tool for searching through massive amounts of genomic data stored in databases to identify unique sequences in the highly-conserved viral enzyme, RNA-dependent RNA polymerase, which is essential for viral replication.
Traditionally, bioinformatics tools were limited in detecting viruses due to the diversity of RNA viruses and their complex genomic structures. Many viruses, referred to as "sequence dark matter," were present in public databases but remained unidentified because of their extreme divergence. The LucaProt algorithm was able to overcome this challenge by recognizing not only the sequences but also the predicted structural features of the proteins these viruses use.
The study uncovered viruses from a wide range of environments, including extreme locations like hydrothermal vents and hot springs. This discovery highlights the vast and largely unexplored diversity of the virosphere. The research opens up new opportunities for further viral discoveries, as the AI tool can potentially be applied to other organisms like bacteria and parasites.
So new species of virus are identified by unique sequences in RNA-dependent RNA polymerase?
Yes, new species of RNA viruses are often identified by analyzing unique sequences in the RNA-dependent RNA polymerase (RdRP) enzyme. This enzyme is essential for the replication of RNA viruses and is highly conserved across different virus species. However, there are enough variations in the RdRP sequences to allow researchers to distinguish between different viral species.
In the study you mentioned, the AI tool LucaProt was specifically designed to identify divergent RdRP sequences from metagenomic data, enabling the discovery of more than 160,000 new RNA virus species. These sequences were previously classified as "dark matter" because they were too divergent to be recognized using traditional methods.[1.1,2.1,3.1]
They have just published their findings, open access in the Cell Press journal Cell. Their work is explained in a University of Sydney new release:
Over 160,000 new virus species discovered by AI
Largest discovery of new virus species sheds light on hidden virosphere
Artificial intelligence (AI) has been used to reveal details of a diverse and fundamental branch of life living right under our feet and in every corner of the globe.
161,979 new species of RNA virus have been discovered using a machine learning tool that researchers believe will vastly improve the mapping of life on Earth and could aid in the identification of many millions more viruses yet to be characterised. Published in Cell and conducted by an international team of researchers, the study is the largest virus species discovery paper ever published.
We have been offered a window into an otherwise hidden part of life on earth, revealing remarkable biodiversity. This is the largest number of new virus species discovered in a single study, massively expanding our knowledge of the viruses that live among us. To find this many new viruses in one fell swoop is mind-blowing, and it just scratches the surface, opening up a world of discovery. There are millions more to be discovered, and we can apply this same approach to identifying bacteria and parasites.
Professor Edward C. Holmes, senior author
School of Medical Sciences
Faculty of Medicine and Health
University of Sydney, Sydney, NSW. Australia
Although RNA viruses are commonly associated with human disease, they are also found in extreme environments around the world and may even play key roles in global ecosystems. In this study they were found living in the atmosphere, hot springs and hydrothermal vents.
That extreme environments carry so many types of viruses is just another example of their phenomenal diversity and tenacity to live in the harshest settings, potentially giving us clues on how viruses and other elemental life-forms came to be.
Professor Edward C. Holmes
How the AI tool worked
The researchers built a deep learning algorithm, LucaProt, to compute vast troves of genetic sequence data, including lengthy virus genomes of up to 47,250 nucleotides and genomically complex information to discover more than 160,000 viruses.The vast majority of these viruses had been sequenced already and were on public databases, but they were so divergent that no one knew what they were. They comprised what is often referred to as sequence ‘dark matter’. Our AI method was able to organise and categorise all this disparate information, shedding light on the meaning of this dark matter for the first time.
Professor Edward C. Holmes
The AI tool was trained to compute the dark matter and identify viruses based on sequences and the secondary structures of the protein that all RNA viruses use for replication.
It was able to significantly fast track virus discovery, which, if using traditional methods, would be time intensive.
We used to rely on tedious bioinformatics pipelines for virus discovery, which limited the diversity we could explore. Now, we have a much more effective AI-based model that offers exceptional sensitivity and specificity, and at the same time allows us to delve much deeper into viral diversity. We plan to apply this model across various applications.
Professor Mang Shi, co-author.
National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases
State Key Laboratory for Biocontrol
School of Medicine
Shenzhen Campus, Sun Yat-sen University, Shenzhen, China.
LucaProt represents a significant integration of cutting-edge AI technology and virology, demonstrating that AI can effectively accomplish tasks in biological exploration. This integration provides valuable insights and encouragement for further decoding of biological sequences and the deconstruction of biological systems from a new perspective. We will also continue our research in the field of AI for virology.
Dr Zhao-Rong Li, co-author
Apsara Lab
Alibaba Cloud Intelligence
Alibaba Group, Hangzhou, China.The obvious next step is to train our method to find even more of this amazing diversity, and who knows what extra surprises are in store.
Professor Edward C. Holmes
HighlightsI'll make creationists the same offer I made on my previous post regarding all these newly-discovered viruses:
- AI-based metagenomic mining greatly expands the diversity of the global RNA virosphere
- Developed a deep learning model that integrates sequence and structural information
- 161,979 putative RNA virus species and 180 RNA virus supergroups were identified
- RNA viruses are ubiquitous and are even found in the most extreme global environments
Summary
Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems. LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity. A subset of these novel RNA viruses was confirmed by RT-PCR and RNA/DNA sequencing. Newly discovered RNA viruses were present in diverse environments, including air, hot springs, and hydrothermal vents, with virus diversity and abundance varying substantially among ecosystems. This study advances virus discovery, highlights the scale of the virosphere, and provides computational tools to better document the global RNA virome.
Introduction
RNA viruses infect a diverse array of host species. Despite their omnipresence, the pivotal role of RNA viruses as major constituents of global ecosystems has only recently garnered recognition due to large-scale virus discovery initiatives in animals,1,2 plants,3 fungi,4 aquatic environments,5 marine environments,6 soil environments,7 and planetary metatranscriptomes.8 A common characteristic of these studies is their reliance on the analysis of RNA-dependent RNA polymerase (RdRP) sequences, a canonical component of RNA virus genomes. Collectively, these studies have led to the identification of tens of thousands of novel virus species, resulting in at least a 10-fold expansion of the virosphere and the proposal of new phylum-level virus groups such as the “Taraviricota” (i.e., “quenyaviruses”).6,9 Similarly, the data mining of metatranscriptomes from diverse ecosystems has revealed several divergent clades of RNA bacteriophage,10,11 while recent metatranscriptomic studies have led to a remarkable 5-fold expansion in the diversity of viroid-like circular RNAs.12,13,14 Despite such progress in uncovering RNA virus diversity through ecological sampling and sequencing, it is probable that more divergent groups of RNA viruses remain to be discovered.9,15 This is in part because the current tools for metagenomic identification of RNA viruses can miss some highly divergent RdRPs.16 It is therefore imperative to develop innovative strategies for the efficient identification of the full spectrum of RNA virus diversity.
Over the past decade, artificial intelligence (AI)-related approaches, especially deep learning algorithms, have had a major impact on various research fields in the life sciences, such as molecular docking, compound screening and interaction, protein structure prediction and functional annotation, and infectious disease modeling.17,18,19,20,21,22 This progress can be attributed to the advantages of deep learning algorithms over classic bioinformatic approaches, including enhanced accuracy, superior performance, reduced reliance on feature engineering, flexible model architectures, and self-learning capabilities.23,24 Recently, deep learning approaches, such as CHEER, VirHunter, Virtifier, and RNN-VirSeeker, have been developed and applied to the identification of viruses from genomic and metagenomic data.25,26,27,28
These tools employ convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are specifically designed for processing spatial data such as images and leverage convolutions to exploit local correlations,29 whereas RNNs are adept at handling sequential data by capturing temporal dependencies and serial order memory.30 Despite their versatility, both face limitations in processing biological sequences: CNNs may encounter challenges with inputs of varying lengths and capturing global correlations, while RNNs struggle with longer sequences due to vanishing or exploding gradients and difficulties in capturing long-term dependencies. It is imperative to consider these shortcomings when assessing their appropriateness for specific tasks. In addition, many of these methodologies exclusively focus on nucleotide sequences, disregarding protein sequences or structural information, thereby constraining their capacity to identify highly divergent RNA viruses. Recently, the transformer architecture has emerged as a powerful alternative for protein function predictions based on sequence data, effectively accommodating sequences of varying lengths and efficiently capturing both local and long-range relationships across sequence positions, surpassing the capabilities of CNNs and RNNs.31,32,33,34 Consequently, the transformer architecture can be leveraged to design better tools for identifying highly divergent RNA viruses.
Herein, we present a transformer-based tool for RNA virus discovery that utilizes protein sequences and the structural characteristics of viral RdRP sequences. This tool was applied to a dataset comprising 10,487 metatranscriptomes from diverse ecological systems. To validate and perform comparative analysis, the same dataset was processed using other available bioinformatics tools, and 50 samples were analyzed using both DNA and RNA sequencing. By employing this tool in conjunction with extensive sequence data, we demonstrate how AI can accurately and efficiently detect RNA viruses exhibiting genetic divergence beyond the capabilities of traditional similarity-based methods, revealing previously unrecognized viral diversity.
Please explain why a creator would go in for such massive overkill in the number of viruses it creates. You can exclude the traditional excuse of 'Sin' causing 'genetic entropy' which leads to the biological absurdity of 'devolution' from an initial created perfection, because we are not dealing with mutations and virulence here; we are dealing with the number of species of virus, which, even if the biologically nonsensical notion of devolution from an itial created perfection had any merit, would need to have been created in the first place.
So, the question remains: why so many, if this is not the result of a creator with an obsessive compulsive disorder?
The Malevolent Designer: Why Nature's God is Not Good
Illustrated by Catherine Webber-Hounslow.
The Unintelligent Designer: Refuting The Intelligent Design Hoax
No comments :
Post a Comment
Obscene, threatening or obnoxious messages, preaching, abuse and spam will be removed, as will anything by known Internet trolls and stalkers, by known sock-puppet accounts and anything not connected with the post,
A claim made without evidence can be dismissed without evidence. Remember: your opinion is not an established fact unless corroborated.