Plaspline – a comprehensive tool for analysing metagenomic data

26 Oct 2022

For this article, SIMBA Press and Communications Officer Sarah Sarsfield spoke to project partner and PhD candidate Wanli He from the University of Copenhagen about his work on Plaspline – a comprehensive tool for analysing plasmids.

What is Plaspline?

Plaspline is a tool that can be used to analyse genetic information about plasmids from bacterial samples. It is a Snakemake based workflow that aims to provide a comprehensive analysis of bacterial plasmids, at both gene and plasmid community level from shot-gun sequences. Plasmids are DNA elements that can offer microorganisms (mainly bacteria and other single-celled organisms) genetic advantages such as antibiotic resistance. They are a good place to start when trying to study microbial community composition and function.

With Plaspline, users feed their data into the system to get plasmidome information about the data, for example, how many plasmids are in the data, what they are, what kind of genes they are carrying, whether they could be mobilised for use etc. The combination of bacteria and plasmidome analyses gives researchers a clear and comprehensive picture of the microbiome in a sample.

How was Plaspline developed?

The first step when we were creating Plaspline was to select existing, state-of-art plasmid detecting tools that could be integrated into the workflow. There are numerous online tools available which have been developed based on distinct strategies for detecting plasmids from metagenomes, for example, seeking for contigs (overlapping DNA sequences) with or without circular edges, predicting plasmid contigs using computational models and machine learning techniques, and seeking plasmid backbone genes. Thus, we needed to benchmark them and select the best ones to put forward by testing them using simulated ‘gold’ standard metagenomic datasets – we spent a long time working on simulating datasets to make them more natural.

Once we had selected the best tools, we then decided on the workflow and application strategies for them, such as how to combine them, and how to filter false positives. Most importantly, we also needed to benchmark these potential workflows and strategies and, finally, pick the workflow with the best performance. Interestingly, the combined tools used in Plaspline showed a much better performance together than each individual tool did separately. This is also why we are so confident that researchers will love it. Another thing the researchers may enjoy is that it is user friendly, just one command can get all the results needed, it is quite convenient.

Are there other tools like this available or is this the first of its kind?

As mentioned, there are numerous tools available now but they rely on different strategies for detecting plasmid specific genes. Often, they only focus on detecting one thing at a time, like if the gene is there or not. They don’t tell you how many genes are present or what they look like. Therefore, we need a comprehensive and stepped workflow for the multiple pre-processing and analytical steps involved in plasmid analysis, not just identifying plasmids, but more detailed, downstream analysis. To our knowledge, this is the first time such a pipeline has been built. Importantly, the combination of tools in Plaspline perform better together than each selected tool does individually.

How does Plaspline fit into the work being done by SIMBA?

Plaspline was developed as part of Work Package 1. The main objective of this part of SIMBA is to build on pre-existing knowledge accrued from the isolation and characterisation of microbiota associated with marine and land-based food production systems, to improve the quantity, quality and safety of the food we produce and consume in Europe. As part of this, SIMBA aims to provide validated user-friendly analysis and decision-making tools for retrieving useful and predictive information from collected microbiome data – this is where Plaspline fits in.

Can anyone use it?

Yes, Plaspline is available to use for free. It can be accessed on Github, where users will find instructions on how it can be used. The current version is V1.4.

Before we go, have you any tips for people using Plaspline?

I would suggest following the tutorials on the Github first. If you have any problems, feel free to contact us!

 

[Images via Pexels]