Loading...
edu-logo

2023-1-BG01-KA220-HED-000155777 – DigiOmica

Module 7 – Microbial gene transcripts in environmental samples

1. INTRODUCTION

Transcriptomics deals with RNA transcripts – those parts of the genome that are transcribed in a particular cell, tissue, or organism at certain conditions. Transcriptomics is the study that reveals the gene expression patterns and their regulation, thus bringing insight into the molecular mechanisms of the cellular processes.

Environmental transcriptomics retrieves transcriptomes (mRNA) of microbial assemblages from the environment, trying to understand better their interactions and the interaction of these communities with the surrounding environment. It links microbial genetic potential with their biogeochemical activity. Environmental transcriptomics goes further, prospecting research focusing on traits or functions of interest brought by microbial consortia but individual species.

Environmental transcriptomics is doing this by lacking information on the kinds of genes expressed at the community level. This approach allows following microbial community dynamics without culturing its members; it only requires extraction of RNA, its sequencing, transcriptome assembly, and functional annotation. However, the vision of using this approach for various applications in environmental science at the molecular level is hampered by technical difficulties in working with mRNA. As it is well known, prokaryotic transcripts lack a polyadenylation mechanism and the RNA isolation is not as straightforward as the eukaryotic one. mRNAs have a very short half-life (about 30 sec) and their relative abundance within the total RNA pool in the microbial cell is very low resulting in poor detection signals.

Protocols have been developed to overcome these difficulties and facilitate the analyses of partial environmental transcriptomes. Among them, the direct extraction of RNA from an environmental sample (meta-transcriptomics), the extraction of RNA from an environmental sample preserving the spatial relationships among the RNA reads (spatial transcriptomics), the extraction of RNA from an environmental sample preserving the sub-population relationships (sorted transcriptomics) can be listed. There are also promising studies that explore the retrieval of community-specific sequences of functional genes essential for quantitative ecological surveys, the generation of new hypotheses for known microbial processes, or the assessment (with the aid of environmental genomics) of the genetic potential and activity patterns of natural microbial assemblages.

2. FINDINGS

2.1. Protocol for analysis of partial environmental transcriptomes

The principal protocol for the analysis of partial environmental transcriptomes comprises the following main steps (Fig. 7.1):

  • Total RNA extraction from an environmental sample;
  • Enriching for mRNA for reduction of rRNA transcripts;
  • Synthesis of cDNA template population by randomly primed reverse transcription;
  • Amplification of the templates by PCR to generate cDNA clone libraries;
  • Sequencing of the cDNA clones;
  • Transcripts annotation: assembly of constructs using either mapping against reference genomes or de novo assembly;
  • Assign species using the transcripts constructs and make taxonomic representation;
  • (optional) Translation and functional analyses to reveal the presence and functions of the resultant proteins.

Since the nature of the environmental samples is very complex, there is a need to choose carefully the parameters of the samples’ collection and further processing. Two considerations must be regarded: 1) the unknown number of species in the sample and 2) the unknown relative share of individual (and potentially novel) species. Thus, the essential precautions that have to be minded, include

  • Choice of the sample size;
  • Choice of processing method
  • Optimization of the RNA extraction technique in terms of lysis techniques, lysis buffers, sample storage, DNAse treatment, etc.;
  • RNA enrichment either by lowering rRNA content (e.g., by subtractive hybridization) or by mRNA isolation
  • Choice of the appropriate sequencing method based on the length of the reads and the target molecule (cDNA or RNA): short-read cDNA, long-read cDNA, and long-read direct RNA;
  • (de novo) Transcriptome assembly – choose the available reference databases;
  • Practice binning (grouping the assembled and annotated transcripts into sets that represent (although approximately) a species (taxon) of origin. Make sequence similarity search to identify homologs among well-studied sequences and make functional annotations of the generated proteins to allow for grouping.

Figure 7.1. Analysis of partial environmental transcriptomes workflow (Source: Mante et al., 2024).


Learn: How to effectively extract RNA (video) and what are the RNA troubleshooting (video)

Learn: How to make cDNA libraries (video)

Learn: ALLUMINA short-read cDNA sequencing (video)


 

3. ALTERNATIVES (DISCUSSION)

3.1. Overcoming the difficulties of transcriptional heterogeneity

The microbial consortia in environmental habitats comprise a complex and dynamic system, and as such, they bear all the difficulties in their transcriptomics data generation and interpretation. Bacteria respond to environmental changes through their specific transcriptional programmes that result in transcriptional heterogeneity. Actually, this heterogeneity is much more complex since even in populations that are genetically identical, the individuals’ transcriptional profiles may be unique. This is the case, for instance, with those individual bacteria that resist antibiotic treatment or those metabolically specialized cells that can emerge under substrate starvation conditions. The inherited transcriptional heterogeneity of the environmental samples makes the transcriptional states’ profiling in microbial communities a laborious task. A rational approach to overcome this obstacle is the development of sequencing methodologies that enable the simultaneous characterization of thousands of cells in a complex system.

Currently, several high-throughput methodologies are available, all based on sequencing of single-cell RNA. Most of them include barcoding of cellular mRNA as early in the sequencing process as the reverse transcription stage and the aid of mRNA capture beads. These techniques support the single-cell resolution capability of the sequencing process.

MATQ-seq

MATQ-seq isolates single cells into separate wells of multi-well plates and performs individual indexing reactions to generate sequencing libraries. This ‘indexing’ scheme is a highly sensitive quantitative method for total RNA sequencing. It overcomes the inherent low sensitivity and high technical ‘noise’ of most single-cell RNA sequencing assays since it allows capturing the small genetic variations between the whole transcriptomes of single cells. The MATQ-seq procedure is depicted in Fig. 7.2.

Figure 7.2. The MATQ-seq pipeline (Source: Sheng and Zong, 2019).

par-SeqFISH

The Parallel Sequential Fluorescence In Situ Hybridization (par-SeqFISH) is a transcriptome imaging approach that reveals the gene expression in a spatial context with a high resolution. The protocol is tested with the opportunistic pathogen Pseudomonas aeruginosa in a planktonic and biofilm culture under various conditions and identified transcripts that correspond to diverse metabolic- and virulence-related states of the planktonic culture. The techniques also demonstrated the coexistence of different physiological states in the biofilm culture depending on the spatial position of the target cell (Fig. 7.3). These results emphasize the complex dynamics of microbial populations and the importance of their study with high precision.

Figure 7.3. The par-SeqFISH method principle (Source: Dar et al., 2021).

PETRI-seq

PETRI-seq is a technique that allows prokaryotic expression profiling by tagging RNA in situ followed by sequencing. This is a cost-effective and high-throughput protocol that exploits combinatorial indexing to simultaneous barcoding of tens of thousands of individual bacterial cells. It is approbated with both Gram-negative and Gram-positive representatives and shows high discriminative values for various bacterial metabolic states/growth phases. The technique is suitable for the assessment of the cellular dynamics in complex microbial communities.

3.2. Applications of microbial gene transcripts in e-samples

Quantitative ecological studies use advanced mathematical and statistical tools to face environmental problems. Environmental transcriptomics can contribute in this sense due to its capability to retrieve community-specific functional gene sequences. The discovery of functional genes in natural environments demands primers’ design based on the available information in the databases. However, this approach works only for cultured microorganisms. Environmental transcriptomics, i.e., the transcript libraries, can help solve this problem because they supply site-specific functional gene sequences from active cells without prior knowledge about their sequence information. Furthermore, environmental transcriptomics provides gene sequences for various genes actively expressed under given conditions. Among them, gene sequences of biogeochemical interest can be found and further exploited for target biotechnological goals.

Environmental proteomics possesses as well potential for generating innovative hypotheses about microbial processes. Different types of communities can be surveyed for microbial gene expression using environmental transcriptomics protocol without the need to specify certain organisms, specific phylogenic groups, or metabolic patterns. The powerful tools of automatic annotation can couple this multi-focused research approach with the genetic potential and activity patterns of natural microbial consortia assessment.

Another promising application of environmental microbial gene transcripts is the creation of ecosystem-specific reference databases. Such small databases that comprise the sequences of a certain ecosystem have already been built – the MiDAS 2.0 database for the microflora in a WWTP and the reference database for the Dictyopteran gut – DictDb. Examples of the ecosystem-specific database include:

  • The human intestinal tract 16S taxonomic database (HITdb);
  • The human oral microbiome database (HOMD)
  • The fresh-water specific database (FreshTrain)
  • The honey beegut microbiota database
  • The rumen and intestinal methanogendatabase

Although these reference databases contribute to the classification of amplicons in a fast and proper way, as a rule, they contain a limited number of sequences. This hurdle is overcome with the development of a specific algorithm that is capable of classifying amplicons using simultaneously two reference databases. The first is a universal database, and the second is a small ecosystem database. The algorithm, named TaxAss, is free and open source. According to it, the protocol starts with amplification and mapping against the ecosystem-specific database to determine a trash-hold for the sequences with a defined percentage of identity, and after that, those over the trash hold are classified towards the universal database. A high rate of classification is achieved by applying this algorithm. However, the closely related sequences may fall from both sides of the threshold and thus be assigned quite different taxonomies.

Current reference databases encompass millions of high-quality reference sequences. These databases are characterized by high identity and broad coverage of the actual diversity in a habitat. A task ahead is the improvement of the taxonomic assignments for uncultured taxa. There is a software tool, AutoTax, that allows the creation of ecosystem-specific taxonomies covering the complete set of seven taxonomic ranks. This software provides names for the unclassified taxa using de novo clustering of sequences to define trash holds for each taxonomic rank. The de novo clustering is simplified; thus, the placeholder names are kept, also in cases of database expansion with additional reference sequences.


Explore TaxAss and AutoTax databases


4. SOLUTIONS

4.1. Single-cell transcriptomics with combinatorial barcoding

One of these methods is the Seq-Well method, designed to offer cost-effectiveness, portability of performance, and good scalability, all attributed to the methods that explore droplets-based cell capture and barcoding. In this method, the cells are loaded in the wells of an array of ‘pico’ dimensions, following gravitation, and their corresponding transcripts are uniquely barcoded. The picowell array is then sealed with a semi-permeable membrane that allows easy exchange of fluids along cell lysis while limiting the RNA leakage (Fig. 7.4). Originally developed for clinical application, this method can be used for transcriptome analysis of any complex unicellular community like the environmental microbial consortia.

Figure 7.4. The Seq-Well technique (Source: Gierahn etal., 2017)

Recently, an upgraded version of the Seq-Well method – Seq-Well3 has been developed that includes the step of second strand synthesis [3 for Second Strand Synthesis] after the reverse transcription to generate a second PCR priming site. As a result, cDNA is recovered that is reversed transcribed but without the performance of the template switch reaction. Thus, the capture of essential transcripts, e.g., transcriptional factors and signaling molecules is enhanced by up to 10 folds.


Follow the Seq-Well3 step-by-step protocol here


4.2. Poly(A)-independent single-cell RNA-sequencing

Poly(A)-independent single-cell RNA-sequencing is a single-cell RNA sequencing protocol that allows for the revealing of growth-independent gene expression patterns in bacteria operable for all classes and genomic regions of RNA. The protocol is tested for individual Salmonella and Pseudomonas bacteria, and the promising results can serve as a reference point for other bacterial species and/or environmental microbial consortia.


Learn more about Poly(A)-independent single-cell RNA seq here


4.3. Probe-based bacterial single-cell RNA sequencing

The Probe-based bacterial single-cell RNA sequencing (ProBac-seq) is designed to reveal transcriptional heterogeneity of isogenic bacterial populations at a single-cell level. The protocol is probe-based and exploits libraries of DNA probes and microfluidic commercial platforms to perform single-cell RNA sequencing. It allows the sequencing of thousands of individual bacteria transcriptomes during a single experiment regardless of the Gram status of the bacteria.

Figure 7.5. The ProBac-seq technique principle (Source: McNulty et al., 2024).

The method is approbated to E. coli and B. subtilis. It is also tested with Clostridium perfringens to study the heterogeneity of toxin expression in various sub-populations subjected to different environmental conditions and proved to be useful in the detection of metabolic perturbations associated with pathogenicity.


Learn more about Probe-based bacterial single-cell RNA seq here


4.4. In situ combinatorial indexing for bacterial single-cell RNA sequencing

The microbial split-pool ligation transcriptomics (microSPLiT) is a technique applicable to both Gram-negative and Gram-positive bacteria and can resolve different transcriptional states. The technique was approbated with Bacillus subtilis cells grown until different stages and revealed a bunch of metabolic patterns corresponding to various metabolic changes during the bacterial life (Fig. 7.6).

Figure 7.6. The microSPLiT single-cell sequencing method principle (Source; Gaisser et al., 2024)

During this study, various expression profiles corresponding to known but rare states (cellular competence and prophage induction) were identified. In addition, unexpected gene expression states were identified, among which activation of metabolic pathways in a heterogenous manner (in certain cell sub-populations). Thus, microSPLiT is a good decision for the detection of phenotypically distinct subpopulations.


Learn more about microSPLiT RNA sequencing method here


5. REFERENCES

Blattman, S.B., Jiang, W., Oikonomou, P. et al. Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Nat Microbiol 5, 1192–1201 (2020). https://doi.org/10.1038/s41564-020-0729-6

Dar, D., Dar N., Cai L., and Newman D.K. Spatial transcriptomics of planktonic and sessile bacterial populations at single-cell resolution (2021) Science, 373, 6556, DOI: 10.1126/science.abi4882

Gaisser, K.D., Skloss, S.N., Brettner, L.M. et al. High-throughput single-cell transcriptomics of bacteria using combinatorial barcoding. Nat Protoc (2024). https://doi.org/10.1038/s41596-024-01007-w

Gierahn, T., Wadsworth, M., Hughes, T. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat Methods 14, 395–398 (2017). https://doi.org/10.1038/nmeth.4179

Imdahl, F., Vafadarnejad, E., Homberger, C. et al. Single-cell RNA-sequencing reports growth-condition-specific global transcriptomes of individual bacteria. Nat Microbiol 5, 1202–1206 (2020). https://doi.org/10.1038/s41564-020-0774-1

McNulty, R., Sritharan, D., Pahng, S.H. et al. Probe-based bacterial single-cell RNA sequencing predicts toxin regulation. Nat Microbiol 8, 934–945 (2023). https://doi.org/10.1038/s41564-023-01348-4

Mante J. Groover, K. E., Pullen R. M. Environmental community transcriptomics: strategies and struggles. (2024) Briefings in Functional Genomics, 1–15, https://doi.org/10.1093/bfgp/elae033

Page, T. M. and Lawley J. W. (2022) Front. Mar. Sci., Sec. Marine Molecular Biology and Ecology, 9 – 2022 https://doi.org/10.3389/fmars.2022.757921

Poretsky R.S, N. Bano, A. Buchan, G. LeCleir, J. Kleikemper, M. Pickering, W.M. Pate, M.A. Moran, J.T. Hollibaugh. Analysis of microbial gene transcripts in environmental samples. Appl Environ Microbiol. 2005 71(7):4121-6. doi: 10.1128/AEM.71.7.4121-4126.2005.

Wang, B., Lin, A.E., Yuan, J. et al. Single-cell massively-parallel multiplexed microbial sequencing (M3-seq) identifies rare bacterial populations and profiles phage infection. Nat Microbiol 8, 1846–1862 (2023). https://doi.org/10.1038/s41564-023-01462-3.