RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs
- Submitting institution
-
University of Exeter
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 1791
- Type
- D - Journal article
- DOI
-
10.1093/bioinformatics/btx114
- Title of journal
- Bioinformatics
- Article number
- -
- First page
- 2089
- Volume
- 33
- Issue
- 14
- ISSN
- 1367-4803
- Open access status
- Compliant
- Month of publication
- February
- Year of publication
- 2017
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- Yes
- Number of additional authors
-
6
- Research group(s)
-
-
- Citation count
- 17
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- Here we present an efficient approach to cluster large RNA sequence datasets according to their shared secondary structure, which in turn can be considered as a proxy for their biological function. The novelty that we introduce tackles the issue that existing approaches have for clustering paralogous RNAs, i.e., not taking compensatory base pair changes obtained from structure conservation in orthologous sequences into account.
This is important when we want to analyse genomic regions to identify the expression and possible function for novel ncRNA candidates. Our system has been used in the comparative phylogenetic analysis of vertebrate and fungi genomes (10.1038/s41598-018-23900-7).
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -