RiboUORF | Help

Documents

Documents index

Overview of Ribo-uORF
Quickly start Ribo-uORF
'Browse' modules
'Search' module
'Jbrowse' module
Transcript browser
Web-based tools
'Statistics' module
'Download' module

# Overview of Ribo-uORF

The scheme of Ribo-uORF workflow: Ribo-uORF provides comprehensive information of uORF generated from the extensive amounts of available Ribo-seq and QTI-seq datasets. Ribo-uORF integrated extra related datasets from public databases such as eQTL and RNA modification datasets. All results generated by Ribo-uORF are deposited in MySQL database and displayed in the visual browser and web pages.

Overall composition of datasets in Ribo-uORFs database

# Quickly start Ribo-uORF

Users can quickly search for the interest gene symbol or Ensembl ID from the homepage. You will see the page shown in the screenshot below. Enter the gene symbol “DEGCR8" in the Gene/Transcript input box. You can read more about them in a list including the number of candidate uORFs and actively translated uORFs. From the list, we can know that DGCR8 possess uORF in human, mouse, zebrafish, and rat. In human, there are 25 DGCR8 uORF candidates with 18 actively translated uORFs identified from 179 Ribo-seq samples.

As shown in the screenshot below, the supporting samples page of actively translated uORF lists all supporting samples for a transcript with actively translated uORF and a link to metagene page. When clicking the button in ‘Metaview’ column, the metagene plot displays the distribution of RPF on 5’ UTR, CDS, and 3’ UTR.

# 'Browse' modules

If you don‘t have a specific gene/transcript in mind and are looking specifically for cases of Ribo-seq or QTI-seq, you can query these in the ‘Browse module’ which displays the datasets of Ribo-uORF by gene/transcript ID, Ribo-seq, and QTI-seq as query.
When you click ‘Browse’ from the menu of the homepage, you can see page as screenshot below. The first subpage shows all genes/transcripts with actively translated uORF in a list that also include other publicly available datasets.

Browse by Ribo-seq samples: The below screenshot show a list of Ribo-seq samples list including the SRA_ID, description retrieved from NCBI database, links to the analysis result, and scientific name of species.

Browse by QTI-seq samples: The below screenshot show list of QIT-seq samples a list including the SRA_ID, description retrieved from NCBI database, links to the analysis result, and scientific name of species.

The below screenshot shows an example of the analysis result of a Ribo-seq sample (similar to QTI-seq analysis result). In the subpage, users can download all results by clicking the ‘download the result’. The ‘Result’ page typically contains the results of quality control for Ribo-seq and uORFs identification.

# 'Search' module

If you have an interesting gene/transcript, you also can search for it in the ‘Search’ module. You can get the uORF information of specific gene/transcript in all six species in Ribo-uORF.

# 'Jbrowse' module

‘Jbrowse’, a comprehensive genomic browser, displays uORF and their related information with genome coordinates.

# Transcript browser

mRNAbrowser shows tracks of uORFs and other related annotations such as eQTL and RNA modification with transcript coordinates. Besides, UTR5viewer and uORFpeptide were developed to display the details of uORF sequences. Wait time will increase when more data is selected, and for uORF with more annotated information. The screenshot of mRNAbrowser is as below.

UTR5viewer and uORFpeptide were developed to display the details of uORF sequences. As the screenshot shows, the UTR5viewer tracks display the elements including candidate uORF, actively translated uORF, uTIS, and other publicly available information in 5’ UTR regions. SeqViz (https://github.com/Lattice-Automation/seqviz), a sequence viewer, which supports multiple input formats and display settings, is used to develop uORFpeptide for visualizing potential peptides encoded by uORFs. Besides, you can go to the genomic browser by clicking the desktop icon.

# Web-based tools

In the current database, we processed uORFs with publicly available Ribo-seq datasets. Therefore, the users who are interested in uORF studies in Ribo-seq not included in our database are encouraged to use our uORFscan under the menu of ‘Tools’ to decode the user-loaded Ribo-seq datasets. uORFscan provides users with three ways to input their data: (1) collapsed FASTA files, (2) web links to their collapsed FASTA files, and (3) FASTQ files. The files are recommended to be compressed to increase the uploading speed.

1 Trim adapters and remove low sequencing quality using Cutadapt+Trimmomatic OR trim_galore (example).

cutadapt -a CTGTAGGCACCATCA --error-rate=0.1 --overlap=3 --times=2 --output=sample.trimed.fastq.gz sample.fastq.gz

--error-rate: Maximum allowed error rate as value between 0 and 1 (no. of errors divided by length of matching region) (0.1 = 10%)
--overlap: MINLENGTH Require MINLENGTH overlap between read and adapter for an adapter to be found.
--times: Remove up to COUNT adapters from each read or number of round for adapter finding and removal

java -jar /path/trimmomatic-0.39.jar SE -phred33 sample.trimed.fastq.gz sample.trimed.cleaned.fastq.gz LEADING:30 TRAILING:30 MINLEN:25

-phred33: Sequence quality type which can be infer using FastQC
LEADING: Cut bases off the start of a read, if below a threshold quality
TRAILING: Cut bases off the end of a read, if below a threshold quality
MINLEN: Drop the read if it is below a specified length

trim_galore sample.trimed.fastq.gz --phred33 CTGTAGGCACCATCA --length 25 --max_length 34 -e 0.1 -q 30 --stringency 3 -o outdir

--phred33: Sequence quality type which can be infer using FastQC
--length: Discard reads that became shorter than length INT because of either quality or adapter trimming. A value of '0' effectively disables this behaviour.
-max_length: Cut bases off the end of a read, if below a threshold quality
-e: same as --error-rate in cutadapt
-q: Phred score cut-off
--stringency: same as --overlap in cutadapt

2 Generate collapsed FASTA file.

perl fq2collapedFa_v1.2.pl -i sample.trimed.cleaned.fastq.gz -o sample.fa.gz

-i: input cleaned FASTQ sequences (sample_trimmed.fq.gz for trim_galore output)
-o: output collapsed FASTA file

uORFscan: tools for uORF identification from user-loaded Ribo-seq datasets.
uORFscan provides several flexible parameters for the users, which show in the below screenshot.

UTR5var: tools for investigating the effect of variation on 5’ UTR regions.
Mounting evidence suggests that the mutations within uORFs are linked to many diseases. However, a convenient and integrated tool for variation annotation in the 5’ UTR region is still lacking. Therefore, UTR5var based on UTRannotator was developed that allows users to upload their variations files in VCF format to evaluate the effect of mutation on uORF for human (hg38) and mouse (mm10).

Retrieving results: tools for retrieving the analysis results of uORFscan and UTR5var by job IDs.

Retrieving the results generated by uORFscan.

Retrieving the results generated by UTR5var.

# 'Statistics' module

The statistics page display the comprehensive atlas of uORFs from six species. This page will periodically update when more datasets adding in Ribo-uORF. All charts are downloadable and interactive Click to show interactive.

# 'Download' module

The RiboUORF data files can be freely downloaded and used following the GNU Public License and the license of primary data sources. If you make use of the data and web-server presented here, please cite our RiboUORF paper (2022) in addition to the primary data sources.