TISdb includes 6991 TIS sites from 4961 human genes and 9973 TIS sites from 5668 mouse genes. The TISdb website provides a simple browser interface for query of high-confidence TIS sites and their associated open reading frames. The output of search results provides a user-friendly visualization of TIS information in the context of transcript isoforms.
PsORF database is a web collection resource of small Open Reading Frames (sORFs) for 35 plant species. Here we identified small ORFs (300 nt or fewer in length) which might translate from non-coding RNAs, such as 5’/3’ UTR of mRNA and gene body of lncRNA. These sORFs were classified by genome location to five groups: uORF, uoORF, dORF, doORF and sORF (pictured at right); The PsORF database aims to provide translational evidence for the small ORFs across different plant species, and information on evolutionary conservation of those small ORFs.
sORF.org is a public repository for sORFs, which allow researchers to examine individual sORFs or to perform searches based on several criteria for further large-scale studies. Different data sources, both experimental and in silico (based on various bioinformatics tools), are collected. sORF.org currently holds 4377422 sORFs across three different species (human, mouse and fruit fly), derived from multiple RIBO-seq experiments and is expanding as more data becomes available.
uORFdb serves as a comprehensive literature database on eukaryotic uORF biology. The uORFdb was manually curated from all uORF-related literature listed at the PubMed database. It categorizes individual publications by a variety of denominators including taxon, gene and type of study. Furthermore, the database can be filtered for multiple structural and functional uORF-related properties to allow convenient and targeted access to the complex field of eukaryotic uORF biology.
SmProt contains records of Small Proteins encoded by genes, especially for ones from UTRs and non-coding RNAs. The selected small proteins were identified from ribosome profiling data, literature, mass spectrometry (MS), etc., carried out in eight species including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans and Escherichia coli. Moreover, SmProt contains features for the collected small proteins on their sequences, genomic locations, tissues/cell lines, assessment reflecting coding potential, function, variants, and related diseases that have been verified or predicted, etc.
PRFdb (v2.0) is a database which currently hosts 2884 ribo-seq datasets from 293 studies, covering 29 different species. In line with the significant expansion of ribo-seq data available in the database, RPFdb v2.0 includes a refined analysis pipeline with multi-step quality control applied for improving the pre-processing and alignment of ribo-seq data, new functional modules providing actively translated ORFs information, and more web features for better database usability.
TranslatomeDB is a comprehensive database which provides collection and integrated analysis of published and user-generated translatome sequencing data. It includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels.
OpenProt proposes a comprehensive annotation of predicted coding sequences on all transcripts. It also publishes results from large scale searches of public data providing evidence for the expression of novel protein products.
HRPDviewer, the Human Ribosome Profiling Data viewer, contains 610 published human ribo-seq datasets from Gene Expression Omnibus, aligns the ribo-seq data to the transcriptome and provides visualization of the ribo-seq data on the selected mRNA transcripts.
RiboVIEW is developed to perform robust quality control of ribosome profiling data (RiboQC), to efficiently visualize ribosome positions and to estimate ribosome speed (RiboMine) in an unbiased way. It contains an R pipeline to setup and undertake the analyses that offers the user an HTML page to scan own data regarding the following aspects: periodicity, ligation and digestion of footprints; reproducibility and batch effects of replicates; drug-related artifacts; unbiased codon enrichment including variability between mRNAs, for A, P and E sites; mining of some causal or confounding factors.
Trips-viz is a transcriptome browser designed to visualize Ribosome profiling and RNA-seq data at the level of a single gene/transcript isoform as opposed to at the genome level. Trips-viz also provides you the ability to vizualize data from a gene under different conditions, get meta-information at an individual dataset level such as read length distribution or triplet periodicity, and provides the ability to find differentially expressed or translated genes.
GENCODE Ribo-seq ORF set is community-led effort involving Ensembl/GENCODE, the HUGO Gene Nomenclature Committee (HGNC), UniProtKB, HUPO/HPP and PeptideAtlas to produce a standardized catalog of 7,264 human Ribo-seq ORFs; a path to bring protein-level evidence for Ribo-seq ORFs into reference annotation databases; and a roadmap tofacilitate research in the global community.
The GWAS Catalog is a publicly available, manually curated resource of all published GWAS and association results, collaboratively produced and developed by the NHGRI and EMBL-EBI. It includes all eligible GWAS studies since the first published GWAS on age-related macular degeneration in 2005
ClinVar is a database which provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation.
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.
The Single Nucleotide Polymorphism database (dbSNP) is a public-domain archive for a broad collection of simple genetic polymorphisms. This collection of polymorphisms includes single-base nucleotide substitutions (also known as single nucleotide polymorphisms or SNPs), small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs), and retroposable element insertions and microsatellite repeat variations (also called short tandem repeats or STRs).
The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 53 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images.
GENCODE is a scientific project in genome research and part of the ENCODE (ENCyclopedia Of DNA Elements) scale-up project.The GENCODE consortium was initially formed as part of the pilot phase of the ENCODE project to identify and map all protein-coding genes within the ENCODE regions (approx. 1% of Human genome)
POSTAR3 is a comprehensive database for exploring POST-trAnscriptional Regulation based on high-throughput sequencing data from 7 species, including human, mouse, zebrafish, fly, worm, Arabidopsis, and yeast. POSTAR3 is developed as the updated version of CLIPdb, POSTAR, and POSTAR2, which provides the largest binding sites collection of RNA-binding proteins and their functional annotations.
uORFlight facilitates the exploration of uORF variation among different splicing models of Arabidopsis and rice genes. Most importantly, users can evaluate uORF frequency among different accessions at the population scale and find out the causal single nucleotide polymorphism (SNP) or insertion/deletion (INDEL) which can be associated with phenotypic variation through database mining or simple experiments. Such information will help to make hypotheses of uORF function in plant development or adaption to changing environments on the basis of the cognate mORF function. This database also curates plant uORF relevant literature into distinct groups. This database expands uORF annotation into more species of fungi (Botrytis cinerea, Saccharomyces cerevisiae), plant (Brassica napus, Glycine max, Gossypium raimondii, Medicago truncatula, Solanum lycopersicum, Solanum tuberosum,Triticum aestivum and Zea mays), metazoan (Caenorhabditis elegans and Drosophila melanogaster) and vertebrate (Homo sapiens, Mus musculus and Danio rerio).
PancanQTL, a user-friendly database, to store cis-eQTLs, trans-eQTLs, survival-associated eQTLs and GWAS-related eQTLs to enable searching, browsing and downloading. PancanQTL could help the research community understand the effects of inherited variants in tumorigenesis and development.
REDIportal collect more than 4.5 millions of A-to-I events detected in 55 body sites from thousands of RNAseq experiments. REDIportal embeds RADAR database and represents the first editing resource designed to answer functional questions, enabling the inspection and browsing of editing levels in a variety of human samples, tissues and body sites.
Met-DB v2.0, the significantly improved second version of Met-DB, which is entirely redesigned to focus more on elucidating context-specific m(6)A functions. Met-DB v2.0 has a major increase in context-specific m(6)A peaks and single-base sites predicted from 185 samples for 7 species from 26 independent studies. Moreover, it is also integrated with a new database for targets of m(6)A readers, erasers and writers and expanded with more collections of functional data.
The REPIC (RNA EPItranscriptome Collection) database records about 10 million peaks called from publicly available m(6)A-seq and MeRIP-seq data using unified pipeline. These data were collected from 672 samples of 49 studies, covering 61 cell lines or tissues in 11 organisms.
m(6)A-Atlas was developed for unraveling the m(6)A epitranscriptome. m(6)A-Atlas features a high-confidence collection of 442 162 reliable m(6)A sites identified from seven base-resolution technologies and the quantitative (rather than binary) epitranscriptome profiles estimated from 1363 high-throughput sequencing samples.
The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims to identify functional elements in the human genome.ENCODE also supports further biomedical research by "generating community resources of genomics data, software, tools and methods for genomics data analysis, and products resulting from data analyses and interpretations". The current phase of ENCODE is adding depth to its resources by growing the number of cell types, data types, assays and now includes support for examination of the mouse genome.
In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.