1. Collecting SNPs in the 3'-UTR
SNPs that are located in the 3'-UTRs of all known genes by UCSC genome annotation (mouse: mm7 and human: hg18) were extracted from dbSNP build 126. Genomic locations of these SNPs were mapped onto mRNAs.
2. Identifying and annotating PolymiRTS
For each SNP, we assessed whether its two alleles lead to different miRNA target sites. We only consider those 3'-UTR SNPs that affect the match to the seed region of the miRNA. Mature miRNA sequences were downloaded from the miRBase 1. We used the criteria of TargetScanS 2 in the prediction of miRNA sites. Basically, besides requiring a perfect Watson-Crick match to the seed nucleotides 2-7 of miRNA, we further require that there is either a perfect match to the 8th nucleotide of miRNA, or an anchor adenosine immediately downstream the 2-7 seed in the target.
We assigned the PolymiRTS to one of the four classes: 'D' (an allele disrupts a conserved miRNA site), 'N' (a derived allele disrupts a nonconserved miRNA site), 'C' (a derived allele creates a new miRNA site) and 'O' (other cases when the ancestral allele can not be determined unambiguously). PolymiRTS of class 'C' may cause abnormal gene repression and PolymiRTS of class 'D' may cause loss of normal repression control. These two classes of PolymiRTS are most likely to have functional impacts. We used the pre-calculated 17-way Multiz alignments of vertebrate genomes to derive the annotations. For a miRNA site to be conserved, we require that it is present in at least two other vertebrate genomes in addition to the query genome. For mouse SNPs, their ancestral alleles were determined by mouse vs. rat (rn3) genome alignment. For human SNPs, their ancestral alleles were determined by human vs. chimpanzee (panTro1) genome alignment. Additionally, we also categorized PolymiRTS with A/G alleles because they are supposed to be less deleterious with their ability to form G:U wobble base-pairs with miRNAs.
3. Assessing PolymiRTS in cis-acting eQTLs
The genes with both cis-acting eQTL and PolymiRTS are featured in the database.
First, gene expression levels in cerebellum, hippocampus, striatum, eye, whole brain and hematopoietic cell were assessed in recombinant inbred strains (BXD) from two parental mouse strains C57BL/6J and DBA/2J. Gene expression levels were treated as quantitative traits and were mapped onto genomic regions (eQTL) using standard marker regression. A gene is said to have a suggestive (significant) cis-acting eQTL are that the LOD peak location is within 10 MB from the gene's physical location and the LOD >2.8 (>4.3) 3.
Second, gene expression levels in lymphoblastoid cells of 194 human individuals from 14 CEPH families were downloaded from the GEO database and the raw data were processed by using the RMA protocol. Genotypes for 1628 autosomal SNP markers were downloaded from The SNP Consortium database. We used Merlin to remove genotype errors and perform family-based linkage analysis. A gene is said to have a cis-acting eQTL if the LOD peak location is within 10 MB from the gene's physical location and the p-value is <0.05.
4. Assessing PolymiRTS in pQTLs
We first mapped the QTLs (with a LOD>2.8) for more than 800 published BXD phenotypes (physiological/behavioral traits). For each QTL, we linked it with genes that are physically located in the QTL interval and have at least one PolymiRTS. These genes, together with genes with nonsynonymous SNPs, are candidate causal genes underlying the pQTL. |