Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jul;18(7):1180-9.
doi: 10.1101/gr.076117.108. Epub 2008 Apr 14.

Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets

Affiliations
Comparative Study

Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets

Chaim Linhart et al. Genome Res. 2008 Jul.

Abstract

We present a threefold contribution to the computational task of motif discovery, a key component in the effort of delineating the regulatory map of a genome: (1) We constructed a comprehensive large-scale, publicly-available compendium of transcription factor and microRNA target gene sets derived from diverse high-throughput experiments in several metazoans. We used the compendium as a benchmark for motif discovery tools. (2) We developed Amadeus, a highly efficient, user-friendly software platform for genome-scale detection of novel motifs, applicable to a wide range of motif discovery tasks. Amadeus improves upon extant tools in terms of accuracy, running time, output information, and ease of use and is the only program that attained a high success rate on the metazoan compendium. (3) We demonstrate that by searching for motifs based on their genome-wide localization or chromosomal distributions (without using a predefined target set), Amadeus uncovers diverse known phenomena, as well as novel regulatory motifs.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The main components of the Amadeus computational pipeline. The input consists of one or more target gene sets and various parameters such as the score(s) for evaluating the motifs. Starting from all k-mers, the algorithm runs a series of refinement phases that eventually converge to a nonredundant list of high-scoring PWMs. These motifs, together with additional information and analyses, are displayed in the graphical output. For more details, see Methods.
Figure 2.
Figure 2.
Screenshot of Amadeus. The left panel controls the input parameters (organism, target set, promoter region, scores, etc.). Here, Amadeus was executed on the set of genes expressed in G2 and G2/M phases of the human cell cycle (Whitfield et al. 2002). The top-scoring motif shown in the output panel on the right is CHR (cell-cycle genes homology region), a cis-regulatory element that was experimentally found in promoters of several G2/M genes (Zhu et al. 2004), and is not represented in TRANSFAC; the second motif is CCAAT-box (NF-Y). For each motif discovered, the output also lists similar patterns from TRANSFAC, information on the localization of its occurrences, and additional statistics. In agreement with recent studies (Linhart et al. 2005; Tabach et al. 2005), the motif-pairs analysis in Amadeus reports the de novo found CHR and NF-Y motifs as a cis-regulatory module that is highly specific to the G2 and G2/M cell-cycle phases (Supplemental Fig. 1). A screenshot with additional graphical features is shown in Supplemental Figure 2.
Figure 3.
Figure 3.
The metazoan target-set compendium and benchmark results on it. (A) The compendium of metazoan TF/miRNA target sets collected from the literature. The “Source” column indicates the experimental procedure or database from which the target set was derived: gene expression microarrays (Ex), ChIP-chip (CC), ChIP-DSL (C-DSL), DamID (van Steensel et al. 2001), or Gene Ontology (GO) database (Ashburner et al. 2000). For additional information and references, see http://acgt.cs.tau.ac.il/amadeus. (B) Performance of motif finding tools on each target set—each successful motif recovery is marked by a gray-shaded box, according to the PWM divergence (darker shades of gray indicate higher similarity of the recovered motif to the one in the literature); the ∞ symbol marks long executions (>48 h) that were aborted. Here, Amadeus was run with the HG enrichment score. The success-rate patterns of the six motif finders are almost identical when comparing different target sets of the same TF. For example, in all three E2F data sets, Amadeus, Weeder, and Trawler are the only tools that recovered the correct motif; in the two Myod sets, Amadeus and Weeder succeeded with PWM divergence cutoff 0.18, AlignACE succeeded with cutoff 0.24, and MEME and YMF failed with all cutoffs. This consistency, observed for all six TFs that are represented by more than one set in our compendium, is not a result of large overlaps between the target sets, as such overlaps were avoided in the construction of the compendium. Instead, it is likely to stem from properties inherent to the TFs, such as the extent and type of their BSs degeneracy.
Figure 4.
Figure 4.
Performance of six motif finding tools on our compendium of metazoan target sets. (A) Success rates for three PWM divergence cutoffs, indicated by different shades of gray. The light-gray boxes on top of the Amadeus bars show the improved success rates when using the binned enrichment score (instead of the HG score; see Methods). Success rates for other PWM similarity measures and cutoffs are shown in Supplemental Figure 4. (B) Running times in logarithmic scale for the TF target-sets (AlignACE and MEME did not finish within 48 h on several sets). Trawler is a web-based tool so we could not measure its running time. For full results, see Supplemental Table 1 and http://acgt.cs.tau.ac.il/amadeus. A detailed comparison of all tested tools is given in Supplemental Table 2.
Figure 5.
Figure 5.
Genome-wide chromosomal preference analysis of C. elegans promoters. (A) Screenshot of Amadeus output, showing the top-scoring motif found in the analysis. The motif is highly overrepresented on chromosome IV (P = 8 × 10−63). (B) The motif reported by Ruby et al. (2006), found upstream of many 21U-RNAs, is nearly identical to the one identified de novo by Amadeus.

References

    1. Aerts S., Thijs G., Dabrowski M., Moreau Y., De Moor B. Comprehensive analysis of the base composition around the transcription start site in Metazoa. BMC Genomics. 2004;5:34. doi: 10.1186/1471-2164-5-34. - DOI - PMC - PubMed
    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Bailey T.L., Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. - PubMed
    1. Birney E., Andrews T.D., Bevan P., Caccamo M., Chen Y., Clarke L., Coates G., Cuff J., Curwen V., Cutts T., et al. An overview of Ensembl. Genome Res. 2004;14:925–928. - PMC - PubMed
    1. Blais A., Tsikitis M., Acosta-Alvear D., Sharan R., Kluger Y., Dynlacht B.D. An initial blueprint for myogenic differentiation. Genes & Dev. 2005;19:553–569. - PMC - PubMed

Publication types

LinkOut - more resources