Genome research

Multiple variable first exons: a mechanism for cell- and tissue-specific gene regulation.

PMID 14672974


A large family of neural protocadherin (Pcdh) proteins is encoded by three closely linked mammalian gene clusters (alpha, beta, and gamma). Pcdh alpha and gamma clusters have a striking genomic organization. Specifically, each "variable" exon is spliced to a common set of downstream "constant" exons within each cluster. Recent studies demonstrated that the cell-specific expression of each Pcdh gene is determined bya combination of variable-exon promoter activation and cis-splicing of the corresponding variable exon to the first constant exon. To determine whether there are other similarly organized gene clusters in mammalian genomes, we performed a genome-wide search and identified a large number of mammalian genes containing multiple variable first exons. Here we describe several clusters that contain about a dozen variable exons arrayed in tandem, including UDP glucuronosyltransferase (UGT1), plectin, neuronal nitric oxide synthase (NOS1), and glucocorticoid receptor (GR) genes. In all these cases, multiple variable first exons are each spliced to a common set of downstream constant exons to generate diverse functional mRNAs. As an example, we analyzed the tissue-specific expression profile of the mouse UGT1 repertoire and found that multiple isoforms are expressed in a tissue-specific manner. Therefore, this variable and constant genomic organization provides a genetic mechanism for directing distinct cell- and tissue-specific patterns of gene expression.