Journal of virology

Sequence variability, gene structure, and expression of full-length human endogenous retrovirus H.

PMID 15858016


Recently, we identified and classified 926 human endogenous retrovirus H (HERV-H)-like proviruses in the human genome. In this paper, we used the information to, in silico, reconstruct a putative ancestral HERV-H. A calculated consensus sequence was nearly open in all genes. A few manual adjustments resulted in a putative 9-kb HERV-H provirus with open reading frames (ORFs) in gag, pro, pol, and env. Long terminal repeats (LTRs) differed by 1.1%, indicating proximity to an integration event. The gag ORF was extended upstream of the normal myristylation start site. There was a long leader (including a "pre-gag" ORF) region positioned like the N terminus of murine leukemia virus (MLV) "glyco-Gag," potentially encoding a proline- and serine-rich domain remotely similar to MLV pp12. Another ORF, starting inside the 5' LTR, had no obvious similarity to known protein domains. Unlike other hitherto described gammaretroviruses, the reconstructed Gag had two zinc finger motifs. Alternative splicing of sequences related to the HERV-H consensus was confirmed using dbEST data. env transcripts were most prevalent in colon tumors, but also in normal testis. We found no evidence for full length env transcripts in the dbEST. HERV-H had a markedly skewed nucleotide composition, disfavoring guanine and favoring cytidine. We conclude that the HERV-H consensus shared a gene arrangement common to gammaretroviruses with gag separated by stop codon from pro-pol in the same reading frame, while env resides in another reading frame. There was also alternative splicing. HERV-H consensus yielded new insights in gammaretroviral evolution and will be useful as a model in studies on expression and function.