We are studying the multigene family encoding the fucoxanthin-chlorophyll binding proteins (fcp genes) that constitute the major component of the photosystem II-associated light harvesting complex in diatoms and brown algae. The characteristics of clusters of fcp genes on the genome of the diatom Phaeodactylum tricornutum are described. Sequence analysis of two genomic clones, PT5 and PT4, has demonstrated the presence of four fcp genes (fcpA, fcpB, fcpC, fcpD) on the former and two fcp genes (fcpE, fcpF) on the latter. The proteins encoded by the six characterized fcp genes range in similarity from 86% to 99%. The genes within each cluster are separated by short intergenic sequences (between 0.5 to 1.1 kb). None of these genes contain introns and all appear to be transcribed with short 5' transcribed, untranslated leader sequences; the transcription initiation sites were mapped 26 to 48 bases upstream of the ATG translation start site. Small conserved motifs are found among all of the genes just upstream of both the translation and the transcription start sites. The codon bias is similar in all of the fcp genes, with a predominance of pyrimidines in the third positions of codons of the four codon families. The two fcp genes that are most similar are fcpC and fcpD, and might represent a recent gene duplication. Southern analyses using fcp cDNAs as hybridization probes suggest that there may be additional sequences on the P. tricornutum genome that resemble the characterized fcp sequences.