(1993). "A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. The Huntington's Disease Collaborative Research Group." Cell 72(6): 971-83.

The Huntington's disease (HD) gene has been mapped in 4p16.3 but has eluded identification. We have used haplotype analysis of linkage disequilibrium to spotlight a small segment of 4p16.3 as the likely location of the defect. A new gene, IT15, isolated using cloned trapped exons from the target area contains a polymorphic trinucleotide repeat that is expanded and unstable on HD chromosomes. A (CAG)n repeat longer than the normal range was observed on HD chromosomes from all 75 disease families examined, comprising a variety of ethnic backgrounds and 4p16.3 haplotypes. The (CAG)n repeat appears to be located within the coding sequence of a predicted approximately 348 kd protein that is widely expressed but unrelated to any known gene. Thus, the HD mutation involves an unstable DNA segment, similar to those described in fragile X syndrome, spino-bulbar muscular atrophy, and myotonic dystrophy, acting in the context of a novel 4p16.3 gene to produce a dominant phenotype.

Akiyama, K., T. Yoshii, et al. (2002). "Analysis of the polymorphic structure of the D7S808-short tandem repeat (STR) locus." Leg Med (Tokyo) 4(3): 178-81.

We analyzed the polymorphic structure of the short tandem repeat (STR) (AARG) locus D7S808 by DNA sequencing and examined the D7S808 allele distribution in a Japanese population. The sequence analysis confirmed that this locus consists of repeats of the tetranucleotides cttt and cctt, but that the number of repeats of the cctt motif does not vary with the allele, and that this STR polymorphism is due to variation in the number of cttt repeats alone. Although the results in this study suggest that the numbers of repeats range from 7 (allele 7) to 22 (allele 22), alleles 9, 10, 19, and 21 were not observed in the Japanese samples examined. Analysis of DNA samples from 355 unrelated individuals revealed the occurrence of 286 heterozygotes (observed heterozygosity 80.6%). Alleles 15, 14, 16, and 17 had high frequencies of 0.261, 0.192, 0.166, and 0.120, respectively and, together with allele 7 with a slightly high frequency of 0.059, showed a bimodal distribution. In addition, we prepared primers yielding shorter amplification products (232-292 bp) than those (435-480 bp) obtained with the originally reported primers. The newly designed primers can be used for polymerase chain reaction, making this locus extremely useful in forensic science practice.

Al-Shahrour, F., R. Diaz-Uriarte, et al. (2004). "FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes." Bioinformatics 20(4): 578-80.

SUMMARY: We present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute. AVAILABILITY: http://fatigo.bioinfo.cnio.es

Allen, A., D. A. Hutton, et al. (1998). "The MUC2 gene product: a human intestinal mucin." Int J Biochem Cell Biol 30(7): 797-801.

The MUC2 gene product is the first human secretory mucin protein core to be fully sequenced. Like the other eight human MUC genes identified to date, MUC2 is characterised by tandem and irregular repeat sequences rich in threonine and serine, the potential sites of attachment of the oligosaccharide chains. The MUC2 gene product is more than 5100 amino acids in its commonest allelic form and accounts for one fifth by weight of the mucin glycoprotein molecule (80% oligosaccharide side chains). The MUC2 product is polymerised end to end through disulphide bridges to form large secreted polymeric gel-forming mucins (Mr approximately 10(7)). The primary function of the MUC2 gene product is to provide a protective barrier between the epithelial surfaces and the gut lumen. There is decreased expression of MUC2 in colonic cancer and defective polymerisation of secreted mucin in ulcerative colitis. Elucidation of the MUC2 and other mucin gene sequences has opened the way for a full structural characterisation and an improved understanding of the structure and function of these complex mucus gel secretions.

Altschul, S. F., T. L. Madden, et al. (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res 25(17): 3389-402.

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

Andrade, M. A., C. Perez-Iratxeta, et al. (2001). "Protein repeats: structures, functions, and evolution." J Struct Biol 134(2-3): 117-31.

Internal repetition within proteins has been a successful strategem on multiple separate occasions throughout evolution. Such protein repeats possess regular secondary structures and form multirepeat assemblies in three dimensions of diverse sizes and functions. In general, however, internal repetition affords a protein enhanced evolutionary prospects due to an enlargement of its available binding surface area. Constraints on sequence conservation appear to be relatively lax, due to binding functions ensuing from multiple, rather than, single repeats. Considerable sequence divergence as well as the short lengths of sequence repeats mean that repeat detection can be a particularly arduous task. We also consider the conundrum of how multiple repeats, which show strong structural and functional interdependencies, ever evolved from a single repeat ancestor. In this review, we illustrate each of these points by referring to six prolific repeat types (repeats in beta-propellers and beta-trefoils and tetratricopeptide, ankyrin, armadillo/HEAT, and leucine-rich repeats) and in other less-prolific but nonetheless interesting repeats.

Andrade, M. A., C. P. Ponting, et al. (2000). "Homology-based method for identification of protein repeats using statistical significance estimates." J Mol Biol 298(3): 521-37.

Short protein repeats, frequently with a length between 20 and 40 residues, represent a significant fraction of known proteins. Many repeats appear to possess high amino acid substitution rates and thus recognition of repeat homologues is highly problematic. Even if the presence of a certain repeat family is known, the exact locations and the number of repetitive units often cannot be determined using current methods. We have devised an iterative algorithm based on optimal and sub-optimal score distributions from profile analysis that estimates the significance of all repeats that are detected in a single sequence. This procedure allows the identification of homologues at alignment scores lower than the highest optimal alignment score for non-homologous sequences. The method has been used to investigate the occurrence of eleven families of repeats in Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens accounting for 1055, 2205 and 2320 repeats, respectively. For these examples, the method is both more sensitive and more selective than conventional homology search procedures. The method allowed the detection in the SwissProt database of more than 2000 previously unrecognised repeats belonging to the 11 families. In addition, the method was used to merge several repeat families that previously were supposed to be distinct, indicating common phylogenetic origins for these families.

Apostol, B. L., A. Kazantsev, et al. (2003). "A cell-based assay for aggregation inhibitors as therapeutics of polyglutamine-repeat disease and validation in Drosophila." Proc Natl Acad Sci U S A.

The formation of polyglutamine-containing aggregates and inclusions are hallmarks of pathogenesis in Huntington's disease that can be recapitulated in model systems. Although the contribution of inclusions to pathogenesis is unclear, cell-based assays can be used to screen for chemical compounds that affect aggregation and may provide therapeutic benefit. We have developed inducible PC12 cell-culture models to screen for loss of visible aggregates. To test the validity of this approach, compounds that inhibit aggregation in the PC12 cell-based screen were tested in a Drosophila model of polyglutamine-repeat disease. The disruption of aggregation in PC12 cells strongly correlates with suppression of neuronal degeneration in Drosophila. Thus, the engineered PC12 cells coupled with the Drosophila model provide a rapid and effective method to screen and validate compounds.

Ashburner, M., C. A. Ball, et al. (2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium." Nat Genet 25(1): 25-9.

Bejerano, G., D. Haussler, et al. (2004). "Into the heart of darkness: large-scale clustering of human non-coding DNA." Bioinformatics 20 Suppl 1: I40-I48.

MOTIVATION: It is currently believed that the human genome contains about twice as much non-coding functional regions as it does protein-coding genes, yet our understanding of these regions is very limited. RESULTS: We examine the intersection between syntenically conserved sequences in the human, mouse and rat genomes, and sequence similarities within the human genome itself, in search of families of non-protein-coding elements. For this purpose we develop a graph theoretic clustering algorithm, akin to the highly successful methods used in elucidating protein sequence family relationships. The algorithm is applied to a highly filtered set of about 700 000 human-rodent evolutionarily conserved regions, not resembling any known coding sequence, which encompasses 3.7% of the human genome. From these, we obtain roughly 12 000 non-singleton clusters, dense in significant sequence similarities. Further analysis of genomic location, evidence of transcription and RNA secondary structure reveals many clusters to be significantly homogeneous in one or more characteristics. This subset of the highly conserved non-protein-coding elements in the human genome thus contains rich family-like structures, which merit in-depth analysis. AVAILABILITY: Supplementary material to this work is available at http://www.soe.ucsc.edu/~jill/dark.html

Bejerano, G., M. Pheasant, et al. (2004). "Ultraconserved elements in the human genome." Science 304(5675): 1321-5.

There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.

Benson, G. (1999). "Tandem repeats finder: a program to analyze DNA sequences." Nucleic Acids Res 27(2): 573-80.

A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.

Borstnik, B. and D. Pumpernik (2002). "Tandem repeats in protein coding regions of primate genes." Genome Res 12(6): 909-15.

Tandem repeats in GenBank primate nucleotide sequences annotated as protein coding regions are analyzed. It is found that only trinucleotide repeats show repeat enrichment well above the threshold of statistical significance. The statistics are improved by a simultaneous search for repeats on both the amino acid and nucleotide levels. The results of the analyses of natural sequences are interpreted by comparing them with the results of the computer simulation of the model dedicated to protein coding regions. According to the simulation results, a limited set of trinucleotides, that is, cgg, ccg, cag, and gaa repeats coding for polyalanine, polyglycine, polyproline, polyglutamine, and polylysine are prone to proliferation. It is also found that within the repeat regions slippage is more frequent by a factor of 10 than point mutations, whereas the ratio of silent versus recognizable point mutations is approximately the same as elsewhere in coding regions. The trinucleotide repeats cover slightly more than 0.3% of the protein coding regions of genes.

Bratt, O., A. Borg, et al. (1999). "CAG repeat length in the androgen receptor gene is related to age at diagnosis of prostate cancer and response to endocrine therapy, but not to prostate cancer risk." Br J Cancer 81(4): 672-6.

The length of the polymorphic CAG repeat in the N-terminal of the androgen receptor (AR) gene is inversely correlated with the transactivation function of the AR. Some studies have indicated that short CAG repeats are related to higher risk of prostate cancer. We performed a case-control study to investigate relations between CAG repeat length and prostate cancer risk, tumour grade, tumour stage, age at diagnosis and response to endocrine therapy. The study included 190 AR alleles from prostate cancer patients and 186 AR alleles from female control subjects. All were whites from southern Sweden. The frequency distribution of CAG repeat length was strikingly similar for cases and controls, and no significant correlation between CAG repeat length and prostate cancer risk was detected. However, for men with non-hereditary prostate cancer (n = 160), shorter CAG repeats correlated with younger age at diagnosis (P = 0.03). There were also trends toward associations between short CAG repeats and high grade (P = 0.07) and high stage (P = 0.07) disease. Furthermore, we found that patients with long CAG repeats responded better to endocrine therapy, even after adjusting for pretreatment level of prostate-specific antigen and tumour grade and stage (P = 0.05). We conclude that short CAG repeats in the AR gene correlate with young age at diagnosis of prostate cancer, but not with higher risk of the disease. Selection of patients with early onset prostate cancer in case-control studies could therefore lead to an over-estimation of the risk of prostate cancer for men with short CAG repeats. An association between long CAG repeats and good response to endocrine therapy was also found, but the mechanism and clinical relevance are unclear.

Bugert, P., M. M. Hoffmann, et al. (2003). "The variable number of tandem repeat polymorphism in the P-selectin glycoprotein ligand-1 gene is not associated with coronary heart disease." J Mol Med 81(8): 495-501.

Genes involved in inflammatory processes are candidates for predisposition to prothrombotic syndromes. The variable number of tandem repeat (VNTR) polymorphism in the P-selectin glycoprotein ligand (PSGL)-1 gene has been associated with ischemic cerebrovascular disease but not with coronary heart disease (CHD). We assessed the effect of the VNTR polymorphism on CHD in two independent case/control studies. In the first study 281 CHD patients and 397 healthy blood donors were genotyped for the VNTR alleles in PSGL-1. The prevalence of homozygous carriers of the PSGL-1 VNTR allele with 15 repeat units was significantly higher in the CHD patients (5.3% vs. 1.5%) than in controls, suggesting an effect of this marker in CHD. To validate the findings genotyping was performed in a second study including 2,578 CHD patients, 731 patients without CHD, and 1084 healthy blood donors. The larger case control study had a power of 99.9% to detect the initially observed difference but failed to confirm the putative role of PSGL-1 VNTR polymorphism in CHD. Frequencies of the PSGL-1 VNTR 15 repeats for homozygous carriers were 2.2% in healthy blood donors, 2.3% in patients without CHD and 2.7%, in CHD cases, respectively. These results demonstrate that the PSGL-1 VNTR polymorphism is not a genetic risk factor for CHD. Adequately powered studies are prerequisites to obtain reliable results about genotype-phenotype relationships of new candidate genes in complex diseases.

Cai, Q., Y. T. Gao, et al. (2003). "Association of Breast Cancer Risk with a GT Dinucleotide Repeat Polymorphism Upstream of the Estrogen Receptor-alpha Gene." Cancer Res 63(18): 5727-5730.

Recent studies suggest that genetic polymorphisms of the estrogen receptor-alpha (ER-alpha) gene may be associated with breast cancer risk. To evaluate the role of this gene in the risk of breast cancer, we genotyped a newly identified GT dinucleotide repeat [(GT)(n)] polymorphism located in the promoter region (6.6 kb upstream of the transcription start site) in 947 breast cancer cases and 993 age frequency-matched community controls from a population-based case-control study conducted among Chinese in urban Shanghai. Sixteen alleles were identified, the most common one having 16 GT repeats [(GT)(16)]. Compared with subjects homozygous for this allele, subjects carrying the (GT)(17) or (GT)(18) allele had a decreased risk of breast cancer. The odds ratios (ORs) were 0.81 [95% confidence interval (CI), 0.62-1.06] and 0.58 (95% CI, 0.36-0.94), respectively, for one and two copies of the (GT)(17) or (GT)(18) allele. The inverse association with carrying either of these alleles was stronger among women with >30 years of menstrual cycles (OR 0.66; 95% CI 0.51-0.85) than those with a shorter duration of menstrual cycles (OR 0.97; 95% CI 0.73-1.27), and the test for an interaction was statistically significant (P = 0.04). Among breast cancer patients, the presence of either the (GT)(17) or (GT)(18) allele was associated with a reduced expression of progesterone receptor. Results of this study indicate that the GT dinucleotide repeat polymorphism in ER-alpha gene promoter region may be a new biomarker for genetic susceptibility to breast cancer.

Callahan, J. L., K. J. Andrews, et al. (2003). "Mutations in Yeast Replication Proteins That Increase CAG/CTG Expansions Also Increase Repeat Fragility." Mol Cell Biol 23(21): 7849-60.

Expansion of trinucleotide repeats (TNRs) is the causative mutation in several human genetic diseases. Expanded TNR tracts are both unstable (changing in length) and fragile (displaying an increased propensity to break). We have investigated the relationship between fidelity of lagging-strand replication and both stability and fragility of TNRs. We devised a new yeast artificial chromomosme (YAC)-based assay for chromosome breakage to analyze fragility of CAG/CTG tracts in mutants deficient for proteins involved in lagging-strand replication: Fen1/Rad27, an endo/exonuclease involved in Okazaki fragment maturation, the nuclease/helicase Dna2, RNase HI, DNA ligase, polymerase delta, and primase. We found that deletion of RAD27 caused a large increase in breakage of short and long CAG/CTG tracts, and defects in DNA ligase and primase increased breakage of long tracts. We also found a correlation between mutations that increase CAG/CTG tract breakage and those that increase repeat expansion. These results suggest that processes that generate strand breaks, such as faulty Okazaki fragment processing or DNA repair, are an important source of TNR expansions.

Calo, C. M., L. Varesi, et al. (2003). "A pentanucleotide repeat polymorphism (TTTTA) in the apolipoprotein (a) gene--its distribution and its association with the risk of cardiovascular disease." Coll Antropol 27(1): 105-15.

Apolipoprotein (a) is a component of lipoprotein (a). Several studies have shown the association between risk of coronary heart diseases and the size of apo(a) isoforms, although this issue is still controversial. Recent researches focused the attention on the pentanucleotide (TTTTA), highlighting a statistical correlation between low Lp(a) levels and high repeat numbers. In the present paper we studied the distribution of the apo(a) pentanucleotide polymorphism among populations from Corsica, and we then compared it with other populations from Europe, Africa and Asia. The results stressed out the usefulness of these markers in population genetics analysis. We later investigated the possible association of the apo(a) pentanucleotide polymorphism with serum lipid levels in two samples from Corsica (France): one comprises patients or individuals with high risk of future coronary heart disease and the other is a control sample. No significant differences between the two groups have been found, but the analysis of variance showed a significant association between different genotypes and cholesterol and LDL serum levels.

Chen, L. S., F. Tassone, et al. (2003). "The (CGG)n repeat element within the 5'untranslated region of the FMR1 message provides both positive and negative cis effects on in vivo translation of a downstream reporter." Hum Mol Genet.

The human fragile X mental retardation 1 (FMR1) gene contains a polymorphic (CGG) trinucleotide repeat element in its 5' untranslated region. Expansion of the (CGG)n element beyond 200 repeats (full mutation range) generally leads to transcriptional silencing; consequent loss of the FMR1 protein (FMRP) results in fragile X syndrome, the most frequent form of inherited mental impairment. For carriers of smaller expansions (55</=n</=200; premutation range), FMRP levels are gradually reduced with increasing repeat number, despite elevated FMR1 mRNA levels, suggesting that translation is impeded within the premutation range. To examine in more detail the influence of the CGG repeat on translation, CMV immediate-early promoter constructs, containing the FMR1 5'UTR with various (CGG)n repeat lengths (0</=n</=99) and a downstream (luciferase) reporter, were transfected into two human cell lines, a neural cell-derived line (SK) and a fetal kidney cell-derived line (293). For both cell types, the CGG element exerts distinct effects on reporter expression, depending on the length of the repeat. For n>/=30, luciferase expression decreases with increasing repeat length, consistent with earlier observations of decreased FMRP expression in peripheral blood leucocytes over the same repeat range, despite a slight increase in mRNA level for the larger repeats. Surprisingly, for smaller alleles (0</=n</=30), reporter expression actually increases by nearly two-fold with increasing repeat length in the absence of any change in mRNA level. These results suggest that the CGG repeat element can exert both positive (n<30) and negative (n>30) effects on translation. Interestingly, optimal translation appears to occur near the modal repeat number within the general human population.

Clark, P. E., R. A. Irvine, et al. (2003). "The androgen receptor CAG repeat and prostate cancer risk." Methods Mol Med 81: 255-66.

Cleary, J. D. and C. E. Pearson (2003). "The contribution of CIS-elements to disease-associated repeat instability: clinical and experimental evidence." Cytogenet Genome Res 100(1-4): 25-55.

Alterations in the length (instability) of gene-specific microsatellites and minisatellites are associated with at least 35 human diseases. This review will discuss the various CIS-elements that contribute to repeat instability, primarily through examination of the most abundant disease-associated repetitive element, trinucleotide repeats. For the purpose of this review, we define CIS-elements to include the sequence of the repeat units, the length and purity of the repeat tracts, the sequences flanking the repeat, as well as the surrounding epigenetic environment, including DNA methylation and chromatin structure. Gender-, tissue-, developmental- and locus-specific CIS-elements in conjunction with TRANS-factors may facilitate instability through the processes of DNA replication, repair and/or recombination. Here we review the available human data that supports the involvement of CIS-elements in repeat instability with limited reference to model systems. In diverse tissues at different developmental times and at specific loci, repetitive elements display variable levels of instability, suggesting vastly different mechanisms may be responsible for repeat instability amongst the disease loci and between various tissues.

De Fonzo, V., E. Bersani, et al. (1998). "Are only repeated triplets guilty?" J Theor Biol 194(1): 125-42.

It is well known that in some places of the human genome one finds a variable number of tandem repeats of trinucleotides; it is now commonly acknowledged that in many cases an excessive expansion of such a number is the cause of nervous system diseases. Moreover there exist cases of genetic disorders linked with loci where a variable number of tandem repeats of sequences longer than three bases has been found. The abnormal number of these repeats in few cases has been associated with the onset of the disease. Considering the above facts, we have performed an extensive study of published sequences of genes connected with various diseases. We have examined, inside or near those genes, all possible tandem repeats. The analysis has led to the detection of a large number of repeats of both triplets and longer sequences, many of which, as far as we know, had not been pointed out before. The results of our analysis lead us to put forward the hypothesis that in more cases than those till now established, a variable number of tandem repeats of generic sequences, not only of triplets, could be associated with disease onset. Finally we suggest to allocate experimental researches for all the possible tandem repeats and their possible correlation with the neurodegenerative disorders and with other kinds of syndromes.

De Luca, A., M. Rizzardi, et al. (2003). "Association of dopamine D4 receptor (DRD4) exon III repeat polymorphism with temperament in 3-year-old infants." Neurogenetics.

The long forms of the dopamine D4 receptor (DRD4) exon III repeat polymorphism (L-DRD4) have been linked in some studies to the adult personality trait of novelty seeking (NS), as well as to infant personality traits related to interest and activity. The current investigation extends the results of our previous longitudinal study on 1- to 5-month-old neonates assessed by the Early and Revised Infancy Temperament Questionnaire (EITQ/RITQ), in which we found a significant correlation between the DRD4 polymorphism and the adaptability trait at 1 month of age. In this study, we examined the relationship between children's behavior at 3 years of age, measured with the Toddler Temperament Scale (TTS), and DRD4 exon III repeat polymorphism. We found a significant association between the behavioral dimension of intensity of reaction and DRD4 genotypes. Current data failed to confirm the association with the adaptability trait. None of the extraversion and/or exploratory behavior measures was related to the L-DRD4 allele, as expected. In contrast, children with 4/7 genotypes showed worse response to new stimuli compared with 4/4 genotypes. This study corroborates only in part previous results on the link between the DRD4 gene and human temperament.

Denoeud, F. and G. Vergnaud (2004). "Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains: a web-based resource." BMC Bioinformatics 5(1): 4.

BACKGROUND: Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison. RESULTS: In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more) closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors. CONCLUSIONS: We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial Genotyping Page" is a service for strain identification at the subspecies level.

Dick, I. M., A. Devine, et al. (2004). "Association of an aromatase TTTA repeat polymorphism with circulating estrogen, bone structure and biochemistry in older women." Am J Physiol Endocrinol Metab.

Osteoporosis is a disease that is strongly genetically determined. Aromatase converts androgens to estradiol in postmenopausal women, therefore polymorphisms of the gene for this enzyme may be associated with bone mass and fracture. We investigated the association of the TTTA microsatellite polymorphism in intron 4 of the aromatase (CYP19) gene with bone mineral density (BMD) and fracture in 1257 women aged 70 years and greater. The data obtained was stratified based on the presence or absence of a [TTTA]n of 7 (A2), determined from a preliminary analysis of hip dual energy X ray absorptiometry (DXA) BMD, which was present in 27% of the population. The presence of an A2 allele was associated with a higher free estradiol index (FEI) (0.52 +/- 0.49, p=0.049) compared to the absence of an A2 allele (0.47 +/- 0.45); higher BMD at all sites of the hip (3.4% total hip, 2.3% femoral neck, 3.6% intertrochanter, 4.1% trochanter) and the lumbar spine (12.7%); higher values for the calcaneal quantitative ultrasound (QUS) parameters BUA (1.3%), SOS (0.4%) and stiffness (3.7%) and higher peripheral quantitative computed tomography (pQCT) measures for total (3.4%), trabecular (3.3%) and cortical BMD (3.3%) and the derived stress strain index (SSI) parameters SSI polar (6.4%) and SSI x (6.8%) values. A lower deoxypryridinoline creatinine ratio (DpdCr) was observed in subjects with an A2 allele (30.3 +/- 10.4 vs 27.1 +/- 9.1, p=0.03). The A2 allele was associated with a lower prevalence of vertebral fracture in subjects who were osteoporotic (Odds Ratio 0.27, CI 0.09-0.79). Therefore, a common polymorphism of the aromatase gene, perhaps in linkage disequilibrium with a functionally significant CYP19 polymorphism, is associated with bone structure and bone turnover, either by local effects or by effects on circulating bioactive estrogen.

Djian, P., J. M. Hancock, et al. (1996). "Codon repeats in genes associated with human diseases: fewer repeats in the genes of nonhuman primates and nucleotide substitutions concentrated at the sites of reiteration." Proc Natl Acad Sci U S A 93(1): 417-21.

Five human diseases are due to an excessive number of CAG repeats in the coding regions of five different genes. We have analyzed the repeat regions in four of these genes from nonhuman primates, which are not known to suffer from the diseases. These primates have CAG repeats at the same sites as in human alleles, and there is similar polymorphism of repeat number, but this number is smaller than in the human genes. In some of the genes, the segment of poly(CAG) has expanded in nonhuman primates, but the process has advanced further in the human lineage than in other primate lineages, thereby predisposing to diseases of CAG reiteration. Adjacent to stretches of homogeneous present-day codon repeats, previously existing codons of the same kind have undergone nucleotide substitutions with high frequency. Where these lead to amino acid substitutions, the effect will be to reduce the length of the original homopolymeric stretch in the protein.

Dokholyan, N. V., S. V. Buldyrev, et al. (2000). "Distributions of dimeric tandem repeats in non-coding and coding DNA sequences." J Theor Biol 202(4): 273-82.

We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution functions are exponential. For non-coding DNA, the distribution functions for most of the dimeric repeats have surprisingly long tails, that fit a power-law function. We hypothesize that: (i) the exponential distributions of dimeric repeats in protein coding sequences indicate strong evolutionary pressure against tandem repeat expansion in coding DNA sequences; and (ii) long tails in the distributions of dimers in non-coding DNA may be a result of various mutational mechanisms. These long, non-exponential tails in the distribution of dimeric repeats in non-coding DNA are hypothesized to be due to the higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the distributions are not universal, but rather depend on the specific class of species and the type of a dimer.

Ferreira, H., E. Costa, et al. (2003). "PENTANUCLEOTIDE REPEAT (TTTTA)n POLYMORPHISM IN THE 5 CONTROL REGION OF THE APOLIPOPROTEIN (A) GENE AND ATHEROTHROMBOTIC SERUM LIPOPROTEIN (A) CONCENTRATION, IN A PEDIATRIC POPULATION." Haematologica 88(3): ELT07.

Fleming, K., D. K. Riser, et al. (2003). "Instability of the fragile X syndrome repeat in mice: the effect of age, diet and mutations in genes that affect DNA replication, recombination and repair proficiency." Cytogenet Genome Res 100(1-4): 140-6.

Repeat expansion diseases such as fragile X syndrome (FXS) result from increases in the size of a specific tandem repeat array. In addition to large expansions, small changes in repeat number and deletions are frequently seen in FXS pedigrees. No mouse model accurately recapitulates all aspects of this instability, particularly the occurrence of large expansions. This may be due to differences between mice and humans in CIS and/or TRANS-acting factors that affect repeat stability. The identification of such factors may help reveal the expansion mechanism and allow the development of suitable animal models for these disorders. We have examined the effect of age, dietary folate, and mutations in the Werner's syndrome helicase (WRN) and TRP53 genes on FXS repeat instability in mice. WRN facilitates replication of the FXS repeat and enhances Okazaki fragment processing, thereby reducing the incidence of processes that have been suggested to lead to expansion. p53 is a protein involved in DNA damage surveillance and repair. We find two types of repeat instability in these mice, small changes in repeat number that are seen at frequencies approaching 100%, and large deletions which occur at a frequency of about 10%. The frequency of these events was independent of WRN, p53, parental age, or folate levels. The large deletions occur at the same frequency in mice homozygous and heterozygous for the repeat suggesting that they are not the result of an interallelic recombination event. In addition, no evidence of large expansions was seen. Our data thus show that the absence of repeat expansions in mice is not due to a more efficient WRN protein or p53-mediated error correction mechanism, and suggest that these proteins, or the pathways in which they are active, may not be involved in expansion in humans either. Moreover, the fact that contractions occur in the absence of expansions suggests that these processes occur by different mechanisms.

Fondon, J. W., 3rd and H. R. Garner (2004). "Molecular origins of rapid and continuous morphological evolution." Proc Natl Acad Sci U S A 101(52): 18058-63.

Mutations in cis-regulatory sequences have been implicated as being the predominant source of variation in morphological evolution. We offer a hypothesis that gene-associated tandem repeat expansions and contractions are a major source of phenotypic variation in evolution. Here, we describe a comparative genomic study of repetitive elements in developmental genes of 92 breeds of dogs. We find evidence for selection for divergence at coding repeat loci in the form of both elevated purity and extensive length polymorphism among different breeds. Variations in the number of repeats in the coding regions of the Alx-4 (aristaless-like 4) and Runx-2 (runt-related transcription factor 2) genes were quantitatively associated with significant differences in limb and skull morphology. We identified similar repeat length variation in the coding repeats of Runx-2, Twist, and Dlx-2 in several other species. The high frequency and incremental effects of repeat length mutations provide molecular explanations for swift, yet topologically conservative morphological evolution.

Fondon, J. W., 3rd, G. M. Mele, et al. (1998). "Computerized polymorphic marker identification: experimental validation and a predicted human polymorphism catalog." Proc Natl Acad Sci U S A 95(13): 7514-9.

A computational system for the prediction of polymorphic loci directly and efficiently from human genomic sequence was developed and verified. A suite of programs, collectively called POMPOUS (polymorphic marker prediction of ubiquitous simple sequences) detects tandem repeats ranging from dinucleotides up to 250 mers, scores them according to predicted level of polymorphism, and designs appropriate flanking primers for PCR amplification. This approach was validated on an approximately 750-kilobase region of human chromosome 3p21.3, involved in lung and breast carcinoma homozygous deletions. Target DNA from 36 paired B lymphoblastoid and lung cancer lines was amplified and allelotyped for 33 loci predicted by POMPOUS to be variable in repeat size. We found that among those 36 predominately Caucasian individuals 22 of the 33 (67%) predicted loci were polymorphic with an average heterozygosity of 0.42. Allele loss in this region was found in 27/36 (75%) of the tumor lines using these markers. POMPOUS provides the genetic researcher with an additional tool for the rapid and efficient identification of polymorphic markers, and through a World Wide Web site, investigators can use POMPOUS to identify polymorphic markers for their research. A catalog of 13,261 potential polymorphic markers and associated primer sets has been created from the analysis of 141,779,504 base pairs of human genomic sequence in GenBank. This data is available on our Web site (pompous.swmed.edu) and will be updated periodically as GenBank is expanded and algorithm accuracy is improved.

Forgacs, E., J. D. Wren, et al. (2001). "Searching for microsatellite mutations in coding regions in lung, breast, ovarian and colorectal cancers." Oncogene 20(8): 1005-9.

RepX represents a new informatics approach to probe the UniGene database for potentially polymorphic repeat sequences in the open reading frame (ORF) of genes, 56% of which were found to be actually polymorphic. We now have performed mutational analysis of 17 such sites in genes not found to be polymorphic (<0.03 frequency) in a large panel of human cancer genomic DNAs derived from 31 lung, 21 breast, seven ovarian, 21 (13 microsatellite instability (MSI)+ and eight MSI-) colorectal cancer cell lines. In the lung, breast and ovarian tumor DNAs we found no mutations (<0.03-0.04 rate of tumor associated open reading frame mutations) in these sequences. By contrast, 18 MSI+ colorectal cancers (13 cancer cell lines and five primary tumors) with mismatch repair defects exhibited six mutations in three of the 17 genes (SREBP-2, TAN-1, GR6) (P<0.000003 compared to all other cancers tested). We conclude that coding region microsatellite alterations are rare in lung, breast, ovarian carcinomas and MSI (-) colorectal cancers, but are relatively frequent in MSI (+) colorectal cancers with mismatch repair deficits.

Fortune, M. T., J. L. Kennedy, et al. (2003). "Anticipation and CAG*CTG Repeat Expansion in Schizophrenia and Bipolar Affective Disorder." Curr Psychiatry Rep 5(2): 145-54.

The genetic contribution to the etiologies of schizophrenia and bipolar affective disorder (BPAD) has been considered for many decades, with twin, family, and adoption studies indicating consistently that the familial clustering of affected individuals is accounted for mainly by genetic factors. Despite the strong evidence for a genetic component, very little is understood about the underlying genetic and molecular mechanisms for schizophrenia and BPAD. In the early 1990s, after the discovery of "dynamic mutation" or "unstable DNA" as a molecular basis for the genetic anticipation observed in Huntington's disease, myotonic dystrophy, and many others, and the recently rediscovered, albeit still controversial, evidence for genetic anticipation in major psychoses, the genetic epidemiology of schizophrenia and BPAD was re-evaluated to demonstrate strong endorsement for the unstable DNA model. Many of the non-Mendelian genetic features of schizophrenia and BPAD could be explained by the behaviour of unstable DNA, and several molecular genetic approaches became available for testing the unstable DNA hypothesis. However, despite promising findings in the mid-1990s, no trinucleotide repeat expansion has yet been identified as a cause of idiopathic schizophrenia or BPAD.

Fu, Y. H., A. Pizzuti, et al. (1992). "An unstable triplet repeat in a gene related to myotonic muscular dystrophy." Science 255(5049): 1256-8.

Synthetic oligonucleotides containing GC-rich triplet sequences were used in a scanning strategy to identify unstable genetic sequences at the myotonic dystrophy (DM) locus. A highly polymorphic GCT repeat was identified and found to be unstable, with an increased number of repeats occurring in DM patients. In the case of severe congenital DM, the paternal triplet allele was inherited unaltered while the maternal, DM-associated allele was unstable. These studies suggest that the mutational mechanism leading to DM is triplet amplification, similar to that occurring in the fragile X syndrome. The triplet repeat sequence is within a gene (to be referred to as myotonin-protein kinase), which has a sequence similar to protein kinases.

Gebhardt, F., K. S. Zanker, et al. (1999). "Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1." J Biol Chem 274(19): 13176-80.

The influence of a highly polymorphic CA dinucleotide repeat in the epidermal growth factor receptor (EGFR) gene on transcription was examined with a quantitative nuclear run-off method. We could demonstrate that transcription of the EGFR gene is inhibited by approximately 80% in alleles with 21 CA repeats. In experiments with polymerase chain reaction products that spanned a region of more than 4,000 base pairs and contained the promoter, two enhancers, and the polymorphic region in the first intron of the gene, we found that transcription activity declines with increasing numbers of CA dinucleotides. In vivo pre-mRNA expression data from cultured cell lines support these findings, although other regulation mechanisms can outweigh this effect. In addition, we showed that under our experimental conditions RNA elongation terminates at a site closely downstream of the simple sequence repeat and that there are two separate major transcription start sites. Our results provide new insights in individually different EGFR gene expression and the role of the CA repeat in transcription of this proto-oncogene.

Glatt, S. J., S. V. Faraone, et al. (2003). "CAG-Repeat length in exon 1 of KCNN3 does not influence risk for schizophrenia or bipolar disorder: A meta-analysis of association studies." Am J Med Genet 121B(1): 14-20.

Schizophrenia and bipolar disorder both show some evidence for genetic anticipation. In addition, significant expansion of anonymous CAG repeats throughout the genome has been detected in both of these disorders. The gene KCNN3, which codes for a small/intermediate conductance, calcium-regulated potassium channel, contains a highly polymorphic CAG-repeat array in exon 1. Initial evidence for association of both schizophrenia and bipolar disorder with increased CAG-repeat length of KCNN3 has not been consistently replicated. In the present study, we performed several meta-analyses to evaluate the pooled evidence for association with CAG-repeat length of KCNN3 derived from case-control and family-based studies of both disorders. Each group of studies was analyzed under two models, including a test for direct association with repeat length, and a test for association with dichotomized repeat-length groups. No evidence for a linear relationship between disease risk and repeat length was observed, as all pooled odds ratios approximated 1.0. Results of dichotomized allele-group analyses were more variable, especially for schizophrenia, where case-control studies found a significant association with longer repeats but family-based studies implicated shorter alleles. The results of these meta-analyses demonstrate that the risks for both schizophrenia and bipolar disorder are largely, if not entirely, independent of CAG-repeat length in exon 1 of KCNN3. This study cannot exclude the possibility that some aspect of this polymorphism, such as repeat-length disparity in heterozygotes, influences risk for these disorders. Further, it remains unknown if this polymorphism, or one in linkage disequilibrium with it, contributes to some distinct feature of the disorder, such as symptom severity or anticipation.

Greene, E., V. Handa, et al. (2003). "Transcription defects induced by repeat expansion: fragile X syndrome, FRAXE mental retardation, progressive myoclonus epilepsy type 1, and Friedreich ataxia." Cytogenet Genome Res 100(1-4): 65-76.

Fragile X mental retardation syndrome, FRAXE mental retardation, Progressive myoclonus epilepsy Type I, and Friedreich ataxia are members of a larger group of genetic disorders known as the Repeat Expansion Diseases. Unlike other members of this group, these four disorders all result from a primary defect in the initiation or elongation of transcription. In this review, we discuss current models for the relationship between the expanded repeat and the disease symptoms.

Harich, N., E. Esteban, et al. (2002). "Apolipoprotein molecular variation in Moroccan Berbers: pentanucleotide (TTTTA)n repeat in the LPA gene and APOE-C1-C2 gene cluster." Clin Genet 62(3): 240-4.

Apolipoprotein LPA, APOE, APOC1, and APOC2 genotype frequencies have been determined for the first time in a North African population. A sample of 140 Berber individuals from the Moroccan Moyen Atlas region has been analyzed. Allelic and haplotypic data have been used to compare our sample with other world populations and the results clearly differentiate Berbers from Europeans and Sub-Saharans, suggesting several distinctive features of Moroccan Berbers as the extreme high values of LPA PNR*11 pentanucleotide allele (10.5%) and the relatively high and low values of APOE*E4 (15.7%) and *E2 (4.5%) in comparison to other Mediterraneans. Another remarkable result is the frequency distribution of the two APOC2 alleles (70% vs 30%) in comparison with the European pattern (50% of each allele). The high values of APOE*E4 and LPA PNR*7 together with the intermediate linkage disequilibrium values between APOE and APOC1 alleles in comparison with Europeans and Africans suggest a certain degree of Sub-Saharan influence in the current Moroccan population.

Heger, A. and L. Holm (2000). "Rapid automatic detection and alignment of repeats in protein sequences." Proteins 41(2): 224-37.

Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. We have developed an automatic algorithm, RADAR, for segmenting a query sequence into repeats. The segmentation procedure has three steps: (i) repeat length is determined by the spacing between suboptimal self-alignment traces; (ii) repeat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment. The method identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in the query sequence. No manual intervention and no prior assumptions on the number and length of repeats are required. Comparison to the Pfam-A database indicates good coverage, accurate alignments, and reasonable repeat borders. Screening the Swissprot database revealed 3,000 repeats not annotated in existing domain databases. A number of these repeats had been described in the literature but most were novel. This illustrates how in times when curated databases grapple with ever increasing backlogs, automatic (re)analysis of sequences provides an efficient way to capture this important information.

Heringa, J. and P. Argos (1993). "A method to recognize distant repeats in protein sequences." Proteins 17(4): 391-41.

An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graph-theoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed from the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences.

Holmer, S. R., C. Hengstenberg, et al. (2003). "Association of polymorphisms of the apolipoprotein(a) gene with lipoprotein(a) levels and myocardial infarction." Circulation 107(5): 696-701.

BACKGROUND: Serum lipoprotein(a) [Lp(a)] concentration is largely determined by variability at the apolipoprotein(a) gene locus. Most prominent effects relate to polymorphisms in the promoter (a pentanucleotide [PN] repeat) and coding regions (a kringle IV [K4] repeat), the latter of which also affects Lp(a) particle size. The impact of these polymorphisms on cardiovascular risk is poorly understood. METHODS AND RESULTS: We studied both polymorphisms and Lp(a) levels in 834 registry-based myocardial infarction (MI) patients (38% women) and 1548 population-based controls. Lp(a) concentrations were inversely related with the numbers of K4 and PN repeats. However, the effect of the PN polymorphism was restricted to subjects producing small Lp(a) particles (<or=8 PN 66.1 mg/dL versus >8 PN 8.7 mg/dL; P<0.0001). The odds to present with MI were elevated in individuals producing small Lp(a) particles (<or=22 K4 repeats; OR 1.47 for men and 1.69 for women; P<0.002) and in women with <or=8 PN repeats (OR 1.46, P=0.009). Interestingly, in women, the frequent haplotype with <or=8 PN and <or=22 K4 repeats, which is related to high levels of small Lp(a) particles, resulted in an elevated OR for MI (1.79; P=0.01) independently of Lp(a) serum concentration. CONCLUSIONS: The K4 and PN repeat polymorphisms largely explain the high variability of serum Lp(a) levels. A haplotype with <or=8 PN and <or=22 K4 repeats is characterized by high concentrations of small Lp(a) particles. Our observation that this haplotype was associated with MI independently of Lp(a) serum levels may suggest that Lp(a) particle size in addition to its concentration may modulate MI risk in women.

Hosack, D. A., G. Dennis, Jr., et al. (2003). "Identifying biological themes within lists of genes with EASE." Genome Biol 4(10): R70.

EASE is a customizable software application for rapid biological interpretation of gene lists that result from the analysis of microarray, proteomics, SAGE and other high-throughput genomic data. The biological themes returned by EASE recapitulate manually determined themes in previously published gene lists and are robust to varying methods of normalization, intensity calculation and statistical selection of genes. EASE is a powerful tool for rapidly converting the results of functional genomics studies from 'genes' to 'themes'.

Hsing, A. W., Y. T. Gao, et al. (2000). "Polymorphic CAG and GGN repeat lengths in the androgen receptor gene and prostate cancer risk: a population-based case-control study in China." Cancer Res 60(18): 5111-6.

The length of the polymorphic CAG trinucleotide repeat in the polyglutamine region of the androgen receptor (AR) gene is inversely correlated with the transactivation function of the AR. Because increased androgenic activity has been linked to prostate cancer and because an ethnic variation exists in the CAG repeat length, this polymorphism has been suggested to explain part of the substantial racial difference in prostate cancer risk. We conducted a population-based case-control study in China to investigate whether CAG and other polymorphisms of the AR gene are associated with clinically significant prostate cancer in this low-risk population. Genomic DNA from 190 prostate cancer patients and 304 healthy controls was used for direct sequencing to evaluate the relationship of CAG and GGN (polyglycine) repeat length in the AR gene. Relative to western men, our study subjects had a longer CAG repeat length, with a median of 23 and only 10% of the subjects having a CAG repeat length shorter than 20. Men with a CAG repeat length shorter than 23 (median length) had a 65% increased risk of prostate cancer (odds ratio, 1.65; 95% confidence interval, 1.14-2.39), compared with men with a CAG repeat length of 23 or longer. For the GGN tract (GGT3GGG1GGT2GGCn), based on the sequencing results from 481 samples, we are the first to show that although GGC regions in the polyglycine tract are highly variable, there are no mutations or polymorphisms in the GGT and GGG regions. More than 72% of the subjects had a GGN repeat length of 23, and those with a GGN repeat length shorter than 23 had a 12% increased risk of prostate cancer (95% confidence interval, 0.71-1.78), compared with those with > or = 23 GGN repeats. Our study not only confirms that Chinese men do have a longer CAG repeat length than western men but also represents the first population-based study to show that even in a very low-risk population, a shorter CAG repeat length confers a higher risk of clinically significant prostate cancer. These results imply that CAG repeat length can potentially serve as a useful marker to identify a subset of individuals at higher risk of developing clinically significant prostate cancer. Larger studies are needed to evaluate the combined effect of CAG and GGN repeats. Because of the significance of AR in prostate cancer, investigation of factors that interact with the polyglutamine region of the AR gene to alter AR function and modulate prostate cancer risk is an important area for future research.

Huang, Q. Y., H. Shen, et al. (2003). "Linkage and association of the CA repeat polymorphism of the IL6 gene, obesity-related phenotypes, and bone mineral density (BMD) in two independent Caucasian populations." J Hum Genet.

Genetic factors play an important role in osteoporosis and obesity, two serious public health problems in the world. We investigated the relationships between obesity-related phenotypes, bone mineral density (BMD) and the CA repeat polymorphism of the IL6 gene in two large independent samples using the quantitative transmission disequilibrium test (QTDT). The first sample consisted of 1,816 individuals from 79 multigenerational pedigrees. Each pedigree was identified through a proband with BMD Z-scores </=-1.28 at the hip or spine. The second sample was a randomly ascertained set of 636 individuals from 157 nuclear families. Ten alleles containing 9-18 CA repeats were identified in our Caucasian populations. For body mass index (BMI), fat mass and percentage fat mass (PFM), highly significant (P<0.01) or significant (P<0.05) results were found for linkage in our sample of nuclear families and for association in the multigenerational pedigrees. We also observed weak evidence for linkage (P=0.069) with spine BMD and for association with hip BMD in the sample of multigenerational pedigrees. Our results suggest that genetic variation in or near the IL6 locus may be involved in the etiology of obesity and osteoporosis.

Hughes, A. L., B. Packer, et al. (2003). "Widespread purifying selection at polymorphic sites in human protein-coding loci." Proc Natl Acad Sci U S A 100(26): 15754-7.

Estimation of gene diversity (heterozygosity) at 1442 single-nucleotide polymorphism (SNP) loci in an ethnically diverse sample of humans revealed consistently reduced gene diversities at SNP loci causing amino acid changes, particularly those causing amino acid changes predicted to be disruptive to protein structure. The reduction of gene diversity at these SNP loci, in comparison to SNPs in the same genes not affecting protein structure, is evidence that negative natural selection (purifying selection) has reduced the population frequencies of deleterious SNP alleles. This, in turn, suggests that slightly deleterious mutations are widespread in the human population and that estimation of gene diversity even in a sample of modest size can help guide the search for disease-associated genes.

Hui, J., G. Reither, et al. (2003). "Novel functional role of CA repeats and hnRNP L in RNA stability." Rna 9(8): 931-936.

CA dinucleotide repeat sequences are very common in the human genome. We have recently demonstrated that the polymorphic CA repeats in intron 13 of the human endothelial nitric oxide synthase (eNOS) gene function as an unusual, length-dependent splicing enhancer. The CA repeat enhancer requires for its activity specific binding of hnRNP L. Here we show that in the absence of bound hnRNP L, the pre-mRNA is cleaved directly upstream of the CA repeats. The addition of recombinant hnRNP L restores RNA stability. CA repeats are both necessary and sufficient for this specific cleavage in the 5' adjacent RNA sequence. We conclude that-in addition to its role as a splicing activator-hnRNP L can act in vitro as a sequence-specific RNA protection factor. Based on the wide abundance of CA repetitive sequences in the human genome, this may represent a novel, generally important role of this abundant hnRNP protein.

Hui, J., K. Stangl, et al. (2003). "HnRNP L stimulates splicing of the eNOS gene by binding to variable-length CA repeats." Nat Struct Biol 10(1): 33-7.

In the human genome, dinucleotide repeats are common sequence elements of unknown functional significance. Here we demonstrate that CA repeats in intron 13 of the human endothelial nitric oxide synthase (eNOS) gene function as an unusual intronic splicing enhancer, whose activity depends on the CA repeat number. We identify the 65 kDa heterogenous nuclear ribonucleoprotein (hnRNP) L as the major factor that binds specifically and in a length-dependent manner to the CA-repeat enhancer. In addition, we show that hnRNP L functions as a specific activator of eNOS splicing, providing the first evidence for a role of hnRNP L in the regulation of mRNA splicing. We hypothesize that hnRNP L may be involved in the regulation of many other genes containing CA repeats or A/C-rich enhancers.

Ibanez, L., K. K. Ong, et al. (2003). "Androgen receptor gene CAG repeat polymorphism in the development of ovarian hyperandrogenism." J Clin Endocrinol Metab 88(7): 3333-8.

Ovarian hyperandrogenism, a key feature of polycystic ovary syndrome, is preceded by precocious pubarche (PP) (pubic hair < 8 yr) in some populations. We hypothesized that this earlier presentation may relate to increased androgen sensitivity, indicated by androgen receptor gene CAG repeat length. This polymorphism was genotyped in 181 Barcelona girls (age, 10.9 yr; range, 4-19 yr) who had presented with PP, and in 124 Barcelona control girls. PP girls had shorter mean CAG number than Barcelona controls (PP vs. controls: mean, range: 21.3, 7-31 repeats vs. 22.0, 15-32, P = 0.003) and greater proportion of short alleles 20 repeats or less (37.0% vs. 24.6%, P = 0.002). Among post-menarcheal PP girls (n = 69), shorter CAG number (biallelic mean </=20) was associated with higher 17-hydroxy-progesterone levels post leuprolide (P = 0.009), indicative of ovarian hyperandrogenism, higher testosterone levels (P = 0.02), acne (P = 0.03) and hirsutism scores (P = 0.01), and more menstrual cycle irregularities (P = 0.04). In multiple regression, ovarian hyperandrogenism risk was related to both low birth weight (SD <-1.5: odds ratio = 17.0; 95% confidence interval: 4.2-69.2) and shorter mean CAG number (20 or less repeats: odds ratio = 7.3; 1.3-42.0). In summary, shorter androgen receptor gene CAG number, indicative of increased androgen sensitivity, increases risks for PP and subsequent ovarian hyperandrogenism. Shorter CAG repeat alleles in Barcelona compared with United Kingdom women could lead to higher prevalences of these conditions.

Irizarry, K., V. Kustanovich, et al. (2000). "Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences." Nat Genet 26(2): 233-6.

Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations-comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing-verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).

Itokawa, M., K. Yamada, et al. (2003). "Genetic analysis of a functional GRIN2A promoter (GT)n repeat in bipolar disorder pedigrees in humans." Neurosci Lett 345(1): 53-6.

Hypofunction of glutamatergic neurotransmission has been hypothesized to underlie the pathophysiology of bipolar affective disorder, as well as schizophrenia. We examined the role of the N-methyl-D-aspartate receptor 2A subunit (GRIN2A) gene on 16p13.3, a region thought to be linked to bipolar disorder, (1) because in a prior study we identified a functional and polymorphic (GT)n repeat in the 5' regulatory region of the gene, with longer alleles showing lower transcriptional activity and an over representation in schizophrenia, and (2) because of the suggestion of a genetic overlap between affective disorder and schizophrenia. Family-based association tests detected a nominally significant preferential transmission of longer alleles in a panel of 96 multiplex bipolar pedigrees. These results support the hypothesis that a hypoglutamatergic state is involved in the pathogenesis of bipolar affective disorder.

Itokawa, M., K. Yamada, et al. (2003). "A microsatellite repeat in the promoter of the N-methyl-D-aspartate receptor 2A subunit (GRIN2A) gene suppresses transcriptional activity and correlates with chronic outcome in schizophrenia." Pharmacogenetics 13(5): 271-278.

Hypofunction of the N-methyl-d-aspartate (NMDA) receptor has been hypothesized to underlie the pathophysiology of schizophrenia, based on the observation that non-competitive antagonists of the NMDA receptor, such as phencyclidine, induce schizophrenia-like symptoms. Mice lacking the NR2A subunit of the NMDA receptor complex are known to display abnormal behaviour, similar to schizophrenic symptoms. The expression of NR2A starts at puberty, a period corresponding to the clinical onset of schizophrenia. This evidence suggests that the NR2A (GRIN2A) gene may play a role in the development of schizophrenia and disease phenotypes. In this study, we performed a genetic analysis of this gene in schizophrenia. Analysis of the GRIN2A gene detected four single nucleotide polymorphisms, and a variable (GT)(n) repeat in the promoter region of the gene. A case-control study (375 schizophrenics and 378 controls) demonstrated evidence of an association between the repeat polymorphism and the disease (P = 0.05, Mann-Whitney test), with longer alleles overly represented in patients. An in-vitro promoter assay revealed a length dependent inhibition of transcriptional activity by the (GT)(n) repeat, which was consistent with a receptor binding assay in postmortem brains. Significantly, the score of symptom severity in chronic patients correlated with repeat size (P = 0.01, Spearman's Rank test). These results illustrate a genotype-phenotype correlation in schizophrenia and suggest that the longer (GT)(n) stretch may act as a risk-conferring factor that worsens chronic outcome by reducing GRIN2A levels in the brain.

Jakupciak, J. P. and R. D. Wells (1999). "Genetic instabilities in (CTG.CAG) repeats occur by recombination." J Biol Chem 274(33): 23468-79.

The expansion of triplet repeat sequences (TRS) associated with hereditary neurological diseases is believed from prior studies to be due to DNA replication. This report demonstrates that the expansion of (CTG.CAG)(n) in vivo also occurs by homologous recombination as shown by biochemical and genetic studies. A two-plasmid recombination system was established in Escherichia coli with derivatives of pUC19 (harboring the ampicillin resistance gene) and pACYC184 (harboring the tetracycline resistance gene). The derivatives contained various triplet repeat inserts ((CTG.CAG), (CGG.CCG), (GAA.TTC), (GTC.GAC), and (GTG.CAC)) of different lengths, orientations, and extents of interruptions and a control non-repetitive sequence. The availability of the two drug resistance genes and of several unique restriction sites on the plasmids enabled rigorous genetic and biochemical analyses. The requirements for recombination at the TRS include repeat lengths >30, the presence of CTG.CAG on both plasmids, and recA and recBC. Sequence analyses on a number of DNA products isolated from individual colonies directly demonstrated the crossing-over and expansion of the homologous CTG.CAG regions. Furthermore, inversion products of the type [(CTG)(13)(CAG)(67)].[(CTG)(67)(CAG)(13)] were isolated as the apparent result of "illegitimate" recombination events on intrahelical pseudoknots. This work establishes the relationships between CTG.CAG sequences, multiple fold expansions, genetic recombination, formation of new recombinant DNA products, and the presence of both drug resistance genes. Thus, if these reactions occur in humans, unequal crossing-over or gene conversion may also contribute to the expansions responsible for anticipation associated with several hereditary neurological syndromes.

Jansen, G., P. Willems, et al. (1994). "Gonosomal mosaicism in myotonic dystrophy patients: involvement of mitotic events in (CTG)n repeat variation and selection against extreme expansion in sperm." Am J Hum Genet 54(4): 575-85.

Myotonic dystrophy (DM) is caused by abnormal expansion of a polymorphic (CTG)n repeat, located in the DM protein kinase gene. We determined the (CTG)n repeat lengths in a broad range of tissue DNAs from patients with mild, classical, or congenital manifestation of DM. Differences in the repeat length were seen in somatic tissues from single DM individuals and twins. Repeats appeared to expand to a similar extent in tissues originating from the same embryonal origin. In most male patients carrying intermediate- or small-sized expansions in blood, the repeat lengths covered a markedly wider range in sperm. In contrast, male patients with large allele expansions in blood (> 700 CTGs) had similar or smaller repeats in sperm, when detectable. Sperm alleles with > 1,000 CTGs were not seen. We conclude that DM patients can be considered gonosomal mosaics, i.e., combined somatic and germ-line tissue mosaics. Most remarkably, we observed multiple cases where the length distributions of intermediate- or small-sized alleles in fathers' sperm were significantly different from that in their offspring's blood. Our combined findings indicate that intergenerational length changes in the unstable CTG repeat are most likely to occur during early embryonic mitotic divisions in both somatic and germ-line tissue formation. Both the initial CTG length, the overall number of cell divisions involved in tissue formation, and perhaps a specific selection process in spermatogenesis may influence the dynamics of this process. A model explaining mitotic instability and sex-dependent segregation phenomena in DM manifestation is discussed.

Jeffreys, A. J., N. J. Royle, et al. (1988). "Spontaneous mutation rates to new length alleles at tandem-repetitive hypervariable loci in human DNA." Nature 332(6161): 278-81.

Tandem-repetitive minisatellite regions in vertebrate DNA frequently show substantial allelic variation in the number of repeat units. This variation is thought to arise through processes such as unequal crossover or replication slippage. We show here that the spontaneous mutation rate to new length alleles at extremely variable human minisatellites is sufficiently high to be directly measurable in human pedigrees. The mutation rate at different loci increases with variability in accord with the neutral mutation/random drift hypothesis, and rises to 5% per gamete for the most unstable human minisatellite isolated. Mutations are sporadic, occur with similar frequencies in sperm and oocytes, and can involve the gain or loss of substantial numbers of repeat units, consistent with length changes arising primarily by unequal exchange at meiosis. Germline instability must therefore be taken into account when using hypervariable loci as genetic markers, particularly in pedigree analysis and parenthood testing.

Jordan, P., L. A. Snyder, et al. (2003). "Diversity in coding tandem repeats in related Neisseria spp." BMC Microbiol 3(1): 23.

BACKGROUND: Tandem repeats contained within coding regions can mediate phase variation when the repeated units change the reading frame of the coding sequence in a copy number dependent manner. Coding tandem repeats are those which do not alter the reading frame with copy number, and the changes in copy number of these repeats may then potentially alter the function or antigenicity of the protein encoded. Three complete neisserial genomes were analyzed and compared to identify coding tandem repeats where the number of copies of the repeat will have some structural consequence for the protein. This is the first study to address coding tandem repeats that may affect protein structures using comparative genomics, combined with a population survey to investigate which show interstrain variability. RESULTS: A total of 28 genes were identified. Of these, 22 contain coding tandem repeats that vary in copy number between the three sequenced strains, three strain specific genes were included for investigation on the basis of having >90% identity between repeated units, and three genes with repeated elements of >250 bp were included although no length variations were seen in the genomes. Amplification, and sequencing of repeats showing altered copy number, of these 28 coding tandem repeat containing regions, from a set of largely unrelated strains, revealed further repeat length variation in several cases. CONCLUSION: Eighteen genes were identified which have variation in repeat copy number between strains of the same species, twelve of which show greater diversity in repeat copy number than is present in the sequenced genomes. In some cases, this may reflect a mechanism for the generation of antigenic variation, as previously described in other species. However, some of the genes identified encode proteins with cytoplasmic functions, including sugar metabolism, DNA repair, and protein production, in which repeat length variation may have other functions. Coding tandem repeats appear to represent a largely unexplored mechanism of generating diversity in the Neisseria spp.

Kasami, M., H. Gobbi, et al. (2000). "Androgen receptor CAG repeat lengths in ductal carcinoma in situ of breast, longest in apocrine variety." Breast 9(1): 23-7.

CAG repeat number in the androgen receptor (AR) has been associated with decreased prostate cancer risk, and AR expression has been found in female breast cancer, often associated with apocrine differentiation. Because trinucleotide expansion can alter gene expression and protein function, we hypothesized that it might occur in breast neoplasms. We used a repeat expansion detection technique to determine CAG repeat lengths in DNA from breast biopsies. Three lesion types were microdissected: fibroadenoma (48 cases), ductal carcinoma in situ (DCIS, 24 cases), and invasive mammary carcinoma (18 cases). The maximum number of CAG repeats in either allele of each patient in these three groups was compared. Microsatellite repeat lengths in DCIS were longer than in fibroadenomas or invasive carcinomas (P= 0.017 comparing DCIS vs invasive carcinomas). Two cases of apocrine DCIS had very long repeat lengths, both exhibiting microsatellite lengths at the longest range of normal (32 and 33). Inherited differences in AR CAG length might influence the transition from DCIS to invasive breast cancer, perhaps by modulating function of AR in breast tissue. AR microsatellite polymorphisms could influence cellular differentiation in DCIS lesions, promoting formation of the apocrine subtype in the presence of longer CAG repeats.

Kato, I., J. Eastham, et al. (2003). "Genotype-phenotype analysis for the polymorphic CA repeat in the insulin-like growth factor-I (IGF-I) gene." Eur J Epidemiol 18(3): 203-9.

Polymorphic microsatellite dinucleotide (cytosine-adenine, CA) repeats of the insulin-like growth factor-I (IGF-I) gene may have implications in the development of certain types of cancer and osteoporosis. We studied correlations between IGF-I genotypes determined by direct sequencing and plasma IGF-I levels among 113 healthy individuals (60 men and 53 women), who were originally enrolled as controls for hospital-based case-control studies of breast and prostate cancer. On the contrary to an earlier observation, there were no differences in plasma IGF-I levels between those with and without the allele of 19 repeats. With adjustment for other confounders, there were no trends in plasma IGF-I levels with increasing or decreasing the number of CA repeats among all study subjects combined, all Whites or all Blacks, suggesting no overall functional significance of this polymorphism. Opposite trends observed in gender and racial subgroups, i.e., an inverse association between plasma IGF-I levels and the CA repeat length in white women and a positive association in black men, are likely to be chance findings.

Katti, M. V., R. Sami-Subbu, et al. (2000). "Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications." Protein Sci 9(6): 1203-9.

All the protein sequences from SWISS-PROT database were analyzed for occurrence of single amino acid repeats, tandem oligo-peptide repeats, and periodically conserved amino acids. Single amino acid repeats of glutamine, serine, glutamic acid, glycine, and alanine seem to be tolerated to a considerable extent in many proteins. Tandem oligo-peptide repeats of different types with varying levels of conservation were detected in several proteins and found to be conspicuous, particularly in structural and cell surface proteins. It appears that repeated sequence patterns may be a mechanism that provides regular arrays of spatial and functional groups, useful for structural packing or for one to one interactions with target molecules. To facilitate further explorations, a database of Tandem Repeats in Protein Sequences (TRIPS) has been developed and is available at URL: http://www.ncl-india.org/trips.

Kawakami, K. and G. Watanabe (2003). "Identification and functional analysis of single nucleotide polymorphism in the tandem repeat sequence of thymidylate synthase gene." Cancer Res 63(18): 6004-7.

The variable number of tandem repeat (VNTR) of thymidylate synthase (TS) gene, mainly 2 repeat (2R) and 3 repeat (3R), is one of the genetic variations that can potentially predict the effectiveness of 5-fluorouracil-based chemotherapy. In this study we identified an additional single nucleotide polymorphism (SNP) in the VNTR of TS, followed by functional and clinical analysis of the SNP. Two-hundred fifty eight tumor samples were obtained from patients with primary colorectal adenocarcinoma. We observed three different patterns of electrophoresis by analysis of the VNTR with 2R/3R heterozygote. The sequencing results revealed a SNP, G/C polymorphism, within the 28-bp repeat component of TS VNTR. Each polymorphic allele was assigned as 2G, 2C, 3G, or 3C according to the combination of SNP and VNTR. Functional analysis showed that the plasmid construct with 3G sequence had three to four times greater efficiency of translation than other polymorphic sequences. 3R allele in colorectal cancer was subdivided into around half by the SNP, indicating its commonness among Japanese. TS genotypes of the patients with colorectal cancer were classified into high expression type (2R/3G, 3C/3G, and 3G/3G) and low expression type (2R/2R, 2R/3C, and 3C/3C). The patients who received oral fluoropyrimedines survived longer than the patients with no treatment in the group of low expression type. No benefit of oral fluoropyrimedines was observed in the group of high expression type. These results suggest that the double polymorphism in the TS tandem repeat sequence, the SNP and the VNTR, may provide a potential for more effective prediction of the clinical outcome of 5-fluorouracil-based chemotherapy.

Kenny, D., C. Muckian, et al. (2002). "Platelet glycoprotein Ib alpha receptor polymorphisms and recurrent ischaemic events in acute coronary syndrome patients." J Thromb Thrombolysis 13(1): 13-9.

AIMS: To examine the relationship between polymorphisms in the platelet receptor glycoprotein (GP) Ib(alpha) and recurrent ischaemic events, and assess their impact on response to anti-platelet treatment. METHODS AND RESULTS: 1014 patients presenting with unstable coronary syndrome were recruited from the OPUS-TIMI 16 clinical trial of the platelet GPIIb/IIIa antagonist, orbofiban. The subjects were genotyped for two polymorphisms in the gene for GPIb(alpha). These were a T-5C polymorphism in the 5' untranslated Kozak region of the GPIb(alpha) gene, and the variable number of tandem repeats (VNTR) in the macroglycopeptide region.165 patients had events (recurrent ischaemia, urgent revascularisation, myocardial infarction (MI), stroke and death). There was no effect of the number of -5C alleles on composite endpoint frequency among Caucasian subjects (test for trend, p = 0.47). However, MI risk increased with the number of -5C alleles carried, with MI occurring in 2.3% of patients with the -5T/-5T genotype, 5.0% of -5T/-5C, and 16.7% of -5C/-5C (p < 0.01). The effect of treatment on MI outcome was not significantly modified by genotype (test for interaction, p = 0.10). The overall risk of bleeding was not strongly influenced by either the -5C or the VNTR polymorphisms. CONCLUSION: In an unstable coronary syndrome population the T-5C polymorphism in GPIb(alpha) influences risk of subsequent MI.

Kohlgraf, K. G., A. J. Gawron, et al. (2003). "Contribution of the MUC1 tandem repeat and cytoplasmic tail to invasive and metastatic properties of a pancreatic cancer cell line." Cancer Res 63(16): 5011-20.

MUC1 is a polymorphic, highly glycosylated, type I transmembrane protein expressed by ductal epithelial cells of many organs including pancreas, breast, gastrointestinal tract, and airway. MUC1 is overexpressed and differentially glycosylated by adenocarcinomas that arise in these organs, and is believed to contribute to invasive and metastatic potential by contributing to cell surface adhesion properties [via the tandem repeat (TR) domain] and through morphogenetic signal transduction [via the cytoplasmic tail (CT)]. The large extracellular TR of MUC1 consists of a heavily glycosylated, 20 amino acid sequence that shows allelic variation with respect to number of repeats. This portion of MUC1 may directly mediate adhesive or antiadhesive interactions with other surface molecules on adjacent cells and through these interactions initiate signal transduction pathways that are transmitted through the CT. We investigated the contribution of the TR domain and the CT of MUC1 to the in vivo invasive and metastatic potential, and the gene expression profile of the human pancreatic tumor cell line S2-013. Results showed that S2-013 cells overexpressing full-length MUC1 displayed a less invasive and metastatic phenotype compared with control-transfected cells and cells expressing MUC1 lacking the TR domain or CT. Clonal populations were analyzed by cDNA array gene expression analysis, which showed differences in the gene expression profiles between the different cell lines. Among the genes differentially expressed were several that encode proteins believed to play a role in invasion and metastasis.

Kryukov, G. V., S. Schmidt, et al. (2005). "Small fitness effect of mutations in highly conserved non-coding regions." Hum Mol Genet 14(15): 2221-9.

Comparison of human and mouse genomes has revealed that many non-coding regions have levels of sequence conservation similar to protein-coding genes. These regions have attracted a lot of attention as potentially functional genomic sequences. However, little is known about the effect mutations in these conserved non-coding regions have on fitness and how many of them are present in the human genome as deleterious polymorphisms. To gain insight into the selective constraints imposed on conserved non-coding and protein-coding regions, we compared substitution rates in primate and rodent lineages and analyzed the density and allele frequencies of human polymorphism. Genomic regions conserved between primate and rodent groups show higher relative conservation within rodents than within primates. Thus, our analysis indicates a genome-wide relaxation of selective constraint in the primate lineage, which most likely resulted from a smaller effective population size. We found that this relaxation is much more profound in conserved non-coding regions than in protein-coding regions, and that mutations at a large proportion of sites in conserved non-coding regions are associated with very small fitness effect. Data on human polymorphism are also consistent with very weak selection in conserved non-coding regions. This staggering enrichment in sites at the borderline of neutrality can be explained by assuming an important role for synergistic epistasis in the evolution of non-coding regions. Our results suggest that most individual mutations in conserved non-coding regions are only slightly deleterious but are numerous and may have a significant cumulative impact on fitness.

Ladomery, M. and G. Dellaire (2002). "Multifunctional zinc finger proteins in development and disease." Ann Hum Genet 66(Pt 5-6): 331-42.

Post-transcriptional processes contribute significantly towards the generation of proteomic diversity. An increasing number of mutations have been described that affect genes encoding components of the post-transcriptional machinery. In particular, multifunctional proteins that link transcription with post-transcriptional processes have been implicated in several human diseases including cancer. A predominant feature of these proteins is the zinc finger, an ancient structural motif that mediates protein ratio protein interactions and is capable of interacting with both DNA and RNA. Zinc finger proteins are the most abundant class of proteins in the human proteome, yet the majority remain uncharacterised. Here we describe multifunctional zinc finger proteins linked to human development and disease. The examples discussed are WT1, ZNF74, EWS, TLS, TAFII68, YY1, CTCF and the GLI proteins. The study of these and other zinc finger proteins provides insights into the functional versatility of the zinc finger motif and suggests that both alternative splicing and sub-cellular compartmentalisation may modulate their multifunctionality.

Laity, J. H., B. M. Lee, et al. (2001). "Zinc finger proteins: new insights into structural and functional diversity." Curr Opin Struct Biol 11(1): 39-46.

Zinc finger proteins are among the most abundant proteins in eukaryotic genomes. Their functions are extraordinarily diverse and include DNA recognition, RNA packaging, transcriptional activation, regulation of apoptosis, protein folding and assembly, and lipid binding. Zinc finger structures are as diverse as their functions. Structures have recently been reported for many new zinc finger domains with novel topologies, providing important insights into structure/function relationships. In addition, new structural studies of proteins containing the classical Cys(2)His(2) zinc finger motif have led to novel insights into mechanisms of DNA binding and to a better understanding of their broader functions in transcriptional regulation.

Lalioti, M. D., S. E. Antonarakis, et al. (2003). "The epilepsy, the protease inhibitor and the dodecamer: progressive myoclonus epilepsy, cystatin b and a 12-mer repeat expansion." Cytogenet Genome Res 100(1-4): 213-23.

Progressive myoclonus epilepsy 1 (EPM1) or Unverricht-Lundborg disease is a human autosomal recessive neurodegenerative disorder caused by mutations in cystatin B (CSTB). The CSTB gene maps to human chromosome 21 and encodes an inhibitor of lysosomal cysteine proteases. Five point mutations have been found, two of which are seen in numerous unrelated patients. However, the main CSTB mutation in EPM1, even among patients of different ethnic origins, is an expansion of a dodecamer repeat (CCCCGCCCCGCG) in the 5' flanking area of CSTB. Most normal alleles contain either two or three repeats, while rarer normal alleles that are highly unstable contain between 12 and 17 repeats. Mutant expanded alleles have been reported to contain between 30 and 80 copies and are also highly unstable, particularly via parental transmission. There is no apparent correlation between mutant repeat length and disease phenotype. While the repeat expansion is outside the CSTB transcriptional unit, it results in a marked decrease in CSTB expression, at least in certain cell types in vitro. CSTB homozygous knockout mice show some parallels to the phenotype of human EPM1 including myoclonic seizures, development of ataxia and neuropathological changes associated with cell loss via apoptosis. Loss of CSTB function due to mutations is consistent with the observed neurodegenerative pathology and phenotype, but the functional link to the epileptic phenotype of EPM1 remains largely unknown.

Lanar, D. E. and K. C. Kain (1994). "Expression-PCR (E-PCR): overview and applications." PCR Methods Appl 4(2): S92-6.

Lander, E. S., L. M. Linton, et al. (2001). "Initial sequencing and analysis of the human genome." Nature 409(6822): 860-921.

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Langdahl, B. L., L. Stenkjaer, et al. (2003). "A CAG Repeat Polymorphism in the Androgen Receptor Gene is Associated with Reduced Bone Mass and Increased Risk of Osteoporotic Fractures." Calcif Tissue Int.

Osteoporosis is a common disease with a strong genetic component. Hypogonadism results in low bone mass and it increases significantly the risk of osteoporosis in both sexes. The estrogen and androgen receptor genes are therefore strong candidates for mediating the genetic influence on bone mass and risk of osteoporosis. A CAG repeat in the first exon of the androgen receptor (AR) is associated with reduced transcriptional activity of the AR. We therefore examined whether this CAG repeat polymorphism is associated with changes in bone mass and risk of osteoporotic fractures in 284 osteoporotic patients with vertebral fractures and 327 normal individuals. The number of CAG repeats varied between 13 and 30 in men and between 7 and 34 in women. The short and long alleles comprised 19.2 +/- 2.5 and 19.0 +/- 2.3 repeats (ns) and 22.7 +/- 2.4 and 21.9 +/- 2.4 (P < 0.01) in women with vertebral fractures and normal women, respectively. This difference was also reflected in the average number of CAG repeats: 21.0 +/- 2.0 in osteoporotic women vs. 20.5 +/- 2.0 in normal women (P < 0.05). 54.8% of women with osteoporotic fractures vs. 45.9% of normal women had average number of CAG repeats of 21 and more (chi(2) = 3.11, P = 0.08). Logistic regression analyses revealed that both the average number of CAG repeats and the length of the long allele were significant predictors of osteoporotic fractures in women (P < 0.05 and P < 0.01, respectively). Men with vertebral fractures had 20.0 +/- 2.8 CAG repeats compared with 20.7 +/- 2.5 CAG repeats in normal men (ns). Linear regression analysis revealed that the length of the long allele was negatively correlated with BMD of the lumbar spine (P < 0.05) and femoral neck (P < 0.05) in women. In men, linear regression analyses demonstrated that BMD of the lumbar spine (P < 0.05), femoral neck (P < 0.05) and total hip (P < 0.05) was positively correlated with length of the CAG repeat polymorphism. In conclusion, we have demonstrated that the CAG repeat polymorphism in the first exon of the AR gene is associated with reduced bone mass and increased risk of osteoporotic fractures in women.

Laule, M., C. Meisel, et al. (2003). "Interaction of CA repeat polymorphism of the endothelial nitric oxide synthase and hyperhomocysteinemia in acute coronary syndromes: evidence of gender-specific differences." J Mol Med.

We have recently shown that high CA repeat copy numbers (>/= 34 repeats) in intron 13 of the endothelial nitric oxide (eNOS) gene are associated with excess risk of coronary artery disease. Hyperhomocysteinemia interacts by several mechanisms with the NO system, thereby favoring endothelial dysfunction. Since hyperhomocysteinemia evidently promotes prothrombotic activation, we investigated a possible interaction among hyperhomocysteinemia, the eNOS CA repeat polymorphism, and acute coronary syndromes. The median value of homocysteine in our study population was 9.4 micro mol/l. We accordingly determined the relative risk of acute coronary syndromes for homocysteine values higher than 9.4 micro mol/l and 9.4 micro mol/l or lower in the entire coronary artery disease group, and at different CA repeat cutoff values (34, 35, 36, 37, 38 CA repeats). For the entire coronary artery disease group (n=1000), homocysteine levels higher than 9.4 micro mol/l were not significantly associated with acute coronary syndromes. Although the CA repeat copy numbers were not associated with acute coronary syndromes in the overall group, the relative risk among women with homocysteine higher than 9.4 micro mol/l for developing acute coronary syndromes increased nonsignificantly from 0.98 at cutoff 34 CA repeats to 1.68 at 35 CA repeats and significantly to 4.89 at 36 CA repeats, 11.20 at 37 CA repeats, and 18.32 at 38 CA repeats. This effect modification was not observed in men. These data suggest gender-specific gene-environment interaction between the CA repeat eNOS polymorphism and homocysteine in acute coronary syndromes.

Lee, Y. J., D. M. Chang, et al. (2003). "Association of a 27-bp repeat polymorphism in intron 4 of endothelial constitutive nitric oxide synthase gene with serum uric acid levels in Chinese subjects with type 2 diabetes." Metabolism 52(11): 1448-53.

Nitric oxide (NO) was found to modulate uric acid production through its influence on xanthine oxidase activity, and a close circadian relationship of serum uric acid (SUA) and NO was reported. Studies also revealed that serum NO activity could be determined by endothelial constitutive nitric oxide synthase gene (ecNOS) polymorphism. This study was designed to investigate whether SUA could be influenced by a 27-bp repeat polymorphism in intron 4 of ecNOS gene. A total of 398 nondiabetic subjects and 800 patients with type 2 diabetes were studied. The ecNOS gene intron 4 polymorphism was determined by polymerase chain reaction (PCR). The mean SUA level of patients having type 2 diabetes was significantly lower than that of control subjects (6.1 +/- 1.8 mg/dL v 6.6 +/- 1.8 mg/dL, P<.001); and the mean SUA level of diabetic patients with ecNOS ab/aa genotypes was lower than that of patients with bb genotype (5.7 +/- 1.6 mg/dL v 6.2 +/- 1.8 mg/dL, P=.008). When subgrouped by gender, the SUA of female diabetic subjects was found to be significantly associated with ecNOS genotype. Using Pearson's correlation analysis and multiple linear regression analysis, ecNOS genotype was noticed to be an independent factor in contributing to SUA variability in female diabetic patients. Our results suggest that SUA levels may be associated with NO activity and can be genetically predetermined.

Lehmann, D. J., H. T. Butler, et al. (2003). "Association of the androgen receptor CAG repeat polymorphism with Alzheimer's disease in men." Neurosci Lett 340(2): 87-90.

We examined the CAG repeat polymorphism in exon 1 of the androgen receptor (AR) in an Oxford cohort of 150 cases (101 men) of definite or probable Alzheimer's disease (AD) and 190 elderly controls (140 men). We found that short alleles (</=20 CAG repeats) were associated with AD (adjusted odds ratio=2.5, 95% confidence intervals: 1.2-5.0) in men, but not in women. This association appeared stronger in early-onset AD (<65 years). We conclude that this AR polymorphism is of potential relevance to the risk of AD in men.

Lenzmeier, B. A. and C. H. Freudenreich (2003). "Trinucleotide repeat instability: a hairpin curve at the crossroads of replication, recombination, and repair." Cytogenet Genome Res 100(1-4): 7-24.

The trinucleotide repeats that expand to cause human disease form hairpin structures in vitro that are proposed to be the major source of their genetic instability in vivo. If a replication fork is a train speeding along a track of double-stranded DNA, the trinucleotide repeats are a hairpin curve in the track. Experiments have demonstrated that the train can become derailed at the hairpin curve, resulting in significant damage to the track. Repair of the track often results in contractions and expansions of track length. In this review we introduce the in vitro evidence for why CTG/CAG and CCG/CGG repeats are inherently unstable and discuss how experiments in model organisms have implicated the replication, recombination and repair machinery as contributors to trinucleotide repeat instability in vivo.

Li, Y. C., A. B. Korol, et al. (2002). "Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review." Mol Ecol 11(12): 2453-65.

Microsatellites, or tandem simple sequence repeats (SSR), are abundant across genomes and show high levels of polymorphism. SSR genetic and evolutionary mechanisms remain controversial. Here we attempt to summarize the available data related to SSR distribution in coding and noncoding regions of genomes and SSR functional importance. Numerous lines of evidence demonstrate that SSR genomic distribution is nonrandom. Random expansions or contractions appear to be selected against for at least part of SSR loci, presumably because of their effect on chromatin organization, regulation of gene activity, recombination, DNA replication, cell cycle, mismatch repair system, etc. This review also discusses the role of two putative mutational mechanisms, replication slippage and recombination, and their interaction in SSR variation.

Lichterfeld, M., H. D. Nischalke, et al. (2003). "The tandem-repeat polymorphism of the DC-SIGNR gene does not affect the susceptibility to HIV infection and the progression to AIDS small star, filled." Clin Immunol 107(1): 55-59.

DC-SIGNR is a C-type lectin that functions as a transreceptor for HIV-1. The exon 4 of the DC-SIGNR gene comprises a variable number of 69-bp tandem repeats, encoding for parts of the extracellular protein domain. Here, we analyzed the relevance of this gene polymorphism for the interindividual transmission of HIV-1 and the progression to AIDS. A cross-sectional comparison between HIV-1-infected patients (n = 391) and healthy volunteers (n = 134) did not reveal significant differences with regard to the DC-SIGNR allele distribution. Moreover, DC-SIGNR allele frequencies were similar in slowly progressing HIV patients (n = 31) and patients who rapidly progressed to AIDS (n = 46). Additionally, in a cohort of 149 newly HIV-infected patients, no relationship was found between HIV set point viremia and DC-SIGNR genotypes. Thus, the DC-SIGNR tandem-repeat polymorphism in exon 4 does not have a significant impact on the host's susceptibility to HIV and the clinical progression to AIDS.

Lin, J. J., K. C. Yueh, et al. (2003). "The homozygote 10-copy genotype of variable number tandem repeat dopamine transporter gene may confer protection against Parkinson's disease for male, but not to female patients." J Neurol Sci 209(1-2): 87-92.

We investigated the role of variable number tandem repeat (VNTR) polymorphism of the dopamine transporter gene (DAT) in the pathogenesis of Parkinson's disease (PD) in Taiwanese. A case-control study was carried out to examine the association of the VNTR polymorphism within the DAT between 193 sporadic PD patients and 254 controls, matched by age and sex. Six alleles of VNTR polymorphism in the DAT, consisting of 6, 7, 8, 9, 10 and 11 copies of the 40-base-pair (bp) repeat sequence, were detected in the study. There were no differences of allele frequency (chi(2)=5.239, p=0.387) and genotype polymorphism of the DAT VNTR (chi(2)=11.873, p=0.157) in PD patients from the controls. Further analysis stratified by sex and age at onset did not show associations. However, PD patients carrying homozygote 10-copy genotype of the DAT VNTR polymorphism were 0.67 times fewer than controls (chi(2)=4.569, odds radio (OR)=0.67, 95% confidence interval (CI)=0.45-0.97, p=0.033). The reduced risk of the homozygosity with PD genotype was only in male PD patients (chi(2)=2.923, OR=0.48, 95% CI=0.25-0.93, p=0.026), but not in female PD patients (chi(2)=0.002, OR=1.02, 95% CI=0.49-2.11, p=0.966). In conclusion, the results of our study show that homozygote 10-copy genotype of the VNTR polymorphism within the DAT may confer a protective factor for male PD patients, but not for female PD patients.

Lin, X. and T. Ashizawa (2003). "SCA10 and ATTCT repeat expansion: clinical features and molecular aspects." Cytogenet Genome Res 100(1-4): 184-8.

Liou, Y. J., S. J. Tsai, et al. (2003). "Association analysis for the CA repeat polymorphism of the neuronal nitric oxide synthase (NOS1) gene and schizophrenia." Schizophr Res 65(1): 57-9.

Litt, M. and J. A. Luty (1989). "A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene." Am J Hum Genet 44(3): 397-401.

The human genome contains approximately 50,000 copies of an interspersed repeat with the sequence (dT-dG)n, where n = approximately 10-60. In humans, (TG)n repeats have been found in several sequenced regions. Since minisatellite regions with larger repeat elements often display extensive length polymorphisms, we suspected that (TG)n repeats ("microsatellites") might also be polymorphic. Using the polymerase chain reaction to amplify a (TG)n microsatellite in the human cardiac actin gene, we detected 12 different allelic fragments in 37 unrelated individuals, 32 of whom were heterozygous. Codominant Mendelian inheritance of fragments was observed in three families with a total of 24 children. Because of the widespread distribution of (TG)n microsatellites, polymorphisms of this type may be generally abundant and present in regions where minisatellites are rare, making such microsatellite loci very useful for linkage studies in humans.

Lund, A., J. S. Tapanainen, et al. (2003). "Long CAG repeats in the AR gene are not associated with infertility in Finnish males." Acta Obstet Gynecol Scand 82(2): 162-166.

BACKGROUND: The modulatory domain of the human androgen receptor gene contains a polymorphic CAG repeat coding for a polyglutamine tract. The length of the polyglutamine tract is inversely correlated with transcriptional activity of the androgen receptor. As androgens are crucial to spermatogenesis, decreased transcriptional activity of the androgen receptor associated with a long polyglutamine tract could lead to failure in spermatogenesis. Accordingly, long CAG repeats within the normal range have been suggested to be more common in infertile males than in the control population. METHODS: To test this hypothesis, we examined the CAG repeat number of 192 Finnish males with moderate or severe spermatogenic failure and 149 control males. RESULTS: Our results did not support the hypothesis, the controls harbored slightly longer CAG repeats than the infertile males. CONCLUSION: At least in the present study population from Finland, long CAG repeats in the androgen receptor gene do not play a significant role in spermatogenic failure.

Luo, H. R., X. M. Lu, et al. (2002). "Length polymorphism of thymidylate synthase regulatory region in Chinese populations and evolution of the novel alleles." Biochem Genet 40(1-2): 41-51.

The tandemly repeated 28-bp sequence in the 5'-terminal regulatory region of human thymidylate synthase (TSER), which has been reported to be polymorphic in different populations, was surveyed in 668 Chinese from 9 Han groups, 8 ethnic populations, and 36 individuals representing a three-generation pedigree. Amplified fragments were separated by electrophoresis on 4% agarose gel. In addition to the reported double and triple repeats of the 28-bp sequence in TSER, we also detected a novel quintuple repeat in this region. The transient expression activity of TSER with the quintuple repeat is almost the same as that of the reported TSER with the triple repeat. All three alleles of the repeat type (2, 3, and 5) were further confined by sequencing. The frequencies of the TSER allele 2 and 3 were 18.82 and 81% in totally unrelated Chinese samples, respectively, while the frequency of allele 3 was variable in different Chinese populations with a range from 62 to 95%. On the basis of the sequences of the different alleles, the existence of the tandem repeats in each allele might be explained by slipped-strand mispairing during DNA replication.

Majed, Z., E. Bellenger, et al. (2005). "Identification of Variable-Number Tandem-Repeat Loci in Leptospira interrogans Sensu Stricto." J Clin Microbiol 43(2): 539-45.

Leptospira interrogans sensu stricto is responsible for the most frequent and severe cases of human leptospirosis. The epidemiology and clinical features of leptospirosis are usually associated with the serovars and serogroups of Leptospira. Because of the difficulties associated with serological identification of Leptospira strains, we evaluated a novel PCR-based method for typing L. interrogans serovars. Based upon the genome sequence of L. interrogans serovar Lai type strain 5660, 44 loci were analyzed by PCR for their variability in size due to the presence of variable-number tandem repeats (VNTR). Seven VNTR loci were found to be powerful markers for serovar identification, epidemiology, and phylogenetic studies of L. interrogans. This rapid and easy method should greatly contribute to a better knowledge of the epidemiology of Leptospira.

Margulies, E. H., M. Blanchette, et al. (2003). "Identification and characterization of multi-species conserved sequences." Genome Res 13(12): 2507-18.

Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (approximately 70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.

Matsugami, A., T. Okuizumi, et al. (2003). "Intramolecular higher-order packing of parallel quadruplexes comprising a G:G:G:G tetrad and a G(A):G(A):G(A):G heptad of GGA triplet repeat DNA." J Biol Chem.

GGA triplet repeats are widely dispersed throughout eukaryotic genomes, and are frequently located within biologically important regions such as gene regulatory regions and recombination hot spot sites. We determined the structure of d(GGA)4 (12-mer) under physiological conditions, and founded the formation of an intramolecular parallel quadruplex for the first time. Later, a similar architecture to that of the intramolecular parallel quadruplex was found for a telomere DNA in the crystalline state. Here, we have determined the structure of d(GGA)8 (24-mer) under physiological conditions. Two intramolecular parallel quadruplexes comprising a G:G:G:G tetrad and a G(A):G(A):G(A):G heptad are formed in d(GGA)8. These quadruplexes are packed in a tail-to-tail manner. This is the first demonstration of the intramolecular higher-order packing of quadruplexes at atomic resolution. K+ ions but not Na+ ones are critically required for the formation of this unique structure. The elucidated structure suggests the mechanisms underlying the biological events related to the GGA triplet repeat. Furthermore, in the light of the structure, the mode of the higher-order packing of the telomere DNA is discussed.

Matsugami, A., K. Ouhashi, et al. (2001). "New quadruplex structure of GGA triplet repeat DNA--an intramolecular quadruplex composed of a G:G:G:G tetrad and G(A):G(A):G(A):G heptad, and its dimerization." Nucleic Acids Res Suppl(1): 271-2.

The structure of d(GGAGGAGGAGGA) containing four tandem repeats of a GGA triplet sequence has been determined under physiological K+ conditions by NMR. d(GGAGGAGGAGGA) folds into an intramolecular quadruplex composed of a G:G:G:G tetrad and a G(A):G(A):G(A):G heptad. Four G-G segments of d(GGAGGAGGAGGA) are aligned parallel to each other due to seven successive turns of the main chain at each of the GGA and GAGG segments. Two quadruplexes form a dimer stabilized through a stacking interaction between the heptads of the two quadruplexes. On the basis of these results, the biological implications of naturally occurring GGA triplet repeat DNA are discussed.

McCracken, J. T., S. L. Smalley, et al. (2000). "Evidence for linkage of a tandem duplication polymorphism upstream of the dopamine D4 receptor gene (DRD4) with attention deficit hyperactivity disorder (ADHD)." Mol Psychiatry 5(5): 531-6.

Attention deficit hyperactivity disorder (ADHD) is a common childhood-onset neurodevelopmental disorder. Evidence from twin, adoption, and family studies provide support for a genetic contribution to the etiology of ADHD. Several candidate gene studies have identified an association between a 7-repeat variant in exon 3 of the dopamine 4 receptor gene (DRD4) and ADHD. However, in spite of the positive reports finding association of the exon 3 VNTR with ADHD, several other polymorphisms within DRD4 have been identified that conceivably could contribute to risk for ADHD. Recently, another common polymorphism of the DRD4 gene has been described involving a 120-bp repeat element upstream of the 5' transcription initiation site. In this report, we describe results of analysis of the DRD4 120-bp repeat promoter polymorphism in a sample of 371 children with ADHD and their parents, using the transmission disequilibrium test (TDT). Results showed a significant preferential transmission of the 240-bp (long) allele with ADHD. Exploratory analyses of the Inattentive phenotypic subtype of ADHD strengthened the evidence for linkage. These data add further support for the role of DRD4 variants conferring increased risk for ADHD, and imply that additional studies of DRD4 and other related genes are needed.

McDonald, J. H. and M. Kreitman (1991). "Adaptive protein evolution at the Adh locus in Drosophila." Nature 351(6328): 652-4.

Proteins often differ in amino-acid sequence across species. This difference has evolved by the accumulation of neutral mutations by random drift, the fixation of adaptive mutations by selection, or a mixture of the two. Here we propose a simple statistical test of the neutral protein evolution hypothesis based on a comparison of the number of amino-acid replacement substitutions to synonymous substitutions in the coding region of a locus. If the observed substitutions are neutral, the ratio of replacement to synonymous fixed differences between species should be the same as the ratio of replacement to synonymous polymorphisms within species. DNA sequence data on the Adh locus (encoding alcohol dehydrogenase, EC 1.1.1.1) in three species in the Drosophila melanogaster species subgroup do not fit this expectation; instead, there are more fixed replacement differences between species than expected. We suggest that these excess replacement substitutions result from adaptive fixation of selectively advantageous mutations.

Mill, J. S., A. Caspi, et al. (2002). "The dopamine D4 receptor and the hyperactivity phenotype: a developmental-epidemiological study." Mol Psychiatry 7(4): 383-91.

Attention-deficit hyperactivity disorder (ADHD) affects 2-6% of school-age children and is a precursor of behavioural problems in adolescence and adulthood. Underlying the categorical definition of ADHD are the quantitative traits of activity, impulsivity, and inattention which vary continuously in the population. Both ADHD and quantitative measures of hyperactivity are heritable, and influenced by multiple genes of small effect. Several studies have reported an association between clinically defined ADHD and the seven-repeat allele of a 48-bp tandem repeat polymorphism in the third exon of the dopamine D4 receptor gene (DRD4). We tested this association in a large, unselected birth cohort (n = 1037) using multiple measures of the hyperactivity phenotype taken at multiple assessment ages across 20 years. This longitudinal approach allowed us to ascertain whether or not DRD4 has a general effect on the diagnosed (n = 49) or continuously distributed hyperactivity phenotype, and related personality traits. We found no evidence to support this association.

Miller, J., A. D. McLachlan, et al. (1985). "Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes." Embo J 4(6): 1609-14.

The 7S particle of Xenopus laevis oocytes contains 5S RNA and a 40-K protein which is required for 5S RNA transcription in vitro. Proteolytic digestion of the protein in the particle yields periodic intermediates spaced at 3-K intervals and a limit digest containing 3-K fragments. The native particle is shown to contain 7-11 zinc atoms. These data suggest that the protein contains repetitive zinc-binding domains. Analysis of the amino acid sequence reveals nine tandem similar units, each consisting of approximately 30 residues and containing two invariant pairs of cysteines and histidines, the most common ligands for zinc. The linear arrangement of these repeated, independently folding domains, each centred on a zinc ion, comprises the major part of the protein. Such a structure explains how this small protein can bind to the long internal control region of the 5S RNA gene, and stay bound during the passage of an RNA polymerase molecule.

Mohmood, S., A. Sherwani, et al. (2003). "DNA trinucleotide repeat expansion in neuropsychiatric patients." Med Sci Monit 9(9): RA237-45.

Dynamic mutations in human genes result from unstable trinucleotide repeats which are expanded within the genome. These expansions of trinucleotide repeats have been shown to be the etiological factors in various neuropsychiatric diseases and other genetic disorders. This hypothesis is supported by various independent studies showing large expansion of trimeric repeats, such as CAG/CTG/CCG/CGG/AAG, in patient DNA samples. These repeats are also identified in other disease loci not clearly related to particular diseases, which indicates that such expansions are one of the general forms of evolution occurring throughout the human genome. The trinucleotide repeat expansions occur during meiosis and are generally irreversible. Accumulation of these repeats over generations eventually ends in a deficiency of replication. There is evidence that certain ethnic groups in the human population have predispositions for expanded repeats related to neuropsychiatric diseases. It is likely that racial/ethnic differences reflect variations, which suggests the possibility of an underlying complex biological process. The present review highlights the importance of repeat expansions in some neuropsychiatric diseases, such as spinal and bulbular atrophy (SBMA), spinocerebellar ataxia (SCA), Huntington's disease (HD), schizophrenia, myotonic dystrophy (DM) and fragile-X syndrome.

Morelle, S., E. Carbonnelle, et al. (2003). "The REP2 Repeats of the Genome of Neisseria meningitidis Are Associated with Genes Coordinately Regulated during Bacterial Cell Interaction." J Bacteriol 185(8): 2618-27.

Interaction with host cells is essential in meningococcal pathogenesis especially at the blood-brain barrier. This step is likely to involve a common regulatory pathway allowing coordinate regulation of genes necessary for the interaction with endothelial cells. The analysis of the genomic sequence of Neisseria meningitidis Z2491 revealed the presence of many repeats. One of these, designated REP2, contains a -24/-12 type promoter and a ribosome binding site 5 to 13 bp before an ATG. In addition most of these REP2 sequences are located immediately upstream of an ORF. Among these REP2-associated genes are pilC1 and crgA, described as being involved in steps essential for the interaction of N. meningitidis with host cells. Furthermore, the REP2 sequences located upstream of pilC1 and crgA correspond to the previously identified promoters known to be induced during the initial localized adhesion of N. meningitidis with human cells. This characteristic led us to hypothesize that at least some of the REP2-associated genes were upregulated under the same circumstances as pilC1 and crgA. Quantitative PCR in real time demonstrated that the expression of 14 out of 16 REP2-associated genes were upregulated during the initial localized adhesion of N. meningitidis. Taken together, these data suggest that these repeats control a set of genes necessary for the efficient interaction of this pathogen with host cells. Subsequent mutational analysis was performed to address the role of these genes during meningococcus-cell interaction.

Mukherjee, B., H. Zhao, et al. (2003). "Microsatellite dinucleotide (T-G) repeat: a candidate DNA marker for breast metastasis." Cancer Detect Prev 27(1): 19-23.

A dinucleotide (T-G) repeat sequence was isolated by comparing DNA from metastatic lymph node and matched normal breast samples from a ductal mammary carcinoma patient using representational difference analysis (RDA) method. Our present study used this metastasis associated DNA sequence (MADS) as a diagnostic probe to screen five patient samples by slot blot method. A new approach to isolate single cells by microdissection, namely single cell microdissection (SCM) was developed to obtain homogeneous population of tumor cells (approximately 1000) from matched primary tumors and corresponding positive lymph nodes of five patients. We isolated DNA from these homogeneous tumor cells and used for the RDA and DNA slot blot experiments. The screening of patient samples showed loss of this MADS in the transition from primary to metastasis in four out of five cases (80%) suggesting its possible role in breast metastasis.

Murphy, P. M. (1993). "Molecular mimicry and the generation of host defense protein diversity." Cell 72(6): 823-6.

Myers, S. J., Y. Huang, et al. (2004). "Inhibition of glutamate receptor 2 translation by a polymorphic repeat sequence in the 5'-untranslated leaders." J Neurosci 24(14): 3489-99.

Previous studies have identified multiple transcription initiation sites for the glutamate receptor 2 (GluR2) gene, resulting in a heterogeneous population of GluR2 transcripts in vivo that differ in the length of their 5'-untranslated leaders (5'-UTR). We designed a series of monocistronic and dicistronic GluR2 cDNA constructs that model the natural in vivo transcripts and investigated their translation efficiencies in rabbit reticulocyte lysates, Xenopus oocytes, and primary cultured neurons. Transcripts containing long 5' leaders (429 and 481 bases) were translated poorly compared with those with shorter leaders (341 or fewer bases). None of the five initiation codons in the 5'-UTR or the leader length per se were responsible for translation regulation. Rather, control of translation was mediated by a sequence containing a 34-42 nucleotide imperfect GU repeat predicted to form secondary structure in vivo. This translation suppression domain is included in some but not all rat and human GluR2 transcripts in vivo, depending on the site of transcription initiation. Rat cortex GluR2 transcripts that lack the translation suppression sequence were preferentially associated with polyribosomes. Furthermore, the GU-repeat cluster was found to be polymorphic in humans, raising the possibility that expansion or contraction of the GU-repeat cluster in certain populations might modify the level of GluR2 protein expression in neurons.

Nagao, K., K. Fujii, et al. (2004). "Identification of a novel polymorphism involving a CGG repeat in the PTCH gene and a genome-wide screening of CGG-containing genes." J Hum Genet.

Mutations in the human homologue of the Drosophila patched gene (PTCH) are responsible for the hereditary disorder called nevoid basal cell carcinoma syndrome (NBCCS). PTCH has a CGG triplet repeat located 4 bp upstream of the first methionine codon. Here we report a novel polymorphism involving the number of the CGG-repeat. The major allele (86.3%) contained a repeat size of seven, whereas the minor allele contained eight. No significant difference in the distributions of genotypes was observed between normal and NBCCS individuals. However, when the repeat was inserted between a heterologous promoter and the luciferase gene, the longer repeats tended to induce higher luciferase activities, suggesting that the repeat length potentially affects the levels of gene expression. A genome-wide screening revealed that 68 and 146 genes contained a CGG/CCG repeat in the coding region and in the 5'-untranslated region (5'-UTR), respectively. None of the genes had this repeat in 3'-UTR. Interestingly, the number of genes with a CGG repeat in the 5'-UTR was significantly higher than that with a CCG repeat in the 5'-UTR. The localization of a CGG/CCG repeat in PTCH is quite unique in that only four other genes have been found in which the repeat is localized up to 4 bp upstream of the first methionine.

Nobukuni, Y., H. Mitsubuchi, et al. (1991). "Maple syrup urine disease. Complete defect of the E1 beta subunit of the branched chain alpha-ketoacid dehydrogenase complex due to a deletion of an 11-bp repeat sequence which encodes a mitochondrial targeting leader peptide in a family with the disease." J Clin Invest 87(5): 1862-6.

Branched chain alpha-ketoacid dehydrogenase (BCKDH) deficiency results in maple syrup urine disease (MSUD). We examined the molecular basis of familial cases of MSUD by analyzing the activity, subunit structure, mRNA sequence, and genome structure of the affected enzyme. The BCKDH activity in the proband with MSUD was approximately 6% of the normal control level. Immunoblot analysis revealed that the E1 beta subunit of BCKDH was absent and that the E1 alpha subunit of BCKDH was markedly reduced. We amplified the cDNAs of the E1 alpha subunit and the E1 beta subunit of the BCKDH complex obtained from cells of the patient, using the polymerase chain reaction method, then sequenced the amplified cDNAs. The deduced amino acid sequence for the E1 alpha subunit of the patient's cell was normal. An 11-bp deletion was identified in the region that encoded the mitochondrial targeting leader peptide in the E1 beta cDNA. This 11-bp sequence is found in the first exon of the BCKDH-E1 beta gene, as a direct tandem repeat. Amplification of genomic DNA revealed that the consanguineous parents were heterozygous for this mutant allele, and sister and brother of the patient with the disease were homozygous for this mutant allele. This 11-bp deletion mutation caused a change in the reading frame and the mature E1 beta protein was defective. These observations show the biological importance of the E1 beta subunit of BCKDH to maintain normal function of the enzyme activity. The absence of the E1 beta subunit results in instability of the E1 alpha subunit.

Nomura, F., S. Itoga, et al. (2003). "Transcriptional activity of the tandem repeat polymorphism in the 5'-flanking region of the human CYP2E1 gene." Alcohol Clin Exp Res 27(8 Suppl): 42S-46S.

BACKGROUND: There are remarkable interindividual differences in the expression of cytochrome P-4502E1(CYP2E1), which in turn may alter susceptibility to alcohol-related diseases and various cancers. We recently characterized the tandem repeat polymorphism in the 5'-flanking region of the human CYP2E1 gene and found that subjects with the homozygous mutant-type (A4/A4) may be at higher risk of developing esophageal cancer. In this study, we determined how this tandem repeat polymorphism alters transcriptional activities of the human CYP2E1 gene by transfection studies. METHODS: The 5'-flanking region (-2,562 base pair to +9 base pair) of the CYP2E1 gene from three individuals of different genotypes (A2/A2, A2/A4, A4/A4) was amplified by polymerase chain reaction. The polymerase chain reaction products placed in front of a luciferase reporter gene were transfected into human hepatoblastoma cells, human esophageal cancer cells, and human uterus cancer cells. Transcriptional activities were determined by the dual-luciferase assay. When indicated, ethanol (50 mM) was included in the culture medium. CYP2E1 messenger RNA levels in peripheral lymphocytes were measured by the real-time reverse transcription-polymerase chain reaction using the LightCycler system. RESULTS: The construct including the tandem repeat region exhibited luciferase activities in both A2 and A4 type. It was of note that the activity produced by the A4 allele was significantly greater than that by A2 allele in HeLa cells (p < 0.001). CYP2E1 messenger RNA levels in peripheral blood lymphocytes were comparable between the two genotypes. CONCLUSION: Transcriptional activity of the mutant allele of the tandem repeat polymorphism in the 5'-flanking region of the CYP2E1 gene is greater than that of the wild type.

O'Dushlaine, C. T., R. J. Edwards, et al. (2005). "Tandem repeat copy-number variation in protein-coding regions of human genes." Genome Biol 6(8): R69.

BACKGROUND: Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles. RESULTS: Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms. CONCLUSION: Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.

Oberle, I., F. Rousseau, et al. (1991). "Instability of a 550-base pair DNA segment and abnormal methylation in fragile X syndrome." Science 252(5010): 1097-102.

The fragile X syndrome, a common cause of inherited mental retardation, is characterized by an unusual mode of inheritance. Phenotypic expression has been linked to abnormal cytosine methylation of a single CpG island, at or very near the fragile site. Probes adjacent to this island detected very localized DNA rearrangements that constituted the fragile X mutations, and whose target was a 550-base pair GC-rich fragment. Normal transmitting males had a 150- to 400-base pair insertion that was inherited by their daughters either unchanged, or with small differences in size. Fragile X-positive individuals in the next generation had much larger fragments that differed among siblings and showed a generally heterogeneous pattern indicating somatic mutation. The mutated allele appeared unmethylated in normal transmitting males, methylated only on the inactive X chromosome in their daughters, and totally methylated in most fragile X males. However, some males had a mosaic pattern. Expression of the fragile X syndrome thus appears to result from a two-step mutation as well as a highly localized methylation. Carriers of the fragile X mutation can easily be detected regardless of sex or phenotypic expression, and rare apparent false negatives may result from genetic heterogeneity or misdiagnosis.

Parekh-Olmedo, H. and E. B. Kmiec (2003). "Targeted nucleotide exchange in the CAG repeat region of the human HD gene." Biochem Biophys Res Commun 310(2): 660-6.

Huntington's disease (HD) is marked by the expansion of a tract of repeated CAG codons in the HD-gene, IT15. Once expressed, the expanded poly Q region of the huntingtin protein (Htt), which is normally soluble, becomes insoluble, leading to the formation of intracellular inclusions and ultimately to neuronal degeneration. Interruption of the pure poly Q tract at the genetic level should undermine the transition from Htt solubility to Htt insolubility. Modified single-stranded oligonucleotides were used to direct the nucleotide exchange of an A residue to a T residue in the second codon of the HD-gene, resulting in the creation of a leucine residue among the poly Q tract. Consistent with results from other groups, we provide evidence that short synthetic DNA molecules can modify the HD-gene directly, preliminarily offering a potential therapeutic approach to Huntington's disease.

Parkhill, J., M. Sebaihia, et al. (2003). Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet. 35: 32-40.

Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica are closely related Gram-negative beta-proteobacteria that colonize the respiratory tracts of mammals. B. pertussis is a strict human pathogen of recent evolutionary origin and is the primary etiologic agent of whooping cough. B. parapertussis can also cause whooping cough, and B. bronchiseptica causes chronic respiratory infections in a wide range of animals. We sequenced the genomes of B. bronchiseptica RB50 (5,338,400 bp; 5,007 predicted genes), B. parapertussis 12822 (4,773,551 bp; 4,404 genes) and B. pertussis Tohama I (4,086,186 bp; 3,816 genes). Our analysis indicates that B. parapertussis and B. pertussis are independent derivatives of B. bronchiseptica-like ancestors. During the evolution of these two host-restricted species there was large-scale gene loss and inactivation; host adaptation seems to be a consequence of loss, not gain, of function, and differences in virulence may be related to loss of regulatory or control functions.

Penkina, M. V., O. I. Karpova, et al. (1999). "[Distribution of tetranucleotide repeats in the family of golden hamster DNA sequences tightly associated with synaptonemal complex]." Mol Biol (Mosk) 33(4): 586-91.

Pineiro, E., L. Fernandez-Lopez, et al. (2003). "Mutagenic stress modulates the dynamics of CTG repeat instability associated with myotonic dystrophy type 1." Nucleic Acids Res 31(23): 6733-40.

The molecular basis of the myotonic dystrophy type 1 is the expansion of a CTG repeat at the DMPK locus. The expanded disease-associated repeats are unstable in both somatic and germ lines, with a high tendency towards expansion. The rate of expansion is directly related to the size of the pathogenic allele, increasing the size heterogeneity with age. It has also been suggested that additional factors, including as yet unidentified environmental factors, might affect the instability of the expanded CTG repeats to account for the observed CTG size dynamics over time. To investigate the effect of environmental factors in the CTG repeat instability, three lymphoblastoid cell lines were established from two myotonic dystrophy patients and one healthy individual, and parallel cultures were concurrently expanded in the presence or absence of the mutagenic chemical mitomycin C for a total of 12 population doublings. The new alleles arising along the passages were analysed by radioactive small pool PCR and sequencing gels. An expansion bias of the stepwise mutation was observed in a (CTG)124 allele of a cell line harbouring two modal alleles of 28 and 124 CTG repeats. Interestingly, this expansion bias was clearly enhanced in the presence of mitomycin C. The effect of mitomycin C was also evident in the normal size alleles in two cell lines with alleles of 13/13 and 12/69 repeats, where treated cultures showed new longer alleles. In conclusion, our results indicate that mitomycin C modulates the dynamics of myotonic dystrophy-associated CTG repeats in LBCLs, enhancing the expansion bias of long-pathogenic repeats and promoting the expansion of normal length repeats.

Resch, A., Y. Xing, et al. (2004). "Assessing the impact of alternative splicing on domain interactions in the human proteome." J Proteome Res 3(1): 76-83.

We have constructed a database of alternatively spliced protein forms (ASP), consisting of 13,384 protein isoform sequences of 4422 human genes (www.bioinformatics.ucla.edu/ASP). We identified fifty protein domain types that were selectively removed by alternative splicing at much higher frequencies than average (p-value < 0.01). These include many well-known protein-interaction domains (e.g., KRAB; ankyrin repeats; Kelch) including some that have been previously shown to be regulated functionally by alternative splicing (e.g., collagen domain). We present a number of novel examples (Kruppel transcription factors; Pbx2; Enc1) from the ASP database, illustrating how this pattern of alternative splicing changes the structure of a biological pathway, by redirecting protein interaction networks at key switch points. Our bioinformatics analysis indicates that a major impact of alternative splicing is removal of protein-protein interaction domains that mediate key linkages in protein interaction networks. ASP expands the available dataset of human alternatively spliced protein forms from 1989 human genes (SwissProt release 42) to 5413 (nonredundant set, ASP + SwissProt), a nearly 3-fold increase. ASP will enhance the existing pool of protein sequences that are searched by mass spectroscopy software during the identification of peptide fragments.

Rolfsmeier, M. L., M. J. Dixon, et al. (2001). "Cis-elements governing trinucleotide repeat instability in Saccharomyces cerevisiae." Genetics 157(4): 1569-79.

Trinucleotide repeat (TNR) instability in humans is governed by unique cis-elements. One element is a threshold, or minimal repeat length, conferring frequent mutations. Since thresholds have not been directly demonstrated in model systems, their molecular nature remains uncertain. Another element is sequence specificity. Unstable TNR sequences are almost always CNG, whose hairpin-forming ability is thought to promote instability by inhibiting DNA repair. To understand these cis-elements further, TNR expansions and contractions were monitored by yeast genetic assays. A threshold of approximately 15--17 repeats was observed for CTG expansions and contractions, indicating that thresholds function in organisms besides humans. Mutants lacking the flap endonuclease Rad27p showed little change in the expansion threshold, suggesting that this element is not altered by the presence or absence of flap processing. CNG or GNC sequences yielded frequent mutations, whereas A-T rich sequences were substantially more stable. This sequence analysis further supports a hairpin-mediated mechanism of TNR instability. Expansions and contractions occurred at comparable rates for CTG tract lengths between 15 and 25 repeats, indicating that expansions can comprise a significant fraction of mutations in yeast. These results indicate that several unique cis-elements of human TNR instability are functional in yeast.

Rousseau, K., C. Byrne, et al. (2004). "The complete genomic organization of the human MUC6 and MUC2 mucin genes." Genomics 83(5): 936-9.

The complete genomic organization of the two mucin genes MUC2 and MUC6 was obtained by comparison of new and published mRNA sequences with newly available human genomic sequence. The two genes are located 38.5 kb apart in a head-to-head orientation within a gene complex on chromosome 11p15.5. The N-terminal organization of MUC6 is highly similar to that of MUC2, containing the D1, D2, D', and D3 Von Willebrand factor domains followed by the large tandem repeat domains located in exons 31 and 30, respectively. MUC6 has a much smaller C-terminal domain (101 amino acids) encoded by 2 exons containing only the CK domain, compared with MUC2, which has a C-terminal domain of 859 amino acids containing the D4, C, D, and CK domains, encoded by 19 exons. The gene structures agreed partially but not completely with predictions from gene prediction programs.

Sandhu, D., H. Gao, et al. (2004). "Deletion of a Disease Resistance Nucleotide-Binding-Site Leucine-Rich- Repeat-like Sequence Is Associated With the Loss of the Phytophthora Resistance Gene Rps4 in Soybean." Genetics 168(4): 2157-67.

Resistance of soybean against the oomycete pathogen Phytophthora sojae is conferred by a series of Rps genes. We have characterized a disease resistance gene-like sequence NBSRps4/6 that was introgressed into soybean lines along with Rps4 or Rps6. High-resolution genetic mapping established that NBSRps4/6 cosegregates with Rps4. Two mutants, M1 and M2, showing rearrangements in the NBSRps4/6 region were identified from analyses of 82 F(1)'s and 201 selfed HARO4272 plants containing Rps4. Fingerprints of these mutants are identical to those of HARO4272 for 176 SSR markers representing the whole genome except the NBSRps4/6 region. Both mutants showed a gain of race specificities, distinct from the one encoded by Rps4. To investigate the possible mechanism of gain of Phytophthora resistance in M1, the novel race specificity was mapped. Surprisingly, the gene encoding this resistance mapped to the Rps3 region, indicating that this gene could be either allelic or linked to Rps3. Recombinant analyses have shown that deletion of NBSRps4/6 in M1 is associated with the loss of Rps4 function. The NBSRps4/6 sequence is highly transcribed in etiolated hypocotyls expressing the Phytophthora resistance. It is most likely that a copy of the NBSRps4/6 sequence is the Rps4 gene. Possible mechanisms of the deletion in the NBSRps4/6 region and introgression of two unlinked Rps genes into Harosoy are discussed.

Santini, S., J. L. Boore, et al. (2003). "Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters." Genome Res 13(6A): 1111-22.

Comparisons of DNA sequences among evolutionarily distantly related genomes permit identification of conserved functional regions in noncoding DNA. Hox genes are highly conserved in vertebrates, occur in clusters, and are uninterrupted by other genes. We aligned (PipMaker) the nucleotide sequences of the HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human, and mouse, which are separated by approximately 500 million years of evolution. In support of our approach, several identified putative regulatory elements known to regulate the expression of Hox genes were recovered. The majority of the newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac database). The regulatory intergenic regions located between the genes that are expressed most anteriorly in the embryo are longer and apparently more evolutionarily conserved than those at the other end of Hox clusters. Different presumed regulatory sequences are retained in either the Aalpha or Abeta duplicated Hox clusters in the fish lineages. This suggests that the conserved elements are involved in different gene regulatory networks and supports the duplication-deletion-complementation model of functional divergence of duplicated genes.

Sartori, M. T., G. Saggiorato, et al. (2003). "Influence of the Alu-repeat I/D polymorphism in t-PA gene intron 8 on the stimulated t-PA release after venous occlusion." Clin Appl Thromb Hemost 9(1): 63-9.

Tissue type plasminogen activator (t-PA) is released from endothelium in both a constitutive and regulated fashion. In healthy subjects, an association between net t-PA release rate and a few t-PA gene polymorphisms, including the Alu-repeat I/D polymorphism in intron 8, was described. The possible influence of the Alu-repeat polymorphism on t-PA release was evaluated after a venous occlusion test (VO) in 82 patients showing an impaired fibrinolytic capacity associated with different arterial disease or with previous venous thrombosis, and in 50 healthy controls. Euglobulin lysis time, t-PA antigen (t-PA:Ag) and activity, PAI-1 antigen and activity plasma levels were assayed before and 20 minutes after VO; the Alu-repeat I/D polymorphism was determined by PCR. Defective fibrinolysis was due to reduced t-PA release in 40 patients (t-PA group) and to PAI-1 excess in 42 patients (PAI group). No differences in both genotype distribution and allele frequencies were observed between patients and controls. The t-PA:Ag increase after VO (20/0-minute levels ratio adjusted for hematocrit) was considerably higher both in controls and in PAI group patients carrying the I allele than in the DD genotype carriers (II, ID, DD: 3.77+/-0.62, 3.43+/-0.44, 2.06+/-0.32 in controls, and 3.67+/-0.23, 2.80+/-0.50, 1.62+/-0.29 in PAI group, respectively). The difference was significant between the DD and both the ID and II genotypes in controls (p<0.05), and between the DD and II genoypes in PAI-1 group (p<0.05). A slight and nonsignificant trend of association between genotype and t-PA:Ag 20/0 ratio was seen in the t-PA group patients. In conclusion, these data suggest a possible genetic modulation of t-PA-regulated secretion.

Savouret, C., E. Brisson, et al. (2003). "CTG repeat instability and size variation timing in DNA repair-deficient mice." Embo J 22(9): 2264-2273.

Type 1 myotonic dystrophy is caused by the expansion of an unstable CTG repeat in the DMPK gene. We have investigated the molecular mechanisms underlying the CTG repeat instability by crossing transgenic mice carrying >300 unstable CTG repeats in their human chromatin environment with mice knockout for genes involved in various DNA repair pathways: Msh2 (mismatch repair), Rad52 and Rad54 (homologous recombination) and DNA-PKcs (non-homologous end-joining). Genes of the non-homologous end-joining and homologous recombination pathways did not seem to affect repeat instability. Only lack of Rad52 led to a slight decrease in expansion range. Unexpectedly, the absence of Msh2 did not result in stabilization of the CTG repeats in our model. Instead, it shifted the instability towards contractions rather than expansions, both in tissues and through generations. Furthermore, we carefully analyzed repeat transmissions with different Msh2 genotypes to determine the timing of intergenerational instability. We found that instability over generations depends not only on parental germinal instability, but also on a second event taking place after fertilization.

Schmidt, S., A. Papassotiropoulos, et al. (2003). "Investigation of a genetic variation of a variable number tandem repeat polymorphism of interleukin-6 gene in patients with multiple sclerosis." J Neurol 250(5): 607-11.

Interleukin-6 (IL-6) plays an important role in the regulation of the inflammatory response in multiple sclerosis (MS) and its animal model, experimental autoimmune encephalomyelitis (EAE). Previous reports indicated that the C allele of a variable number tandem repeat (vntr) polymorphism located in the 3'flanking region of the IL-6 gene (IL-6) is associated with reduced activity of IL-6 in vivo. Since disease-modifying genes are likely to contribute to phenotypic differences in MS patients, we tested the hypothesis that the IL-6 C allele is associated with the clinical course of MS. The IL-6 C allele was equally distributed between 217 MS patients of German Caucasian origin and 111 age-mached healthy controls. Stratification of patients according to the course of disease revealed no significant difference of IL-6 C allele distribution between patients with primary progressive and those with either relapsing-remitting or secondary progressive MS although IL-6 C allele was more frequent in patients with RR-MS. Since IL-6 C allele has been associated with a benign course in Sardinian MS patients, we further analysed an independent sample of 125 Sardinian MS patients revealing that IL-6 C allele was much more frequent than in German MS patients. Taken together, a disease-modifying effect of IL-6 C allele could not be demonstrated in MS patients of German Caucasian descent.

Schuler, G. D. (1997). "Sequence mapping by electronic PCR." Genome Res 7(5): 541-50.

The highly specific and sensitive PCR provides the basis for sequence-tagged sites (STSs), unique landmarks that have been used widely in the construction of genetic and physical maps of the human genome. Electronic PCR (e-PCR) refers to the process of recovering these unique sites in DNA sequences by searching for subsequences that closely match the PCR primers and have the correct order, orientation, and spacing that they could plausibly prime the amplification of a PCR product of the correct molecular weight. A software tool was developed to provide an efficient implementation of this search strategy and allow the sort of en masse searching that is required for modern genome analysis. Some sample searches were performed to demonstrate a number of factors that can affect the likelihood of obtaining a match. Analysis of one large sequence database record revealed the presence of several microsatellite and gene-based markers and allowed the exact base-pair distances among them to be calculated. This example provides a demonstration of how e-PCR can be used to integrate the growing body of genomic sequence data with existing maps, reveal relationships among markers that existed previously on different maps, and correlate genetic distances with physical distances.

Schuler, G. D. (1998). "Electronic PCR: bridging the gap between genome mapping and genome sequencing." Trends Biotechnol 16(11): 456-9.

A crucial event in the history of the Human Genome Project was the decision to use sequence-tagged sites (STSs) as common landmarks for genomic mapping. Following several years of constructing STS-based maps of ever-increasing detail, the emphasis has recently shifted towards large-scale genomic sequencing. A computational procedure called 'electronic PCR' allows STS landmarks to be revealed as data emerge from the sequencing pipeline, thereby bridging the gap between mapping and sequencing activities.

Schuler, G. D., M. S. Boguski, et al. (1996). "A gene map of the human genome." Science 274(5287): 540-6.

The human genome is thought to harbor 50,000 to 100,000 genes, of which about half have been sampled to date in the form of expressed sequence tags. An international consortium was organized to develop and map gene-based sequence tagged site markers on a set of two radiation hybrid panels and a yeast artificial chromosome library. More than 16,000 human genes have been mapped relative to a framework map that contains about 1000 polymorphic genetic markers. The gene map unifies the existing genetic and physical maps with the nucleotide and protein sequence databases in a fashion that should speed the discovery of genes underlying inherited human disease. The integrated resource is available through a site on the World Wide Web at http://www.ncbi.nlm.nih.gov/SCIENCE96/.

Shields, D. C., D. L. Harmon, et al. (1996). "Evolution of hemopoietic ligands and their receptors. Influence of positive selection on correlated replacements throughout ligand and receptor proteins." J Immunol 156(3): 1062-70.

The rates of amino acid replacement in cytokines and their receptors are high and vary considerably. To determine whether this reflects the action of positive selection, rates of nonsynonymous DNA substitution were examined and found to exceed the synonymous substitution rate in certain exons of rodent IL-3, granulocyte-macrophage stimulating factor, and IL-4. To determine the extent to which positive selection could account for correlations between the amino acid replacement rates of hemopoietins and their receptors, rates were examined in various domains: the correlation with ligand rate was not confined to the ligand-binding domain of the receptor, but extended into the transmembrane and cytoplasmic domains and even to leader peptide domains of both ligand and receptor. As the majority of these replacements are unlikely to be strongly advantageous, different levels of both positive and purifying selection contribute to the extensive variation in hemopoietin/receptor evolutionary rates. Changes in a few residues critical for ligand-receptor interaction may be followed by changes of lesser selective importance in both molecules: replacements of growth hormone residues that form hydrogen or salt bridges with the receptor occur in lineages in which there are many concurrent replacements. A ligand/receptor rate correlation is not found between the seven-transmembrane receptors and their ligands, whose mature forms are often short and completely conserved. This study predicts that a minority of concurrent evolutionary changes in hemopoietins and their receptors reflect directly compensatory changes.

Shimizu, M., R. Fujita, et al. (2001). "Chromatin structure of yeast minichromosomes containing triplet repeat sequences associated with human hereditary neurological diseases." Nucleic Acids Res Suppl(1): 71-2.

Expansion of triplet repeat sequences such as (CTG)n, (CGG)n, and (GAA)n causes human genetic diseases. Since DNA is packaged into arrays of nucleosomes in eukaryotic cells, chromatin may be involved in the mechanism of triplet repeat diseases. To elucidate this issue, we have examined effects of triplet repeat sequences on the chromatin organization in vivo using well defined yeast minichromosomes. We show here that (CGG)12 disrupts an array of positioned nucleosomes, whereas (CTG)12 promotes the nucleosome formation. Thus, triplet repeat sequences can affect the chromatin organization in vivo, which may contribute to the triplet repeat expansion or alterations in the expression of genes associated with triplet repeat diseases.

Skrabanek, L. and F. Campagne (2001). "TissueInfo: high-throughput identification of tissue expression profiles and specificity." Nucleic Acids Res 29(21): E102-2.

We describe TissueInfo, a knowledge-based method for the high-throughput identification of tissue expression profiles and tissue specificity. TissueInfo defines a set of tissue information calculations that can be computed for large numbers of genes, expressed sequence tags (ESTs) or proteins. Tissue information records that result from the TissueInfo calculations are used to generate tables suitable for data mining and for the selection of genes according to a given expression profile or specificity. When benchmarked against a test set of 116 proteins and literature information, TissueInfo was found to be accurate for 69% of identified tissue specificities and for 80% of expression profiles. The accuracy of the identifications can be increased if query sequences for which little information is available from dbEST are ignored. Thus, with 80% coverage, TissueInfo achieves an accuracy of 76% for specificity and 89% for expression. For the same set of proteins, the curated tissue specificity offered in SWISS-PROT was accurate in 78% of cases. TissueInfo can be useful for the selection of clones for custom microarrays, selection of training sets for ab initio identification of tissue information, gene discovery and genome-wide predictions. Further information about the program can be found at http://icb.mssm.edu/tissueinfo.

Smigielski, E. M., K. Sirotkin, et al. (2000). "dbSNP: a database of single nucleotide polymorphisms." Nucleic Acids Res 28(1): 352-5.

In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Cancer for Biotechnology Information (NCBI) has established the dbSNP database. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. Submitted SNPs can also be downloaded via anonymous FTP at ftp://ncbi.nlm.nih.gov/snp/

Spurdle, A. B., G. S. Dite, et al. (1999). "Androgen receptor exon 1 CAG repeat length and breast cancer in women before age forty years." J Natl Cancer Inst 91(11): 961-6.

BACKGROUND: We conducted a population-based, case-control-family study to determine whether androgen receptor (AR) exon 1 polymorphic CAG repeat length (CAGn) was a risk factor for early-onset breast cancer in the Australian population. METHODS: Case subjects under 40 years of age at diagnosis of a first primary breast cancer and age-matched control subjects were interviewed to assess family history and other risk factors. AR CAGn length was determined for 368 case subjects and 284 control subjects. Distributions in the two groups were compared by linear and logistic regression, allowing adjustment for measured risk factors. All statistical tests were two-tailed. RESULTS: When analyzed as either a continuous or a dichotomous variable, there was no association between CAG, length and breast cancer risk, before or after adjustment for risk factors. Mean (95% confidence interval [CI]) CAGn lengths were 22.0 (21.8-22.2) for case subjects and 22.0 (21.7-22.3) for control subjects (P =.9). The frequency (95% CI) of alleles with 22 or more CAGn repeats was 0.531 (0.494-0.568) for case subjects and 0.507 (0.465-0.549) for control subjects (P =.4). After adjustment, the average effect on log OR (odds ratio) per allele was 0.16 (95% CI = -0.03 to 0.40; P =.2), and the effect of any allele was equivalent to an OR of 1.40 (95% CI = 0.94-2.09; P =.1). Stratification by family history also failed to reveal any association. Similar results were obtained when alleles were defined by other cutoff points. CONCLUSION: We found no evidence for an association between AR exon 1 CAGn length and breast cancer risk in women under the age of 40, despite having 80% power to detect modest effects.

Spurdle, A. B., P. M. Webb, et al. (2000). "Androgen receptor exon 1 CAG repeat length and risk of ovarian cancer." Int J Cancer 87(5): 637-43.

Epidemiological studies indicate that ovarian cancer is an endocrine-related tumour. We conducted a case-control comparison to assess the androgen receptor (AR) exon 1 polymorphic CAG repeat length (CAG(n)) as a risk factor for epithelial ovarian cancer. AR CAG(n) was determined for 319 case subjects with ovarian adenocarcinoma and 853 unaffected control subjects (comprising 300 unrelated adult female monozygotic twins, and 553 adult females sampled randomly from the population using the electoral rolls). The CAG(n) distributions of case subjects and control subjects were compared as a continuum, and by dichotomising alleles according to different CAG(n) cut-points. Logistic regression was used to calculate age-adjusted odds ratio (OR) estimates. Analyzed as a continuous variable, there was no difference between case subjects and control subjects for the smaller, larger or average allele sizes of the CAG(n) genotype, before or after adjusting for age. The mean (95% CI) for the average CAG(n) was 22.0 (21.8-22.2) for case subjects and 22.0 (21.9-22.1) for control subjects (p>.9). Analysis of CAG(n) as a dichotomous variable showed no difference between case subjects and control subjects for the median cutpoint (>/= 22), or for another cut-point previously reported to act as a modifier of breast cancer risk (>/= 29). Our data provide no evidence for an association between ovarian cancer risk and the genotype defined by the AR exon 1 CAG(n) polymorphism, although we cannot exclude small effects, or threshold effects in a small subgroup.

Stadlbacher, S., E. M. Dauber, et al. (2003). "The tetranucleotide repeat polymorphism C2_4_4: population data and linkage disequilibria with HLA class I." Immunobiology 207(2): 137-40.

The tetranucleotide repeat locus C2_4_4 situated in the HLA class I region (6p21.3) and the HLA-ABC specificities were investigated in an Austrian population sample of 240 unrelated Caucasoid individuals. The analysis of the linkage disequilibrium between C2_4_4 and HLA class I showed several significant values, especially when factors coded for by so-called "superhaplotypes" were considered; such linkage disequilibria are of importance for the practical use of HLA coded short tandem repeats.

Steffan, J. S. and L. M. Thompson (2003). "Targeting aggregation in the development of therapeutics for the treatment of Huntington's disease and other polyglutamine repeat diseases." Expert Opin Ther Targets 7(2): 201-13.

Huntington's disease (HD) is one of a number of familial polyglutamine (polyQ) repeat diseases. These neurodegenerative disorders are caused by expression of otherwise unrelated proteins that contain an expansion of a polyQ tract, rendering them toxic to specific subsets of vulnerable neurons. These expanded repeats have an inherent propensity to aggregate; insoluble neuronal nuclear and cytoplasmic polyQ aggregates or inclusions are hallmarks of the disorders [1,2]. In HD, inclusions in diseased brains often precede onset of symptoms, and have been proposed to be involved in pathogenicity [3-5]. Various strategies to block the process of aggregation have been developed in an effort to create drugs that decrease neurotoxicity. A discussion of the effect of antibodies, caspase inhibitors, chemical inhibitors, heat-shock proteins, suppressor peptides and transglutaminase inhibitors upon aggregation and disease is presented.

Stevanin, G., H. Fujigasaki, et al. (2003). "Huntington's disease-like phenotype due to trinucleotide repeat expansions in the TBP and JPH3 genes." Brain.

We report a group of 252 patients with a Huntington's disease-like (HDL) phenotype, including 60 with typical Huntington's disease, who had tested negative for pathological expansions in the IT15 gene, the major mutation in Huntington's disease. They were screened for repeat expansions in two other genes involved in HDL phenotypes: those encoding the junctophilin-3 (JPH3/HDL2) and prion (PRNP/HDL1) proteins. In addition, because of the clinical overlap between patients with HDL disease and autosomal dominant cerebellar ataxia or dentatorubral and pallidoluysian atrophy (DRPLA), we investigated trinucleotide repeat expansions in genes encoding the TATA-binding protein (TBP/SCA17) and atrophin-1 (DRPLA). Two patients carried 43 and 50 uninterrupted CTG repeats in the JPH3 gene. Two other patients had 44 and 46 CAA/CAG repeats in the TBP gene. Patients with expansions in the TBP or JPH3 genes had HDL phenotypes indistinguishable from Huntington's disease. Taking into account patients with typical Huntington's disease, their frequencies were evaluated as 3% each in our series of typical HDL patients. Interestingly, incomplete penetrance of the 46 CAA/CAG repeat in the TBP gene was observed in a 59-year-old transmitting, but healthy, parent. Furthermore, we report a new configuration of the expanded TBP allele, with 11 repeats on the first polymorphic stretch of CAGs. Expansions in the DRPLA gene and insertions in the PRNP gene were not found in our group of patients. Further genetic heterogeneity of the HDL phenotype therefore exists.

Sun, L., Z. Li, et al. (2003). "Pentanucleotide TTTTA Repeat Polymorphism of Apolipoprotein(a) Gene and Plasma Lipoprotein(a) Are Associated With Ischemic and Hemorrhagic Stroke in Chinese. A Multicenter Case-Control Study in China." Stroke.

BACKGROUND AND PURPOSE: It is still inconclusive whether high plasma lipoprotein(a) [Lp(a)] level is a risk factor for stroke. Small sample size and different ethnic groups and methodologies might be contributors to the conflicts in study results. The purpose of the present study was to investigate the association between plasma Lp(a) levels, pentanucleotide TTTTA repeat (PNTR) polymorphism of the apolipoprotein(a) [apo(a)] gene, and Chinese stroke in a case-control study. METHODS: We recruited 1825 cases with stroke (44.3% cerebral atherothrombosis, 28.3% lacunar infarction, and 27.3% intracerebral hemorrhage) and 1817 controls from 7 centers in China. Lp(a) concentrations were quantified by enzyme-linked immunosorbent assay. The PNTR polymorphism of the apo(a) gene was determined by polymerase chain reaction-polyacrylamide gel electrophoresis. Conditional multivariate logistic regression analysis was used to identify independent risk factors for stroke and its subtypes. RESULTS: Lp(a) levels were significantly higher in cases than in controls (median, 28.5 versus 23.1 mg/dL; P<0.001), leading to a 1.97-fold (95% CI, 1.64 to 2.37) increase in risk for overall stroke, 2.0-fold (95% CI, 1.59 to 2.52) increase for atherothrombotic type, 2.05-fold increase (95% CI, 1.59 to 2.63) for lacunar type, and 1.64-fold increase (95% CI, 1.21 to 2.21) for hemorrhagic type. The number of PNTR negatively correlated with Lp(a) levels. Low-number repeats (sum of both alleles <16) of apo(a) PNTR were associated with both atherothrombotic stroke (odds ratio, 1.41; 95% CI, 1.04 to 1.91) and hemorrhagic stroke (odds ratio, 1.62; 95% CI, 1.09 to 2.37). CONCLUSIONS: Our results indicate for the first time that low numbers of apo(a) PNTR and plasma Lp(a) levels are independently associated with both ischemic and hemorrhagic stroke in Chinese.

Sutherland, G. R. and R. I. Richards (1995). "Simple tandem DNA repeats and human genetic disease." Proc Natl Acad Sci U S A 92(9): 3636-41.

The human genome contains many repeated DNA sequences that vary in complexity of repeating unit from a single nucleotide to a whole gene. The repeat sequences can be widely dispersed or in simple tandem arrays. Arrays of up to 5 or 6 nt are known as simple tandem repeats, and these are widely dispersed and highly polymorphic. Members of one group of the simple tandem repeats, the trinucleotide repeats, can undergo an increase in copy number by a process of dynamic mutation. Dynamic mutations of the CCG trinucleotide give rise to one group of fragile sites on human chromosomes, the rare folate-sensitive group. One member of this group, the fragile X (FRAXA) is responsible for the most common familial form of mental retardation. Another member of the group FRAXE is responsible for a rarer mild form of mental retardation. Similar mutations of AGC repeats give rise to a number of neurological disorders. The expanded repeats are unstable between generations and somatically. The intergenerational instability gives rise to unusual patterns of inheritance--particularly anticipation, the increasing severity and/or earlier age of onset of the disorder in successive generations. Dynamic mutations have been found only in the human species, and possible reasons for this are considered. The mechanism of dynamic mutation is discussed, and a number of observations of simple tandem repeat mutation that could assist in understanding this phenomenon are commented on.

Swanson, J. M., G. A. Sunohara, et al. (1998). "Association of the dopamine receptor D4 (DRD4) gene with a refined phenotype of attention deficit hyperactivity disorder (ADHD): a family-based approach." Mol Psychiatry 3(1): 38-41.

Previously in this journal, we reported an association of the dopamine D4 receptor gene (DRD4) and attention deficit hyperactivity disorder (ADHD). In a population-association (case-control) study of 39 children with a refined phenotype of ADHD and 39 ethnically matched controls, we observed an increased percentage of the 7 repeat allele (29% vs 12%) and the 7+ genotype (49% vs 21%) in the ADHD group compared to the control group. In a replication and an extension of our initial study, we recruited another sample of ADHD subjects and found percentages of the 7 repeat allele (28%) and the 7+ genotype (48%) consistent with our previous findings. We used a family-based approach to evaluate a predicted association of DRD4 and ADHD based on a test of allele transmission focused on the 7 repeat allele. We identified 52 families based on the diagnosis of the refined phenotype of ADHD in the proband and the availability of DNA from both biological parents as well as the proband. Haplotype relative risk (HRR) analysis was performed to test our a priori hypothesis and produced significant results (chi-square = 4.65, P < 0.035). This provides additional evidence that the DRD4 gene is associated with a refined phenotype of ADHD.

Sylvestre, P., E. Couture-Tosi, et al. (2003). Polymorphism in the collagen-like region of the Bacillus anthracis BclA protein leads to variation in exosporium filament length. J Bacteriol. 185: 1555-63.

We recently identified a Bacillus anthracis glycoprotein which is a structural constituent of the exosporium filaments (P. Sylvestre, E. Couture-Tosi, and M. Mock, Mol. Microbiol. 45:169-178, 2002). This Bacillus collagen-like protein (BclA) contains an internal collagen-like region (CLR) of GXX repeats which includes a large proportion of GPT triplets. Here, we report that the polymorphic marker Ceb-Bams13, for which there are nine alleles (P. Le Fleche et al., BMC Microbiol. 1:2, 2001), maps within the open reading frame encoding BclA. The bclA gene in 11 B. anthracis strains representative of seven Ceb-Bams13 alleles was sequenced and compared to the Ames bclA gene sequence. The amino- and carboxy-terminal sequences surrounding the CLR are conserved. The CLR itself is highly polymorphic: it contains between 17 and 91 GXX repeats and one to eight copies of the 21-amino-acid sequence (GPT)(5)GDTGTT, named the BclA repeat. The length of the filament on the spore surface differed between the strains. We exchanged the bclA gene between strains with different CLRs and examined the spore surfaces by electron microscopy analysis. The length of the BclA CLR is responsible for the variation in filament length.

Takara, M., T. Kouki, et al. (2003). "CTLA-4 AT-repeat polymorphism reduces the inhibitory function of CTLA-4 in Graves' disease." Thyroid 13(12): 1083-9.

Graves' disease (GD) is thought to be an autoimmune disease with a strong genetic component. Candidate genes include human leukocyte antigen (HLA) class II genes and CTLA-4. The CTLA-4 gene has a variable length AT-repeat polymorphism in the 3'-untranslated region. We previously found that the AT-repeat of 104 bp or longer was associated with GD. In this study, we categorized patients with GD and normal controls (NC) by genotyping the CTLA-4 AT-repeat and investigated the function of CTLA-4. Peripheral blood mononuclear cells (PBMC) and DNA were prepared from adult Caucasians (NC = 34, GD = 37). Genotypes of the AT-repeat polymorphism were divided into three groups according to their alleles. We related the CTLA-4 polymorphism in each genotype to augmentation of T-cell proliferation induced by a soluble anti-CTLA-4 antibody during incubation with irradiated Epstein-Barr virus (EBV)-transformed B cells. Proliferation of T cells from subjects with the 86/86 bp (shorter) allele was less than T cells from patients with longer alleles. The length of the AT-repeat allele correlated inversely with augmentation of proliferation after CTLA-4 blockade in subjects with GD. The CTLA-4 AT-repeat polymorphism affects the inhibitory function of CTLA-4. The long AT-repeat allele is associated with reduced control of T-cell proliferation and thus contributes to the pathogenesis of GD.

Tatemichi, M., T. Sawa, et al. (2005). "Increased risk of intestinal type of gastric adenocarcinoma in Japanese women associated with long forms of CCTTT pentanucleotide repeat in the inducible nitric oxide synthase promoter." Cancer Lett 217(2): 197-202.

Tandem repeat number polymorphism of a CCTTT pentanucleotide in the promoter region of the inducible nitric oxide synthase gene (iNOS) and a polymorphism of the interleukin-1beta (IL-1B) promoter at position -31 were analyzed in DNA samples from 181 Japanese control subjects and 158 gastric cancer patients, including 96 intestinal type and 62 diffuse type. An association between the intestinal type of gastric adenocarcinoma and higher promoter activity of the iNOS gene was found in women, especially those having higher promoter activity of the IL-1B gene and without a history of smoking. Our results imply that chronic inflammation caused by excess nitric oxide generated by iNOS contributes to Helicobacter pylori-induced gastric cancer.

Tomita, N., R. Fujita, et al. (2002). "Effects of triplet repeat sequences on nucleosome positioning and gene expression in yeast minichromosomes." Nucleic Acids Res Suppl(2): 231-2.

Triplet repeat sequences that cause human hereditary diseases can form a variety of DNA conformations. Since DNA structures act as determinants of chromatin structure, chromatin may be involved in mechanisms of these diseases. To address this issue, we examined effects of triplet repeat sequences on chromatin structure and gene expression in Saccharomyces cerevisiae. We show here that (1) (CTG)12 promotes nucleosome formation, (2) (CGG)12 disrupts an array of positioned nucleosomes, and (3) (GAA)12 has little effect on nucleosome formation. Also, we show that insertion of (CGG)12 increases gene expression of a UAS-less promoter about 10-fold, while (CTG)12 and (GAA)12 have no effect. Thus, expansion of triplet repeat sequences may cause improper expression of disease related genes, through their effects on chromatin structure.

Toth, G., Z. Gaspari, et al. (2000). "Microsatellites in different eukaryotic genomes: survey and analysis." Genome Res 10(7): 967-81.

We examined the abundance of microsatellites with repeated unit lengths of 1-6 base pairs in several eukaryotic taxonomic groups: primates, rodents, other mammals, nonmammalian vertebrates, arthropods, Caenorhabditis elegans, plants, yeast, and other fungi. Distribution of simple sequence repeats was compared between exons, introns, and intergenic regions. Tri- and hexanucleotide repeats prevail in protein-coding exons of all taxa, whereas the dependence of repeat abundance on the length of the repeated unit shows a very different pattern as well as taxon-specific variation in intergenic regions and introns. Although it is known that coding and noncoding regions differ significantly in their microsatellite distribution, in addition we could demonstrate characteristic differences between intergenic regions and introns. We observed striking relative abundance of (CCG)(n)*(CGG)(n) trinucleotide repeats in intergenic regions of all vertebrates, in contrast to the almost complete lack of this motif from introns. Taxon-specific variation could also be detected in the frequency distributions of simple sequence motifs. Our results suggest that strand-slippage theories alone are insufficient to explain microsatellite distribution in the genome as a whole. Other possible factors contributing to the observed divergence are discussed.

Twells, R. C., C. A. Mein, et al. (2003). "Haplotype Structure, LD Blocks, and Uneven Recombination Within the LRP5 Gene." Genome Res 13(5): 845-55.

Patterns of linkage disequilibrium (LD) in the human genome are beginning to be characterized, with a paucity of haplotype diversity in "LD blocks," interspersed by apparent "hot spots" of recombination. Previously, we cloned and physically characterized the low-density lipoprotein-receptor-related protein 5 (LRP5) gene. Here, we have extensively analysed both LRP5 and its flanking three genes, spanning 269 kb, for single nucleotide polymorphisms (SNPs), and we present a comprehensive SNP map comprising 95 polymorphisms. Analysis revealed high levels of recombination across LRP5, including a hot-spot region from intron 1 to intron 7 of LRP5, where there are 109 recombinants/Mb (4882 meioses), in contrast to flanking regions of 14.6 recombinants/Mb. This region of high recombination could be delineated into three to four hot spots, one within a 601-bp interval. For LRP5, three haplotype blocks were identified, flanked by the hot spots. Each LD block comprised over 80% common haplotypes, concurring with a previous study of 14 genes that showed that common haplotypes account for at least 80% of all haplotypes. The identification of hot spots in between these LD blocks provides additional evidence that LD blocks are separated by areas of higher recombination.

Ujike, H., M. Harano, et al. (2003). "Nine- or fewer repeat alleles in VNTR polymorphism of the dopamine transporter gene is a strong risk factor for prolonged methamphetamine psychosis." Pharmacogenomics J 3(4): 242-7.

Susceptibility to drug dependence and drug-induced psychoses is influenced not only by the pharmacological effects of the drug but also by the genetic factors of the individual. To clarify the latter, we investigated the association between methamphetamine (METH) dependence/psychosis and the hDAT1 gene (SLC6A3) encoding the dopamine transporter, which is the primary site of METH activity in the brain. Four exonic polymorphisms of the hDAT1 gene, 242C/T (exon 2), 1342A/G (exon 9), 2319G/A (3'UTR), and VNTR (3'UTR) were examined. Although there was no significant difference in genotypic and allelic distribution of the four polymorphisms between all METH dependence/psychosis patients (N=124) and controls (N=160), the patients with METH psychosis lasting for 1 month or more after discontinuance of METH consumption showed a significant excess of nine- or fewer repeat alleles of the VNTR in 3'UTR of the hDAT1 gene (P=0.0054, OR=4.24, 95% CI=2.46-7.31). The present study demonstrated that the presence of nine- or fewer repeat alleles of hDAT1 is a strong risk factor for a worse prognosis of METH psychosis.The Pharmacogenomics Journal (2003) 3, 242-247. doi:10.1038/sj.tpj.6500189

Valenti, K., E. Aveynier, et al. (1999). "Contribution of apolipoprotein(a) size, pentanucleotide TTTTA repeat and C/T(+93) polymorphisms of the apo(a) gene to regulation of lipoprotein(a) plasma levels in a population of young European Caucasians." Atherosclerosis 147(1): 17-24.

Several studies indicate that the inter-individual variation in plasma concentrations of lipoprotein(a) (Lp(a)) is mainly under genetic control. To define the effect of three DNA polymorphisms on apolipoprotein(a) (apo(a)) expression, we have determined plasma Lp(a) concentrations, apo(a) isoform size, KpnI allele size, the TTTTA pentanucleotide repeat number in the 5' control region of the apo(a) gene and the +93 C/T polymorphism in a European Caucasian population. The simultaneous determination of the kringle 4 (K4) number by genotyping and by phenotyping revealed that the size distribution of non-expressed apo(a) alleles was markedly skewed towards alleles with greater than 25 K4 repeats. This is consistent with the inverse relationship frequently described between the kringle 4 number and the plasma Lp(a) level. Apportioning the Lp(a) concentration from the surface of the peaks on apo(a) phenotyping blots, we have observed that the Lp(a) plasma concentration associated with alleles having more than 25 K4 units does not exceed 400 mg/l, whereas the range of Lp(a) concentrations associated with smaller alleles was broad, from 0 to more than 1000 mg/l. It can thus be concluded that the number of K4 repeats is the main determinant of Lp(a) concentration when this number is more than 25, whereas other polymorphisms may be involved in the alleles with fewer than 26 K4. Analyses of the TTTTA repeat number and of the +93 C/T polymorphism were performed in subjects with KpnI alleles of the same length: low Lp(a) concentrations were shown to be preferentially associated with the presence of apo(a) alleles with more than eight pentanucleotide repeats while no association was revealed between Lp(a) plasma levels and the C/T polymorphism. These results demonstrate that the (TTTTA)(n) polymorphism affects the Lp(a) expression independently of apo(a) size polymorphism.

van Belkum, A., S. Scherer, et al. (1998). "Short-sequence DNA repeats in prokaryotic genomes." Microbiol Mol Biol Rev 62(2): 275-93.

Short-sequence DNA repeat (SSR) loci can be identified in all eukaryotic and many prokaryotic genomes. These loci harbor short or long stretches of repeated nucleotide sequence motifs. DNA sequence motifs in a single locus can be identical and/or heterogeneous. SSRs are encountered in many different branches of the prokaryote kingdom. They are found in genes encoding products as diverse as microbial surface components recognizing adhesive matrix molecules and specific bacterial virulence factors such as lipopolysaccharide-modifying enzymes or adhesins. SSRs enable genetic and consequently phenotypic flexibility. SSRs function at various levels of gene expression regulation. Variations in the number of repeat units per locus or changes in the nature of the individual repeat sequences may result from recombination processes or polymerase inadequacy such as slipped-strand mispairing (SSM), either alone or in combination with DNA repair deficiencies. These rather complex phenomena can occur with relative ease, with SSM approaching a frequency of 10(-4) per bacterial cell division and allowing high-frequency genetic switching. Bacteria use this random strategy to adapt their genetic repertoire in response to selective environmental pressure. SSR-mediated variation has important implications for bacterial pathogenesis and evolutionary fitness. Molecular analysis of changes in SSRs allows epidemiological studies on the spread of pathogenic bacteria. The occurrence, evolution and function of SSRs, and the molecular methods used to analyze them are discussed in the context of responsiveness to environmental factors, bacterial pathogenicity, epidemiology, and the availability of full-genome sequences for increasing numbers of microorganisms, especially those that are medically relevant.

Veeraraghavan, J., M. Rossi, et al. (2003). "Analysis of DNA replication intermediates suggests mechanisms of repeat sequence expansion." J Biol Chem.

We previously developed a system to investigate the mechanism of repeat sequence expansion during eukaryotic Okazaki fragment processing. Upstream and downstream primers were annealed to a complementary template to overlap across a CAG repeat region. An nealing by the competing primers lead to structural intermediates that ligated to expand the repeat segment. When an equal number of repeats overlapped on the upstream and downstream primers, a 2-fold expansion was expected, but no expansion occurred. W e show here that such substrates do not expand irrespective of their repeat length. To reveal mechanism, we tested different hairpin loop intermediates expected to form and facilitate ligation. Substrates configured to form large loops in either the ups tream or downstream primer alone, allowed expansion. Large or small fixed position single loops allowed expansion when located at least six nucleotides up- or downstream of the nick. Fixed loops in both primers, simulating a double loop intermediate, all owed expansion as long as each loop was 9 nucleotides from the nick. Thus neither the double-loop configuration required to form with equal length overlaps nor the large single loop configuration are fundamental structural impediments to expansion. We p ropose a model for the expansion mechanism based on the relative stabilities of single loop, double loop, hairpin and flap intermediates that is consistent with the observed expansion efficiency of equal and unequal overlap substrates. The model sug gests that the equilibrium concentration of double loop intermediates is so vanishingly small that they are not likely contributors to sequence expansion.

Verkerk, A. J., M. Pieretti, et al. (1991). "Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome." Cell 65(5): 905-14.

Fragile X syndrome is the most frequent form of inherited mental retardation and is associated with a fragile site at Xq27.3. We identified human YAC clones that span fragile X site-induced translocation breakpoints coincident with the fragile X site. A gene (FMR-1) was identified within a four cosmid contig of YAC DNA that expresses a 4.8 kb message in human brain. Within a 7.4 kb EcoRI genomic fragment, containing FMR-1 exonic sequences distal to a CpG island previously shown to be hypermethylated in fragile X patients, is a fragile X site-induced breakpoint cluster region that exhibits length variation in fragile X chromosomes. This fragment contains a lengthy CGG repeat that is 250 bp distal of the CpG island and maps within a FMR-1 exon. Localization of the brain-expressed FMR-1 gene to this EcoRI fragment suggests the involvement of this gene in the phenotypic expression of the fragile X syndrome.

Versteeg, R., B. D. Van Schaik, et al. (2003). "The Human Transcriptome Map Reveals Extremes in Gene Density, Intron Length, GC Content, and Repeat Pattern for Domains of Highly and Weakly Expressed Genes." Genome Res.

The chromosomal gene expression profiles established by the Human Transcriptome Map (HTM) revealed a clustering ofhighly expressed genes in about 30 domains, called ridges. To physically characterize ridges, we constructed a new HTM based on the draft human genome sequence (HTMseq). Expression of 25,003 genes can be analyzed online in a multitude oftissues (http://bioinfo.amc.uva.nl/HTMseq). Ridges are found to be very gene-dense domains with a high GC content, a high SINE repeat density, and a low LINE repeat density. Genes in ridges have significantly shorter introns than genes outside of ridges. The HTMseq also identifies a significant clustering ofweakly expressed genes in domains with fully opposite characteristics (antiridges). Both types of domains are open to tissue-specific expression regulation, but the maximal expression levels in ridges are considerably higher than in antiridges. Ridges are therefore an integral part ofa higher order structure in the genome related to transcriptional regulation.

Waterston, R. H., K. Lindblad-Toh, et al. (2002). "Initial sequencing and comparative analysis of the mouse genome." Nature 420(6915): 520-62.

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

Weber, J. L. and P. E. May (1989). "Abundant class of human DNA polymorhisms which can be typed using the polymerare chain reaction." Am J Hum Genet 44: 288-396.

Wren, J. D., E. Forgacs, et al. (2000). "Repeat polymorphisms within gene regions: phenotypic and evolutionary implications." Am J Hum Genet 67(2): 345-56.

We have developed an algorithm that predicted 11,265 potentially polymorphic tandem repeats within transcribed sequences. We estimate that 22% (2,207/9,717) of the annotated clusters within UniGene contain at least one potentially polymorphic locus. Our predictions were tested by allelotyping a panel of approximately 30 individuals for 5% of these regions, confirming polymorphism for more than half the loci tested. Our study indicates that tandem-repeat polymorphisms in genes are more common than is generally believed. Approximately 8% of these loci are within coding sequences and, if polymorphic, would result in frameshifts. Our catalogue of putative polymorphic repeats within transcribed sequences comprises a large set of potentially phenotypic or disease-causing loci. In addition, from the anomalous character of the repetitive sequences within unannotated clusters, we also conclude that the UniGene cluster count substantially overestimates the number of genes in the human genome. We hypothesize that polymorphisms in repeated sequences occur with some baseline distribution, on the basis of repeat homogeneity, size, and sequence composition, and that deviations from that distribution are indicative of the nature of selection pressure at that locus. We find evidence of selective maintenance of the ability of some genes to respond very rapidly, perhaps even on intragenerational timescales, to fluctuating selective pressures.

Yamada, N., M. Yamaya, et al. (2000). "Microsatellite polymorphism in the heme oxygenase-1 gene promoter is associated with susceptibility to emphysema." Am J Hum Genet 66(1): 187-95.

Cigarette smoke, containing reactive oxygen species, is the most important risk factor for chronic pulmonary emphysema (CPE). Heme oxygenase-1 (HO-1) plays a protective role as an antioxidant in the lung. A (GT)n dinucleotide repeat in the 5'-flanking region of human HO-1 gene shows length polymorphism and could modulate the level of gene transcription. To investigate the correlation between the length of the (GT)n repeat and susceptibility to the development of CPE, we screened the frequencies of alleles with varying numbers of (GT)n repeats in the HO-1 gene in 101 smokers with CPE and in 100 smokers without CPE. Polymorphisms of the (GT)n repeat were grouped into three classes: class S alleles (<25 repeats), class M alleles (25-29 repeats), and class L alleles (>/=30 repeats). The proportion of allele frequencies in class L, as well as the proportion of genotypic frequencies in the group with class L alleles (L/L, L/M, and L/S), was significantly higher in the smokers with CPE than in smokers without CPE. Moreover, we analyzed the promoter activities of the HO-1 gene carrying different (GT)n repeats (n=16, 20, 29, and 38), by transient-transfection assay in cultured cell lines. H2O2 exposure up-regulated the transcriptional activity of the HO-1 promoter/luciferase fusion genes with (GT)16 or (GT)20 but did not do so with (GT)29 or (GT)38. These findings suggest that the large size of a (GT)n repeat in the HO-1 gene promoter may reduce HO-1 inducibility by reactive oxygen species in cigarette smoke, thereby resulting in the development of CPE.

Yang, S. W., D. H. Kim, et al. (2003). "Expression of the telomeric repeat binding factor gene NgTRF1 is closely coordinated with the cell division program in tobacco BY-2 suspension culture cells." J Biol Chem.

Telomeres are vital for preserving chromosome integrity during the cell division. Several genes encoding potential telomere-binding proteins have recently been identified in higher plants, but nothing is known about their function or regulation during the cell division. In this study, we have isolated and characterized a cDNA clone, pNgTRF1, encoding a putative double-stranded telomeric repeat binding factor of N. glutinosa, a diploid tobacco plant. The predicted protein sequence of NgTRF1 (Mr=75 kDa) contains a single Myb-like domain with significant homology to a corresponding motif in human TRF1/Pin2 and TRF2. Gel retardation assays revealed that bacterially- expressed full-length NgTRF1 was able to form a specific complex only with probes containing three or more contiguous telomeric TTTAGGG repeats. The Myb-like domain of NgTRF1 is essential, but not sufficient, to bind the telomeric repeat sequence. The glutamine-rich extreme C-terminal region, which does not exist in animal proteins, was additionally required to form a specific telomere-protein complex. The dissociation constant (Kd) of the Myb motif plus the glutamine-rich domain of NgTRF1 to the two-telomeric repeat sequence was evaluated to be 4.5A0.2 X 10-9 M which is comparable to that of the Myb domain of human TRF1. Expression analysis showed that NgTRF1 gene activity was inversely correlated with the cell division capacity of tobacco root cells and during the 9-d culture period of BY-2 suspension cells, while telomerase activity was positively correlated with cell division. In synchronized BY-2 cells, NgTRF1 was selectively expressed in G1-phase, whereas telomerase activity peaked in S-phase. These findings suggest that telomerase activity and NgTRF1 expression are differentially regulated in an opposing fashion during the growth and cell division in tobacco plants. The possible physiological functions of NgTRF1 in tobacco cells are also discussed.

Yeramian, E. and H. Buc (1999). "Tandem repeats in complete bacterial genome sequences: sequence and structural analyses for comparative studies." Res Microbiol 150(9-10): 745-54.

A series of complete bacterial genome sequences have recently become available and powerful methods have been developed for the identification of tandem repeats on a very large scale. It is thus possible to derive extensive comparative descriptions of such repeats at the level of complete genomes, as illustrated here for three different bacterial genomes: Escherichia coli, Haemophilus influenzae, and Mycobacterium tuberculosis. Such sequence analyses can be usefully complemented by structural characterisations of the repeats.

Yoshida, M., T. Tamura, et al. (2003). "Analysis of numbers of repeated units in R2 region among varicella-zoster virus strains." J Dermatol Sci 31(2): 129-33.

BACKGROUND: A variable region, R2, on the varicella-zoster virus (VZV) genome contains a repeated 42-bp unit. OBJECTIVE: The purpose of this study is the derivation of significance from tandem reiteration structure in the R2 region. METHODS: Fifty-two specimens were collected from 52 patients with herpes zoster in Osaka and Tokyo, Japan. After treatment of the specimens to release viral DNA, the samples were amplified directly by polymerase chain reaction. In addition, 14 samples were collected from 7 of these zoster patients after valaciclovir or aciclovir therapy. RESULTS: Analyses of the 52 specimens revealed that the number of repeats ranged from 4 to 13. Interestingly, the numbers of repeats among various VZV strains showed a normal distribution pattern, so that 6-9 repeats were found to be predominant in both Osaka (85%) and Tokyo (72%). The pre- and post-treatment strains taken from the same individuals showed the same numbers of repeats (7-9 in 6 cases and 11 in one). CONCLUSION: Our results suggest that the 6-9 repetitions of the 42-bp unit, with presumed stability, may offer these virus strains an advantage in virulence to human skin.

Yu, M. W., Y. C. Yang, et al. (2002). "Androgen receptor exon 1 CAG repeat length and risk of hepatocellular carcinoma in women." Hepatology 36(1): 156-63.

The androgen receptor (AR) gene is localized on chromosome X, and shorter CAG repeats in exon 1 of the AR gene were recently suggested to increase hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC) risk among men. To examine whether the relationship between the AR-CAG repeats and HCC was also evident among women, we conducted a case-control study in Taiwan. The number of AR-CAG repeats was determined for 238 women with HCC and 354 unrelated control subjects (comprising 188 first-degree and 166 nonbiological relatives) selected from female relatives of patients with HCC. Women harboring 2 AR alleles with more than 23 CAG repeats had an increased risk of HCC (age-adjusted odds ratio [OR], 1.82; 95% CI, 1.06-3.14), compared with women with only short alleles or a single long allele. The association between harboring 2 AR alleles containing longer CAG repeats and HCC was more striking among HBV carriers (age-adjusted OR for more than 22 repeats, 2.23; 95% CI, 1.14-4.34) and particularly prominent among HBV carriers under age 53 years (age-adjusted OR, 3.16; 95% CI, 1.13-8.82). When CAG repeats were analyzed as a continuous variable, the increase in HCC risk associated with each incremental repeat in the shorter of 2 alleles in a given genotype was statistically significant among women with a first-degree relative with HCC (age-adjusted OR, 1.18; 95% CI, 1.01-1.37). No such relationship was detected among women without the family history. In conclusion, our observations suggest that the AR-CAG alleles may contribute to HCC predisposition among women through a mechanism different from that for men.

Yue, C. M., M. X. Bi, et al. (2004). "Short tandem repeat polymorphism in a novel esophageal cancer-related gene (ECRG2) implicates susceptibility to esophageal cancer in Chinese population." Int J Cancer 108(2): 232-6.

We have previously cloned and identified a novel esophageal cancer related gene 2 (ECRG2; GenBank Accession Number AF268198), which is down-regulated in esophageal squamous cell carcinoma (ESCC) and involved in the induction of the apoptosis in esophageal cancer cell lines. In the present study, we have found a short tandem repeat (STR) polymorphism in the noncoding region of the exon 4 of the ECRG2 gene by using PCR-denaturing high-performance liquid chromatography (DHPLC). Three STR genotypes, TCA(3)/TCA(3), TCA(3)/TCA(4) and TCA(4)/TCA(4) were revealed and confirmed by DNA sequencing analysis. A total of 661 objects including 228 patients with ESCC and 373 normal controls were analyzed to investigate the impact of this ECRG2 STR polymorphism on risk of ESCC in case-control studies. Genotypes were determined in 231 controls and 162 cases from Beijing, which is a low risk area of ESCC, and in 142 controls and 126 cases from Linxian, a well-known high-risk area of ESCC. In both of the Beijing and Linxian population, subjects who carried the TCA(3)/TCA(3) genotype were at an increased risk of ESCC compared to those carrying the TCA(4)/TCA(4) genotype, with the adjusted odds ratios (ORs) being 2.05 [95% confidence interval (CI), 1.02-4.06] for the subjects from Beijing and 4.40 (95% CI, 1.93-10.01) for the subjects from Linxian. Furthermore, comparison of the genotype distributions among other cancer sites might suggest that risk of the ECRG2 STR polymorphism might be specific to the esophagus. These findings indicate for the first time that the ECRG2 STR is a genetic susceptibility factor for ESCC and the TCA(3)/TCA(3) allele might play a role in the development of this cancer.

Zhang, B., D. Schmoyer, et al. (2004). "GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies." BMC Bioinformatics 5(1): 16.

BACKGROUND: Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. RESULTS: We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at http://genereg.ornl.gov/gotm/. CONCLUSION: GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets.

Zitzmann, M., M. Depenbusch, et al. (2003). "Prostate volume and growth in testosterone-substituted hypogonadal men are dependent on the CAG repeat polymorphism of the androgen receptor gene: a longitudinal pharmacogenetic study." J Clin Endocrinol Metab 88(5): 2049-54.

Testosterone (T) substitution in hypogonadal men results in growth of the prostate gland. T effects are mediated via the androgen receptor (AR). The length of the (CAG)n polymorphism of the AR gene is negatively associated with transcriptional activity and might account for variations in prostate growth during substitution therapy. In 131 hypogonadal men aged 18-69 yr, we assessed prostate volume longitudinally by transrectal ultrasonography and determined AR (CAG)n, sex hormone levels, and anthropometric measures. Sixty-nine men with primary and 62 with secondary hypogonadism began substitution therapy with im injections of T enanthate (n = 81), transdermal T preparations (n = 19), sc injections of human chorionic gonadotropin (n = 17), or oral T undecanoate (n = 14) for 2.4 +/- 0.8 yr. Average prostate size increased from 15.8 +/- 6.1 ml to 23.0 +/- 6.8 ml. ANOVA including covariates revealed initial prostate size to be dependent on age (P < 0.001) and baseline T levels (P = 0.01) but not on number of (CAG)n (ranging from 13-30; mean, 21.4 +/- 3.5). Prostate growth per year and absolute prostate size under substituted T levels (6.1 +/- 3.3 to 21.6 +/- 10.3 nmol/liter) were strongly dependent on (CAG)n, with lower treatment effects in longer repeats (both P < 0.001). Other significant predictors were initial prostate size (negative for growth rate and positive for absolute size) and age (positive for both growth rate and absolute size). The odds ratio for men with (CAG)n less than 20, compared with those with (CAG)n of 20 or more to develop a prostate size of at least 30 ml under T substitution, was 8.7 (95% confidence interval, 3.1-24.3; P < 0.001). This observation was strongly age dependent with a more pronounced odds ratio in men older than 40 yr. This first pharmacogenetic study on androgen substitution in hypogonadal men demonstrates a marked influence of the AR gene (CAG)n polymorphism on prostate growth.

Zitzmann, M., J. Gromoll, et al. (2003). "The CAG repeat polymorphism in the androgen receptor gene modulates body fat mass and serum concentrations of leptin and insulin in men." Diabetologia 46(1): 31-9.

AIMS/HYPOTHESIS: The relationship of androgens to the metabolic syndrome has not been resolved. The polymorphic number of CAG repeats within the androgen receptor gene is inversely associated with the transcriptional activity of target genes.This polymorphism might thus influence testosterone effects on body fat content and serum concentrations of leptin and insulin. The direct and indirect role of androgens within the metabolic syndrome should become clearer if this genetically determined effector is taken into account. METHODS: The hypothesis was investigated in a cross-sectional study involving 106 healthy 20-50 year old males. RESULTS: Multiple regression models showed a positive independent correlation of the CAG repeat number with body fat content, leptin and insulin (partial r=0.39, 0.36 and 0.28, p<0.001, p<0.001 and p=0.006, respectively). Factor analysis yielded a five-dimensional model: two dimensions were influenced by the androgen receptor polymorphism, namely "body composition" which consisted of leptin, body fat mass, insulin, the number of CAG repeats (positive loadings) and physical activity (negative loading), and "lipid profile" which comprised low density lipoprotein cholesterol, cigarette smoking, triglycerides (positive loadings) as well as high density lipoprotein cholesterol and number of CAG repeats (negative loadings). CONCLUSIONS/INTERPRETATION: A low number of CAG repeats were independently associated with protective parameters (low body fat mass and plasma insulin) as well as with adverse parameters (low high density lipoprotein cholesterol concentrations). This suggests that the pivotal role of this polymorphism in modulating androgen effects on cardiovascular risk factors is of a complex nature and implies that its clinical impact, similar to that of androgens, is dependent on exogenous cofactors.