出典(authority):フリー百科事典『ウィキペディア(Wikipedia)』「2014/06/05 21:59:33」(JST)
A gene family is composed of several genes which share similar features. A gene cluster is part of a gene family. A gene cluster is a group of two or more genes found within an organism's DNA that encode for similar polypeptides, or proteins, which collectively share a generalized function and are located within a few thousand base pairs of each other. The size of gene clusters can vary significantly, from a few genes to several hundred genes.[1] Portions of the DNA sequence of each gene within a gene cluster are found to be identical; however, the resulting protein of each gene is distinctive from the resulting protein of another gene within the cluster. Genes found in a gene cluster may be observed near one another on the same chromosome or on different, but homologous chromosomes. Because of the homology of the DNA sequences, the presence of gene clusters on the same chromosome suggests a close evolutionary relationship between two species. Therefore, a gene cluster may be used to assess the evolutionary relationship among organisms. An example of a gene cluster is the Hox gene, which is made up of eight genes and is part of the Homeobox gene family.
Currently, five models exist which aim to explain the formation and persistence of gene clusters.
This model postulates that gene clusters were formed as a result of gene duplication and divergence.[2] Few cases of duplication and divergence have been observed in prokaryotic gene clusters. Gene clusters are less abundant in eukaryotic organisms, but the few gene clusters that exist arose as a result of gene duplication and divergence. These gene clusters include the Hox gene cluster, the human β-globin gene cluster, and four clustered human growth hormone (hGH)/chorionic somaomammotropin genes.[3]
Conserved gene clusters, such as Hox and the human β-globin gene cluster, may be formed as a result of the process of gene duplication and divergence. A gene is duplicated during cell division, so that its descendants have two end-to-end copies of the gene where it had one copy, initially coding for the same protein or otherwise having the same function. In the course of subsequent evolution, they diverge, so that the products they code for have different but related functions, with the genes still being adjacent on the chromosome.[4] Ohno theorized that the origin of new genes during evolution was dependent on gene duplication. If only a single copy of a gene existed in the genome of a species, the proteins transcribed from this gene would be essential to their survival. Because there was only a single copy of the gene, they could not undergo mutations which would potentially result in new genes; however, gene duplication allows essential genes to undergo mutations in the duplicated copy, which would ultimately give rise to new genes over the course of evolution.[5] Mutations in the duplicated copy were tolerated because the original copy contained genetic information for the essential gene's function. Species who have gene clusters have a selective evolutionary advantage because natural selection must keep the genes together.[1][6] Over a short span of time, the new genetic information exhibited by the duplicated copy of the essential gene would not serve a practical advantage; however, over a long, evolutionary time period, the genetic information in the duplicated copy may undergo additional and drastic mutations in which the proteins of the duplicated gene served a different role than those of the original essential gene.[5] Over the long, evolutionary time period, the two similar genes would diverge so the proteins of each gene were unique in their functions. Hox gene clusters, ranging in various sizes, are found among several phyla.
When gene duplication occurs to produce a gene cluster, one or multiple genes may be duplicated at once. In the case of the Hox gene, a shared ancestral ProtoHox cluster was duplicated, resulting in genetic clusters in the Hox gene as well as the ParaHox gene, an evolutionary sister complex of the Hox gene.[7] It is unknown the exact number of genes contained in the duplicated Protohox cluster; however, models exist suggesting that the duplicated Protohox cluster originally contained four, three, or two genes.[8]
In the case where a gene cluster is duplicated, some genes may be lost. Loss of genes is dependent of the number of genes originating in the gene cluster. In the four gene model, the ProtoHox cluster contained four genes which resulted in two twin clusters: the Hox cluster and the ParaHox cluster.[7] As its name indicates, the two gene model gave rise to the Hox cluster and the ParaHox cluster as a result of the ProtoHox cluster which contained only two genes. The three gene model was originally proposed in conjunction with the four gene model;[8] however, rather than the Hox cluster and the ParaHox cluster resulting from a cluster containing three genes, the Hox cluster and ParaHox cluster were as a result of single gene tandem duplication, identical genes found adjacent on the same chromosome.[7] This was independent of duplication of the ancestral ProtoHox cluster.
Gene duplication may occur via cis-duplication or trans duplication. Cis-duplication, or intrachromosomal duplication, entails the duplication of genes within the same chromosome whereas trans duplication, or interchromosomal duplication, consists of duplicating genes on neighboring but separate chromosomes.[7] The formation of the Hox cluster and the ParaHox cluster were a result of intrachromosomal duplication despite it was initially thought to be interchromosomal.[8]
The Fisher Model was proposed in 1930 by Ronald Fisher. Under the Fisher Model, gene clusters are a result of two alleles working well with one another. In other words, gene clusters may exhibit co-adaptation.[3] The frequency of errors in recombination is reduced the closer the genes of the co-adapted alleles are located. The likelihood of recombination during meiosis increases as the physical proximity of one gene becomes closer to another gene.[2] Genes that are located close to one another on the same chromosome are said to be linked genes. As the number of recombination events are increased and the number of errors in this event are reduced, the frequency of linked genes is increased as a result of selection for specific genotypes. For example, Genes "A" and "a" are found at the same loci while Genes "B" and "b" are found at the same loci as one another. Natural selection would favor the genotypes "AB" and "ab" because these genes reflect co-adaptation. Genotypes "Ab" and "aB" would not be favored because they are recombinants.[3] This model contends that gene clusters will be favored if their existence reduces harmful recombination events.[2]
In order for selection to occur under this model, two conditions must be met: (1) substantive genetic variation must be demonstrated at the chromosomal position of the gene (loci) and (2) recombination must occur frequently. The first condition must be met to ensure that genotypes with co-adapted alleles (i.e. "Ab" and "aB") arise while the second condition must be met to ensure the genotypes are regularly distributed. The Fisher Model may be applied to only eukaryotic gene clusters; however, few functionally related genes are composed in a eukaryotic gene cluster. Meiosis and sexual reproduction provide frequent recombination. Bacteria undergo asexual reproduction and thus exhibit a low frequency of recombination.[3] The prokaryotic genome does not exhibit co-adapted alleles and does not meet the requirements to be selected for via natural selection as proposed by this model.[2] Due to these restraints, the Fisher Model was considered unlikely and later dismissed as an explanation for gene cluster formation.[2][3]
Under the coregulation model, genes are organized into clusters as a benefit to the organism. This model is based off the existence of operons. An operon is a gene cluster that contains functionally related genes that are controlled by a single promoter and a single operator. An example of an operon is the lac operon. Operons exhibit coordinated gene expression. That is, all genes within the cluster are either simultaneously expressed or repressed by one promoter. Regulation of the genes in the cluster are controlled by the operator.[3] Coordinated gene expression was considered to be the most common mechanism driving the formation of gene clusters.[1]
The typical eukaryotic gene was thought to be randomly distributed in the eukaryotic genome and independently expressed from its neighbor; however, evidence has been found that eukaryotic genes are not only regulated at the individual level by promoter sequences and transcription factors but by the gene's location within the genome. As a result, genes are nonrandomly distributed within genomes when two or more genes share similar expression levels.[9] These co-expressed genes tend to be found in a gene cluster in which the individual genes generally share a similar function, such as a metabolic pathway, and are regulated by a single promoter. The cluster is composed of genes, specific to a particular metabolic pathway, spanning a large portion of the genome.[1]
The Coregulation Model is unlikely as coregulation and thus coordinated gene expression cannot drive the formation of gene clusters.[3] In order for an operon to be formed under the Coregulation Model, there must be a strong selection rate for a functionally related gene to be placed next to another functionally related gene during chromosomal rearrangement. This is a rare event in rearrangement as inversions, duplications, and deletions account for rearranging genes. Inversions are considered a rare event. Although duplications do occur, they are unstable. Unlike duplications, deletions are stable and permanent, potentially eliminating critical components of DNA that make the gene highly favorable during natural selection. The complicated intermediate steps involved in operon formation under this model would occur prior to chromosomal rearrangement.[2][3]
The Molarity Model considers the constraints of cell size. Transcribing and translating genes together is beneficial to the cell.[10] thus the formation of clustered genes generates a high local concentration of cytoplasmic protein products. Spatial segregation of protein products has been observed in bacteria; however, the Molarity Model does not consider co-transcription or distribution of genes found within an operon [2]
The Selfish Operon Model contends that the formation of clustered genes is a benefit to the individual gene and not to the host. As its name suggests, this model is specific to operon formation. In bacterial genomes, genetic information is primarily passed through vertical transfer, or commonly known as reproduction. During bacterial cell division, genetic information is passed from the parent to daughter cell. Although reproduction is the most common form, horizontal transfer may also pass genetic information. Independent from reproduction, horizontal transfer passes genetic information from one organism to another. Horizontal gene transfer is considered an isolated event when it occurs in eukaryotic organisms; however, horizontal gene transfer occurs frequently in prokaryotes.[3]
Gene clusters may occur as a result of reproduction or horizontal gene transfer; however, unclustered genes may only be passed via reproduction. In order for horizontal gene transfer to occur, genes must be clustered. Individual genes are not favored during natural selection. In order for their selection and thus their formation of an operon, functionally related genes must be clustered. Once the horizontal transfer occurs, an operon will be formed.[2]
Repeated genes can occur in two major patterns: gene clusters and tandem repeats, or formerly called tandemly arrayed genes. Although similar, gene clusters and tandemly arrayed genes may be distinguished from one another.
Gene clusters are found to be close to one another when observed on the same chromosome. They are dispersed randomly; however, gene clusters are normally within, at most, a few thousand bases of each other. The distance between each gene in the gene cluster can vary. The DNA found between each repeated gene in the gene cluster is non-conserved.[11] Portions of the DNA sequence of a gene is found to be identical in genes contained in a gene cluster.[5] Gene conversion is the only method in which gene clusters may become homogenized. Although the size of a gene cluster may vary, it rarely comprises more than 50 genes, making clusters stable in number. Gene clusters change over a long evolutionary time period, which does not result in genetic complexity.[11]
Tandem arrays are a group of genes with the same or similar function that are repeated consecutively without space between each gene. The genes are organized in the same orientation.[11] Unlike gene clusters, tandemly arrayed genes are found to consist of consecutive, identical repeats, separated only by a nontranscribed spacer region.[12] While the genes contained in a gene cluster encode for similar proteins, identical proteins or functional RNAs are encoded by tandemly arrayed genes. Unequal recombination, which changes the number of repeats by placing duplicated genes next to the original gene. Unlike gene clusters, tandemly arrayed genes rapidly change in response to the needs of the environment, causing an increase in genetic complexity.[12]
Gene conversion allows tandemly arrayed genes to become homogenized, or identical.[12] Gene conversion may be allelic or ectopic. Allelic gene conversion occurs when one allele of a gene is converted to the other allele as a result of mismatch base pairing during meiosis homologous recombination.[13] Ectopic gene conversion occurs when one homologous DNA sequence is replaced by another. Ectopic gene conversion is the driving force for concerted evolution of gene families.[14]
Tandemly arrayed genes are essential to maintaining large gene families, such as ribosomal RNA. In the eukaryotic genome, tandemly arrayed genes make up ribosomal RNA. Tandemly repeated rRNAs are essential to maintain the RNA transcript. One RNA gene may not be able to provide a sufficient amount of RNA. In this situation, tandem repeats of the gene allow a sufficient amount of RNA to be provided. For example, human embryonic cells contain 5-10 million ribosomes and double in number within 24 hours. In order to provide a substantive amount of ribosomes, multiple RNA polymerases must consecutively transcribe multiple rRNA genes.[12]
Gene clusters may be similar to an operon in which all genes are controlled by a single promoter and operator. All genes are transcribed simultaneously. In the case of bacterial operons, genes are transcribed as a polycistronic messenger RNA. Operon-like gene clusters are primarily, but not exclusively, formed by horizontal gene transfer in prokaryotes. This type of gene cluster has been observed in the bacterium Escherichia coli.[15] The lac operon of Escherichia coli is the most well-studied operon-like gene cluster.[16]
The lac operon is required for the metabolism of lactose in Escherichia coli as well as several other bacteria. It is composed of three genes: lacZ, lacY, and lacA. Each gene encodes for an enzyme that plays a role in lactose metabolism. LacZ encodes for β-galactosidase while lacY and lacA respectively encode for galactose permease and thiogalactoside tranacetylase. One polycistronic mRNA is transcribed and produces multiple polypeptide chains from one mRNA. That is, one translation event results in three polypeptide chains, one for each gene of the lac operon [17]
Although operon-like gene clusters are more common in prokaryotes, they have been observed in the nematodeCaenorhabditis elegans[1] as well as the tunicate Ciona intestinalis.[15] These eukaryotic organisms are thought to exhibit the most characteristics of a true operon.[1] Eukaryotic operons were first discovered in 1993 while investigating the nematode Caenorhabditis elegans. These operons were found to produce polycistronic pre-mRNAs. The polycistronic mRNA is processed to produce a monocistronic mature mRNAs which will then form only one mature RNA. Primitive chordates have also exhibited these types of gene clusters.[18]
Gene clusters have also been observed in eukaryotic organisms, such as yeast, fungi, insects, vertebrates, and plants. A variety of well-known gene clusters, such as the clusters DAL and GAL, are exhibited in yeast.[1] Filamentous fungal gene clusters play a key role in the biosynthesis of primary or secondary metabolites.[15] Metabolic pathway gene clusters vastly differ from the structure of operon-like gene clusters.[1] In general, eukaryotic gene clusters greatly differ from prokaryotic gene clusters. While prokaryotic gene clusters are thought to form as a result of horizontal gene transfer, this mechanism is highly unlikely in eukaryotes. Despite the isolated observations of fungal gene clusters arising as a result of horizontal gene transfer the messenger RNA of eukaryotic gene clusters is transcribed as an independent, or monocistronic, messenger RNA.[15]
While insects and plants are eukaryotic members, some of these organisms have exhibited gene clusters similar to bacterial operons in that they produce polycistronic pre-mRNAs that result in multiple polypeptides.[18]
Gene expression is critical to understanding gene function and networks of genes. Furthermore, it aids in the study of diseases as well as their treatment. The essential first step in analyzing gene expression is detecting the presence of gene clusters.[19] The use of bioinformatic tools and techniques can help identify gene clusters in organisms. Searching a genome (or a section of a genome) for gene clusters can be based on sequence similarity or functional similarity.[20] Data from gene expression experiments is typically presented in a matrix in which the rows correlate to genes and columns correlate to conditions or time. The data found within the matrix demonstrates the level of expression for each gene specific for a type of condition or length of time.[19]
High-density microarrays allow researchers to view transcription levels and the expression of specific genes during various stages as well as various situations. Understanding the expression of particular genes during developmental stages allows a more thorough study of diseases as well as their response to treatment options. Currently, two types of microarrays exist that produce data of gene expression on a large scale. In each microarray, hybridizations between probes and targets are performed on DNA chips.[19]
Detecting gene clusters presents an algorithmic problem. Elements (or genes) and a characteristic vector for an element (a gene's expression pattern) make up a clustering problem. Similarity, which is problem dependent, is measured between two vectors. The elements are separated into subsets, or clusters, satisfying homogeneity and separation. Homogeneity is defined as all genes within a cluster are highly similar to one another. In contrast, separation is defined as genes found in different clusters that exhibit low similarity to one another. Homogeneity and separation share an inverse relationship. That is, the greater the homogeneity of elements, the poorer the separation of elements and vice versa. A variety of programs exist in the bioinformatics field which allows for easy analysis of gene cluster problems.[19]
Agglomerative hierarchical clustering is the oldest and most popular algorithm used for detecting gene clusters. A dendogram is typically representative of an agglomerative hierarchical cluster. In a top-down or bottom-up manner, elements are repeatedly partitioned until one cluster which encompasses all elements. Clusters are merged until all elements are found in the same cluster. Eisen et al developed the software program Cluster based on this algorithm. Its viewing program is TreeView.[19]
Self-organizing maps (SOM) assumes that all gene clusters are known. A two-dimensional grid is composed of sets of nodes which are representative of each cluster. Each reference vector correlates to one node. Movement of the reference vectors is dependent on the input vector, which directs the reference vector movement toward dense areas found in the input vectors space. Tamayo et al developed GeneCluster, a software program based on the SOM algorithm.[19]
Cluster Identification via Connectivity Kernels (CLICK) assumes that pairwise similarity values between mates are normally distributed. CLICK uses a theoretical graph. A weighted similarity graph is generated from input data. Before a partition occurs, the algorithm assesses whether any subset of the elements in the weighted similarity graph are kernels, which is the basis for clusters. If a kernel exists, then the data is not partitioned further; however, if not, the data is partitioned into a list of kernels and a set of single vertices. The similarities between a single vertices' fingerprint and the fingerprint of a cluster are calculated. Two kernels who share the highest similarity are merged; however, they are only merged if the similarity exceeds a predetermined threshold. Standard error bars and expression patterns are exhibited for each cluster upon visualization.[19]
The majority of clustering software clusters genes that exhibit a positive correlation among expression patterns; however, genes that are anti-correlated may still be functionally similar and thus exhibit a gene cluster. Genes found in a cellular pathway are anticipated to be positively correlated. Genes that repress other genes in the same pathway are expected to be anti-correlated, or negatively correlated. The diametrical clustering algorithm was developed to specifically cluster negatively correlated clusters. Diametrical clustering repeatedly re-partitions genes while repeatedly calculated the dominant singular vector of each cluster. Each dominant singular vector serves as a model of a diametric cluster.[21]
Expressed sequence tags (ESTs) are used to discover genes and analyze gene expression; however, encoded genes are typically not identified from ESTs. ESTs are composed of large, repeated, partial transcript sequences often exhibiting chimaerism and vector adaptor contamination. TIGR Gene Indices Clustering tools (TGICL) is a software system specifically designed to provide fast, efficient clustering of large ESTs.[22]
Common genomic analysis tools like BLAST can be used to find similar sequences throughout the genome. A program called DAVID (Database for Annotation, Visualization, and Integrated Discovery) can be applied to find functionally similar genes across the genome, once a gene of interest has been identified.[20][23]
Bioinformatics and computational biology have revealed insights about gene clustering. Researchers using genome-wide association studies (GWAS) to analyze metabolic pathways have discovered that genes within a pathway are closer together in the genome than unrelated/randomly-dispersed genes are. The computational analysis of these metabolic pathways in eukaryotes showed that gene clusters are relatively common.[24][25]
Another important observation was gleaned from these computational studies. Not all of the genes in a cluster are part of the same metabolic pathway. In other words, these clusters are not exclusive. Rather, they include genes of related function that are located closer to each other than randomly dispersed genes,[1] likely derived from gene duplications and divergence over time.
Many genetic diseases are complex in their etiology and caused by a combination of multiple genetic abnormalities. Research involving gene clusters may be beneficial in discerning the basis of a particular genetic disease. Genes that are grouped together by function or by proximity may be implicated in the cause of the same genetic disease. Thus, identifying and studying gene clusters can have an impact on future medical applications. For example, gene clusters may play a role in tumor development. A particular region of the human genome (3p21.3 region) is especially susceptible to deletion event and such events in this region are found commonly in lung and breast cancers.[26] The region contains a gene cluster consisting of RASSF1 and at least 9 other genes. RASSF1 is a known tumor suppressing gene, and is typically associated with the pathogenesis of cancers.[27][28] However, research has shown that other genes in this cluster may also be involved. This region experiences frequent methylation and acetylation, and studies have shown that abnormalities (via epigenetic modifications) in other genes in the cluster occur in cancer patients. Research on gene clusters could reveal other genes, pathways, and interactions involved in genetic diseases and provide targets for drug development.
全文を閲覧するには購読必要です。 To read the full text you will need to subscribe.
リンク元 | 「遺伝子群」「multigene family」「遺伝子集団」「遺伝子クラスター」「gene pool」 |
関連記事 | 「cluster」「clustering」 |
.