WordNet

the ordering of genes in a haploid set of chromosomes of a particular organism; the full DNA sequence of an organism; "the human genome contains approximately three billion chemical base pairs"
the act of adding notes (同)annotating

PrepTutorEJDIC

注釈,注

Wikipedia preview

出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2015/03/16 17:24:26」(JST)

wiki en

The Genome sequence when printed fills a huge book of close print

Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus) and to annotate protein-coding genes and other important genome-encoded features.^[1] The genome sequence of an organism includes the collective DNA sequences of each chromosome in the organism. For a bacterium containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.

The Human Genome Project was a landmark genome project that is already having a major impact on research across the life sciences, with potential for spurring numerous medical and commercial developments.^[2]

Genome assembly

Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source (usually a single organism, anything from a bacterium to a mammal) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 1000 nucleotides or bases at a time. (The four bases are adenine, guanine, cytosine, and thymine, represented as AGCT.) A genome assembly algorithm works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged, and the process continues.

Genome assembly is a very difficult computational problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and some occur in thousands of different locations, especially in the large genomes of plants and animals.

The resulting (draft) genome sequence is produced by combining the information sequenced contigs and then employing linking information to create scaffolds. Scaffolds are positioned along the physical map of the chromosomes creating a "golden path".

Assembly software

Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such assembler Short Oligonucleotide Analysis Package developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.^[3]^[4]^[5]

Genome annotation

Genome annotation is the process of attaching biological information to sequences.^[6] It consists of three main steps:

identifying portions of the genome that do not code for proteins
identifying elements on the genome, a process called gene prediction, and
attaching biological information to these elements.

Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

The basic level of annotation is using BLAST for finding similarities, and then annotating genomes based on that.^[1] However, nowadays more and more additional information is added to the annotation platform. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g. Ensembl) rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline.^[7]

Structural annotation consists of the identification of genomic elements.

ORFs and their localisation
gene structure
coding regions
location of regulatory motifs

Functional annotation consists of attaching biological information to genomic elements.

biochemical function
biological function
involved regulation and interactions
expression

These steps may involve both biological experiments and in silico analysis. Proteogenomics based approaches utilize information from expressed proteins, often derived from mass spectrometry, to improve genomics annotations.^[8]

A variety of software tools have been developed to permit scientists to view and share genome annotations.^{[citation needed]}

Genome annotation remains a major challenge for scientists investigating the human genome, now that the genome sequences of more than a thousand human individuals and several model organisms are largely complete.^[9]^[10] Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism.^[1] Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together".^[11]

Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:

Encyclopedia of DNA elements (ENCODE)
Entrez Gene
Ensembl
GENCODE
Gene Ontology Consortium
GeneRIF
RefSeq
Uniprot
Vertebrate and Genome Annotation Project (Vega)

At Wikipedia, genome annotation has started to become automated under the auspices of the Gene Wiki portal which operates a bot that harvests gene data from research databases and creates gene stubs on that basis.^[12]

When is a genome project finished?

When sequencing a genome, there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every base pair of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes.

It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans, where coding DNA may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the coding regions separately. Also, as scientists understand more about the role of this noncoding DNA (often referred to as junk DNA), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism.

In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include gene prediction to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs or mRNAs to help find out where the genes actually are.

Historical and technological perspectives

Historically, when sequencing eukaryotic genomes (such as the worm Caenorhabditis elegans) it was common to first map the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be 'shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach).

Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair) and newer technology has also meant that genomes can be sequenced far more quickly.

When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as model organism or have a relevance to human health (e.g. pathogenic bacteria or vectors of disease such as mosquitos) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (e.g. the common chimpanzee).

In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity.

Example genome projects

L1 Dominette 01449, the Hereford who serves as the subject of the Bovine Genome Project

Many organisms have genome projects that have either been completed or will be completed shortly, including:

Humans, Homo sapiens; see Human genome project
Palaeo-Eskimo,^[4] an ancient-human
Neanderthal, "Homo neanderthalensis" (partial); see Neanderthal Genome Project
Common Chimpanzee Pan troglodytes; see Chimpanzee Genome Project
Domestic Cow^[13]^[14]
Bovine Genome
Honey Bee Genome Sequencing Consortium
Horse genome^[15]
Human microbiome project
International Grape Genome Program
International HapMap Project
Tomato 150+ genome resequencing project
100K Genome Project
Genomics England

References

^ ^a ^b ^c Pevsner, Jonathan (2009). Bioinformatics and functional genomics (2nd ed ed.). Hoboken, N.J: Wiley-Blackwell. ISBN 9780470085851.
^ "Potential Benefits of Human Genome Project Research". Department of Energy, Human Genome Project Information. 2009-10-09. Retrieved 2010-06-18.
^ Li, Ruiqiang; Hongmei Zhu, Jue Ruan, Wubin Qian, Xiaodong Fang, Zhongbin Shi, Yingrui Li, Shengting Li, Gao Shan, Karsten Kristiansen, Songgang Li, Huanming Yang, Jian Wang, Jun Wang (February 2010). "De novo assembly of human genomes with massively parallel short read sequencing". Genome Research 20 (2): 265–272. doi:10.1101/gr.097261.109. ISSN 1549-5469. PMC 2813482. PMID 20019144.
^ ^a ^b Rasmussen, Morten; Yingrui Li, Stinus Lindgreen, Jakob Skou Pedersen, Anders Albrechtsen, Ida Moltke, Mait Metspalu, Ene Metspalu, Toomas Kivisild, Ramneek Gupta, Marcelo Bertalan, Kasper Nielsen, M Thomas P Gilbert, Yong Wang, Maanasa Raghavan, Paula F Campos, Hanne Munkholm Kamp, Andrew S Wilson, Andrew Gledhill, Silvana Tridico, Michael Bunce, Eline D Lorenzen, Jonas Binladen, Xiaosen Guo, Jing Zhao, Xiuqing Zhang, Hao Zhang, Zhuo Li, Minfeng Chen, Ludovic Orlando, Karsten Kristiansen, Mads Bak, Niels Tommerup, Christian Bendixen, Tracey L Pierre, Bjarne Grønnow, Morten Meldgaard, Claus Andreasen, Sardana A Fedorova, Ludmila P Osipova, Thomas F G Higham, Christopher Bronk Ramsey, Thomas V O Hansen, Finn C Nielsen, Michael H Crawford, Søren Brunak, Thomas Sicheritz-Pontén, Richard Villems, Rasmus Nielsen, Anders Krogh, Jun Wang, Eske Willerslev (2010-02-11). "Ancient human genome sequence of an extinct Palaeo-Eskimo". Nature 463 (7282): 757–762. doi:10.1038/nature08835. ISSN 1476-4687. PMC 3951495. PMID 20148029.
^ Wang, Jun; Wei Wang, Ruiqiang Li, Yingrui Li, Geng Tian, Laurie Goodman, Wei Fan, Junqing Zhang, Jun Li, Juanbin Zhang, Yiran Guo, Binxiao Feng, Heng Li, Yao Lu, Xiaodong Fang, Huiqing Liang, Zhenglin Du, Dong Li, Yiqing Zhao, Yujie Hu, Zhenzhen Yang, Hancheng Zheng, Ines Hellmann, Michael Inouye, John Pool, Xin Yi, Jing Zhao, Jinjie Duan, Yan Zhou, Junjie Qin, Lijia Ma, Guoqing Li, Zhentao Yang, Guojie Zhang, Bin Yang, Chang Yu, Fang Liang, Wenjie Li, Shaochuan Li, Dawei Li, Peixiang Ni, Jue Ruan, Qibin Li, Hongmei Zhu, Dongyuan Liu, Zhike Lu, Ning Li, Guangwu Guo, Jianguo Zhang, Jia Ye, Lin Fang, Qin Hao, Quan Chen, Yu Liang, Yeyang Su, A. san, Cuo Ping, Shuang Yang, Fang Chen, Li Li, Ke Zhou, Hongkun Zheng, Yuanyuan Ren, Ling Yang, Yang Gao, Guohua Yang, Zhuo Li, Xiaoli Feng, Karsten Kristiansen, Gane Ka-Shu Wong, Rasmus Nielsen, Richard Durbin, Lars Bolund, Xiuqing Zhang, Songgang Li, Huanming Yang, Jian Wang (2008-11-06). "The diploid genome sequence of an Asian individual". Nature 456 (7218): 60–65. doi:10.1038/nature07484. ISSN 0028-0836. PMC 2716080. PMID 18987735. Retrieved 2012-12-22.
^ Stein, L. (2001). "Genome annotation: from sequence to biology". Nature Reviews Genetics 2 (7): 493–503. doi:10.1038/35080529. PMID 11433356.
^ "Ensembl's genome annotation pipeline online documentation".
^ Gupta, Nitin; Stephen Tanner, Navdeep Jaitly, Joshua N Adkins, Mary Lipton, Robert Edwards, Margaret Romine, Andrei Osterman, Vineet Bafna, Richard D Smith, Pavel A Pevzner (September 2007). "Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation". Genome Research 17 (9): 1362–1377. doi:10.1101/gr.6427907. ISSN 1088-9051. PMC 1950905. PMID 17690205.
^ ENCODE Project Consortium (2011). Becker PB, ed. "A User's Guide to the Encyclopedia of DNA Elements (ENCODE)". PLoS Biology 9 (4): e1001046. doi:10.1371/journal.pbio.1001046. PMC 3079585. PMID 21526222. edit
^ McVean, G. A.; Abecasis, D. M.; Auton, R. M.; Brooks, G. A. R.; Depristo, D. R.; Durbin, A.; Handsaker, A. G.; Kang, P.; Marth, E. E.; McVean, P.; Gabriel, S. B.; Gibbs, R. A.; Green, E. D.; Hurles, M. E.; Knoppers, B. M.; Korbel, J. O.; Lander, E. S.; Lee, C.; Lehrach, H.; Mardis, E. R.; Marth, G. T.; McVean, G. A.; Nickerson, D. A.; Schmidt, J. P.; Sherry, S. T.; Wang, J.; Wilson, R. K.; Gibbs (Principal Investigator), R. A.; Dinh, H.; Kovar, C. (2012). "An integrated map of genetic variation from 1,092 human genomes". Nature 491 (7422): 56–65. doi:10.1038/nature11632. PMC 3498066. PMID 23128226. edit
^ Dunham, I.; Bernstein, A.; Birney, S. F.; Dunham, P. J.; Green, C. A.; Gunter, F.; Snyder, C. B.; Frietze, S.; Harrow, J.; Kaul, R.; Khatun, J.; Lajoie, B. R.; Landt, S. G.; Lee, B. K.; Pauli, F.; Rosenbloom, K. R.; Sabo, P.; Safi, A.; Sanyal, A.; Shoresh, N.; Simon, J. M.; Song, L.; Trinklein, N. D.; Altshuler, R. C.; Birney, E.; Brown, J. B.; Cheng, C.; Djebali, S.; Dong, X.; Dunham, I. (2012). "An integrated encyclopedia of DNA elements in the human genome". Nature 489 (7414): 57–74. doi:10.1038/nature11247. PMC 3439153. PMID 22955616. edit
^ Huss, Jon W.; Orozco, C; Goodale, J; Wu, C; Batalov, S; Vickers, TJ; Valafar, F; Su, AI (2008). "A Gene Wiki for Community Annotation of Gene Function". PLoS Biology 6 (7): e175. doi:10.1371/journal.pbio.0060175. PMC 2443188. PMID 18613750.
^ Yates, Diana (2009-04-23). "What makes a cow a cow? Genome sequence sheds light on ruminant evolution" (Press Release). EurekAlert!. Retrieved 2012-12-22.
^ Elsik, C. G.; Elsik, R. L.; Tellam, K. C.; Worley, R. A.; Gibbs, D. M.; Muzny, G. M.; Weinstock, D. L.; Adelson, E. E.; Eichler, L.; Elnitski, R.; Guigó, D. L.; Hamernik, S. M.; Kappes, H. A.; Lewin, D. J.; Lynn, F. W.; Nicholas, A.; Reymond, M.; Rijnkels, L. C.; Skow, E. M.; Zdobnov, L.; Schook, J.; Womack, T.; Alioto, S. E.; Antonarakis, A.; Astashyn, C. E.; Chapple, H. -C.; Chen, J.; Chrast, F.; Câmara, O.; Ermolaeva, C. N. (2009). "The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution". Science 324 (5926): 522–528. doi:10.1126/science.1169588. PMC 2943200. PMID 19390049. edit
^ http://www.genome.gov/20519480

External links

The Wikibook Next Generation Sequencing (NGS) has a page on the topic of: De_novo_assembly

GOLD:Genomes OnLine Database
Genome Project Database
The Protein Naming Utility
SUPERFAMILY
The sea urchin genome database
NRCPB.

UpToDate Contents

全文を閲覧するには購読必要です。 To read the full text you will need to subscribe.

1. ゲノム疾患：概要 genomic disorders an overview
2. 遺伝学およびゲノミクスのためのツール：モデルシステム tools for genetics and genomics model systems
3. Tools for genetics and genomics: Cytogenetics and molecular genetics
4. オーダーメイド医療 personalized medicine
5. 小頭症に対する臨床遺伝学的アプローチ microcephaly a clinical genetics approach

English Journal

Transcriptome analysis of Capsicum annuum varieties Mandarin and Blackcluster: Assembly, annotation and molecular marker discovery.

Ahn YK, Tripathi S, Kim JH, Cho YI, Lee HE, Kim DS, Woo JG, Cho MC.SourceVegetable Research Division, National Institute of Horticultural & Herbal Science, Rural Development Administration, Suwon 440-706, Republic of Korea. Electronic address: aykyun@korea.kr.
Gene.Gene.2014 Jan 10;533(2):494-9. doi: 10.1016/j.gene.2013.09.095. Epub 2013 Oct 11.
Next generation sequencing technologies have proven to be a rapid and cost-effective means to assemble and characterize gene content and identify molecular markers in various organisms. Pepper (Capsicum annuum L., Solanaceae) is a major staple vegetable crop, which is economically important and has
PMID 24125952

Genome-wide analysis of the heat stress response in Zebu (Sahiwal) cattle.

Mehla K, Magotra A, Choudhary J, Singh AK, Mohanty AK, Upadhyay RC, Srinivasan S, Gupta P, Choudhary N, Antony B, Khan F.SourceDairy Cattle Physiology Division, National Dairy Research Institute, Karnal 132001 (Haryana), India. Electronic address: mehla.kusum@gmail.com.
Gene.Gene.2014 Jan 10;533(2):500-7. doi: 10.1016/j.gene.2013.09.051. Epub 2013 Sep 27.
Environmental-induced hyperthermia compromises animal production with drastic economic consequences to global animal agriculture and jeopardizes animal welfare. Heat stress is a major stressor that occurs as a result of an imbalance between heat production within the body and its dissipation and it
PMID 24080481

Insights into significant pathways and gene interaction networks underlying breast cancer cell line MCF-7 treated with 17β-Estradiol (E2).

Huan J, Wang L, Xing L, Qin X, Feng L, Pan X, Zhu L.SourceDepartment of General Surgery, The Eighth People's Hospital of Shanghai, Shanghai 200235, China. Electronic address: huanjl2013@126.com.
Gene.Gene.2014 Jan 1;533(1):346-55. doi: 10.1016/j.gene.2013.08.027. Epub 2013 Aug 23.
OBJECTIVE: Estrogens are known to regulate the proliferation of breast cancer cells and to alter their cytoarchitectural and phenotypic properties, but the gene networks and pathways by which estrogenic hormones regulate these events are only partially understood.METHODS: We used global gene express
PMID 23978611

Japanese Journal

Exploring the Potential Role of Rosmarinic Acid in Neuronal Differentiation of Human Amnion Epithelial Cells by Microarray Gene Expression Profiling

大河内信弘,礒田博子,Farhana Ferdousi,Kazunori Sasaki,Yoshiaki Uchida,Nobuhiro OHKOHCHI,Yun-Wen Zheng,Hiroko ISODA
Frontiers in Neuroscience (13), 779, 2019-07
… Gene set enrichment analysis, and gene annotation and pathway analysis were conducted using online data mining tools GSEA and DAVID. … Findings from our genome-wide analysis could provide a foundation for further in-depth investigation. …
NAID 120006768224

A chromosome-scale genome assembly and dense genetic map for Xenopus tropicalis

Therese Mitros,Jessica B. Lyons,Adam M. Session,Jerry Jenkins,Shengquiang Shu,Taejoon Kwon,Maura Lane,Connie Ng,Timothy C. Grammer,Mustafa K. Khokha,Jane Grimwood,Jeremy Schmutz,Richard M. Harland,Daniel S. Rokhsar
Developmental Biology 452(1), 8-20, 2019-04-10
… tropicalis genome, improving the previously published draft genome assembly through the use of new assembly algorithms, additional sequence data, and the addition of a dense genetic map. … The improved genome enables the mapping of specific traits (e.g., the sex locus or Mendelian mutants) and the characterization of chromosome-scale synteny with other tetrapods. …
NAID 120006766017

Draft genome of the brown alga, Nemacystus decipiens, Onna-1 strain: Fusion of genes involved in the sulfated fucan biosynthesis pathway

Koki Nishitsuji,Asuka Arimoto,Yoshimi Higa,Munekazu Mekaru,Mayumi Kawamitsu,Noriyuki Satoh,Eiichi Shoguchi
Scientific Reports 9(1), 4607, 2019-03-14
… To facilitate brown algal studies, we decoded the ~154 Mbp draft genome of N. … The genome is estimated to contain 15,156 protein-coding genes, ~78% of which are substantiated by corresponding mRNAs. … Gene ontology annotation showed more than half of these are classified as molecular function, enzymatic activity, and/or biological process. …
NAID 120006652935

「genome」

　　[★] <additional_journal_mednt> <status>20220</status> <pubmed_ins date=20110329>

Production and characterization of Acidothermus cellulolyticus endoglucanase in Pichia pastoris.

Lindenmuth BE, McDonald KA.
Protein expression and purification.2011 Jun;77(2):153-8. Epub 2011 Jan 22.
PMID 21262363

Differential activities of the Drosophila JAK/STAT pathway ligands Upd, Upd2 and Upd3.

Wright VM, Vogt KL, Smythe E, Zeidler MP.
Cellular signalling.2011 May;23(5):920-7. Epub 2011 Jan 22.
PMID 21262354

Adaptive haemoglobin gene control in Daphnia pulex at different oxygen and temperature conditions.

Gerke P, Börding C, Zeis B, Paul RJ.
Comparative biochemistry and physiology. Part A, Molecular & integrative physiology.2011 May;159(1):56-65. Epub 2011 Jan 31.
PMID 21281731

</pubmed_ins> <CiNii_ins></CiNii_ins> <yahoo_ins date=20110329> [display]http://en.wikipedia.org/wiki/Genome

Genome - Wikipedia, the free encyclopedia

In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding ...

[display]http://r.gnavi.co.jp/k506200

ぐるなび - 心斎橋個室ダイニング Genome．

心斎橋個室ダイニング Genome．(大阪府/地下鉄御堂筋線心斎橋駅、誕生日・大人の空間 )の店舗情報です。ぐるなびなら詳細なメニューの情報や地図など、「心斎橋個室ダイニング Genome．」の情報が満載です。【心斎橋】イタリアン部門アクセスランキング ...

[display]http://pubs.nrc-cnrc.gc.ca/cgi-bin/rp/rp2_desc_e?gen

Genome - NRC Research Press

3 Mar 2011 ... P. Donini (Philip Morris Products S.A., Switzerland), Cereal genetics, plant molecular genetics, plant molecular breeding, biodiversity, genome evolution, classical and molecular cytogenetics ...

</yahoo_ins> <yahoo_pic_ins date=20110329> </yahoo_pic_ins> <tenpu_ins></tenpu_ins> </additional_journal_mednt>