Genomics
Chapter 9 in Modern
Genetic Analysis, 2nd Edition
What is Genomics?
-
Study of complete set of genes
-
Global vs local
-
Genome Projects (Table)
-
Structural vs. Functional genomics
Steps in Whole Genome
Mapping:
- High
resolution genetic mapping
- Physical
mapping
- DNA
sequencing
Applications of the Complete DNA
Sequence
- Functional
Genomics
- Bioinformatics
High Resolution
Genetic Mapping
- used to place molecularly defined
differences on linkage maps or
cytogenetic maps
- provide molecular landmarks for
building higher resolution physical
and sequencing maps
- builds on low-resolution genetic maps
DNA polymorphisms: molecularly
defined differences between individuals
Mapping
Techniques Used to Determine Position of DNA Marker on a Chromosome:
1)
Meiotic Recombination Maps
- based on analyzing recombinant
frequency in dihybrid and multihybrid
crosses
- done more easily in Saccharomyces
cerevisiae, Ceanorhabiditis
elegans, Drosophila melanogaster etc. due to ease of controlled
experimental crosses
- Not as easy for humans since
crosses are harder to obtain and progeny
sizes are too small
- measurements of loci with known
phenotypic effect showed intervals in
between genes which contain vast amounts of DNA
- To fill in these gaps, various
kinds of polymorphic
DNA markers need
to be exploited
Examples of
polymorphic DNA markers (RFLPs, SSLPs, and RAPDs):
a) RFLPs
- Restriction
Fragment Length Polymorphism
- Restriction enzyme recognition
sites present in some strains but not in others (presence/absence)
b)
SSLPs
- Simple
Sequence Length Polymorphisms
- Advantages over RFLPs include
larger number of alleles as well as
higher levels of heterozygosity
Types of SSLPs:
i) Minisatellite
markers
- based on VNTRs: Variable Number of Tandem Repeats
- VNTRs are 1 - 5 kb sequences of
repeating units that are 15 - 100
nucleotides long
- When genome cut by restriction enzymes, and there are no
recognition
sites located within VNTRs, Southern blot analysis reveals a large
number of different sized fragments bound by VNTR probe.
- Due to high variability in number of tandem repeats from
person to
person, set of fragments revealed is highly individualistic.
- Often called DNA
fingerprints
ii) Microsatellite markers
- based
on variable number of dinucleotides repeated in tandem
- CA
with complementary GT most common
- Use PCR to make probes for DNA flanking marker.
First digest the
DNA with a restriction enzyme such as Alu1.
Clone the fragments into a sequencing vector and then identify those
containing the CA/GT repeats with a CA/GT probe. Sequence these
vectors and create PCR primer pairs. These primers are designed
to recognize single copy DNA sequences flanking the marker. Use
these primers with genomic DNA in PCR
amplification. Gel
electrophoresis can be used to determine size differences.
c) RAPDs
- Randomly Amplified Polymorphic DNA
- based
on random PCR amplification
- A
single PCR primer is designed at random and used to amplify
different regions of the genome.
Also SNPs should be mentioned.
- SNPs
are Single Nucleotide Polymorphisms
- many
due to neutral variation, such as third codon position
- There
is one SNP difference every 1000 base pairs of human
DNA. Since the human genome is about 3 billion base pairs
long there are 3 million differences between any two of our
genomes. A very rich source of variation!
2)
Cytogenetic Maps
- produced
by relating locations of DNA markers to cytogenetic landmarks
such as chromosome bands and puffs
Ways to do this:
a) In Situ Hybridization Mapping
- if a
cloned DNA sequence is available for area of interest, label it
as a probe and use it to hybridize to chromosomes in situ
- individual
chromosomes are recognizable through morphology differences
such as size, banding pattern, centromere location
- map
the probe sequence to approximate position on chromosome
- labels used for probes: FISH - Fluorescent In Situ
Hybridization
b) Rearrangement
Breakpoint Mapping
- based
on using DNA breakpoints as molecular landmarks
- when
cloned DNA spanning a breakpoint has been identified, breakpoints
are easily detected on Southern blots as two bands of hybridization,
while in normal chromosomes you would only see one band
c) Radiation
Hybrid Mapping
- does
not require marker heterozygosity
- irridate
cells with Xray fragments to break up the chromosomes (human)
- insert
these fragments into rodent cells
- fragmented
human chromosomes fuse to chromosomes of rodent cells
- create
a series of clones each containing a different random
assortment of fragments of human chromosomes
- isolate
and denature DNA from each cell line
- introduce
a labeled human DNA probe to identify positions of human DNA
homologous to probe
- analyze
the data from many probe hybridizations to determine
co-retention of DNA markers
- co-retention
of different human markers allows high-resolution mapping
of the chromosonal loci of the DNA markers
Physical
Mapping
- Physical
mapping is an intermediate step in sequencing the entire
genome
(genetic map
> physical map > sequence map)
- A
complete physical map of the
genome includes:
- maps for each chromosome in the haploid
chromosome set
- for each
chromosome, continuous overlapping cloned genomic DNA segments
extending from one telomere
of the chromosome to the other
Vector -
plasmid or phage
chromosome used to carry cloned DNA segment (or insert) Chapter 8, MGA
Main
types used: YAC (Yeast Artificial chromosomes) or Cosmids
BAC (Bacterial
Artificial Chromosomes)
PAC (Phage P-1 based
Artificial Chromosomes)
Contig - set of ordered
overlapping clones that constitute a a chromosomal region or a genome
Techniques for
Identifying Clone Overlaps:
1) Ordering by Clone Fingerprints
- genomic insert (clone) carried by vector has a unique
sequence that
can be used to generate a DNA fingerprint
- multiple restriction enzyme digestion generates a set bands,
unique in
number and position, representing the fingerprint for a particular
clone
- patterns of bands from multiple clones are read by computer
and
aligned to determine the degree of overlap between inserted DNA
segments
- the proportion of bands shared between two clones
indicates
whether there is true overlap (usually 20-25%)
- important technique in developing physical maps for C. elegans, mouse and human
genomes
2) Ordering by STSs
- Sequence-Tagged Sites are short unique sequences
that
can be amplified using defined PCR primers
- derive from sequenced regions of the genome so can be used
as
landmarks for clone classification in creating physical map
- clones that share STSs must overlap; the more STSs they
share, the
more they overlap
- resulting physical map is a STS
content map
- combination of
fingerprinting and STS content mapping has
resulted in complete and near-complete physical maps for many
organisms, such as C. elegans
Simplifying
Physical Mapping by Subdividing the Genome:
- many biological and technical challenges in creating
physical maps
that are true reflections of the genome
- in human genome, there are megabase-size regions that are
duplicate
copies on two different chromosomes (biological challenge)
- some regions of the genome do not clone efficiently in
standard
vectors, leaving gaps in the physical map (technical challenge)
- subdividing the genome into smaller working entities can
circumvent
some biol and tech challenges as the number of clones required to
complete the physical map in a given subregion is much less
1) Chromosome Specific Libraries
- separate actual DNA molecules of the genome into those
contained
within specific chromosomes
- library serves as source of clones for fingerprint or STS
content physical mapping
- individual chromosomes are identified
Techniques:
a) Pulse Field Gel
Electrophoresis (PGFE) Chapter 2, MGA
- modification of standard gel electrophoresis that adjusts
conditions to permit separation of large DNA molecules
- can isolate individual chromosomes if they are small (eg.
yeast
chromosome)
- for large chromosomes, can isolate chromosome fragments by
"cutting"
with "rare cutter" enzymes (eg. NotI
enzyme cuts every 64,000 bp)
b) Fluorescence-Activated Chromosome
Sorting (FACS) or Flow Sorting
- cells disrupted to liberate whole metaphase chromosomes
into liquid
suspension
- chromosomes stained with 2 dyes, one binding to AT-rich
regions, the
other to GC-rich regions
- each chromosome has a unique ratio of AT-rich to GC-rich
regions,
used to distinguish between different chromosomes
2) Ordering by FISH (Fluorescent In
Situ Hybridization)
- technique used to confirm
physical map order of cloned DNAs
- many clones in situ
hybridize to the same landmark regions (using human chromosome banding
techniques), thus, FISH puts clones into one of a number of cytogenetic
regions within a given chromosome
- clones then evaluated by
fingerprint analysis or STS content mapping to produce physical map
- provides independent way
to corroborate results with the physical maps produces from fingerprint
or STS content mapping
- Cytogenetic map of
BAC and PAC clones localized by FISH mapping in human genome
DNA Sequencing
- Four
bases include A, C, T, and G
- Human genome equals 3 x 109 base pairs and
includes an X and
Y chromosome as well as 22 autosomes
- All
current sequencing techniques are clone
based
- First make a clone or subclone
library and then sequence all or part of inserts of individual clones
in the library. From these sequences form a consensus sequence
There are Two Ways
to Assemble a Consensus Sequence:
1) Ordered Clone Sequencing
- produce physical map of genome
- ordered subset of minimally
overlapping clones selected for sequencing
- consensus sequence for each clone
- assemble in order on physical map
2) Whole Genome Shotgun Sequencing
- obtain sequence reads from randomly
selected clones from whole genome library
- no information on where clones map
in genome
- homologous sequence allows assembly
of sequences into consensus sequence over whole genome
Sequencing Strategies in
Bacteria:
- bacterial
DNA is single copy and only a few megabase pairs in size
- due to
simple system, whole genome shotgun assembly can be applied
- gaps in
consensus sequence can be filled in by primer walking
- Primer Walking
- use of a
primer based on a sequenced area of a genome to sequence into a
flanking unsequenced area
- shotgun
sequencing does not work well in eukaryotic systems since it
is not composed entirely of single copy DNA and may contain repetitive
genome sequences
Repeated Genome
Sequences:
- repeated
genome sequences are identical sequence strings present many
times in the genome
- problematic
in eukaryotic systems
- two classes included are tandem repeat arrays and mobile
genetic elements
1) Tandem Repeat Arrays
- tandem repeats are sequences in multiple copies
adjacent to one another, variable in size and number of
repetitions
(Recall: VNTR-minisatellites, microsatellites).
a) Tandemly
repeated genes Figure 9-22
b) Non-coding tandem
repeats -
telomeres and heterochromatin
2) Mobile Genetic Elements:
Dispersed Repeats (Summary Table)
- dispersed
in genome and move to
new locations via transposition
a) Transposons
b) Retrotransposons
c) LINE (long
interspersed elements)
d) SINE (short
interspersed elements)
Tackling
Genomes with Repetitive
Sequences
1) Assembling a Sequence
from
Ordered Clones
- straightforward
assembly of many of
the dispersed repeats since they are present only once in the
individual clone
- Minimum Tiling Path is a subset of clones with clear but
minimal overlap (ie. minimum # of clones that represent entire genome)
- relies
on physical map to order
and orient the clone sequences
2) Whole Genome Shotgun
Assembly
- connects
the single-copy sequences
on either side of the repetitive element but ignores the sequence of
the repetitive element itself.
- sequenced
clones aligned by their homologous sequence overlaps into contigs (but in no particular order)
- paired-end sequences
(sequences corresponding to either end of cloned insert) are used to
span gaps between contigs and
place them in correct genomic order and orientation
- scaffolds -
ordered set of contigs in which there are unsequenced gaps connected by
paired-end sequence reads
For a visual comparison of these methods see: Figure
9-29
Functional
Genomics
- functional genomics includes study of expression and
interaction of gene products on a global
level, that is, using genomic approaches to study some aspect of all
gene products simultaneously
- how molecules cooperate and interact to effect all the
processes and phenotypes that make up a biological system
- genome refers to "gene" plus "ome", or the global data set
for "all genes"
- various other 'ome's are being worked on: transcriptome,
proteome, interactome
and phenome
- transcriptome
- sequence and expression patterns of all transcripts (where, when, how
much)
- proteome -
sequence and expression patterns of all proteins (where, when, how much)
- interactome -
complete set of physical interactions between: all proteins and all DNA
segments; all proteins and RNA segments; and among all proteins
- phenome -
description of complete set of phenotypes produced by inactivation of
gene function for each gene in the genome
Studying the
Transcriptome and Interactome Using DNA Chips:
DNA chips:
chips the size of a microscope cover slip which contain
samples of DNA laid out in series
- automation and miniaturization of assay methods
- contain samples of DNA laid out as a series of microscopic
spots bound to a glass "chip"
- can contain all genes of complex genome
- can assay all gene products in a single experiment
- method alternative to mutational analysis; rather than
amassing mutations to disrupt a particular process, chip technology
detects the specific mRNAs expressed in that process
- can also be used to detect protein-DNA interactions
Constructing DNA Chips:
- microscopic droplets of DNA added to slide via a robotic
machine (thousands of samples can be applied to one chip)
- DNA dried and treated to bind to glass
1)
One protocol detecting which
genes
are active at a particular stage of
development in a cell:
- array of known cDNAs from different genes are
applied to chip
- chip exposed to fluorescently labelled probe, such as, RNA extracted
from particular cell at particular stage of development
- binding of probe molecules to homologous DNA spots
monitored automatically by laser beam-illuminated microscope
- detect spots on chip where probe binds to determine which
genes are
active at the particular stage of interest
- Animation
2) Another protocol for building
oligonucleotides for detection of active
genes:
- array of oligonucleotides
are chemically synthesized on chip, one nucleotide at a time
- chip covered with protecting groups that prevent DNA
deposition
- mask placed on chip containing holes where sites of
deposition are to
occur
- shine a laser beam on holes where where synthesis will
begin, this will knock off protecting groups
- bathe chip in first nucleotide to be added (containing
protective group to
avoid adding dimers)
- sequential additions of laser beams, appropriate masks and
bathing in
nucleotides allow for construction of oligonucleotide
- once this is done, these chips are ready to bind to
fluorescent probes isolated at some developmental stage of interest
- chip is analyzed similar to method above
- Animation
Studying the
Interactome with the Yeast Two-Hybrid System:
- investigates interactions between proteins Figure 9-40
- uses the yeast GAL4 transcription activator
- GAL4 has two domains: a DNA binding domain which binds to
site of
transcriptional activation and an activation domain which is
responsible for activating transcription, but cannot do so without the
DNA binding domain
- gene for GAL4 is divided between plasmids
- gene for protein of interest is spliced next to DNA binding
domain of
GAL4 = bait
- the other protein gene is spliced next to the rest of the
GAL4 gene on
other plasmid = target
- both plasmids introduced into cell or cells of organism
together and
observed for activation via a reporter gene (gene for an easily detected protein)
Bioinformatics
- deciphering meaning from the raw 4-letter DNA sequence by
using
computational analysis to predict mRNA and polypeptide sequences.
Problems
with deciphering information content of DNA:
- Do not know all of the specific DNA sequences that encode
the
thousands of docking
sites for DNA or RNA-binding regulatory proteins.
- A given DNA sequence can encode for different things
depending on
its location within the DNA
ie. if located in coding region, the sequence would code for amino
acid, if located in non-coding region, the sequence would act as
binding site for regulatory protein.
- Two or more different sequences can serve the same function.
Using
Bioinformatics to Determine an Organisms Proteome
- proteome
(complete set of polypeptides encoded by a genome)
Bioinformatics
uses several independent sets of information to do
this:
- cDNA sequences (complimentaryDNAs
are DNA copies of mRNAs) cDNAs
are aligned with genomic DNA to determine the
position of introns and exons.
- Docking site sequences marking the start and end points for
the
events in information transfer (transcription, pre-mRNA splicing,
translation).
- Sequences of related polypeptides. Common statistical
tool for aligning proteins is
BLAST (Basic Local Alignment Search Tool)
- Codon bias - species-specific usage preferences for some
codons over
other encoding for the same amino acid. Presence of the
preferred codon in predicted
mRNA sequence supports the accuracy of the prediction.
Predictions of mRNA and polypeptide structure from genomic DNA sequence
depend on an integration of information from cDNA sequence, docking
site predictions, polypeptide similarities, and codon bias. Summary Figure
Links
Modern Genetic Analysis
Institute
for
Genomic Research - TIGR
Sanger
Institute - Genome Projects
Questions?
Contact Corinne or Jennifer