Focus Moves from Transcriptional Mapping to Gene-Function Studies
By: Aaron Hall
The Sixth International Workshop on the Identification of
Transcribed Sequences was held October 2-5, 1996, in Edinburgh,
Scotland. The meeting attracted 46 speakers with 20 posters to
discuss topics including the generation of regional and chromosomal
transcriptional maps, functional analysis of gene expression,
techniques for isolating and analyzing genes, use of model
organisms, and informatics. The workshop was supported by the
Cancer Research Campaign, European Commission, HUGO Europe, Lothian
and Edinburgh Enterprise, Wellcome Trust, and DOE. Selected
presentations are summarized below.
Bioinformatics, Computational Biology
A number of speakers addressed bioinformatics and computational
biology needs for transcriptional analysis. Characterization of
potential regulatory elements in genomic DNA remains a difficult
task.
Laurent Duret (Geneva University Hospital) described the use of
large-scale comparative analysis of metazoan noncoding sequences to
identify such elements. His study has found hundreds of long,
highly conserved regions (HCRs) in noncoding parts of genes. HCRs
retain at least 70% identity in sequences of 50 to 200 bases in DNA
of species that diverged 300 million to 550 million years ago. Some
of these sequences may play roles in gene regulation and mRNA
localization. A database with more than 300 such sequences is
available.
James W. Fickett (SmithKline Beecham) reported progress in
recognizing transcriptional regulatory regions from their context
within a DNA sequence. In particular, he has devised a system that
can discriminate among myotubulin-specific regulatory regions,
other regulatory regions, and nonregulatory regions. This is an
important step toward being able to infer possible functions of a
newly discovered gene from its DNA sequence.
Thomas Werner (GMBH Institut fr Sugetiergenetik) presented an -
approach to identifying transcriptional control regions. Using two
types of retroviral control regions (LTRs) as models, he showed
that this robust technique found all known LTRs in the Primate
division of GenBank (95 Mb) and identified five previously unknown
LTRs. The false-positive rate was reported to be quite low.
Richard Mural (Oak Ridge National Laboratory) commented on the
challenge of automated annotation of DNA sequences. As the analysis
of genomes moves into large-scale sequencing, identification and
annotation of biologically relevant features in the sequence become
increasingly complex and important. Annotation must be updated
continually, particularly in light of the rapid rate of new data
acquisition. Ideas were discussed for new systems to provide a
user-defined view of a DNA sequence as well as data-mining tools
for complex querying of multiple data resources.
Comparative Genomics
Among mammals, the mouse is clearly the model organism of choice
for "surrogate" human genetics, and information resources for mouse
genetics and developmental biology are critical. Martin Ringwold
(Jackson Laboratory) reported work on the Gene Expression Database.
This database not only contains information on the expression of
various mouse genes but also is being linked to a mouse-anatomy
database that will allow the user to follow gene expression through
the course of development.
The accumulating data from a number of other model organisms are
providing new insights into genome structure and function. With
nearly half of its 100-Mb genome sequenced, the nematode
Caenorhabditis elegans is becoming increasingly important for gene
discovery. Steven Jones (Sanger Centre) presented some results of
gene prediction in the C. elegans genomic sequencing project. The
project has identified about 9700 proteins, 46% of which clearly
are related to proteins already in public sequence databases.
Because of its small genome size and the compact nature of its
genes, the puffer fish Fugu rubripes is another important model
organism. Greg Elgar (HGMP Resource Centre) described the Fugu
Landmark Mapping Project, which aims to sample sequence 1000 Fugu
cosmids to provide resources for a number of different
applications, including gene identification. Some physical linkage
data also are expected to come out of this project because of the
likelihood of finding more than one Fugu gene per cosmid clone.
Nearly 200 Fugu cosmids have been scanned.
Learning the patterns of gene expression is a necessary first step
to understanding gene function and interaction. C. elegans is
particularly well suited to studying gene-expression patterns -
because the animal develops rapidly and the fates of all its cells
have been mapped. Donna Albertson (Lawrence Berkeley National
Laboratory) reviewed a preliminary study in which the expression
pattern of nearly 200 C. elegans genes was examined using FISH on
whole animals. Petra Ross-Macdonald (Yale University) described an
approach for yeast that determines when a gene is expressed during
the yeast life cycle, subcellular localization of the gene product,
and the phenotypic effect of disrupting the gene. This technique is
helping to determine functions of large numbers of yeast genes that
have been identified by sequencing but have no relatives with known
function in current databases.
One hope of comparative genomics is to use information from
well-characterized model systems to provide candidates for genes
implicated in human diseases. One such application was presented by
Guiseppe Borsani (Tlthon Institute of Genetics and Medicine), who
found 66 human ESTs with significant homology to known Drosophila
genes. All these genes, which are well characterized in Drosophila,
are candidates for genes involved in human pathology. For example,
an EST that was homologous to a gene causing retinal degeneration
in the fruit fly was mapped to a human genome region near genes for
three different types of human retinopathology.
Gene Identification and Mapping
A number of speakers presented data that begin to elucidate genome
organization and function. Stephen Scherer (University of Toronto)
described progress in gene identification on human chromosome 7q.
Around 2500 genes are expected to be found on the long arm of
chromosome 7. Three strategies for isolating and mapping these
genes were discussed: (1) initial assignment of all known
chromosome 7 genes and ESTs from the public domain to the map, (2)
genomic DNA sequencing of selected chromosomal regions to identify
genes, and (3) direct cDNA selection on chromosome-specific
cosmids. The current 7q map contains over 1600 DNA markers,
including 170 known genes, 200 ESTs, and more than 500 selected
cDNA fragments.
Mammalian genomes are a mosaic of regions (isochores) of varying
base composition. Katheleen Gardiner (Eleanor Roosevelt Institute)
showed data on the isochore structure of human chromosome 21 and
the nature of the boundaries between different isochores. Sequences
at a number of these boundaries are homologous (>80% identity)
to the pseudo-autosomal boundary of the sex chromosomes' short arms
(as described for chromosome 6 isochore boundaries, Fukagawa et
al.). One interesting feature of these sequences is that some
appear to be transcribed.
CpG islands are short (1-kb) regions of genomic DNA with a high GC
content and reduced methylation of C residues. About 60% of genes
have these islands at their 5' ends, making CpG islands useful
markers for transcriptional units. Sally Cross (Edinburgh
University) discussed the construction of whole-genome CpG island
libraries from human, mouse, and chicken. These libraries should be
a valuable resource for isolating the 5' ends of a large number of
genes, regardless of their level of expression.
Complexities of deducing mRNA structure from genomic sequences were
described by Sherman Weissman (Yale University). Comparing
full-length cDNAs to genomic sequences reveals a number of
limitations in current methods using genomic sequence to predict
the structure of mRNAs and proteins. One problem involves large
introns that contain other transcribed sequences. Weissman also
described a gene, B144, which has a 700-base mRNA that exists in at
least 30 alternatively spliced forms.
Quantitative PCR is becoming an important technique for studying
gene expression. Michael McClelland (Sidney Kimmel Cancer Center)
addressed a broad range of issues connected to the effective use of
quantitative PCR, particularly as it applies to differential
display. These issues include relative quantitation by
low-stringency PCR, the Cot effect, and problems of target vs
standard titration. The Cot effect is particularly interesting
because it demonstrates that low-abundance and high-abundance
products accumulate at different rates. Very abundant products are
formed more slowly than expected because product reannealing
competes with priming. The need to control these various parameters
was stressed in this presentation.
J.G. Sutcliffe (Scripps Research Institute) reported a form of
differential display called TOGA (Total Gene Expression Analysis).
TOGA uniquely identifies nearly every mRNA from an organism,
including mRNAs not previously described, and does not require that
the mRNA has been characterized previously. This automated
PCR-based technique can detect messages of <0.001% prevalence,
thus providing a powerful means for comparing mRNA expression
profiles.
Wai-Choi Leung (Tulane University School of Medicine) described
architectural elements of mRNA molecules. Energy maps can be
constructed that describe the location, size, and energy density of
closed regions of mRNA molecules. Closed regions reflect the
secondary structure of mRNA that may be related to a number of
processes, including RNA translocation, nuclear export,
transcription termination, and translational control.
cDNA Libraries
M. Bento Soares (Columbia University) discussed strategies for
constructing cDNA libraries for both gene discovery and
characterization. To clone genes represented by low-abundance
transcripts, subtractive hybridization strategies are being
developed to eliminate pools of sequenced cDNAs. In addition,
techniques are being optimized to produce libraries enriched for
full-length cDNAs. These libraries will be very useful for
increasing gene representation and therefore the utility of
dbEST.
Bernhard Korn (German Cancer Research Center) reported progress in
constructing and gridding a full-length cDNA library from human
fetal brain. His institution's current library has 120,000 clones
with an average insert size of 1.8 kb. Some problems inherent in
making such libraries were discussed.
Y Chromosome
The Y chromosome presents a number of unique problems to both
genetic mapping and gene identification. Yun-Fai Chris Lau
(University of California, San Francisco) presented two approaches
to identifying Y chromosome specific genes by en masse terminal
exon trapping. Analysis of these methods showed that such an
approach is very feasible and>50% of exon clones were derived
either from known Y genes or potential functional sequences.
Article Source: http://www.articledashboard.com