Chicken EST sequencing results
This website presents sequences, annotation, gene discovery and SNP
detection from an EST sequencing study of four chicken cDNA libraries,
constructed from brain and testis from White Leghorn and red
junglefow. This EST project is part of a Wallenberg Consortium North
(WCN) funded project: "Functional analysis of multifactorial traits
and vertebrate development in the chicken".
Papers
The EST sequencing, annotation and gene discovery is described in:
Savolainen P., Fitzsimmons C.J., Arvestad L., Andersson L., Lundeberg J.
ESTs from brain and testis of White Leghorn and red junglefowl: annotation,bioinformatic classification of unknown transcripts and analysis of expression levels. Cytogenet Genome Res, 2005, 111(1), 79—78
The SNP discovery is described in:
Fitzsimmons C.J., Savolainen P., Amini B.,
Hjälm G., Lundeberg J., Andersson L.
Detection of sequence polymorphisms in red junglefowl and White Leghorn ESTs. Anim Genet, 2004, 35(5), 391—396
(Supplemental material information is available in .doc and PDF format.)
Sequences
The 21,285 ESTs generated in this project are available at
Genbank, under Genbank numbers CN216802-238086, and are also
available in fasta format
(13 MB).
Clustering and sequence assembly resulted in 12,549 distinct putative
transcripts. We have collected them for easy access in a public
Fasta file.
Annotation
There are three different files summarising the annotation of the
transcripts. They are made available in Excel files below.
Clone names (first columns) are hyperlinks, so by clicking on
them one brings up the default web browser with a page listing
the hits for that clone. On this page there are further links
into databanks such as NCBI and SwissProt with detailed
information about the hits.
For better accessability, you can also browse the latter file as a
webpage.
If you, for example from the sequence file above, already know the
identifier of a transcript you want to view the annotations
for, please use the following form:
Gene discovery
In the annotation steps a large number of transcripts were found
to have no match in the gene/protein databases or even to any
database, including dbEST. These transcripts were studied for
the presence of coding sequence, which would indicate that they
represent novel genes.
The 5,443 transcript we have filtered out as containing genes (Table 5 in our paper) are available here.
The 166 transcripts that had no significant similarity to known genes or proteins, but are likely to be coding genes are also available as a Fasta file.
| | ESTScan | Estzmate | ESTScan+Estzmate
|
|---|
| Data set | Size | Coding transcripts | Adjusted estimate | Transcripts with long ORF | Adjusted estimate | Coding transcripts | Adjusted estimate
|
|---|
| (i) | 12,549 | 8,204 | | 6,061 | | 5,443 |
|
| (ii) | 4,129 | 1,263 | 705 | 756 | 363 | 389 | 400
|
| (iii) | 1,649 | 548 | 352 | 291 | 123 | 166 | 180
|
| (iv) | 8,420 | 6,941 | | 5,305 | | 5,054 |
|
Micro array annotation
Using the above clones, we have manufactured a micro array
containing close to 14,000 clones. This includes transcripts
from other EST projects as well. Annotation and chromosome
localization of the spotted clones are given here.
Lars Arvestad
Last modified: Wed Nov 3 13:17:44 CET 2004