Chicken EST sequencing results

This website presents sequences, annotation, gene discovery and SNP detection from an EST sequencing study of four chicken cDNA libraries, constructed from brain and testis from White Leghorn and red junglefow. This EST project is part of a Wallenberg Consortium North (WCN) funded project: "Functional analysis of multifactorial traits and vertebrate development in the chicken".

Papers

The EST sequencing, annotation and gene discovery is described in:
Savolainen P., Fitzsimmons C.J., Arvestad L., Andersson L., Lundeberg J. ESTs from brain and testis of White Leghorn and red junglefowl: annotation,bioinformatic classification of unknown transcripts and analysis of expression levels. Cytogenet Genome Res, 2005, 111(1), 79—78
The SNP discovery is described in:
Fitzsimmons C.J., Savolainen P., Amini B., Hjälm G., Lundeberg J., Andersson L. Detection of sequence polymorphisms in red junglefowl and White Leghorn ESTs. Anim Genet, 2004, 35(5), 391—396 (Supplemental material information is available in .doc and PDF format.)

Sequences

The 21,285 ESTs generated in this project are available at Genbank, under Genbank numbers CN216802-238086, and are also available in fasta format (13 MB). Clustering and sequence assembly resulted in 12,549 distinct putative transcripts. We have collected them for easy access in a public Fasta file.

Annotation

There are three different files summarising the annotation of the transcripts. They are made available in Excel files below. Clone names (first columns) are hyperlinks, so by clicking on them one brings up the default web browser with a page listing the hits for that clone. On this page there are further links into databanks such as NCBI and SwissProt with detailed information about the hits. For better accessability, you can also browse the latter file as a webpage.

If you, for example from the sequence file above, already know the identifier of a transcript you want to view the annotations for, please use the following form:

Enter id of EST or contig:  

Gene discovery

In the annotation steps a large number of transcripts were found to have no match in the gene/protein databases or even to any database, including dbEST. These transcripts were studied for the presence of coding sequence, which would indicate that they represent novel genes. The 5,443 transcript we have filtered out as containing genes (Table 5 in our paper) are available here.

The 166 transcripts that had no significant similarity to known genes or proteins, but are likely to be coding genes are also available as a Fasta file.

ESTScanEstzmateESTScan+Estzmate
Data setSizeCoding transcriptsAdjusted estimateTranscripts with long ORFAdjusted estimateCoding transcriptsAdjusted estimate
(i) 12,5498,204 6,061 5,443
(ii) 4,1291,263705 756363 389400
(iii) 1,649 548352 291123 166180
(iv) 8,4206,941 5,305 5,054

Micro array annotation

Using the above clones, we have manufactured a micro array containing close to 14,000 clones. This includes transcripts from other EST projects as well. Annotation and chromosome localization of the spotted clones are given here.
Lars Arvestad
Last modified: Wed Nov 3 13:17:44 CET 2004