crassostreome

things just got real Zhang, G; Fang, X; Guo, X; Li, L; Luo, R; Xu, F; Yang, P; Zhang, L; Wang, X; Qi, H; Zhu, Y; Yang, L; Huang, Z (2012) Genomic data from the Pacific oyster (Crassostrea gigas). GigaScience. http://dx.doi.org/10.5524/100030
 * =[[image:genefish/Pacific oyster » GigaDB-1.png width="82" height="112" align="left" link="@http://gigadb.org/pacific_oyster/"]]TJGR=

**Other Resources:** [|OysterDB] (Blast, Downloads) | Related manuscript: [|doi:10.1038/nature11413]

....................................................................................................................... || rss url="http://genefish.wikispaces.com/page/xml/crassostreome?v=rss_2_0" link="true" number="3" date="true" author="true" ||

// Description // : Genomic scaffolds longer that 1M bps
 * oyster.v9_M **
 * Sequence || [|FASTA] || 60 Sequences; 75 MB ||
 * BSMAP || [|SAM] || BS MBD ||
 * RNAseq || [|SAM] || PROPS: DH ||
 * RNAseq || [|SAM] || PROPS: BBC ||
 * RefMap || [|SAM] || MBD Library ||
 * Annotation || [|BED] || BS MBD CG features (no scores) ||
 * Annotation || [|BED] || BS MBD CG features (with scores) ||
 * Annotation || [|BED] || BS MBD CG features (0-20%) ||
 * Annotation || [|BED] || BS MBD CG features (20-70%) ||
 * Annotation || [|BED] || BS MBD CG features (70-100%) ||
 * Annotation || [|GFF] || Tandem Repeats [Phobos] (imperfect search) ||
 * Annotation || [|BED] || Tandem Repeats (interval and ID only) ||
 * Annotation || [|BED] || Complement to CDS features ||
 * Annotation || [|GFF] || CG motifs ||
 * Annotation || [|GFF] || CG motifs non overlap w BS MBD CG (~250k) ||
 * Annotation || [|GFF] || CG motifs with methylation status (~500k) ||
 * Annotation || [|BED] || CG motifs(~500k) (interval and ID only) ||
 * Annotation || [|BED] || 50bp region (up and down) flanking CG motifs(~500k) ||
 * Sequence || [|FASTA] || 50bp region (up and down) flanking CG motifs(~500k) ||
 * Annotation || [|GFF] || CG motifs (of 500k) that overlap with Tandem Repeats ||
 * Annotation || [|Blast Table] || Transposable Elements: RepBase inv Blast ||
 * Annotation || [|BED] || Transposable Elements: evalue 1E-10 ||
 * Annotation || [|BED] || Transposable Elements: evalue 1E-10; single directionality ||
 * Data || [|Tab Text] || Interval Join: CG motifs(~500k) and TEs (RepBase) ||
 * Data || [|Tab Text] || Interval Join: CG motifs(~500k) and Repeats (Phobos) ||
 * Data || [|Tab Text] || Interval Join: CG motifs(~500k) and exons (CDS) ||
 * Data || [|Tab Text] || Interval Join: CG motifs(~500k) and closest non overlapping exons ||
 * Data || [|Tab Text] || Interval Join: CG motifs(~500k) not overlapping w/ mRNA (extragenic) ||

// Description // : Longest genomic scaffolds (1670; 14%) that cover over 90% of genome.
 * oyster.v9_90 **
 * Sequence || [|FASTA] || 1670 Sequences; 488 MB ||
 * Annotation || [|Tab Text] || Tiling Array Design v1 ||

// Description // : Proteins
 * oyster.v9_ proteome **
 * Sequence || [|FASTA] || [.gz] 28027 protein sequences (via gigadb.org) ||
 * Sequence || [|FASTA] || 28027 protein sequences (local) ||
 * Annotation || [|Blast Table] || Blastp Swiss-Prot evalue: 10 ||
 * Annotation || [|Blast Table] || Blastp Swiss-Prot evalue: 1E-05 ||
 * Annotation || [|Blast Table] || Blastp Estrogen Biosynthetic Process; 1E-20, 4 max_targets, ([|details]) ||

// Description // : Coding sequence
 * oyster.v9_ genes **
 * Sequence || [|FASTA] || [.gz] 28027 gene (CDS only) sequences (via gigadb.org) ||
 * Sequence || [|FASTA] || 28027 gene (CDS only) sequences (local) ||
 * Annotation || [|Blast table] || Blastx Swiss-Prot evalue: 1E-05 ||
 * Data || [|Tab Text] || Number of exons per gene ||
 * Annotation || [|Tab Text] || Corresponding SPID and evalues ||
 * Annotation || [|Tab Text] || Corresponding SPID, evalues, and descriptions ||
 * Annotation || [|Tab Text] || Corresponding SPID, evalues, and GO# (using [|recent GO file]) ||
 * Annotation || [|Tab Text] || Corresponding SPID, evalues, and GO and GOslim ||

// Description // : Zhang, G; Fang, X; Guo, X; Li, L; Luo, R; Xu, F; Yang, P; Zhang, L; Wang, X; Qi, H; Zhu, Y; Yang, L; Huang, Z (2012) Genomic data from the Pacific oyster (//Crassostrea gigas//). GigaScience. http://dx.doi.org/10.5524/100030
 * oyster.v9 **
 * Sequence || [|FASTA] || [.gz] 11969 Sequences (via gigadb.org) ||
 * Sequence || [|FASTA] || [.gz] 11969 Sequences (local) ||
 * Sequence || [|FASTA] || 11969 Sequences (local) 560MB ||
 * Annotation || [|GFF] || [.gz] gene features (via gigadb.org) ||
 * Annotation || [|GFF] || gene features (CDS and mRNA) ||
 * Annotation || [|GFF] || gene features (CDS only) ||
 * Annotation || [|BED] || gene features (CDS only) (interval and ID) ||
 * Annotation || [|GFF] || gene features (mRNA only) ||
 * Annotation || [|BED] || gene features (mRNA only) ||
 * Annotation || [|GFF] || promoter region (1000bp 5' of mRNA) ||
 * Annotation || [|GFF] || mRNA GOslim = cell adhesion or signal transduction or cell-cell signaling ||
 * Annotation || [|GFF] || mRNA GOslim = DNA metabolism or RNA metabolism or protein metabolism ||
 * Annotation || [|GFF] || DESeq (p<0.05) Gill v Male Gonad RNA-seq from Zhang et al ||

towards getting a gigas assembly (alpha) ................................................................................................................................................ ||  ||
 * =[[image:https://img.skitch.com/20120227-buriq24ytfwyrjsh4ut9qw4nhb.png align="left"]]TGAGA (alpha)=

//Description//: Independent assembly of 19 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset. Combination of assemblies v0.2.0 and v0.3.1.
 * cgigas_alpha_v0.4.0**
 * Sequence || [|FASTA] ||  || 230,270 Sequences ||

//Description//: Independent assembly of 10 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset. Sequences greater than 20k bp retained. Cgigas BAC clones from NCBI (60) reduced to 53 clusters. Plus 12 select sequences with known genomic structure.
 * cgigas_alpha_v0.3.2**
 * Sequence || [|FASTA] || 272 Sequences ||
 * Annotation || [|GFF v1 (gff)] || blastn Sigenae8 ||
 * Annotation || [|BED v1 (bed)] || blastn Sigenae8 ||
 * Visual || [|Blast Directionality (htm)] || based on BED v1 ||
 * Annotation || [|GFF v2 (gff)] || blastn Sigenae8 (includes gene description) ||
 * Annotation || [|GFF v3 (gff)] || blastn Sigenae8 GOslim= cell adhesion or signal transduction or cell-cell signaling ||
 * Annotation || [|GFF v3 (bed)] || blastn Sigenae8 GOslim= cell adhesion or signal transduction or cell-cell signaling ||
 * Annotation || [|GFF v4 (gff)] || blastn Sigenae8 GOslim= DNA metabolism or RNA metabolism or protein metabolism ||
 * Annotation || [|GFF v4 (bed)] || blastn Sigenae8 GOslim= DNA metabolism or RNA metabolism or protein metabolism ||
 * Annotation || [[file:genefish/MicroSatellite_GFF_Final1.txt|MicroSatellite (GFF3]]) || MicroSatellite Regions (HP) ||
 * Annotation || [[file:familyGFF.txt|Repeat_Regions(GFF3)]] || Regions that are repeated 10 or more times(HP) ||
 * Annotation || [[file:nestsRemoved3.txt|Repeat_Regions v1 (GFF3)]] || Regions that are repeated 10 or more times, derivative regions omitted (HP) ||
 * Annotation || [[file:finalNest.txt|Repeat_Regions v2 (GFF3)]] || Fixed orientation and score issues (HP) ||
 * Annotation || [[file:finalNest.names.txt|Repeat_Regions v3 (GFF3)]] || Identified some Retrotransposon regions from Repbase (HP) ||
 * Annotation || [[file:repeat_regions_beta.bed|Repeat_Regions(BED)]] || Regions that are repeated 10 or more times(HP) ||
 * Annotation || [[file:cleanNest.bed|Repeat_Regions v1 (BED)]] || Regions that are repeated 10 or more times, derivative regions omitted(HP) ||
 * Annotation || [|GFF v5 (gff)] || Tandem repeats (Geneious:Phobos) ||
 * Annotation || [|GFF v6 (gff)] || Transcription Factors (Geneious) ||
 * Alignment || [|RefMap MBD_meth (bed)] || gill tissue ||
 * Alignment || [|RefMap MBD_unmeth (bed)] || gill tissue ||
 * Alignment || [|RNAseq BB3 (bed)] || gill tissue ||
 * Alignment || [|RNAseq DH3 (bed)] || gill tissue ||
 * Alignment || [|RefMap larvae_SRP002286 (bed)] || larvae ||
 * Alignment || [|RefMap larvae_SRP004696 (bed)] || Two 454 SRA files - larvae ||
 * Motif || [|CG motifs (gff)] ||  ||
 * BSMAP || [|BSMAP_MBD_all (gff)] || Bisulfite-seq 10x cov ; all ||
 * BSMAP || [|BSMAP_MBD_0 (gff)] || Bisulfite-seq 10x cov ; 0% methylated ||
 * BSMAP || [|BSMAP_MBD_1_9 (gff)] || Bisulfite-seq 10x cov ; 1-9% methylated ||
 * BSMAP || [|BSMAP_MBD_10_49 (gff)] || Bisulfite-seq 10x cov ; 10-49% methylated ||
 * BSMAP || [|BSMAP_MBD_50_89 (gff)] || Bisulfite-seq 10x cov ; 50-89% methylated ||
 * BSMAP || [|BSMAP_MBD_90_100 (gff)] || Bisulfite-seq 10x cov ; 90-100% methylated ||
 * Visual || [|Track Visualization BS (Galaxy)] || Tracks include: exon annotation, RNA-seq, select GO terms, CpG methylation ||
 * Visual || [|Track Visualization 454 (Galaxy)] || Tracks include: RefMap larvae_SRP004696 (bed) ||
 * Annotation || [|bigwig 1] ||  ||
 * Annotation || [|bigwig 2] ||  ||

//Description//: Independent assembly of 10 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.
 * cgigas_alpha_v0.3.1**
 * Sequence || [|FASTA] || 500MB || 203,216 Sequences ||

//Description//: Independent assembly of 10 fosmids. Consensus sequences >20,000bp along with publicly available BAC sequences were analyzed using CD-HIT EST to obtain non-redundant dataset. Plus 12 select sequences with known genomic structure.
 * cgigas_alpha_v0.3.0**
 * Sequence || [|FASTA] || 12.3MB || 272 Sequences ||
 * Annotation || [|GFF v1 (gff)] ||  || blastn Sigenae8 ||
 * Annotation || [|BED v1 (bed)] ||  || blastn Sigenae8 ||
 * Alignment || [|MBD-meth reads (sam)] ||  ||   ||
 * Alignment || [|MBD-ummeth reads (sam)] ||  ||   ||
 * Alignment || [|RNAseq BB3 (sam)] ||  ||   ||
 * Alignment || [|RNAseq DH3 (sam)] ||  ||   ||

//Description//: Independent assembly of nine fosmids. Consensus sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.
 * cgigas_alpha_v0.2.0**
 * Sequence || [|FASTA] || 615MB || 205,903 Sequences ||
 * Annotation ||  ||   ||   ||

//Description//: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Remaining sequences greater than 20,000 bp are included. Plus 12 select sequences with known genomic structure.
 * cgigas_alpha_v0.1.3**
 * Sequence || [|FASTA] || 6MB || 262 Sequences ||
 * Annotation ||  ||   ||   ||

//Description//: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Plus 12 select sequences with known genomic structure.
 * cgigas_alpha_v0.1.2**
 * Sequence || [|FASTA] || 215MB || 28,538 Sequences ||
 * Annotation || [|BLASTn Sigenae v8 (tsv)] || 800MB || *rev ||
 * Annotation || [|GFF v1 (gff)] || 9MB ||  ||

//Description//: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset.
 * cgigas_alpha_v0.1.1**
 * Sequence || [|FASTA] || 215MB || 28,526 Sequences ||
 * Annotation ||  ||   ||   ||

//Description//: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Remaining sequences greater than 20,000 bp are included.
 * cgigas_alpha_v0.1.0**
 * Sequence || [|FASTA] || 5.8MB || 250 Sequences ||
 * Annotation || [|BLASTn Sigenae v8 (csv)] ||  || top hit ||
 * Annotation || [|SW Sigenae v8 (tsv)] ||  || top hit ||
 * Annotation || [|GFF v0 (gff)] ||  || 12,582 total hits ||
 * Annotation || [|GFF v1 (gff)] ||  || 3891 hits, gene names ||
 * Annotation || [|GFF v2 (gff)] ||  || 1105 hits, used rev blast ||


 * cgigas_PREalpha_v0.2.0**
 * Sequence || [|FASTA] ||  || 12 sequences ||
 * Annotation || [|GFF v1 (gff)] ||  ||   ||
 * Annotation || [|GFF v2 (gff)] ||  || Exons only ||
 * Alignment || [|MBD-BS reads (sam)] ||  ||   ||


 * //BLAST option:// blastn, megablast select database from menu

=Related Resources= various other genomic resources for the oyster.

[|Public Sigenae Contig Browser: Oyster]
The Oyster EST contig browser aims to produce and maintain an automatic annotation of Oyster EST libraries. This database __ [|GigasDatabase] __ was initiated within the frame of the __ [|AquaFirst] __ European project, it now gathers EST sequences produced by a __ [|Marine Genomics Europe] __ project (GOCE-CT-2004-505403) and a __ [|Genoscope project] __. __ [|GigasDatabase] __ is regularly updated in the context of the ANR project "Gametogenes" (ANR-08-GENM-041).

GigasDatabase Assemblies
Version 8 [|cgigas_all_contigs_v8.fa] [|cgigas_all_contigs_v8_BESTHIT.tsv] [|cgigas_all_contigs_v8_ontology.tsv]

Archived GigasDatabase Assemblies
Version 6 [|Sigenae_v6_assembly.fa] [|Sigenae _v6_SPhits.xls]

NCBI
[|Crassostrea gigas Entrez Records] [|Crassostrea virginica Entrez Records]

NCBI: SRA
[|Crassostrea gigas] [| FASTA export of quality trimmed reads] (export date 02/10/11)

Misc
[|Cgigas_BAConly.fa] (export date 01/31/11) [|Cgigas_genomic_NCBI.fa] (export date 01/31/11)

**Roberts Lab Submissions**
[|Crassostrea Nucleotide Database entries] [|Crassostrea gigas EST Database entries]

media type="custom" key="13513808" media type="custom" key="13513836"