Pacific oyster » GigaDB-1.pngTJGR

things just got real
Zhang, G; Fang, X; Guo, X; Li, L; Luo, R; Xu, F; Yang, P; Zhang, L; Wang, X; Qi, H; Zhu, Y; Yang, L; Huang, Z (2012) Genomic data from the Pacific oyster (Crassostrea gigas). GigaScience. http://dx.doi.org/10.5524/100030

Other Resources: OysterDB (Blast, Downloads) | Related manuscript: doi:10.1038/nature11413

.......................................................................................................................
Wikispaces : genefish : crassostreome - edits


    oyster.v9_M
    Sequence
    FASTA
    60 Sequences; 75 MB
    BSMAP
    SAM
    BS MBD
    RNAseq
    SAM
    PROPS: DH
    RNAseq
    SAM
    PROPS: BBC
    RefMap
    SAM
    MBD Library
    Annotation
    BED
    BS MBD CG features (no scores)
    Annotation
    BED
    BS MBD CG features (with scores)
    Annotation
    BED
    BS MBD CG features (0-20%)
    Annotation
    BED
    BS MBD CG features (20-70%)
    Annotation
    BED
    BS MBD CG features (70-100%)
    Annotation
    GFF
    Tandem Repeats [Phobos] (imperfect search)
    Annotation
    BED
    Tandem Repeats (interval and ID only)
    Annotation
    BED
    Complement to CDS features
    Annotation
    GFF
    CG motifs
    Annotation
    GFF
    CG motifs non overlap w BS MBD CG (~250k)
    Annotation
    GFF
    CG motifs with methylation status (~500k)
    Annotation
    BED
    CG motifs(~500k) (interval and ID only)
    Annotation
    BED
    50bp region (up and down) flanking CG motifs(~500k)
    Sequence
    FASTA
    50bp region (up and down) flanking CG motifs(~500k)
    Annotation
    GFF
    CG motifs (of 500k) that overlap with Tandem Repeats
    Annotation
    Blast Table
    Transposable Elements: RepBase inv Blast
    Annotation
    BED
    Transposable Elements: evalue 1E-10
    Annotation
    BED
    Transposable Elements: evalue 1E-10; single directionality
    Data
    Tab Text
    Interval Join: CG motifs(~500k) and TEs (RepBase)
    Data
    Tab Text
    Interval Join: CG motifs(~500k) and Repeats (Phobos)
    Data
    Tab Text
    Interval Join: CG motifs(~500k) and exons (CDS)
    Data
    Tab Text
    Interval Join: CG motifs(~500k) and closest non overlapping exons
    Data
    Tab Text
    Interval Join: CG motifs(~500k) not overlapping w/ mRNA (extragenic)
    Description: Genomic scaffolds longer that 1M bps


    oyster.v9_90
    Sequence
    FASTA
    1670 Sequences; 488 MB
    Annotation
    Tab Text
    Tiling Array Design v1
    Description: Longest genomic scaffolds (1670; 14%) that cover over 90% of genome.


    oyster.v9_proteome
    Sequence
    FASTA
    [.gz] 28027 protein sequences (via gigadb.org)
    Sequence
    FASTA
    28027 protein sequences (local)
    Annotation
    Blast Table
    Blastp Swiss-Prot evalue: 10
    Annotation
    Blast Table
    Blastp Swiss-Prot evalue: 1E-05
    Annotation
    Blast Table
    Blastp Estrogen Biosynthetic Process; 1E-20, 4 max_targets, (details)
    Description: Proteins

    oyster.v9_genes
    Sequence
    FASTA
    [.gz] 28027 gene (CDS only) sequences (via gigadb.org)
    Sequence
    FASTA
    28027 gene (CDS only) sequences (local)
    Annotation
    Blast table
    Blastx Swiss-Prot evalue: 1E-05
    Data
    Tab Text
    Number of exons per gene
    Annotation
    Tab Text
    Corresponding SPID and evalues
    Annotation
    Tab Text
    Corresponding SPID, evalues, and descriptions
    Annotation
    Tab Text
    Corresponding SPID, evalues, and GO# (using recent GO file)
    Annotation
    Tab Text
    Corresponding SPID, evalues, and GO and GOslim
    Description: Coding sequence



    oyster.v9
    Sequence
    FASTA
    [.gz] 11969 Sequences (via gigadb.org)
    Sequence
    FASTA
    [.gz] 11969 Sequences (local)
    Sequence
    FASTA
    11969 Sequences (local) 560MB
    Annotation
    GFF
    [.gz] gene features (via gigadb.org)
    Annotation
    GFF
    gene features (CDS and mRNA)
    Annotation
    GFF
    gene features (CDS only)
    Annotation
    BED
    gene features (CDS only) (interval and ID)
    Annotation
    GFF
    gene features (mRNA only)
    Annotation
    BED
    gene features (mRNA only)
    Annotation
    GFF
    promoter region (1000bp 5' of mRNA)
    Annotation
    GFF
    mRNA GOslim = cell adhesion or signal transduction or cell-cell signaling
    Annotation
    GFF
    mRNA GOslim = DNA metabolism or RNA metabolism or protein metabolism
    Annotation
    GFF
    DESeq (p<0.05) Gill v Male Gonad RNA-seq from Zhang et al
    Description: Zhang, G; Fang, X; Guo, X; Li, L; Luo, R; Xu, F; Yang, P; Zhang, L; Wang, X; Qi, H; Zhu, Y; Yang, L; Huang, Z (2012) Genomic data from the Pacific oyster (Crassostrea gigas). GigaScience. http://dx.doi.org/10.5524/100030




    external image 20120227-buriq24ytfwyrjsh4ut9qw4nhb.pngTGAGA (alpha)

    towards getting a gigas assembly (alpha)
    ................................................................................................................................................




    cgigas_alpha_v0.4.0
    Sequence
    FASTA

    230,270 Sequences
    Description: Independent assembly of 19 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.
    Combination of assemblies v0.2.0 and v0.3.1.

    cgigas_alpha_v0.3.2
    Sequence
    FASTA
    272 Sequences
    Annotation
    GFF v1 (gff)
    blastn Sigenae8
    Annotation
    BED v1 (bed)
    blastn Sigenae8
    Visual
    Blast Directionality (htm)
    based on BED v1
    Annotation
    GFF v2 (gff)
    blastn Sigenae8 (includes gene description)
    Annotation
    GFF v3 (gff)
    blastn Sigenae8 GOslim= cell adhesion or signal transduction or cell-cell signaling
    Annotation
    GFF v3 (bed)
    blastn Sigenae8 GOslim= cell adhesion or signal transduction or cell-cell signaling
    Annotation
    GFF v4 (gff)
    blastn Sigenae8 GOslim= DNA metabolism or RNA metabolism or protein metabolism
    Annotation
    GFF v4 (bed)
    blastn Sigenae8 GOslim= DNA metabolism or RNA metabolism or protein metabolism
    Annotation
    MicroSatellite (GFF3)
    MicroSatellite Regions (HP)
    Annotation
    Repeat_Regions(GFF3)
    Regions that are repeated 10 or more times(HP)
    Annotation
    Repeat_Regions v1 (GFF3)
    Regions that are repeated 10 or more times, derivative regions omitted (HP)
    Annotation
    Repeat_Regions v2 (GFF3)
    Fixed orientation and score issues (HP)
    Annotation
    Repeat_Regions v3 (GFF3)
    Identified some Retrotransposon regions from Repbase (HP)
    Annotation
    Repeat_Regions(BED)
    Regions that are repeated 10 or more times(HP)
    Annotation
    Repeat_Regions v1 (BED)
    Regions that are repeated 10 or more times, derivative regions omitted(HP)
    Annotation
    GFF v5 (gff)
    Tandem repeats (Geneious:Phobos)
    Annotation
    GFF v6 (gff)
    Transcription Factors (Geneious)
    Alignment
    RefMap MBD_meth (bed)
    gill tissue
    Alignment
    RefMap MBD_unmeth (bed)
    gill tissue
    Alignment
    RNAseq BB3 (bed)
    gill tissue
    Alignment
    RNAseq DH3 (bed)
    gill tissue
    Alignment
    RefMap larvae_SRP002286 (bed)
    larvae
    Alignment
    RefMap larvae_SRP004696 (bed)
    Two 454 SRA files - larvae
    Motif
    CG motifs (gff)

    BSMAP
    BSMAP_MBD_all (gff)
    Bisulfite-seq 10x cov ; all
    BSMAP
    BSMAP_MBD_0 (gff)
    Bisulfite-seq 10x cov ; 0% methylated
    BSMAP
    BSMAP_MBD_1_9 (gff)
    Bisulfite-seq 10x cov ; 1-9% methylated
    BSMAP
    BSMAP_MBD_10_49 (gff)
    Bisulfite-seq 10x cov ; 10-49% methylated
    BSMAP
    BSMAP_MBD_50_89 (gff)
    Bisulfite-seq 10x cov ; 50-89% methylated
    BSMAP
    BSMAP_MBD_90_100 (gff)
    Bisulfite-seq 10x cov ; 90-100% methylated
    Visual
    Track Visualization BS (Galaxy)
    Tracks include: exon annotation, RNA-seq, select GO terms, CpG methylation
    Visual
    Track Visualization 454 (Galaxy)
    Tracks include: RefMap larvae_SRP004696 (bed)
    Annotation
    bigwig 1

    Annotation
    bigwig 2

    Description: Independent assembly of 10 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset. Sequences greater than 20k bp retained. Cgigas BAC clones from NCBI (60) reduced to 53 clusters. Plus 12 select sequences with known genomic structure.



    cgigas_alpha_v0.3.1
    Sequence
    FASTA
    500MB
    203,216 Sequences
    Description: Independent assembly of 10 fosmids. Sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.


    cgigas_alpha_v0.3.0
    Sequence
    FASTA
    12.3MB
    272 Sequences
    Annotation
    GFF v1 (gff)

    blastn Sigenae8
    Annotation
    BED v1 (bed)

    blastn Sigenae8
    Alignment
    MBD-meth reads (sam)


    Alignment
    MBD-ummeth reads (sam)


    Alignment
    RNAseq BB3 (sam)


    Alignment
    RNAseq DH3 (sam)


    Description: Independent assembly of 10 fosmids. Consensus sequences >20,000bp along with publicly available BAC sequences were analyzed using CD-HIT EST to obtain non-redundant dataset. Plus 12 select sequences with known genomic structure.


    cgigas_alpha_v0.2.0
    Sequence
    FASTA
    615MB
    205,903 Sequences
    Annotation







    Description: Independent assembly of nine fosmids. Consensus sequences were analyzed using CD-HIT EST to obtain non-redundant dataset.


    cgigas_alpha_v0.1.3
    Sequence
    FASTA
    6MB
    262 Sequences
    Annotation







    Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Remaining sequences greater than 20,000 bp are included. Plus 12 select sequences with known genomic structure.


    cgigas_alpha_v0.1.2
    Sequence
    FASTA
    215MB
    28,538 Sequences
    Annotation
    BLASTn Sigenae v8 (tsv)
    800MB
    *rev
    Annotation
    GFF v1 (gff)
    9MB

    Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Plus 12 select sequences with known genomic structure.


    cgigas_alpha_v0.1.1
    Sequence
    FASTA
    215MB
    28,526 Sequences
    Annotation







    Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset.


    cgigas_alpha_v0.1.0
    Sequence
    FASTA
    5.8MB
    250 Sequences
    Annotation
    BLASTn Sigenae v8 (csv)

    top hit
    Annotation
    SW Sigenae v8 (tsv)

    top hit
    Annotation
    GFF v0 (gff)

    12,582 total hits
    Annotation
    GFF v1 (gff)

    3891 hits, gene names
    Annotation
    GFF v2 (gff)

    1105 hits, used rev blast
    Description: Independent assembly of nine fosmids. Consensus sequences with greater than 20x average coverage and longer than 5000 bp were analyzed using CD-HIT EST to obtain non-redundant dataset. Remaining sequences greater than 20,000 bp are included.


    cgigas_PREalpha_v0.2.0
    Sequence
    FASTA

    12 sequences
    Annotation
    GFF v1 (gff)


    Annotation
    GFF v2 (gff)

    Exons only
    Alignment
    MBD-BS reads (sam)




    *BLAST option: blastn, megablast select database from menu


    external image 20120227-f95u9h9wt2gqwihpusy1qccqrx.pngRelated Resources

    various other genomic resources for the oyster.






    GigasBase

    Public Sigenae Contig Browser: Oyster

    The Oyster EST contig browser aims to produce and maintain an automatic annotation of Oyster EST libraries. This database GigasDatabase was initiated within the frame of the AquaFirst European project, it now gathers EST sequences produced by a Marine Genomics Europe project (GOCE-CT-2004-505403) and a Genoscope project. GigasDatabase is regularly updated in the context of the ANR project "Gametogenes" (ANR-08-GENM-041).


    GigasDatabase Assemblies

    Version 8
    cgigas_all_contigs_v8.fa
    cgigas_all_contigs_v8_BESTHIT.tsv
    cgigas_all_contigs_v8_ontology.tsv


    Archived GigasDatabase Assemblies

    Version 6
    Sigenae_v6_assembly.fa
    Sigenae _v6_SPhits.xls




    NCBI

    Crassostrea gigas Entrez Records
    Crassostrea virginica Entrez Records

    NCBI: SRA

    Crassostrea gigas
    FASTA export of quality trimmed reads (export date 02/10/11)

    Misc

    Cgigas_BAConly.fa (export date 01/31/11)
    Cgigas_genomic_NCBI.fa (export date 01/31/11)

    Roberts Lab Submissions

    Crassostrea Nucleotide Database entries
    Crassostrea gigas EST Database entries








    customizable counter