Methylation+Patterns+in+the+Pacific+Oyster+Genome

Feature tracks used in characterization of DNA methylation in the Pacific oyster genome [version oyster.v9_90 (fasta)] This version of the genome represents l ongest genomic scaffolds (1670; 14%) that cover over 90% of genome. Derived from genome build available at Zhang, G; Fang, X; Guo, X; Li, L; Luo, R; Xu, F; Yang, P; Zhang, L; Wang, X; Qi, H; Zhu, Y; Yang, L; Huang, Z (2012) Genomic data from the Pacific oyster (Crassostrea gigas). GigaScience. @http://dx.doi.org/10.5524/100030
 * Summary**

http://eagle.fish.washington.edu/Mollusk/174gm_analysis/Bedtools_Intersect/oyster.v9_90_allCGs Preview: code scaffold22   fuzznuc    misc_feature    69    70    2.000    +. Sequence "scaffold22.1" ; note "*pat pattern1" scaffold22   fuzznuc    misc_feature    73    74    2.000    +. Sequence "scaffold22.2" ; note "*pat pattern1" scaffold22   fuzznuc    misc_feature    93    94    2.000    +. Sequence "scaffold22.3" ; note "*pat pattern1" scaffold22   fuzznuc    misc_feature    156    157    2.000    +. Sequence "scaffold22.4" ; note "*pat pattern1" scaffold22   fuzznuc    misc_feature    191    192    2.000    +. Sequence "scaffold22.5" ; note "*pat pattern1" scaffold22   fuzznuc    misc_feature    240    241    2.000    +. Sequence "scaffold22.6" ; note "*pat pattern1" code
 * [Track] oyster.v9_90 all CGs**

Description: fuzznuc on oyster.v9_90 fasta file.

@http://eagle.fish.washington.edu/Mollusk/174gm_analysis/MethylatedCG_BED.bed Preview: code scaffold1   263    263    CG    0.300    + scaffold1   267    267    CG    0.100    + scaffold1   9470    9470    CG    0.188    + scaffold1   18706    18706    CG    0.071    + scaffold1   20215    20215    CG    0.077    + code Description: BSMAP used to map PE Bisulfite Illumina Reads from sperm sample c1= scaffold, c2= start, c3= end, c4= motif, c5= percent methylation, c6= strand code ./bsmap -a /Volumes/web/whale/ce_bs/filtered_174gm_A_NoIndex_L006_R1.fastq.gz -b /Volumes/web/whale/ce_bs/filtered_174gm_A_NoIndex_L006_R2.fastq.gz -d /Volumes/web/whale/ce_bs/oyster.v9_90.fa -o /Volumes/web/Mollusk/BSMAPoutput_174gm_v9_90.sam -p 8 code code python methratio.py -d /Volumes/web/whale/ce_bs/oyster.v9_90.fa -o /Volumes/web/Mollusk/methratiopython_174gm_v9_90.txt -s /Users/Shared/Apps/bsmap-2.73/samtools -z -u /Volumes/web/Mollusk/174gm_analysis/BSMAPoutput_174gm_v9_90.sa code Total number of aligned reads: total 145949462 valid mappings, 123681367 covered cytosines, average coverage: 11.86 fold
 * [Track] Methylated CpGs**

@http://eagle.fish.washington.edu/Mollusk/174gm_analysis/NoMethCG_BED.bed Preview code scaffold1   64    64    CG    0.000    + scaffold1   128    128    CG    0.000    + scaffold1   10530    10530    CG    0.000    + scaffold1   10569    10569    CG    0.000    + scaffold1   11745    11745    CG    0.000    + code Description:
 * [Track] Unmethylated CpGs**

[Track] Methylated CpGs [] Preview code scaffold1   9470    9470    CG    0.162    + scaffold1   16825    16825    CG    0.067    + scaffold1   18706    18706    CG    0.077    + scaffold1   20215    20215    CG    0.071    + scaffold1   20756    20756    CG    0.600    + code Description: BSMAP used to map PE Bisulfite Illumina Reads from sperm sample code ./bsmap -a /Volumes/web/whale/ce_bs/filtered_174gm_A_NoIndex_L006_R1.fastq.gz -b /Volumes/web/whale/ce_bs/filtered_174gm_A_NoIndex_L006_R2.fastq.gz -d /Volumes/web/whale/ce_bs/oyster.v9_90.fa -o /Volumes/web/whale/ce_bs/BSMAP_output_PE_v9_90.sam -p 8 code Total number of aligned reads: pairs: 85147571 (50%) single a: 16704916 (9.7%) single b: 15703005 (9.2%) code python methratio.py -d /Volumes/web/whale/ce_bs/oyster.v9_90.fa -u -p -q -z -o /Volumes/web/whale/ce_bs/OUT_methratio_gonadPE_v9_90_B.txt -s /Users/Shared/Apps/bsmap-2.73/samtools /Volumes/web/whale/ce_bs/BSMAP_output_PE_v9_90.sam code Trimmed output (Galaxy) to have data for only CGs, positive strand, and 10x coverage Specifically, methratio output ran through this workflow in Galaxy: []

[Track] Unmethylated CpGs http://eagle.fish.washington.edu/cnidarian/TJGR_GonadPE_BS_v9_90_CG_10x_NOmethbed.txt Preview code scaffold1   13612    13612    CG    0.000    + scaffold1   13822    13822    CG    0.000    + scaffold1   13936    13936    CG    0.000    + scaffold1   13967    13967    CG    0.000    + scaffold1   14032    14032    CG    0.000    + code Description: Second product of Galaxy Workflow above.

[] Preview code scaffold43964   14454    14480 scaffold43964   49922    49938 scaffold43964   52400    52420 scaffold43964   58274    58294 scaffold43964   128548    128573 code Description: Based on above track. Intervals where the maximum distance between mCpG is 10bp, and the minimum # of mCpG is 4. Includes 9974 features.
 * [Track] Methylation Clusters 10-4**

[] Preview code scaffold43964   3034    3111 scaffold43964   14417    14480 scaffold43964   25418    25493 scaffold43964   39824    39905 scaffold43964   49292    49349 code Description: Based on above track. Intervals where the maximum distance between mCpG is 50p, and the minimum # of mCpG is 5. Includes 40890 features.
 * [Track] Methylation Clusters 50-5**

[] Preview code scaffold43964   3034    3111 scaffold43964   14221    14291 scaffold43964   14417    14689 scaffold43964   19154    19187 scaffold43964   24969    25108 code Description: Based on above track. Intervals where the maximum distance between mCpG is 100p, and the minimum # of mCpG is 4. Includes 79161 features. Corresponding fasta: []
 * [Track] Methylation Clusters 100-4**

http://eagle.fish.washington.edu/cnidarian/rm_020713/oysterv9_90.fa.out.gff Preview code scaffold1   RepeatMasker    similarity    9873    9897     0.0    +. Target "Motif:AT_rich" 1 25 scaffold1   RepeatMasker    similarity    12513    12553     0.0    +. Target "Motif:(GA)n" 1 41 scaffold1   RepeatMasker    similarity    16199    16242    18.2    +. Target "Motif:AT_rich" 1 44 scaffold1   RepeatMasker    similarity    16261    16334    21.6    +. Target "Motif:AT_rich" 1 74 scaffold1   RepeatMasker    similarity    16494    16522     3.5    +. Target "Motif:AT_rich" 1 29 code Description RepeatMasker with[| Repbase]; Summary table @ []
 * [Track] Repeats**

http://eagle.fish.washington.edu/cnidarian/TJGR_TE_oysterv9_90.gff code scaffold999   TRF    Tandem_Repeat    166754    166792    69    +. . scaffold1   TRF    Tandem_Repeat    12513    12553    82    +. . scaffold1259   WUBlastX    MuDR1x_AP    15516    15635    50    -. DNA scaffold1327   WUBlastX    Zator-3_AAe    333539    334297    105    -. DNA scaffold1627   WUBlastX    Zator-3_AAe    151603    151785    32    +. DNA code Description RepeatProteinMask
 * [Track] Transposable Elements**

http://aquacul4.fish.washington.edu/~steven/armina/oyster.v9.glean.final.rename.CDS.gff Preview code scaffold980   GLEAN    CDS    134604    134778. +   2    Parent=CGI_10019211; scaffold980   GLEAN    CDS    141499    141593. +   1    Parent=CGI_10019211; scaffold980   GLEAN    CDS    142711    142811. +   2    Parent=CGI_10019211; scaffold980   GLEAN    CDS    143780    143896. +   0    Parent=CGI_10019211; scaffold980   GLEAN    CDS    144887    145029. +   0    Parent=CGI_10019211; code
 * [Track] CDS **

http://eagle.fish.washington.edu/Mollusk/174gm_analysis/oysterv9_90_Introns.bed Preview
 * [Track] Introns**

code scaffold22   8845    13192 scaffold22   13237    14157 scaffold22   14229    15108 scaffold22   15180    15773 scaffold22   19018    19239 code Description

http://eagle.fish.washington.edu/cnidarian/oysterv9_90_Intron_50pbWindows.bed Preview code scaffold22   8845    8895 scaffold22   8895    8945 scaffold22   8945    8995 scaffold22   8995    9045 scaffold22   9045    9095 code
 * [Track] Introns divided into 50bp windows**

http://aquacul4.fish.washington.edu/~steven/armina/oyster.v9.glean.final.rename.mRNA.gff Preview code scaffold6   GLEAN    mRNA    684420    688461    0.811719    +. ID=CGI_10022332; scaffold6   GLEAN    mRNA    694464    700813    0.235103    +. ID=CGI_10022333; scaffold6   GLEAN    mRNA    701995    741494    0.270237    +. ID=CGI_10022334; scaffold1710   GLEAN    mRNA    22769    26100    0.999946    +. ID=CGI_10022335; scaffold1710   GLEAN    mRNA    66509    80594    0.877603    +. ID=CGI_10022336; code
 * [Track] mRNA**

http://eagle.fish.washington.edu/cnidarian/TJGR_genes_v9_promoter_5p1000.gff Preview code scaffold40150   GLEAN    promoter    53687    54687    0.999676    -. ID=CGI_10003906; scaffold40150   GLEAN    promoter    61510    62510    0.998077    -. ID=CGI_10003907; scaffold40150   GLEAN    promoter    82433    83433    1    -. ID=CGI_10003910; scaffold1177   GLEAN    promoter    70856    71856    0.889891    -. ID=CGI_10003913; scaffold40178   GLEAN    promoter    50250    51250    0.999219    -. ID=CGI_10003915; code
 * [Track] Promoter Region:**

[] code scaffold22   0    50 scaffold22   50    100 scaffold22   100    150 scaffold22   150    200 scaffold22   200    250 code Description: Complement of CDS interval in 50bp windows
 * [Track] NonCDS 50bp windows**

[] code scaffold1   100112    100162    121 scaffold1   100162    100212    118 scaffold1   100212    100262    106 scaffold100   80833    80883    4279 scaffold100   82089    82139    555 code Description: bedtools | coveragebed using bam tophat output (-a) and NonCDS 50bp window bed (-b). split option.
 * [Track] +100x Mgo Expression - NonCDS 50bp windows**

[] Preview code scaffold1   21144    21194    20 scaffold1   23024    23074    27 scaffold1   23074    23124    30 scaffold1   23124    23174    26 scaffold1   23174    23224    23 code Description: bedtools | coveragebed using bam tophat output (-a) and NonCDS 50bp window bed (-b). split option.
 * [Track] +20x Mgo Expression - NonCDS 50bp windows**

http://eagle.fish.washington.edu/cnidarian/TJGR_MgoSNP_vcf_to_gff.gff Preview code scaffold1   SAMTools    SNP    18600    18600    33.8. .   REF=C;ALT=T;FILTER=.;INFO=DP%3D2%3BVDB%3D0.0160%3BAF1%3D1%3BAC1%3D2%3BDP4%3D0%2C0%2C2%2C0%3BMQ%3D50%3BFQ%3D-33;FORMAT=GT:PL:GQ;SAMPLE=1/1:65%2C6%2C0:10 scaffold1   SAMTools    SNP    18913    18913    4.77. .   REF=A;ALT=C;FILTER=.;INFO=DP%3D1%3BAF1%3D1%3BAC1%3D2%3BDP4%3D0%2C0%2C1%2C0%3BMQ%3D50%3BFQ%3D-30;FORMAT=GT:PL:GQ;SAMPLE=0/1:33%2C3%2C0:3 scaffold1   SAMTools    SNP    21342    21342    117. .   REF=T;ALT=A;FILTER=.;INFO=DP%3D31%3BVDB%3D0.0445%3BAF1%3D1%3BAC1%3D2%3BDP4%3D0%2C0%2C23%2C5%3BMQ%3D50%3BFQ%3D-111;FORMAT=GT:PL:GQ;SAMPLE=1/1:150%2C84%2C0:99 scaffold1   SAMTools    SNP    21381    21381    222. .   REF=G;ALT=A;FILTER=.;INFO=DP%3D37%3BVDB%3D0.0394%3BAF1%3D1%3BAC1%3D2%3BDP4%3D0%2C0%2C32%2C5%3BMQ%3D50%3BFQ%3D-138;FORMAT=GT:PL:GQ;SAMPLE=1/1:255%2C111%2C0:99 scaffold1   SAMTools    SNP    23620    23620    165. .   REF=A;ALT=T;FILTER=.;INFO=DP%3D16%3BVDB%3D0.0440%3BAF1%3D0.5%3BAC1%3D1%3BDP4%3D2%2C6%2C0%2C8%3BMQ%3D50%3BFQ%3D168%3BPV4%3D0.47%2C0.42%2C1%2C1;FORMAT=GT:PL:GQ;SAMPLE=0/1:195%2C0%2C254:99
 * [Track] SNPs Mgo RNA-seq Tophat**

code Description: SNPs identified in bam file from Mgo Tophat alignment

Track

Other files http://eagle.fish.washington.edu/cnidarian/TJGR_Gene_GO_GOslim.txt
 * GO Analyses:**

[] http://eagle.fish.washington.edu/cnidarian/MG_alldata1059.txt code .   CG in exon    CG intron    CG total    %CG in MBD exon    %CG in MBD intron    %CG in MBD total. Dgl   Fgo    Gil    Amu    Hem    Lpa    Mgo    overall abundance    abundCV. cell adhesion   cell cycle and proliferation    cell organization and biogenesis    cell-cell signaling    death    developmental processes    DNA metabolism    other biological processes    other metabolic processes    protein metabolism    RNA metabolism    signal transduction    stress response    transport. cell adhesion   cell cycle and proliferation    cell organization and biogenesis    cell-cell signaling    death    developmental processes    DNA metabolism    other biological processes    other metabolic processes    protein metabolism    RNA metabolism    signal transduction    stress response    transport CGI_10026228   16    11    27    0.025316456    0    0.022222222    CGI_10026228    0    0    0    0    0    0    0.071608608    0.010229801    264.5751311    CGI_10026228                                                        1    CGI_10026228    N    N    N    N    N    N    N    N    N    N    N    N    N    Y CGI_10026611    22    0    22    0    0.035294118    0.03    CGI_10026611    0    0    0    0.099655867    0    0    0    0.014236552    264.5751311    CGI_10026611                                1                            CGI_10026611    N    N    N    N    N    N    N    Y    N    N    N    N    N    N CGI_10027943    42    11    53    0    0.126213592    0.087837838    CGI_10027943    0    0    0    0.0751982    0    0    0.048001375    0.017599939    176.5122486    CGI_10027943                            1    1                            CGI_10027943    N    N    N    N    N    N    Y    Y    N    N    N    N    N    N
 * MBD-bisulfite seq data (gill tissue):**
 * data used in multivariate class*

code // Methods: DNA methylation data // Analyses are based on the results of high-resolution methylation analysis of genomic DNA from pooled oyster gill tissue (n=8). Briefly, genomic DNA was isolated and methylation enrichment performed using the MethylMiner Kit (Invitrogen) following the manufacturer’s instructions. A bisulfite treated DNA library of the methylation-enriched fraction was prepared for Illumina Sequencing at the University of Washington high throughput sequencing facility (Seattle, WA). High-throughput reads were mapped back to a subset of the oyster genome which included scaffolds longer than 1million bp (Zhang et al, 2012). Mapping of the bisulfite treated reads was performed using BS-MAP software (version 2.73). Cytosines in a CG dinucleotide context with greater than 5x coverage in the MBD library were considered to be methylated if at least one of the reads remained unconverted by the bisulfite treatment. One thousand fifty-five oyster genes were evaluated for further analysis of methylation and other gene attributes. Genes were selected if at least 1 CG dinucleotide had 5x coverage in the MBD library and were further limited to genes that were expressed in at least 1 of 6 oyster tissues based on the dataset of Zhang et al (2012). Proportion of methylation for a given gene was calculated by dividing the number of methylated cytosines by the total number of CG dinucleotides in the sequence. The proportion of methylation for exonic regions and intronic regions were also calculated per gene.

[] Preview code "","id","chr","start","end","strand","pvalue","qvalue","meth.diff" "1","scaffold1.105280","scaffold1",105280,105280,"+",1,0.72307975717804,8.33333333333333 "2","scaffold1.105289","scaffold1",105289,105289,"+",1,0.72307975717804,7.14285714285714 "3","scaffold1.154709","scaffold1",154709,154709,"+",0.0019663626474772,0.00796012720063931,-46.1538461538462 "4","scaffold1.154924","scaffold1",154924,154924,"+",1,0.72307975717804,5.95238095238095 code Description: Intervals are all CG with 10x coverage that were analyzed in both gill and sperm. Both tissues were mapped to oyster_v9_90. Results of methylkit analysis gives p-value, q-value and %difference in methylation. The sperm is considered the 'control' in this analysis, so positive values in the meth.diff column indicate higher methylation in the sperm.
 * [Track] MethylKit analysis results - sperm methylation as Compared to gill by individual CG**

[] Preview: code id	chr	start	end	strand	pvalue	qvalue	meth.diff 1	scaffold1.105201.105300	scaffold1	105201	105300	*	0.544685352	0.57355062	7.317073171 2	scaffold1.130301.130400	scaffold1	130301	130400	*	4.99E-08	3.13E-07	-100 3	scaffold1.154701.154800	scaffold1	154701	154800	*	0.00013902	0.000525279	-44.69026549 4	scaffold1.154901.155000	scaffold1	154901	155000	*	0.647937411	0.631058554	9.523809524 5	scaffold1.155601.155700	scaffold1	155601	155700	*	2.39E-175	6.58E-172	-91.27868169 code Description: Intervals are all 100bp tiles that were analyzed in both gill and sperm. Both tissues were mapped to oyster_v9_90. Results of methylkit analysis gives p-value, q-value and %difference in methylation. The sperm is considered the 'control' in this analysis, so positive values in the meth.diff column indicate higher methylation in the sperm.
 * [Track] MethylKit analysis results - sperm methylation as Compared to gill by 100bp tile**

http://eagle.fish.washington.edu/cnidarian/TJGR_Gil_cov_CDS_stats_cv.txt Preview code CGI_10011974 3.676751918 12 0.222693761 0.306395993 0.471904398 0 1.47826087 154.0178099 CGI_10014715 12.11476909 17 1.205523862 0.712633476 1.097963507 0 2.951219512 154.0712784 CGI_10021734 0.050157776 7 0.000121899 0.007165397 0.011040788 0 0.03125 154.0848123 CGI_10015964 0.028011204 5 7.45E-05 0.005602241 0.008633633 0 0.019607843 154.1103512 CGI_10004322 65.58826056 6 283.8299465 10.93137676 16.84725338 0 41.6056338 154.1183124 code Description: Columns: ID sum CDScount var avg stdev min max cv
 * Summary Statistics for Gill RNAseq coverage on CDS, grouped by gene**