DNA methylation is important for gene regulation. The ability to accurately identify 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) gives us greater insight into potential gene regulatory mechanisms. Bisulfite sequencing (BS) is traditionally used to detect methylated Cs, however, BS has some drawbacks. DNA is commonly damaged and degraded by the chemical bisulfite reaction resulting in libraries that demonstrate GC-bias and are enriched for methylated regions. To overcome these limitations, we developed an enzymatic approach, NEBNext™ Enzymatic Methyl-Seq (EM-Seq™), for methylation detection that minimizes DNA damage, resulting in longer fragments and minimal GC bias.
NA12878 Illumina libraries were prepared using bisulfite and EM-Seq methods. Libraries generated with DNA inputs ranging from 10 ng to 200 ng were sequenced using Illumina’s NovaSeq 6000. Reads were adapter trimmed (trimadap) and aligned to GRCh38 using BWAMeth. Aggregate metrics like GC bias and insert size distribution (Picard) were assessed before evaluating methylation status of individual Cs (MethylDackel). MethylKit was used for correlation analysis. EM-Seq libraries have longer inserts, lower duplication rates, a higher percentage of mapped reads and less GC bias compared to bisulfite-converted libraries. Global methylation levels are similar between EM-seq and whole genome bisulfite libraries (WGBS) indicating the overall detection of methylated Cs is similar. However, CpG correlation plots demonstrated higher correlation coefficients indicating that EM-Seq libraries are more consistent than WGBS across replicates and input amount. GC Bias and dinucleotide distribution showed that EM-Seq has more even dinucleotide representation compared to the AT-rich representation observed for WGBS. EM-seq’s more even coverage allows for a higher percentage of CpGs to be assessed leading to more consistent evaluation of methylation across key genomic features (TSS, CpG island, etc.). EM-seq is more robust than WGBS, works over a wide range of DNA input amounts, has superior sequencing metrics, and detects more CpGs.