# EpiAnnotator Usage

#### 2018-04-03

This document presents an example usage of EpiAnnotator based on data from a scientific paper.

## Example Study & Data

Here we will use two lists of differentially methylated probes (DMPs) obtained from the supplementary material of a study on hepatocellular carcinoma (HCC)1. Methylation data have been measured using the Illumina HumanMethylation450 BeadChip assay. Briefly, HCC samples were compared to adjacent non-tumoral tissues. In total, 66 pairs were tested using a two-sample t-test, and the obtained P-values were adjusted for multiple testing using the Bonferroni correction. Following this step, 130512 probes have been identified as DMPs. The authors applied an additional cutoff of 20% methylation, and thus defined:

• a probe to be significantly hypermethylated if the difference between the means of HCC and normal samples is at least 0.2, and the adjusted P-value is no more than 0.05;
• a probe to be significantly hypomethylated if the difference between the means of HCC and normal samples is at most -0.2, and the adjusted P-value is no more than 0.05.

Applying the criteria defined above resulted in the generation of two lists: one consisting of 3921 hypermethylated probes, and another one of 696 hypomethylated probes.

## Formatting Data for Enrichment Analysis with EpiAnnotator

EpiAnnotator expects 2 types of probe list to be submitted by the user: a selected set of probes and a background set of probes. Each list needs to be stored in a text (.txt) file, one probe ID per line as follows:

cg17113117
cg04551318
cg05000860
cg00830435
cg16678111
cg06966242
cg15088574
cg11204987
cg05750824
cg04510459

This is the format supported by EpiAnnotator for an enrichment analysis. It is highly recommended to compress these files using gzip (or another archiving utility) to a .txt.gz format, in order to speed up the upload and computation.

Warning: The maximum file size supported for the upload on our server is 30MB. If your text file is above this limit, compressing it could overcome the size issue.

In the workflow described below, we are going to use the hypermethylated DMPs as a selected set. In this case, the background set should contain all probes that were tested for differential methylation but were not found to be hypermethylated. Importantly, probes in the selected set should not appear in the background.

Generating such files from the supplementary material of Shen et al. is not a complex task, but it may require some knowledge in working with spreadsheet applications or programming. For users who wish to follow this tutorial without investing time in creating the lists themselves, the following table gives links to the generated files:

hypermethylated Hypermethylated_DMPs.txt.gz
background for hypermethylated Background_for_Hyper.txt.gz

We also provide the corresponding selected and background sets based on hypomethylated DMPs (these are not used in the example):

hypomethylated Hypomethylated_DMPs.txt.gz
background for hypomethylated Background_for_Hypo.txt.gz

The following sections outline the different steps of an enrichment analysis for methylation data with EpiAnnotator.

## Start a New Enrichment Analysis

1. Connect to the EpiAnnotator public server. It should prompt you to the EpiAnnotator website in the Analysis section. Make sure the selected option is New analysis on tracks in a databank.

2. In the left panel, under Databank, select the EpiAnnotator databank you want to use for your enrichment analysis. Here, we will use the databank EpiAnnotator (hg19).

3. Click the Start button. This will prompt you to the second tab Annotation. It displays a list of all annotations contained in the databank. You can select multiple annotations. If the annotations you are looking for are not in the selected databank, you can previously add your own custom annotations (For details about custom annotation upload, please check the section Add Custom Track). In our example, we will select from the repository Roadmap, 18-level chromatin states annotations for the HepG2 tumoral cell line (18-state ChromHMM HepG2) and for the Liver (18-state ChromHMM Liver).

4. Once all the annotations of interest are selected, click on Service to open the third tab. It will prompt you for the parameters of your analysis.

5. Under Analysis title, type the name you are willing to give to your analysis.

6. Under Distance cut-off, set an non-negative whole number. The distance cutoff can be defined as the lowest gap you allow to consider a probe in the selected or background set associated with a genomic region of a given annotation. A cutoff of 0 indicates a probe and a genomic region need to overlap to be considered associated.

7. Under Analysis type, select the type of analysis to be performed. In our case, methylation data are coming from and Illumina HumanMethylation450 BeadChip Plateform, so we will select Enrichment (Infinium 450k).

8. This should display 2 new fields: one to load your selected set of probes and the other to load your background set of probes. Click on Browse for each field, and upload the file in which each the corresponding probe list is located.

9. Once both lists are loaded in EpiAnnotator, click on the Run analysis button. In a few seconds, EpiAnnotator will compute all overlaps between the selected and background sets of probes, and the genomic regions contained in the selected annotation. You can follow the process by keeping an eye on the progress bar that appears at the bottom right corner of the page.

10. When the enrichment analysis is complete, click the Download analysis result button to save your results. Be aware that your results are also temporarily saved on the EpiAnnotator server under an anonymous ID displayed to the right of the download button. The identifier is constructed by appending a four-letter random sequence to the date and time of the completion of the analysis, for example: 2017-10-20-09-03-35-SNBI. It can be convenient to write down this ID, in order to easily access and share your results.

## Visualizing the Results

Once your results file is saved, click the Visualize results button to continue to the visualization interface.

2. Click on Labels: here you have the possibility to change default labels of the annotations you selected for your enrichment to the sentences you would like to see displayed on your results figures. Click the “Apply” button to submit the new labels.

3. Click on Visualization: You will see various parameters appearing:

• Plot type: there are currently 3 plots available to display your results: set overlaps, fold enrichment and p-value and fold enrichment. The last one is a summary figure of the enrichment analysis results: it is a dot plot showing the significance and fold change for all chromatin states in all annotations. We will focus on this representation for our example. The section Results Interpretation below provides guidance on interpreting this figure.
• Image width and height: you can change the dimensions (in inches) of your figure before saving it. For our example, we choose a width of 4 and a height of 6 to save the p-value and fold enrichment figures.
• Export image: click this button to export your figure as a PDF file.

The approach to load the results of previous analyses are described below.

1. In the first field, select the Load results from file option. This will display a new field.

2. Click on Browse and set the directory to where you saved your results file.

3. Click the Start button.

### Specify a Results ID

In case you did not save your results, but only your results ID, select Specify a results ID in the first field, and then paste the identifier in the field under Analysis results ID. For example, you can access this tutorial’s results by pasting: 2017-10-20-09-03-35-SNBI

The following steps are the ones described in the beginning of this section. If you cannot access the tutorial’s results, here is a download link to get it: Download Example’s Results.

## Results Interpretation

In our example, we used the list of hypermethylated DMPs as the selected set of probes. We ran an enrichment analysis for chromatin states annotations - 18-state ChromHMM HepG2 and 18-state ChromHMM Liver from Roadmap.

Note that HepG2 is a human cancer cell line, therefore, its chromatin states might not reflect the genomic architecture of tumoral HCC. In our tutorial, results related to the chromatin states of healthy liver tissue are probably easier to interpret. We use the following general rules in interpreting results from EpiAnnotator:

• The significance flag is a critical parameter to consider. If a result is significant, the contours of the dot on the summary plot are black. Otherwise, contours are displayed in grey.
• The fold change defines the degree of enrichment or depletion. It is encoded by color in a dot: red for enrichment and blue for depletion.
• The P-value is a measure of confidence on the obtained results. Small P-values indicate high confidence and are represented by large dots. EpiAnnotator calculates P-values after multiple testing correction using the Benjamini-Hochberg procedure.

When looking at Results we can see that some dots stand out of the figure for chromatin states Quiescent (18), Bivalent Enhancers (15), Bivalent Transcription Starting Sites (14), Weak Enhancers (11) and Flanking Transcription Starting Sites Downstream (4).

• Quiescent chromatin state: The strong and significant depletion of the quiescent state in the healthy liver tissue confirms the aberrant gain of methylation during tumor progression is unlikely to target regions that are highly methylated in normal state to begin with.
• Bivalent Enhancers and Transcription Starting Sites: Influence of bivalent enhancers is largely unknown, though they are linked to stem cell phenotype (Bernstein et al. Cell 2006)2. It is known that tumoral cells share some characteristics of stem cells (like the potential immortality of stem cell lines and tumoral cell lines). The enrichment is especially strong in the liver chromatin states annotation, less pronounced in HepG2 chromatin states annotation.
• Weak Enhancers: Notice that for the same chromatin state, there is a depletion in HepG2 and an an enrichment in healthy liver tissue. Deeper comparison of the overlaps between these annotations, as well as the hyperemthylated sites in HCC might reveal underlying tumor-specific processes.
• Flanking Transcription Start Sites Downstream: There is an enrichment in healthy liver tissue for this chromatin state, suggesting functional relevance too.

As a conclusion of this interpretation, we can claim that EpiAnnotator is a convenient tool to quickly obtain relevant results for genomic and epigenomic analysis, and to provide additional information for hypothesis generation and testing.

In case you would like to go further in this tutorial, to experiment other enrichment analysis, you can add to EpiAnnotator your own custom annotations:

1. On the first tab Analysis, in the first field, select the Add track to a databank.
2. In the second field, select the databank you are willing to work with for the annotations it contains. It is important to double check the genome assembly you used to generate your annotation (currently supported: hg38, hg19, mm10 and mm9)
3. Click the Start button. This will prompt you to the Track tab, and display several fields which can be filled with your own information the following way:

• Repository is the name the repository your annotation is coming from (e.g. UCSC, ENCODE, Roadmap Epigenomics, …). This can also be a name you give to a series of annotations stemming from your work.
• Annotation is the name of your annotation.
• Class can be of different types: genes, transcripts, chromatin states, repeats class… it can also be left blank.
• Subclass is an additional field you can use to identify a subgroup of genomic regions you isolated in your annotation. This field is used to specify a cell type, in case an annotation is specific to one.
• Tissue should be the name of the tissue of origin for your annotation. It can be left blank if your annotation is not tissue-specific. Within the EpiAnnotator repositories, for example, tissue is not specified for annotations based on cell lines.
• Cell line should be the name of the cell line used to generate your annotation, in case an annotation is specific to one.
• Disease should be specified for annotations related to abnormal phenotypes.
• Sex can be “female”, “male” or left blank if the annotation is not sex-dependent.
• Version should be the release number, the date of the annotation creation, or the version of the tool used to generate it. This field is important for reproducibility; for example, it allows specifying a unique annotation from an online public repository.
• Additional is optional. It should contain additional information as a list of pairs name=value, separated by semicolon (;). If you are adding a chromatin states annotation, the field needs to contain the following pair:

States=1 Name_State1:2 Name_State2:3 Name_State3

In this case only 3 states are declared, but you must list as many states as are defined in the annotation.

## References

1. Shen J et al. Exploring Genome-Wide DNA Methylation Profiles Altered in Hepatocellular Carcinoma Using Infinium HumanMethylation 450 BeadChips. Epigenetics 1:34-43, 2013.
2. Berstein BE et al. A Bivalent Chromatin Structure marks Key Developmental Genes in Embryonic Stem Cells. Cell 125(2):315-26, 2006.