Main responsibilities in this PhD project will be sample handling, laboratory techniques (DNA isolation and sequencing), bioinformatics and interpretation of results. Further assignments will be scientific writing and representing data on scientific meetings. The laboratory work will be performed in the laboratory at Ahus and the sequence data processing will take place at CRN once the data is produced.
The NGS-based sequencing method to be used for genotyping HPV types is already developed by the three-part collaboration and the analytical parameters are currently validated. The Illumina MiSeq (www.illumina.com) is a desktop sequencer available at the HPV Reference Laboratory, able to produce up to 50 million high quality sequences per run. Well-established HPV primer sets, PGMY09/11 and MGP, are used for genotyping HPV types. These primers amplify a conserved region of the L1 gene in HPV genome, allowing the detection of all relevant HPV infections and their genetic variants. (Gravitt et al., 2000; Söderlund-Strand et al. 2009) With this method all the high-risk HPV types can be studied for HPV genetic diversity. The primer sets have been coupled to a barcoding system allowing up to 1500 samples to be sequenced at a time making the analysis highly cost-effective.
The other NGS method, also developed by the three collaborators, is used for HPV type 16 whole genome sequencing (WGS). This method allows investigation of genetic diversity in the whole HPV genome and determining integration sites in human genome. The HPV DNA is frequently integrated into the host genomes in cancers, which results in lifetime persistence of certain viral genes in the cell (zur Hausen 2002). This method is so far limited to investigating the HPV type 16 but can easily be extended to cover other HPV high-risk types.
Two fundamentally different analysis pipelines that produce HPV type and variant information from the sequence data are established at CRN. The preliminary analyses show that our HPV genotyping method offers high analytical sensitivity and type specificity.
Study design
Study 1
Study 1 is a prospective follow-up study of intra-patient HPV genetic variation. The material used in a previous publication (Tropé et al., 2012) will be analyzed in this study to compare twin samples collected at two time points. The collection of these samples in the HPV biobank at Ahus has previously received support from HSØ (Ref. no 2769112). Since both LBC and FFPE samples are collected at the second time point, HPV variability in these will also be compared.
Study 2
Study 2 is a case-control study of HPV genetic variance in HPV positive samples with lesions of different severity and cancers. A large collection of samples, using both liquid based cytology (LBC) and formalin fixed paraffin embedded (FFPE) tissue, with approval from the Regional Committee for Medical Research Ethics (REK), are available to the project (see Ethical considerations). A random selection of each category will be analyzed.
Statistical power
The following assumptions are made for the power calculation dependent on the sample size. The threshold for SNPs is set to minimum of 10 sequences and the estimation for coefficient of variation is cv=1. The type1 error α for this genome-wide screen is set to 0.001, thus adjusting for multiple testing and calculated the number of samples needed in order to achieve 80% power. 250 twin samples taken at two time points in Study 1 will suffice given that there is a 1.5-fold change in SNP number. Similarly, 370 high-grade lesions vs. 400 normal and 370 low-grade lesions in Study 2 will suffice given that the fold-change in the number of SNPs are >1.5.