Has the HPV vaccine approval ushered in an era of over-prevention?

Gardasil, Merck’s quadrivalent HPV (human papillomavirus) vaccine, has been touted as a significant breakthrough in cervical cancer prevention and women’s health, offering the potential to reduce the incidence and mortality from this disease by at least two-thirds.1,2 It is the first vaccine for which the U.S. Food and Drug Administration (FDA) has granted accelerated approval and fast track. Gardasil targets two high-risk HPV strains (HPV 16 and HPV 18) that are found in approximately 70% of cervical cancer cases.3,4,5 According to the International Agency for Research on Cancer, at least 11 other high-risk HPV types can cause cervical cancer.6 Pre-marketing randomized controlled trials (RCTs) have provided solid evidence that Gardasil is very effective in preventing high-grade cervical lesions (CIN, precursors of cervical cancer) linked to HPV 16 and HPV 18 in women not previously infected.7 But we still


Original Research and Commentary
Has the HPV vaccine approval ushered in an era of over-prevention? Catherine Riva 1 and Jean-Pierre Spinosa, MD 2 Background Gardasil, Merck's quadrivalent HPV (human papillomavirus) vaccine, has been touted as a significant breakthrough in cervical cancer prevention and women's health, offering the potential to reduce the incidence and mortality from this disease by at least two-thirds. 1,2 It is the first vaccine for which the U.S. Food and Drug Administration (FDA) has granted accelerated approval and fast track. Gardasil targets two high-risk HPV strains (HPV 16 and HPV 18) that are found in approximately 70% of cervical cancer cases. 3,4,5 According to the International Agency for Research on Cancer, at least 11 other high-risk HPV types can cause cervical cancer. 6 Pre-marketing randomized controlled trials (RCTs) have provided solid evidence that Gardasil is very effective in preventing high-grade cervical lesions (CIN, precursors of cervical cancer) linked to HPV 16 and HPV 18 in women not previously infected. 7 But we still 1 Investigative journalist; Re-Check co-founder, Winterthur, Switzerland (catherine.riva@re-check.ch) 2 Gynecologic and breast surgical oncologist, Lausanne, Switzerland (spinosa@deckpoint.ch) lack strong evidence from RCTs that HPV vaccination leads to an expected reduction in the overall incidence of highgrade CIN and of cervical cancer, although this is the most relevant issue in terms of public health.
In the last decade, observational studies around the world attempted to clarify this question, but their results are conflicting: some studies show a reduction of the high-grade CIN incidence, 8,9,10 others do not. 11,12,13 Several of these studies' authors have disclosed conflicts of interest with the companies that market Gardasil (Merck, MSD, CSL and Sanofi Pasteur MSD). Their results should be considered carefully since causality cannot be assessed from observational studies, and CIN incidence can be influenced by many factors other than vaccination, including socioeconomic status, hormonal contraceptive, tobacco smoking or sexual habits, 6,14 as well as vaginal microbiota Abstract Gardasil (Merck's quadrivalent HPV vaccine) is the first vaccine in history to have been granted the FDA's accelerated approval and fast track. Using unpublished documents and data, we investigated the impact of US regulators' choices on the quality of available evidence regarding the vaccine's efficacy in preventing high-grade cervical lesions, which are precursors of cervical cancer. We found that, as early as 2001, the accelerated approval and fast track procedures prompted FDA advisory committees to make methodological choices such that only weak claims could be made regarding the vaccine's efficacy and to approve a product whose benefit-to-harm ratio cannot be appropriately assessed. By giving more weight to the HPV vaccine's hypothetical promises rather than to compliance with best methodological principles, regulatory authorities' decisions turned out to be more favorable to commercial interests than to public health thereby allowing HPV vaccine manufacturers to escape the usual burden of proof while generating huge profits. Published and unpublished results of pre-marketing trials strongly suggest that introduction of the vaccine will not lead to the expected reduction in the incidence of high-grade cervical lesions, let alone of cervical cancer. The available HPV vaccines do not target all high-risk HPV strains. Consequently, screening must be maintained. The marketing of Gardasil has thus inaugurated a new form of medical overuse in the field of prevention: the introduction of a low-value primary prevention measure (vaccination) whose effectiveness can never be completely assessed since the secondary prevention measure (screening) cannot be removed. Meanwhile, health authorities promote the product and society bears the costs of vaccination campaigns and health risks. This is a concerning outcome. Such over-prevention creates a societal and individual burden of unnecessary medical expansion that undermines science. composition, 15 screening intensity, test performance and diagnostic assessment. 16 This unsatisfactory situation raises questions as to why better data are not available, especially from pre-marketing RCTs. This predicament has led us to examine the regulatory circumstances under which RCTs were designed and Gardasil's evaluation criteria were selected. To do so, we have reviewed files available on the FDA website and documents obtained through a Freedom of Information Act (FOIA) request (i.e., meeting minutes, background documents, emails and statistical data analysis plans) to investigate how FDA advisory committees selected assessment criteria, and what impact this has had on the quality of evidence available today regarding Gardasil's efficacy.
FDA advisory bodies initiated the clinical evaluation process of Gardasil in November 2001, when the Vaccines and Related Biological Products Advisory Committee (VRBPAC) of the Center for Biologics Evaluation and Research (CBER) met to set the guidelines for Gardasil's premarketing trials and granted an accelerated approval. 17 In 2002, CBER granted a "fast track" designation to Merck's Gardasil development program. 18 Gardasil is the first vaccine in history to have received such approval and designation. Later, CBER and VRBPAC evaluated the data submitted by the manufacturer during the Phase III trials (FUTURE studies). 19 Our examination of FDA documents shows that significant shortcomings occurred during the FDA approval process, affecting the following prior steps: (i) decisions on accelerated approval and fast track; (ii) choice of surrogate endpoint; (iii) choice of primary and secondary endpoints; (iv) statistical data analysis plans; and (v) availability of trial results.

Methods
To retrace decisions made prior to and during Gardasil's approval (1997 to 2008), we searched FDA archives for regulations, briefing and background documents, PowerPoint presentations, VRBPAC meetings' minutes, statistical data analysis plans, CBER clinical and statistical reviews of the biologics license application (BLA) for the quadrivalent HPV vaccine, and approval letters. We submitted a FOIA request for access to FDA documents and correspondence between the FDA and Merck. Since CBER mentioned in a list of appendices for a clinical study report (CSR) that, in 2005, changes were made to the planned efficacy analysis prior to unblinding, 20 22 and Statistical Principles for Clinical Trials (ICH E9). 23 We compared published and unpublished efficacy results that we were able to find in CBER background documents as well as clinical and statistical reviews. Results and Analysis

(i) Decisions on accelerated approval and fast track
The FDA states that accelerated approval was instituted to allow "drugs for serious conditions that filled an unmet medical need to be approved based on a surrogate endpoint," which enables the FDA to "approve these drugs faster." 24 As for fast track, it is "a process designed to facilitate the development, and expedite the review of drugs to treat serious conditions and fill an unmet medical need." 25 In the fast track context, the FDA defines an unmet medical need as follows: Filling an unmet medical need is defined as providing a therapy where none exists or providing a therapy which may be potentially superior to existing therapy. Any drug being developed to treat or prevent a disease with no current therapy obviously is directed at an unmet need. If there are existing therapies, a fast track drug must show some advantage over available treatment, such as: showing superior effectiveness, avoiding serious side effects of an available treatment, improving the diagnosis of a serious disease where early diagnosis results in an improved outcome; decreasing a clinically significant toxicity of an accepted treatment. 26 In the case of Gardasil, the "unmet medical need" criterion was not filled. In the early 2000s, cervical cancer was certainly a serious disease in the U.S.A.; however, it was already preventable via cervical screening. 27 Since its introduction in North America, screening with Pap smear tests has been correlated with a significant (over 70%) decline in the incidence of cervical cancer and its associated mortality. 28 There was no increase in the number of cases at the time, and the full potential of screening was far from being completely exploited. 29 Furthermore, vaccination has never been compared to screening in pre-marketing trials. Cervical cancer screening can cause harm by leading to false-positive and false-negative findings, as well as overdiagnosis and overtreatment. 30 Since Gardasil does not prevent all HPV infections involved in the development of cervical cancer and given that screening must be maintained, then Gardasil does not solve the problems associated with screening.
In November 2001, VRBPAC members met with FDA and CDC experts to discuss some key assessment criteria for Gardasil's pre-marketing trials and "the use of the accelerated approval regulations". 17 However, the participants did not deliberate on whether or not Gardasil fit the accelerated approval criteria; rather, they discussed whether Gardasil could be approved based on a surrogate endpoint (see next section) and granted accelerated approval. One year later, in 2002, CBER granted a "fast track" designation to Merck's Gardasil development program. 31 To the best of our knowledge, there is no other document available that describes CBER's decision to grant Gardasil a fast track designation in 2002.

(ii) Choice of surrogate endpoint
From a public health perspective, the goal of an HPV vaccination program is to reduce the overall incidence of cervical cancer and its associated mortality. But cervical cancer takes decades to develop, which represents a major obstacle to conducting trials. Cervical cancer evolves through a series of precursor lesions (cervical intraepithelial neoplasia or CIN, see Figure 1), graded 1 to 3, 32 which make it preventable 33 when high-grade CIN (i.e., CIN 2 and 3) are detected and treated.
In the U.S.A., where Pap test screening has been performed for decades, women diagnosed with CIN2 are offered treatment that may include cryotherapy, laser therapy, loop electrosurgical procedure (LEEP) or cone biopsy to remove or destroy the abnormal tissue at the surface of the cervix. 34 Therefore, should an HPV vaccine clinical trial participant develop a high grade cervical lesion, it would be unacceptable to let the precursor evolve into cancer without intervention. As specified in the ICH E6 guidelines, a clinical trial investigator is supposed to: "ensure that adequate medical care is provided to a subject for any adverse events, including clinically significant laboratory values, related to the trial." An HPV vaccine clinical trial must thus define which precursor lesions to use as a surrogate endpoint (instead of cervical cancer) in order to provide evidence of efficacy without denying the ethical treatment of patients who develop high-grade CIN.
During their November 2001 meeting, VRBPAC members discussed possible surrogate endpoints for Gardasil's premarketing trials with FDA and CDC experts. 17 The composite surrogate endpoint CIN 2/3 entered the discussion at the very beginning of the meeting when FDA expert (Karen Goldenthal) asked the meeting's attendees to "discuss and identify the most appropriate endpoints for traditional approval of HPV vaccine intended to prevent cervical cancer." She cited the following potential endpoints: 17 None of the attendees insisted on choosing CIN3+ as the best surrogate endpoint. However, as we will return to in the discussion, there is evidence that CIN3+ is a better cervical cancer predictor than CIN2.

(iii) Choice of primary and secondary endpoints
The choice of a primary endpoint is crucial to the quality of a clinical trial. According to the ICH E9 guidelines, the primary variable should be the variable capable of providing the most clinically relevant and convincing evidence directly related to the trial's primary objective. 23 McLeod and colleagues also argue that the primary endpoint should be "meaningful to the clinicians, patients and policymakers that are the end-users of evidence (generated by the trial)." 35 Pre-marketing trials submitted to the FDA for licensing purposes may also include secondary endpoints (or variables) that should be pre-defined in the protocol. Their importance and role in the interpretation of trial results should be explained.
Prior to the November 2001 VRBPAC meeting, participants received a briefing document which stated: 36 Limiting the primary endpoint to HPV types represented in the vaccine will likely result in a higher vaccine efficacy estimate than if the endpoint reflected disease caused by all HPV types (…) However, prevention of all cervical cancer associated with HPV is the ultimate goal of an HPV vaccine. Therefore, it will also be important in HPV vaccines trials to conduct pre-specified secondary analyses to assess efficacy of the vaccine for the chosen endpoints (e.g. all CIN 2/3), regardless of the HPV type implicated. Such secondary analyses have the potential to address questions that can be important to the overall risk-benefit assessment, such as 'replacement' disease caused by non-vaccine types or other infectious diseases.

[Emphasis Added]
The VRBPAC meeting's participants ultimately concurred with the use of CIN2/3, AIS, or cervical cancer (i.e., CIN2/3 or worse) by histology with virology to determine the associated HPV type as the primary endpoint. They followed the briefing document, reasoning that if the vaccine prevents CIN2/3 associated to HPV 16 and HPV 18, it can be expected to prevent more or less 50-70% of all CIN2/3, regardless of HPV type. They assumed that assessing the efficacy of the vaccine for CIN2/3 regardless of HPV type in secondary analysis would be sufficient to confirm the findings. Furthermore, they agreed with a participant (Thomas Fleming) who stated that accelerated approval would allow only assessment of "type specific outcomes" (i.e., effect on the incidence of CIN2/3 related to HPV 16/18), whereas a standard procedure could allow a "validation of global benefit" (i.e., effect on the overall incidence of CIN2/3 regardless of the associated HPV). Thus, the overall incidence of CIN2/3 (irrespective of HPV type) was chosen as a secondary endpoint. The only objections came from the consumer representative (Barbara Loe Fisher) and from an invited attendee (Martin Myers, from the National Vaccine Program Office). Both worried about the unintended effects on women already infected with HPVs 16 and 18.

(iv) Statistical data analysis plans (DAP)
VRBPAC's decision on the trials' endpoints was translated by Merck into a statistical data analysis plan (DAP). To the best of our knowledge, the first version of the DAP was released in July 2003. 37,38 In this document, the lower incidence of CIN2/3 and invasive cancers associated with HPVs 16 and 18 was set as the primary endpoint for premarketing trials. The 2003 DAP also stipulated that Merck would deliver, in an exploratory analysis, the results obtained in the intention to treat (ITT) and per protocol (PP) populations for all CIN 2/3, regardless of the HPV type involved.
The DAP was amended in August 2005; this new version no longer mentioned that the sponsor would provide the measure of prevention of all high-grade lesions regardless of the HPV type in the PP population. Instead, the amended DAP introduced an exploratory analysis of a new subgroup called the "Restricted Modified Intention To Treat 2" (RMITT-2) population, which would include: 39 all subjects who are seronegative and PCR negative at enrollment to all vaccine HPV types, who are PCR negative at enrollment for the non-vaccine HPV types for which PCR assays are available AND who have normal Pap test result at enrollment.
The amended DAP argued that this "supporting" analysis of the secondary endpoint would be conducted "to assess the impact of the vaccine on this endpoint from a population benefit point of view" and was "intended to provide a 'real world' estimate of the impact of the vaccine with regard of CIN 2/3 or worse among baseline HPV-naïve women." In the same document, Merck stated: "Assay data for the non-vaccine HPV types will not be available at the interim analysis to allow for the type-specific estimation of the secondary endpoints." The RMITT-2 subgroup was defined after randomization and after collecting data (a posteriori) unlike the per protocol (PP) population that was pre-specified before the start of the Phase III trials (study 013 and study 015), that is, before randomization. Hence, this new RMITT-2 analysis should be considered as a post-hoc or post-randomization subgroup analysis. However, the provisory RMITT-2 analysis results were misrepresented by Merck's head of the clinical program for Gardasil (Eliav Barr) during the May 2006 VRBPAC meeting, where the results of interim analysis were presented and at the end of which the approval was voted: 19 to try and get as many cases as possible, we pre-specified that we would do this in the HPV naive MITT population. Statistical criteria for success was, this was a pre-specified exploratory evaluation and these are the results.
[Emphasis Added] Eliav Barr argued that since the assays were not yet available: 19 the best that we can do at this stage is a population that includes women who are predominantly HPV naive, but still have CIN 2/3 and infection at baseline that was not picked up on the Pap test.

(v) Effect of amended DAP on available results
The results of the overall Gardasil efficacy analysis (prevention of all CIN2/3 lesions irrespective of HPV type) in the ITT population have been published, but not the results of the overall vaccine efficacy analysis (prevention of all CIN2/3 lesions related to any high-risk HPV type) in the PP population. The latter are unknown. The only available results that can give us an idea of the overall Gardasil efficacy in the PP population are those of an exploratory analysis that the CBER explicitly asked Merck to provide at the May 2006 VRBPAC meeting. This "additional efficacy analyses" was conducted in a "Subgroup of subjects meeting the PP population for all four vaccine-relevant HPV types" (see Table 3 and Table 5). Tables 1-3 summarize the ITT and the PP results, as well as the results for the "Subgroup of subjects meeting the PP population for all four vaccine-relevant HPV types" (FUTURE studies combined) available to date.  Tables 1-3 show that results for the primary endpoint diverged from results for the secondary endpoint. While the vaccine showed a 44% efficacy in the ITT population in preventing CIN2/3 related to HPVs 16 and 18 and a near 100% efficacy in the PP population, results for all high-risk HPV-associated CIN2/3 were dramatically below the expected rates. Given the 100% efficacy in the PP population in preventing CIN2/3 related to HPVs 16 and 18, Gardasil should have prevented around 50-70% of CIN2/3. 40,18 In the ITT analysis (4 studies combined), results show only 18%. 7 In the "Subgroup of subjects meeting the 'per protocol' population for all four vaccine-relevant HPV types" (3 studies combined), this percentage was even lower (16.9%) and did not reach statistical significance. 31 Results of study 015, the largest pre-marketing trial, show the same trend (see Table 5). Indeed, while the vaccine showed a 100% efficacy in the PP population in preventing CIN2/3 related to HPVs 16 and 18, the efficacy in preventing CIN2/3 related to any HPV was dramatically lower (14.4%) and not statistically significant in the "Subgroup of subjects meeting the 'per protocol' population for all four vaccinerelevant HPV types".
Furthermore, the PP population was clearly not defined the same way as the "Subgroup of subjects meeting the PP population for all four vaccine-relevant HPV types". Whereas the VRBPAC background document provided a precise definition of PP population ("Received all three vaccinations; Seronegative at day 1 and PCR-negative at day 1 and at month 7 to the appropriate HPV types; Did not deviate from the protocol; Clinical endpoints were counted beginning one month after the third dose (month 7)"), it just briefly mentioned that the "subgroup of subjects meeting the PP population for all four vaccine-relevant HPV types" was a "subgroup of subjects that did not have prior exposure to vaccine-relevant HPV and had normal baseline Pap tests". Looking at study 015, as shown in Tables 4-5, the number of participants in the subgroup labeled "per protocol population" for primary analysis (n Gardasil 5301/n Placebo 5258) did not match the number of participants in the subgroup labeled "Subgroup of subjects meeting the 'per protocol' population for all four vaccine-relevant HPV  types" for secondary analysis (n Gardasil 3899/n Placebo 3703). This difference makes the results difficult to compare, since the "Subgroup of subjects meeting the PP population for all four vaccine-relevant HPV types" included nearly 3000 participants less than the PP population. In the available documents, we could not find any justification for the difference. This issue was not discussed during the May 2006 meeting. But the results mentioned above were not the only ones that did not match the expected rates. Regarding the RMITT-2 exploratory analysis, Eliav Barr had to admit: 19 (…) At this stage in our clinical trial, 55 percent of the CIN 2/3 lesions were 16 and 18 related. So, our expected efficacy would be at least 55 percent. But what we saw, as we expected, was that efficacy was a bit lower, 38 percent, slightly higher for the individual components. And this is because we couldn't exclude all of that baseline HPV infection, all the baseline disease caused by nonvaccine types.

Discussion
Our careful examination shows that there were significant shortcomings in the FDA advisory bodies' evaluation of the Gardasil Phase III trials, including: (i) questionable decisions on accelerated approval and fast track; (ii) choice of surrogate composite endpoint CIN 2/3; (iii) choice of primary and secondary endpoints that didn't allow to assess vaccine's overall efficacy; (iv) changes to the statistical data analysis plan that were misrepresented to the VRBPAC; and (v) relevant trial results that remain unpublished (selective reporting).
(i) The accelerated approval and fast track decisions led to tricky consequences. The question of granting accelerated approval or not prompted VRBPAC members in 2001 to opt out of the most relevant endpoint as a primary endpoint. In addition, the fast track designation made Merck's vaccine eligible for priority review and rolling review, "which means that a drug company can submit completed sections of its Biologic License Application (BLA) or New Drug Application (NDA) for review by FDA, rather than waiting until every section of the NDA is completed before the entire application can be reviewed." 41 (ii) Rees and colleagues have analyzed the negative consequences of the FDA's decision to choose the composite surrogate endpoint CIN2/3. 42 They argue that since "CIN2 is often misclassified due to its diagnosis having lower reproducibility and validity," it cannot be assumed that vaccine efficacy against CIN2/3 will translate in efficacy against CIN3+. They also contend that "CIN3 is generally agreed to be the best marker for risk of cervical cancer" and "composite endpoints in intervention studies involving CIN2 are sub-optimal" since CIN2 lesions "may not be good predictors of progression." 42 Furthermore, in a recent systematic review and meta-analysis, Tainio and colleagues show that in the case of CIN2, "active surveillance rather than immediate intervention appear justified, especially among young women," considering that "most CIN2 lesions, particularly in young women (<30 years), regress spontaneously." 43 In addition, treatment can cause harms such as recurrence of high-grade CIN, premature delivery, major and minor bleeding and infection, pelvic inflammatory disease, and damage to other organs. 44 Hence, choosing CIN3+ as a surrogate endpoint would have been ethically arguable.
By choosing the composite surrogate endpoint CIN2/3, the 2001 VRBPAC meeting's attendees failed to consider the warnings provided by ICH E9 guidelines "about the In the case of CIN2/3 as a surrogate endpoint, the two final criteria mentioned above were not met. Moreover, Gardasil's pre-marketing trials may have overestimated the vaccine's efficacy. Rees and colleagues noted that CINs may have been overdiagnosed because cervical cytology was assessed at 6-and 12-month intervals rather than the normal 36-month screening interval. 42 It is unclear from the documents we have consulted when the decision regarding investigation intervals was made. The 2001 VRBPAC meeting transcripts suggest that ethical and feasibility considerations may have led to the choice of shorter investigation intervals. One VRBPAC member, Dixie E. Snider, pointed out that a shorter investigation interval would have allowed them to identify numerous cervical lesions that regressed spontaneously, whereas a longer interval would have given more specificity. But he argued that longer investigation intervals might have missed the high-grade lesions that would rapidly progress to cancer and, above all, would raise "this whole issue of compliance in clinical trials. The longer you wait, the more you signal that this is not all that important and women start dropping out and they don't come in for that two-year visit." 17 (iii) Regarding the choice of primary and secondary endpoints, VRBPAC and CBER experts should have considered that Gardasil only targets HPV 16 and 18 infections and cannot be assumed to decrease the incidence of global high-grade cervical lesions (related to any HPV type). Indeed, other high-risk strains may replace HPVs held in check by the vaccination (viral replacement). 45,46 In other words, it is only if a global reduction of high-grade lesions related to all high-risk HPV types can be demonstrated (rather than simply those related to HPV16 and HPV18), that it may be possible to conclude that HPV vaccination will have the desired long-term clinical benefit of reducing the overall incidence of cervical cancers and associated mortality. The viral replacement hypothesis was only briefly mentioned during the November 2001 VRBPAC meeting. When asked about it, FDA expert (Karen Goldenthal) responded: "I didn't see evidence from the literature that removing, let's say, type 16 would be more likely to cause persistence of other types." Another CDC expert (William Reeves) agreed "I think there's not going to be a rush of other types to replace it." The same happened during the May 2006 meeting. When Monica Farley, meeting chair, asked the question "at least hypothetically or the concern would be, replacement if we eliminate 16 and 18, will it be replaced" and if there were "differences in which types of HPV they were infected with," CBER medical officer Nancy Miller stated: "I'm not aware of differences. I know that we -they did not test for the non-vaccine HPV types, so I don't believe -I don't have that information about which other types they might have been infected with." Choosing all high-grade lesions irrespective of HPV type (ideally all CIN3+ irrespective of HPV type) as a primary endpoint would have helped avoid the pitfall of viral replacement by providing a clinically-relevant measure of a possible overall public health benefit. It would also have provided information about the vaccine's efficacy on the incidence of global high-grade lesions. Consequently, the 2001 November VRBPAC meeting participants could have chosen a secondary endpoint that might demonstrate a biologically-plausible mechanism of action and ensure that any observed decrease in the overall number of high-grade lesions had occurred as a result of the vaccine. The latter could have been ensured by determining the lesions' HPV type that occurred in the vaccine and control arms. By deciding that the overall CIN2/3 incidence (irrespective of HPV type) would become a secondary endpoint, meeting participants decreased the validity of the clinically-relevant measure of an overall public health benefit. Secondary endpoint results only allow weak claims and there is still no consensus about the way in which such results should be interpreted, especially when they are divergent from the primary endpoint results, as was the case for Gardasil. 47,48,49,50 In retrospect, VRBPAC's and CBER's decision to limit the vaccine's primary efficacy endpoint to HPVs 16 and 18 CIN2/3 appears very problematic from scientific and public health points of view, although very advantageous for Merck, as its pre-marketing trials were required to show only partial efficacy, and not overall efficacy, in preventing CIN2/3.
(iv) It is difficult to understand why CBER accepted the amended DAP. First, because post-hoc subgroup analyses have more limitations than pre-specified analyses, the new analysis would have provided less reliable results, even though the sponsor planned to adjust for multiplicity. 51 As recalled by Desai and colleagues, post-randomization subgroup analyses are not recommended, as the potential for biased treatment estimates is high. 52 Desai et al. identified the following pitfalls: (a) increased type I error rates resulting from testing multiple hypotheses; (b) increased type II error rates caused by testing hypotheses for which the study was not designed; (c) incorrect application of statistical tools for assessing heterogeneity across subgroups; (d) testing data-driven (as opposed to pre-specified) hypotheses; (e) performing subgroup analyses when overall findings are negative; (f) considering hypotheses not justified by biology; and (g) selective reporting.
Second, due to the lack of assay data for HPV types not targeted by the vaccine, the new RMITT-2 analysis could not be conducted properly given that Merck would not have been able to provide the purported "real world' estimate" at the planned interim analysis.
In light of the fast track context, the reliability of the planned interim analysis was a crucial issue because if its results were judged sufficient, it could lead not only to approval, but also to the early termination of the trials. Early termination would make long-term assessment of the vaccine efficacy on all CIN2/3 irrespective of HPV type impossible. This is precisely what happened when the blind was broken: vaccination was offered to all participants in the Fall of 2005. 53 Finally, the amended DAP did not mention the treatment group interaction test that was described in the 2003 DAP. Interaction tests allow for the examination of the strength of evidence for treatment differences between subgroups and are considered the most useful approach for evaluating subgroup analyses. 54 CBER did not stick to the ICH E3 and ICH E9 guidelines that both stress the importance of paying attention to differences between the planned and actual analyses (ICH E9, p. 32; ICH E3, p. 17).
(v) During the May 2006 approval meeting, CBER and VRBPAC members failed to address the contrast between, on one hand, the ITT results and the results for the "Subgroup of subjects meeting the 'per protocol' population for all four vaccine-relevant HPV types" that were far below expectations and, on the other hand, the interim RMITT-2 results that better matched expectations. This should have raised questions, especially considering the last-minute DAP amendment. The PP results remain unknown and the results for the "Subgroup of subjects meeting the 'per protocol' population for all four vaccine-relevant HPV types" (as noted in Table 3 and Table 5) have never been published and suggest limited global efficacy.
Furthermore, CBER seems to have applied a double standard regarding exploratory analyses. On this matter, the ICH E9 guidelines point out that: When exploratory, these [subgroup] analyses should be interpreted cautiously; any conclusion of treatment efficacy (or lack thereof) or safety based solely on exploratory subgroup analyses are unlikely to be accepted.
CBER applied this rule strictly when it came to assess the analysis results of the "Subgroup of subjects meeting the PP population for all four vaccine-relevant HPV types", diligently underlining the limitation of subgroup analyses: In the subgroup of subjects that did not have prior exposure to vaccine-relevant HPV and had normal baseline Pap tests, there appeared to be a modest efficacy of approximately 20% against CIN 2/3 or worse due to any type HPV. We again note the important limitations of a subgroup analysis where imbalances in baseline demographics could account for differences in the subgroup efficacy determinations. The degree to which cases of CIN 2/3 or worse due to HPV types not associated with Gardasil™ might offset its efficacy against vaccinerelevant HPV types has not been fully elucidated in these studies.
[…] The applicant proposed a plan to identify the HPV types other than 6, 11, 16, or 18 from the studies' clinical specimens.
The same rigor was applied when analyzing results of exploratory subgroup analyses for study 013, suggesting that subjects who were seropositive and PCR-positive for the vaccine-relevant HPV types had a greater number of CIN 2/3 or worse: This demonstrated a limitation of the evaluation of small subgroups, where subgroups might have imbalances in baseline demographic characteristics. In this case, it appeared that subjects in this subgroup of study 013 who received Gardasil™ might have had enhanced risk factors for development of CIN 2/3 or worse compared to placebo recipients.
We searched the Gardasil regulatory documents for such rigorous assessment of the limitations of the RMITT-2 results but could not find any. As the results came from a post-hoc analysis, such an assessment would have been more pressing.
Certainly, FDA advisory bodies did not rely solely on this exploratory post-hoc subgroup analysis and/or the numbers forecasted by Merck during the May 2006 VRBPAC meeting ("we expect at the end of phase III to have a complete estimate of the efficacy of the vaccine, probably close to the 55 percent that we anticipate or maybe even greater"), but rather on the nearly 100% efficacy Gardasil demonstrated on the primary endpoint (HPVs 16 and 18 related CIN2/3) in uninfected women. Still, the exploratory post-hoc subgroup analysis played a major role in making VRBPAC members and CBER confident that Gardasil would significantly reduce the global incidence of high-grade cervical lesions and therefore reduce cervical cancer's incidence. CBER only considered the "imbalances in baseline demographics" hypothesis to explain the disappointing ITT and "Subgroup of subjects meeting the 'per protocol' population for all four vaccine-relevant HPV types" results regarding Gardasil's efficacy against CIN2/3 irrespective of high-risk HPV type, although CBER did not provide evidence to support this claim.
Had FDA advisory bodies complied with ICH guidelines, they might have appraised the RMITT-2 results in a much more critical manner and assessed the possibility that the disappointing ITT and "Subgroup of subjects meeting the 'per protocol' population for all four vaccine-relevant HPV types" results might indicate that viral replacement was occurring. This assessment would have seriously challenged the benefit that might be expected from the vaccine and therefore its benefit-to-harm ratio, which consequently should have suspended the approval. 55 On top of this, between 2006 and 2008, CBER continued to rely on RMITT-2 results. Unpublished documents show that during this period, i.e. long after the data were seen, several RMITT-2 analyses were performed with slightly different inclusion and exclusion criteria, as summarized in Table 6 and Table 7. Such practices should be considered data dredging, which is known to be deployed when investigators are eager to produce more favorable results than those of prespecified analyses. As pointed out by the University of Oxford Centre for Evidence-Based Medicine's Catalogue of Bias, data-dredging can include: 56 assessing models with multiple combinations of variables and selectively reporting the 'best' model (i.e., 'fishing'); making decisions about whether to collect new data on the basis of interim results; making post-hoc decisions about which statistical analyses to conduct; and generating a hypothesis to explain results which have already been obtained but presenting it as it were a hypothesis one had prior to collecting the data (i.e., HARKing ('hypothesizing after the results are known'] In general, these procedures are acceptable when transparently reported; however, when authors neglect to accurately report how the results were in fact generated, they are rightfully classified as data-dredging.
To the best of our knowledge, the RMITT-2 analyses' results were never transparently reported as data generated, neither by the FDA nor by the journal in which they were published.

Conclusion
In sum, as early as 2001, FDA advisory bodies reviewing Gardasil's pre-marketing trials made methodological choices that would inevitably prevent them from making strong claims regarding the vaccine's efficacy. The FDA approved the vaccine, despite the substantial weaknesses of the interim results, in 2006. 57 Two years later, in 2008, when Merck provided the results for the RMITT-2 subgroup (published in 2010 by Munoz et al. 58 ), CBER realized that the tests failed to demonstrate the expected overall efficacy 59 and that the RMITT-2 results did not match the numbers forecasted by Merck's head of the clinical program for Gardasil during the May 2006 VRBPAC meeting. Notwithstanding this realization by CBER, the FDA did not suspend Gardasil's approval. The results of the RMITT-2 analysis were included in Gardasil's package insert 60 without any mention of the fact that it was a post-hoc subgroup analysis, the results of which should at least be considered with caution. Ultimately, because of selective outcome reporting and multiple post-hoc subgroup analyses, and contrary to the claims made by pre-marketing trials (FUTURE) steering committee members, 58 it is not possible to derive any meaningful conclusions about the overall HPV vaccine efficacy from the available RCT data. Results of published and unpublished phase III trials strongly suggest that an introduction of the HPV vaccination will not lead to the expected reduction in the incidence of global high-grade cervical lesions, let alone in the cases of cervical cancer.
These FDA decisions greatly influenced other regulators and public health bodies. In 2007, EMA/EMEA and Swissmedic approved Gardasil on the same basis, and Cervarix and Gardasil9 were marketed according to the same benchmarks. The FDA decision also prompted public health bodies to consider results from a post-hoc subgroup analysis as a sufficient basis to introduce Gardasil into routine adolescent vaccination schedules. Consequently, public health authorities, the medical community, and the public have been deprived the possibility to gain unbiased insights about the efficacy of Gardasil in preventing highgrade cervical lesions.
Furthermore, the HPV vaccine's benefit-to-harm ratio cannot be appropriately assessed. First, cervical cancer screening must be maintained because available vaccines do not target all high-risk HPV strains. Second, the introduction of HPV vaccination has prompted health authorities and medical societies to revise screening guidelines and has led to the use of HPV testing (that is about to replace Pap smear tests), 61 despite serious issues with false positives and overdiagnosis. 62 Thus, the continuation of screening coupled with guideline changes affect the reliability of post-marketing studies and make it difficult, if not impossible, to properly address the vaccine impact and viral replacement issues that were raised by premarketing trials and other publications. 63,64,65 Moreover, even though they are publicly funded, vaccination programs do not solve the problem of low-income women being more at risk of cervical cancer. Observational studies indicate that socioeconomic status remains a major factor in nonadherence to cervical cancer screening as well as in nonadherence to HPV vaccination. 66,67,68 By giving more weight to the HPV vaccine's hypothetical promises rather than to compliance with best methodological principles, regulatory authorities have made decisions that turned out to be more favorable to commercial interests than to public health. 69 As pointed out by health economist Alain Enthoven, "increasing medical inputs will at some point become counterproductive and produce more harm than good." 70 The HPV vaccine produces precisely such a pattern by promoting an intervention that has no proven benefits and is potentially harmful and unnecessarily costly. The approval process described above has allowed HPV vaccine manufacturers to escape the burden of proof while generating huge profits: Gardasil sales grew from $1.7 billion in 2014 to $3.7 billion in 2019. 71 The marketing of Gardasil has thus inaugurated a new form of medical overuse in the field of prevention: the introduction of a low-value primary prevention measure (vaccination) whose effectiveness can never be completely assessed since the secondary prevention measure (screening) cannot be removed. Meanwhile, health authorities promote the product and society bears the costs of vaccination campaigns and health risks. This is a concerning outcome. Similar to the concepts of overdiagnosis and medicalization, such over-prevention is a societal and individual burden of unnecessary medical expansion that undermines science. 72,73 In view of the many regulatory oversights in the approval of Gardasil, an independent, non-industry funded reassessment of the clinical study reports of all available HPV vaccines (including anonymized individual patient data) is urgently required.

Limitations
Our analysis has several limitations. First, our approach cannot be compared to prior research since, to the best of our knowledge, we are the first who have attempted to review Gardasil approval by analyzing FDA bodies decisions and choices against ICH guidelines. Future research is needed to confirm, refine or disprove our approach and our findings.
We requested documents from the FDA based on our initial research on the Gardasil approval, which is detailed in our investigative book. 74 We cannot rule out the possibility that our selection criteria were too restrictive or that the FDA holds other documents whose contents could contradict our analysis and interpretation. Furthermore, the meetings' minutes we reviewed reveal only a partial aspect of the discussions FDA bodies experts held regarding Gardasil's approval and do not allow us to reconstruct all the details of the approval process. As a result, the facts we report might be incomplete.