Introduction

Globally, testicular cancer is the most common type of cancer among men aged 14-44.1 One in every 270 American men will develop testicular cancer, and incidences have been on the rise since 1992.2–4 The highest rates of testicular cancer occur in the United States, with up to 10,000 new cases identified each year.5 While treatment is highly successful, with an estimated 97% 5-year survival rate,3 the majority of cases occurring in young men require chemotherapy or surgery, which can impact their fertility and quality of life.6 Therefore, studies involving novel treatments for testicular cancer that preserve fertility in young men must be high-quality, reproducible, and portrayed in an unbiased manner.

Physicians often rely on the abstracts of research articles to guide their treatments; thus, the physician’s interpretation of an abstract has a direct impact on a physician’s interpretation of treatment efficacy.7 Due to the potential for the use of abstracts to guide medical therapy, authors must state their findings clearly and objectively, both in full-text articles and their corresponding abstracts. Consequently, a growing body of research is being conducted on spin in medical literature. “Spin” is defined as “a specific way of reporting, intentional or not, to highlight that the beneficial effect of the experimental treatment, in terms of efficacy or safety, is greater than that shown by the results.”8 One study, published in the Journal of Clinical Oncology, examined the interpretation of results in abstracts in the field of cancer by clinicians.7 A sample of randomized control trials (RCTs) with statistically nonsignificant primary outcomes and spin in the abstract were chosen. The researchers then created two versions of the abstract - the original with spin and a rewritten abstract without spin. Blinded to the study’s hypothesis, 300 clinicians were randomly and evenly assigned to one of these two groups. This study found that for abstracts containing spin, clinicians rated the treatment as more beneficial and were more interested in reading the full-text article.7 Despite there being a non-significant outcome, the presence of spin within the abstract had a direct impact on the clinician’s rating of a study and interpretation of the study’s results. Proper interpretation and further evaluation of a study can be hindered if the full text is inaccessible. Other studies have found at least a 30% prevalence of spin in systematic reviews in the fields of emergency medicine, ophthalmology (especially cataract therapies), addiction medicine (including cannabis use disorder and alcohol use disorder), and physiotherapy (particularly low back pain).8–13

Systematic reviews provide a complete source of evidence-based medicine, as they offer clinicians a comprehensive synthesis of the available findings on a specific treatment.14 Since researchers collate and analyze many randomized controlled trials for systematic reviews and meta-analyses, these reports are often considered the gold standard for evidence-based medicine and are used to create clinical practice guidelines (CPGs).15 Since a change in standards for CPGs in 2011, no practice recommendation has appeared in a CPG without being supported by a systematic review.16 Spin within such fundamental research articles can lead to misinterpretation and poses the risk of producing suboptimal patient outcomes if these misinterpretations are acted upon. Accordingly, this study aims to identify the presence of spin in the abstracts of systematic reviews on testicular cancer screening, treatment, and quality of life.

Methods

Overview

In accordance with the U.S. Federal Policy for the Protection of Human Subjects, oversight by an institutional review board was not necessary as this study did not directly include human subjects. To ensure the reproducibility and transparency of our study, our protocol (https://osf.io/zd8xt/), spin and AMSTAR-2 extraction forms, data analysis scripts, and other study resources were uploaded to the Open Science Framework.17 Additionally, an independent research team re-analyzed our data and analysis transcripts in a masked fashion. The methods described within our study are also described in similar concurrent studies that evaluated systematic reviews for spin and assessed method qualities in their respective fields.9–12 We adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)18,19 and Murad and Wang’s18,19 guidelines for meta-epidemiological studies to draft this manuscript.

Search Strategy

Search strategies for the MEDLINE (Ovid) and Embase (Ovid) databases were created to locate systematic reviews and meta-analyses focused on the screening for, treating, and quality of life after testicular cancer (Figure 1).

Figure 1.Search queries.
Ovid MEDLINE:

  1. exp Testicular Neoplasms/
  2. ((testi* or testes) adj2 (cancer* or neoplasm* or tumo$r*)).mp.
  3. 1 or 2
  4. exp "Systematic Review"/
  5. exp Meta-Analysis/
  6. ("systematic review" or "meta-analysis" or (systematic* adj1 review*)).ti,ab.
  7. 4 or 5 or 6
  8. 3 and 7
Ovid Embase:

  1. exp testis tumor/
  2. ((testi* or testes or tunica) adj2 (cancer* or neoplasm* or tumo$r*)).mp.
  3. 1 or 2
  4. exp "systematic review"/
  5. exp meta analysis/
  6. ("systematic review" or "meta-analysis" or (systematic* adj1 review*)).ti,ab.
  7. 4 or 5 or 6
  8. 3 and 7

A medical librarian designed the search strategies and performed these searches on June 2, 2020. Recorded results were uploaded to Rayyan, a systematic review screening platform, at which point duplicates were then removed.20 Following identical procedures, two investigators independently determined eligibility by screening titles and abstracts. When discrepancies emerged, investigators reached a consensus agreement.

Eligibility Criteria

Predetermined eligibility criteria served as the basis for study selection, described as follows: (1) systematic review with or without a meta-analysis; (2) related to the treatment of, screening for, or quality of life after testicular cancer; (3) subjects included only human biological males; and (4) written in English. These selection criteria were decided according to the definition of systematic reviews and meta-analysis derived from PRISMA.21

Training

Before initial title and abstract screening, two investigators completed online training, offered by Johns Hopkins and taught by Li and Dickersin on Coursera, on systematic reviews and meta-analyses.22 Investigators also participated in a two-consecutive day online/in-person training on the definition and interpretation of the nine most severe types of spin in systematic review abstracts, detailed by Yavchitz and colleagues.8 In addition, both investigators were trained on using A Measurement Tool to Assess Systematic Reviews (AMSTAR-2) to assess methodological quality.23 A detailed description and outline of the training regimen is included in our protocol.

Data Extraction

Independently, two reviewers extracted data in a masked fashion using a validated Google form. The reviewers evaluated the studies meeting inclusion criteria for the presence of the nine most severe types of spin in their abstracts. The definitions of spin (detailed in Table 1) originated in a study by Yavchitz and colleagues.8 Next, the reviewers rated the methodological quality of each publication for 16 distinct and specific criteria using AMSTAR-2.23

Table 1.Spin types and frequencies (%) in abstracts (n=50).
Nine most severe types of spin No. (%) of abstracts containing spin
1) Conclusion contains recommendations for clinical practice not supported by the findings. 3 (6)
2) Title claims or suggests a beneficial effect of the experimental intervention not supported by the findings. 0 (0)
3) Selective reporting of or overemphasis on efficacy outcomes or analysis favoring the beneficial effect of the experimental intervention. 9 (18)
4) Conclusion claims safety based on non-statistically significant results with a wide confidence interval. 2 (4)*
5) Conclusion claims the beneficial effect of the experimental treatment despite high risk of bias in primary studies. 3 (6)
6) Selective reporting of or overemphasis on harm outcomes or analysis favoring the safety of the experimental intervention. 2 (4)
7) Conclusion extrapolates the review’s findings to a different intervention (i.e., claiming efficacy of one specific intervention although the review covers a class of several interventions). 0 (0)
8) Conclusion extrapolates the review’s findings from a surrogate marker or a specific outcome to the global improvement of the disease. 1 (2)
9) Conclusion claims the beneficial effect of the experimental treatment despite reporting bias. 3 (6)

* 43 abstract conclusions did not mention safety

In previous studies, AMSTAR-2 inter-rater reliability was established to be moderate to high. Similarly, the original AMSTAR (r = 0.91) and the Risk of Bias in Systematic Reviews instrument (r = 0.84) were correlated with high construct validity coefficients.24 The quality of each included systematic review was rated as high, moderate, low, or critically low quality.

Using previous literature as guidance,25–27 the following specific study characteristics were also extracted: (1) type of intervention (surgery, pharmacologic, non-pharmacologic, combination, and other); (2) the date the review was received by the journal; (3) sources of funding for each review (industry, private, public, hospital, combination of funding not including industry, combination of funding including industry, other, none, and not mentioned); (4) whether the review adhered to PRISMA19 or PRISMA for abstracts28; (5) whether the journal requires PRISMA adherence; and (6) the journal’s 5-year impact factor. After completion of data extraction, both reviewers were unmasked, and any discrepancies in spin and AMSTAR-2 rating were resolved through discussion. In the event an agreement could not be reached, then a third author acted as a mediator to reach a consensus on spin and AMSTAR-2 rating.

Statistical Analysis

Descriptive statistics characterized the frequency of spin and its subtypes, and the results were reported as frequency counts and percentages. Stata 16.1 (StataCorp LLC, College Station, TX) was used for all analyses, as recorded in our protocol. An a priori power calculation suggested that 185 articles would be needed to identify associations of spin and study characteristic based on results from a previous study26; however, only 50 systematic reviews were available for analysis after screening. Thus, as a result of this limited number, we explored the associations between spin and study characteristics using chi-square tests and the Fisher’s exact test when 40% of the cell counts were less than 5.

Results

Our database search returned 900 articles; of them, 266 duplicates were removed, with an additional 554 articles excluded during the initial title and abstract screening phase (see Figure 2). During full-text analysis, another 30 articles were excluded. In total, 50 systematic reviews and meta-analyses from 40 unique journals were included for data extraction. The full details of exclusion can be found in Figure 2. Thirty-three (of 50, 66%) of our included articles were exclusively systematic reviews, while 17 (of 50, 33%) performed a meta-analysis within their systematic review. Each article title and its corresponding journal, as well as year of publication, can be found in the Supplement Table 1.

Figure 2
Figure 2.Flow diagram of study selection.

The most common intervention type was mixed (e.g., surgical and pharmacologic, 23/50, 46%), followed by non-pharmacological (10/50, 20%), pharmacological (8/50, 16%), surgery (6/50, 12%), and educational (3/50, 6%). Thirty-five systematic reviews did not state that they adhered to PRISMA guidelines (35/50, 70%; however, 26 journals recommended PRISMA adherence in their submission guidelines (26/50, 52%). Of the funded studies, public funding was the most common source (6/50, 12%). No systematic reviews received industry funding, and 12 systematic reviews did not receive any funding (12/50, 24%). Over half of the studies did not mention a funding source (28/50, 56%). The average 5-year journal impact factor was 6.258 (SD: 6.4, range: 0.818 to 28.349). Our results found no significant association between any of the included study characteristics and the presence of spin. Table 2 illustrates all study characteristics and the presence of spin within each of the characteristic categories.

Table 2.General characteristics of systematic reviews and meta-analyses.
Characteristics No. (%) of Articles (n=50)
Total (%) Abstract Contains No Spin Abstract Contains Spin P-value
Intervention Type 0.13a
Mixed 23 (46) 19 (38) 4 (8)
Non-pharmacologic 10 (20) 7 (14) 3 (6)
Pharmacologic 8 (16) 3 (6) 5 (10)
Surgery 6 (12) 3 (6) 3 (6)
Education 3 (6) 2 (4) 1 (2)
Study mentions adherence to PRISMA 0.15b
No 35 (70) 26 (52) 9 (18)
Yes 15 (30) 8 (16) 7 (14)
Publishing journal recommends adherence to PRISMA 0.85b
No 24 (48) 16 (32) 8 (16)
Yes 26 (52) 18 (36) 8 (16)
Funding Source 0.56a
Not Funded 12 (24) 6 (12) 6 (12)
Private 3 (6) 2 (4) 1 (2)
Combination of funding including industry 0 (0) 0 (0) 0 (0)
Combination of funding not including industry 1 (2) 1 (2) 0 (0)
Public 6 (12) 4 (8) 2 (4)
Not Mentioned 28 (56) 21 (42) 7 (14)
Industry 0 (0) 0 (0) 0 (0)
AMSTAR-2 Rating 1.0a
High 0 (0) 0 (0) 0 (0)
Moderate 3 (6) 2 (4) 1 (2)
Low 8 (16) 6 (12) 2 (4)
Critically Low 39 (78) 26 (52) 13 (26)

aFisher’s exact test
bPearson’s chi-squared test

Spin in Abstracts

Of the 50 systematic reviews and meta-analyses included in our investigation, 16 (32.0%) contained spin in the abstract. In total, 23 instances of spin were identified in our sample resulting from several abstracts containing more than one type of spin. Spin Type 3 (“selective reporting of or overemphasis on efficacy outcomes or analysis favoring the beneficial effect of the experimental intervention”)8 was the most prevalent spin type and occurred in nine abstracts (out of 50, 18%). However, because only seven abstract conclusions mentioned safety measures, Spin Type 4 had the highest occurrence frequency (2/7, 28.6%). The most severe type of spin, Type 1 (“conclusion contains recommendations for clinical practice not supported by the findings”),8 was present in 6% of the abstracts (3/50). No abstracts contained Spin Types 2 or 7 (see Table 1). There was no significant association between the presence of spin and the following study characteristics: the studies intervention type, whether a systematic review mentioned PRISMA adherence, whether the journal recommends adherence to PRISMA, or a systematic review funding source. The logistic regression analysis found no relationship between spin and a journal’s 5-year impact factor (OR: 1.01, 95% CI: 0.91-1.11; Table 2).

Methodological Quality Evaluation Using AMSTAR-2 Rating

Based on the AMSTAR-2 rating of the included 50 studies, three were rated as moderate quality (6%), eight were rated as low quality (16%), and 39 were rated as critically low quality (78%). No studies were appraised as high quality. There was no significant association between the methodological quality and the presence of spin in the abstract. All AMSTAR-2 data is presented in Table 3.

Table 3.AMSTAR-2 Items and Frequency of Responses.
AMSTAR-2 Item Response, N = 50 (%)
Yes No Partial Yes
1) Did the research questions and inclusion criteria for the review include the elements of PICO? 49 (98) 1 (2) 0 (0)
2) Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol? 4 (8) 36 (72) 10 (20)
3) Did the review authors explain their selection of the study designs for inclusion in the review? 18 (36) 32 (64) 0 (0)
4) Did the review authors use a comprehensive literature search strategy? 1 (2) 19 (38) 30 (60)
5) Did the review authors perform study selection in duplicate? 22 (44) 28 (56) 0 (0)
6) Did the review authors perform data extraction in duplicate? 11 (22) 39 (78) 0 (0)
7) Did the review authors provide a list of excluded studies and justify the exclusions? 1 (2) 30 (60) 19 (38)
8) Did the review authors describe the included studies in adequate detail? 12 (24) 6 (12) 32 (64)
9) Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) in individual studies that were included in the review? 3 (6) 15 (30) 2 (4)
10) Did the review authors report on the sources of funding for the studies included in the review? 4 (8) 46 (92) 0 (0)
11) If meta-analysis was performed, did the review authors use appropriate methods for statistical combination of results? 2 (4)* 1 (2)* 0 (0)
12) If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis? 0 (0)* 3 (6)* 0 (0)
13) Did the review authors account for RoB in primary studies when interpreting/discussing the results of the review? 12 (24) 38 (76) 0 (0)
14) Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review? 17 (34) 33 (66) 0 (0)
15) If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review? 0 (0)* 3 (6)* 0 (0)
16) Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review 35 (70) 15 (30) 0 (0)

* 33 articles did not perform a meta-analysis.

Discussion

Our results indicated that spin was present in approximately one-third of testicular cancer systematic reviews’ or meta-analyses’ abstracts (16/50, 32%). Our findings are similar to two recent studies from our research team on spin in systematic review abstracts pertaining to erectile dysfunction and acne vulgaris. Reddy et al., focused on erectile dysfunction and reported that 31.4% (32/102) of systematic reviews’ or meta-analyses’ abstracts contained spin.29 A study conducted by Ottwell et al. on systematic reviews and meta-analyses relating to acne vulgaris reported that 31% (11/36) of abstracts contained spin.26

In both our study and studies by Reddy et al. and Ottwell et al., the third most severe type of spin — “selective reporting of or overemphasis on efficacy outcomes or analysis favoring the beneficial effect of the experimental intervention” — was the most prevalent form of spin.8,26,29 Spin Type 3 also appeared in a systematic review evaluating cisplatin as a treatment for testicular germ cell tumors (TGCT).30 The article’s objective was to provide evidence for including cisplatin in the World Health Organization (WHO) Essential Medication List (EML). In the abstract, the authors acknowledge some safety outcomes, while failing to fully report on the “severe gastrointestinal toxicity” mentioned in the results section.30 This example of spin shows how Type 3 can alter the perception of an intervention’s usefulness and emphasize its efficacy while minimizing the consideration of known adverse effects.

Another article in our sample highlights Spin Type 4 — “conclusion claims safety based on non-statistically significant results with a wide confidence interval.”8 This type of spin can be found in a systematic review analyzing the evidence for the use of testis-sparing surgery (TSS).31 The authors evaluated retrospective outcome studies and case reports for data on “operative technique, indications, complications, and oncologic and functional outcome.”31 In the abstract, they concluded that “TSS can be safely adopted for the treatment of carefully selected cases of tumours of different histology.”31 However, they did not statistically analyze safety, thereby committing Spin Type 4.

AMSTAR-2 was created as a systematic review appraisal tool applicable to both randomized and nonrandomized healthcare interventions.23 The 16-item appraisal instrument provides an overall confidence rating of high, moderate, low, or critically low.23 Our study showed that the majority of systematic reviews and meta-analyses addressing the treatment of, screening for, or quality of life after testicular cancer were rated as critically low (78%). In addition, no systematic reviews were rated as high. Poor reporting quality of systematic reviews affects the extent to which physicians should extrapolate the reviews’ findings. If these reviews were to be incorporated into clinical practice guidelines, such as the American Urological Association’s guideline for the diagnosis and treatment of early-stage testicular cancer, patient treatments would be based on reviews with low to critically low reporting quality.

While the implications and prevalence of spin in testicular cancer systematic reviews has not been previously studied, the presence of spin has been shown to alter clinicians’ interpretations of results.7 The presence of spin in abstracts has been shown to result in better study ratings and increased interest in studies among clinicians despite statistically nonsignificant results, therefore creating the potential to alter clinical practice.7 Additionally, one study examined the impact of positive-outcome bias in peer-reviewing RCTs using two manuscripts with identical methods differing only in reported outcomes, with one positive-outcome version and one no-difference version.32 They found that reviewers were more likely to recommend the positive-outcome version for publication compared to the no-difference version, detect more errors in the no-difference version, and award higher methods scores in the positive-outcome version.32 The evidence of positive-outcome manuscripts being favored for publication, combined with the presence of spin disproportionately highlighting beneficial effects in some abstracts, shows the need for improved reporting.

Our study contributes knowledge and awareness of spin to the field of testicular cancer research and suggests that improved reporting would help eliminate the negative implications spin can have on clinical practice. Our results indicate that, even with many guidelines available, reporting quality of testicular cancer systematic reviews remains low. Fortunately, some steps can be taken by both authors and journals to enhance the reporting quality of systematic reviews and meta-analyses. We recommend an update to PRISMA-A to address spin. In addition, authors should compare their manuscript against AMSTAR-2 before submission to improve its quality before peer-review. Finally, journals should recommend that authors and peer reviewers participate in structured training on reporting guidelines and spin in abstracts.

Strengths and Limitations

The strengths of our review include the following: all data extractors underwent multiple forms of training methods; masked, double-data extraction was performed for screening and all data extraction as currently recommended by the Cochrane Collaboration33; we published our protocol and training materials on Open Science Framework to improve the duplicability of our study; and we developed our protocol a priori and maintained strict adherence by documenting any modifications to or variations from this protocol carefully in a protocol update. Additionally, in an effort to increase the reproducibility of our results, all analyses were reproduced by an independent group of statisticians. Despite this study’s strengths, it also has limitations. Classification of spin in abstracts is subjective, and others may disagree with our classifications. Efforts to mitigate this subjectivity were made by participating in structured training and comprehension assessments to more consistently evaluate spin. Similarly, the AMSTAR-2 classification was also subjective, and comprehensive training was used to improve the concordance of the investigators’ interpretations. Furthermore, we limited our review to only the nine most severe forms of spin as defined by Yavchitz and colleagues.8 Not including all of the forms of spin could underreport spin in the included abstracts. Our search was also limited due to inaccessible and non-English articles. Our search included the two largest bibliographic databases, MEDLINE and Embase. Finally, this review is a cross-sectional analysis, and the results should be interpreted as such.

Conclusion

Our findings illustrate the need for advancements to improve the reporting quality of systematic reviews related to testicular cancer interventions. We found that nearly one-third of abstracts of testicular cancer systematic reviews and meta-analyses contain at least one type of spin and that over three-fourths of the AMSTAR-2 ratings were critically low. Efforts should be taken to strengthen the review process and improve the quality of information conveyed in these reviews. A number of strategies exist that could improve spin reporting. As journals ultimately publish these studies, they hold the authority to institute reporting requirements for authors. These requirements could include checking for spin in the abstracts of systematic reviews prior to submission. Peer reviewers could likewise be instructed to evaluate the abstracts for the potential of spin. We are not aware of any training initiatives for authors or peer reviewers that directly address spin, yet these opportunities should be considered. Finally, the PRISMA guidelines are the current gold standard for reporting systematic reviews and, as authors continue to adapt and improve PRISMA guidelines, they should consider how to address spin in abstracts.


Disclosure statement

One author has received grants from the National Institutes for Justice not related to the current work.

Another author has received grants from the National Institute on Drug Abuse, the National Institute on Alcohol Abuse and Alcoholism, the US Office of Research Integrity, Oklahoma Center for Advancement of Science and Technology, and internal grants from Oklahoma State University Center for Health Sciences not related to the present work.

All remaining authors of this study declare no conflicts of interest.