Application of Machine Learning Algorithms for Risk Stratification and Efficacy Evaluation in Cervical Cancer Screening Among the ASCUS/LSIL Population: Evidence from the Korean HPV Cohort Study

Article information

J Korean Cancer Assoc. 2024;.crt.2024.465
Publication date (electronic) : 2024 September 6
doi : https://doi.org/10.4143/crt.2024.465
1Department of Obstetrics and Gynecology, Incheon St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
2Department of Obstetrics and Gynecology, Yeouido St. Mary’s Hospital, The Catholic University of Korea, Seoul, Korea
3Department of Statistics, Columbia University, New York, NY, USA
4Division of Clinical Research, Center for Emerging Virus Research, National Institute of Infectious Diseases, Korea National Institute of Health, Cheongju, Korea
5Department of Obstetrics and Gynecology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
6Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul, Korea
Correspondence: Soo Young Hur, Department of Obstetrics and Gynecology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea 222, Banpo-daero, Seocho-gu, Seoul 06591, Korea Tel: 82-2-2258-2721 Fax: 82-2-595-1549 E-mail: hursy@catholic.ac.kr
Co-correspondence: Youn Jin Choi, Department of Obstetrics and Gynecology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, Korea Tel: 82-2-2258-2721 Fax: 82-2-595-1549 E-mail: yunno@catholic.ac.kr
*Youn Jin Choi and Soo Young Hur contributed equally to this work.
Received 2024 May 15; Accepted 2024 September 5.

Abstract

Purpose

We assessed human papillomavirus (HPV) genotype-based risk stratification and the efficacy of cytology testing for cervical cancer screening in patients with atypical squamous cells of undetermined significance (ASCUS)/low-grade squamous intraepithelial lesion (LSIL).

Materials and Methods

Between 2010 and 2021, we monitored 1,273 HPV-positive women with ASCUS/LSIL every 6 months for up to 60 months. HPV infections were categorized as persistent (HPV positivity consistently observed post-enrollment), negative (HPV negativity consistently observed post-enrollment), or non-persistent (neither consistently positive nor negative). HPV genotypes were grouped into high-risk (Hr) groups 1 (types 16, 18, 31, 33, 45, 52, and 58) and 2 (types 35, 39, 51, 56, 59, 66, and 68) and a low-risk group. Hr1 was subdivided into types (a) 16 and 18; (b) 31, 33, and 45; and (c) 52 and 58. Cox regression and machine learning (ML) algorithms were used to analyze progression rates.

Results

Among 1,273 participants, 17.6% with persistent HPV infections experienced disease progression versus no progression in the HPV-negative group (p < 0.001). Cox analysis revealed the highest hazard ratios (HRs) for Hr1-a (11.6, p < 0.001), followed by Hr1-b (9.26, p < 0.001) and Hr1-c (7.21, p < 0.001). HRs peaked at 12-24 months, with Hr1-a maintaining significance at 24-36 months (10.7, p=0.034). ML analysis identified the final cytology change pattern as the most significant factor, with 14-15 months the optimal time for detecting progression from the first examination.

Conclusion

In ASCUS/LSIL cases, follow-up strategies should be based on HPV risk types. Annual follow-up was the most effective monitoring for detecting progression/regression.

Introduction

Cervical cancer is the fourth most common cancer in terms of incidence and the fourth deadliest cancer in women with an estimated 660,000 new cases and 350,000 deaths worldwide in 2022 [1]. In South Korea, the age-standardized incidence rate per 100,000 persons for cervical cancer was 3.7 in 2020 [2]. For the early detection of uterine cervical abnormalities, the Korean National Health Insurance Service provides biannual cytology tests for individuals over 20 years of age and has offered human papillomavirus (HPV) vaccination for girls 12 years of age and older since 2016 [3]. This cancer screening and prevention program has likely contributed to the decrease in cervical cancer incidence, from 8.6 per 100,000 in 1999 to 3.7 per 100,000 in 2020 [2]. Compared to the United States in 2015-2020, the incidence of cervical cancer in Korea is notably lower (7.7 per 100,000) [4]. However, precancerous lesions have increased in Korea from 17,651 in 2018 to 20,910 in 2021 [5]. Therefore, the cervical cancer screening program in South Korea is still critical and should potentially be revised to decrease the frequency of precancerous cervical cancer lesions.

In the HPV-positive population with atypical squamous cells of undetermined significance (ASCUS), the immediate cervical intraepithelial neoplasm (CIN) 3+ risk was 4.2% globally [6]. Therefore, repeat HPV testing or co-testing at 1 year is recommended for patients with minor screening abnormalities indicating HPV infection with a low risk of underlying CIN3+ (e.g., HPV-positive, low-grade cytological abnormalities after a documented negative screening HPV test or co-test) [6]. In comparison, the 2-year cumulative incidence of CIN3+ in the HPV-positive population with ASCUS in Japan was 17.5% [7]. In a previous Korean HPV cohort study from 2012 to 2017, a cumulative incidence of CIN2+ of 7.1% was determined [8]. The progression rate is higher in Korea and Asian countries than in the United States, and cytology with HPV testing is regularly followed up every 6 months in accordance with the Korean Society of Gynecologic Oncology’s recommendations [9].

The latest 2024 American Society for Colposcopy and Cervical Pathology (ASCCP) guidelines have introduced more detailed classifications for high-risk HPV types and now cover 14 to 20 genotypes [10], as opposed to previous guidelines that categorized HPV types merely as HPV 16 and 18 versus others [6,11]. These guidelines reflect an emerging consensus on the importance of distinguishing between high-risk HPV genotypes, a perspective supported by earlier studies. Research has indicated that HPV 58 could pose a cancer risk comparable to that of HPV 16 [12], suggesting a need for a re-evaluation of how non-HPV 16 and 18 types are assessed for cancer risk.

Machine learning (ML) was first introduced in 1956 and is widely used to assist in providing an accurate analysis of clinical findings and treatment decision-making [13]. In particular, ML algorithms are capable of repeating the same analysis using more than 100 different types of variables and can thus help to find more specific results, especially in the analysis of medical findings or clinical practice [14]. In addition, ML is more effective than conventional survival analysis because the latter can only handle low-dimensional data and faces problems in identifying non-linear associations and complex relationships between covariates and survival time [15]. Therefore, we evaluated the risk stratification of HPV types using a Korean HPV cohort as an ML algorithm to determine more suitable guidelines for ASCUS/low-grade squamous intraepithelial lesions (LSIL) and to additionally pre-evaluate the efficacy of HPV testing as part of a complete screening test.

Materials and Methods

1. Design of the Korea HPV cohort study

The Korea HPV Cohort Study, which received funding from the Korea Disease Control and Prevention Agency, took place between April 2010 and September 2021. This multicenter study, carried out in the obstetrics departments of eight general hospitals in Korea, aimed to identify the risk factors associated with the progression of cervical disease, up to the high-grade squamous intraepithelial lesion (HSIL) stage, in HPV-infected adult Korean women. Eligible participants were Korean women aged 20-60 years who tested positive for HPV DNA, irrespective of genotype, and had a diagnosis of ASCUS or LSIL through cytology testing. Prior to enrollment, all participants provided written informed consent. Approval was obtained from the Institutional Review Boards of all eight hospitals involved in the study. Throughout the study period, the enrolled patients underwent HPV DNA testing and cytology every 6 months, and data were recorded using an electronic case report form on each occasion [16].

2. Eligibility criteria and definitions of the HPV infection pattern and cytology change

From April 2010 to September 2021, during the last enrollment period of the Korean HPV Cohort Study, only those women who were diagnosed with ASCUS or LSIL and confirmed to have HPV infection in an external test and who agreed to participate in the cohort study were included. They underwent cytological and HPV DNA testing every 6 months after enrollment with collection of biological samples. Only those who were followed up at least twice after the initial examination were included in the study results. The criterion for disease “progression” was established as a diagnosis of CIN2+, confirmed with biopsy. After confirmation of progression, these participants were excluded from the cohort follow-up and suggested treatment. We also excluded women who didn’t have sufficient biopsy results or didn’t match the initial inclusion criteria of cytology (Fig. 1).

Fig. 1.

Study design. ASCUS, atypical squamous cells of undetermined significance; CIN, cervical intraepithelial neoplasm; HPV, human papillomavirus; LSIL, low-grade squamous intraepithelial lesion.

To evaluate risk factors for progression on biopsy, HPV infection patterns were first subdivided into three groups: (1) HPV-persistent, (2) HPV-negative, and (3) HPV–non-persistent. HPV-persistent infection was identified when the HPV test remained positive in two or more successive evaluations [12,16]. The HPV-negative group included those who showed HPV infection regression within 6 months post-enrollment. The HPV–non-persistent infection group included individuals who did not fit into either the HPV-persistent or HPV-negative categories (Fig. 2) [16].

Fig. 2.

Definition of the types of human papillomavirus (HPV) infection. Modified from Park et al. J Gynecol Oncol. 2019;30:e50, with permission of Korean Society of Gynecologic Oncology [8].

Second, the HPV genotype was divided as follows. Fourteen HPV genotypes—HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, and 68—are considered pathogenic or “high-risk” (Hr) for causing the development of cervical cancer [17,18]. In a large retrospective cross-sectional worldwide study, the most common HPV types were 16, 18, 31, 33, 35, 45, 52, and 58, with a combined worldwide relative contribution of 8,196 of 8,977 cases (91%; 95% confidence interval, 90 to 92) [19]. In addition, the most common HPV types for HSIL in Korea are 16, 58, 18, and 52 [12]. As a result, we classified HPV types into the following categories: Hr1 (16, 18, 31, 33, 45, 52, and 58), Hr2 (35, 39, 51, 56, 59, 66, and 68), and low risk (any virus type that is not included in either of the preceding two groups). Furthermore, to analyze specific progression risks, Hr1 was subdivided into types (a) 16 and 18; (b) 31, 33, and 45; and (c) 52 and 58 (Fig. 3).

Fig. 3.

Human papillomavirus subgroups as genotypes. Hr, high-risk; Lr, low-risk.

Lastly, the HPV cohort was grouped by change pattern of cytology: (1) regression: no cytological abnormality; (2) persistent: no change in the cytology result (ASCUS/LSIL); and (3) progression: change to HSIL, atypical squamous cells cannot exclude high-grade squamous intraepithelial lesion, or malignancy in the cytology result.

3. Statistical analysis

Based on HPV infection patterns, participants were divided into three groups: persistent infection, HPV-negative group, and non-persistent infection. Age, BMI, disease progression, observation duration, initial cytology, cytology pattern, number of HPV infections, presence of multiple HPV infections, history of sexually transmitted disease (STD) infections, HPV prophylactic vaccination, pregnancy, and smoking, 1st coitus age, and number of sex partner were compared between the three groups using ANOVA or Kruskal-Wallis test for continuous variables and chi-squared test for categorical variable. To evaluate the detection efficacy for CIN2+ lesions, univariate Cox analysis was conducted for each HPV group in specific periods. Multivariate Cox analysis was repeated not only for the entire period, but also by dividing the cohort into HPV-persistent infection and HPV-non-persistent infection groups. Since there were no progression cases in the HPV-negative group, it could not be used in the analysis, so the reference group was changed to HPV-non-persistent group in the survival analysis. The independent variables included categorical variables such as HPV infected number group, type of HPV infection, multiple HPV infections, cytology pattern, and STD infection history, as well as continuous variables such as age and body mass index (BMI). Stata 17.0 (Stata Corp.) and 4.3.1 (R Core Team) were used as the statistical software, and a p-value less than 0.05 was considered statistically significant for all variables.

For ML analysis, a gradient boosting, random survival forest, and random forest model were used to identify important factors related to the disease progression rate. Initial experiments were conducted using the AutoML method, which is a common approach for automatically selecting, training, and tuning models. The grid search method was used to tune the hyperparameters. To determine the performance metrics for each model, five separate training sessions were conducted, each utilizing 5-fold cross-validation. We extracted variable importance and created heatmaps to analyze how each variable influenced the results across various model types.

To determine the period with the highest progression rate, we created 1,000 models each for the random forest and gradient boosting models using training data by the bootstrapping method (to estimate the sampling distribution of a statistic, even without knowing the true distribution). The area under the curve (AUC) for each month was then calculated using each data set, followed by calculation of the monthly increase in the AUC (average AUC of the following month−average AUC of the previous month). Statistical testing was performed using bootstrapping to determine whether the increase in the AUC for any period was significantly greater than for other periods. The conclusion was that if the increase in the AUC for all values exceeded an average of 0.95 (i.e., a significance level of 5%), it was statistically significantly greater than the increases in other periods. All ML procedures were performed in Python and using the H2O. ai API.

Results

Out of a total of 1,273 participants, 98 (7.7%) had progressive disease (classified as CIN2+). In addition, there were 266 patients in the HPV-persistent group, 49 in the HPV-negative group, and 958 in the HPV–non-persistent group. Among the HPV-persistent group, 17.6% (47 patients) had disease progression (CIN2+) while 5.3% (51 patients) had progression in the HPV–non-persistent group (Table 1). The disease progression rate for persistent Hr1 infection was 48.9%, which was significantly higher than the non-persistent Hr1 infection rate (41.0%) and lower than that of the HPV-negative group (no progression) (p < 0.001) (Table 1). The highest rates of progression in cytology (p < 0.001) and STD infection history (p=0.002) were in the persistent infection group. Multiple HPV infection was significantly more common in the persistent infection group than in the incidental or regression groups (p < 0.001). BMI was significantly lower in the HPV-negative group than in the HPV-persistent and HPV–non-persistent groups (p=0.004). However, there were no significant differences by HPV infection pattern in the ratio of initial cytology values (Table 1), history of prophylactic HPV vaccination, age at first coitus, and history of smoking (S1 Table).

Basic characteristics of all participants

To evaluate HPV risk factors, univariable Cox analysis of 1-year progression was conducted. In the interval analysis, the 12-24-month period showed the highest hazard ratios (HRs): 32.6 for Hr1-a, 16.0 for Hr1-b, and 8.85 for Hr1-c (all p < 0.05). Notably, Hr1-a consistently showed the highest HR in the 24-36-month period of 10.7 (p=0.034) (Table 2). Multivariable Cox analysis was conducted for 60-month progression (Table 3). The HPV infected number group and cytological pattern showed significant HRs for progression. In particular, both Hr1-a and Hr1-b had similar HRs (3.20 for Hr1-a and 3.53 for Hr1-b, all p < 0.05). Moreover, cytological progression were strong risk factors for progression. Subgroup analysis was conducted for the HPV-persistent and –non-persistent groups. Hr1-a had the only significant HR (2.38, p=0.006) in the HPV-persistent group (Fig. 4A) and all Hr1 subgroups were significant in the HPV–non-persistent group, with the highest HR in the Hr1-c group (Fig. 4B).

Univariable Cox analysis by period and HPV infection group

Multivariable Cox analysis at 60 months follow-up

Fig. 4.

Multivariable Cox analysis by human papillomavirus (HPV) infection type (hazard ratios and 95% confidence intervals were adjusted by diagnostic age, body mass index (BMI), multiple HPV infection, and sexually transmitted disease infection history. (A) HPV-persistent infection. (B) HPV–non-persistent infection. Hr1, HPV 16, 18, 31, 33, 45, 52, 58; Hr2, HPV 35, 39, 51, 56, 59, 66, 68; Lr, low risk HPV.

The ML tools random survival forest, gradient boosting, and random forest were used to evaluate important factors for progression and the effective follow-up interval by comparison to the multivariable Cox analysis results. In the analysis, the gradient boosting model emerged as the most effective method, achieving an AUC of 0.942. The random survival forest model also demonstrated strong performance, with an AUC of 0.915. In contrast, the random forest model achieved an AUC of just 0.5. Consequently, the gradient boosting model and random survival forest model were chosen for use in this analysis.

In terms of importance for disease progression to CIN2+ predicted using the ML algorithm, the cytology pattern was the most important factor, followed (in order) by HPV infection type (persistent or not) and HPV number group (e.g., Hr1, Hr2) (Fig. 5A). In addition, the increase ratio of the monthly AUC obtained through the bootstrapping method was found to be highest when moving from 14 to 15 months with the gradient boosting model, showing an AUC of 0.97 (Fig. 5B), and from 11 to 12 months in the random forest model, showing an AUC of 0.95 (S2 Fig.).

Fig. 5.

Analysis of gradient boosting model. (A) Importance of prediction for progression. (B) Average increase in the area under the curve per time interval. HPV, human papillomavirus.

Discussion

In Korea, cervical cancer screening has been conducted using cytology since 1999. Currently, women aged 20 years and older are recommended to undergo screening every 2 years. However, there is a lack of studies on the efficacy of cervical cancer screening through HPV risk stratification or cytology, not only in the ASCUS/LSIL population but also in the general populace in Korea. Accordingly, the present study has value in having identified the risk stratification based on HPV type, determined the follow-up duration based on HPV type, and re-evaluated all results using ML algorithms to confirm the importance of cytology. Even though all results were based on the low-grade abnormal cytology population, the findings can still guide changes in cervical cancer screening.

According to the ASCUS-LSIL Triage Study (ALTS) study, the total 2-year cumulative incidence of CIN2+ was 15.4% (CIN2, 6.7%; CIN3, 8.8%) [20] and only 5.3% of the overall CIN3 population was found to be high-risk HPV-negative [21]. In this study, CIN2+ was defined as progression and the total disease progression (CIN2+) rate by biopsy was 7.7% (mean survival time, 1.95 years), even in the initial HPV-positive population, a lower value than that of the ALTS study.

In this study, the progression group had a higher proportion of high-risk HPV-persistent infection (70%) than the non-progression group (11%). Furthermore, no progression was seen in the HPV-negative group. Many studies have defined HPV infections as persistent if HPV is detected on two consecutive follow-up visits 4-6 months apart [22], as in the present study. However, some studies have shown a similar average time to clearance. In one previous study, HPV 16 had a particularly long time to clearance (mean duration, 18.3 months) compared with other HPV types [23]. However, high-risk and low-risk HPV types can be detected for similar clearance periods [24]. Therefore, persistent infection with high-risk HPV was most frequently a major contributing factor to cervical cancer [25]. This was why we stratified the population based on HPV progression risk.

In this study, the Hr1 group was more likely to show disease progression than the Hr2 group during 60 months of follow-up. In the Hr1 group, Hr1-a and -b had the highest HRs until 24 months; in particular, Hr1-a was maintained until 24-36 months (Table 2). However, based on multivariable Cox analysis at 60 months, Hr1-b (HPV 31, 33, and 45) had the same HPV risk as Hr1-a (Table 3). However, only Hr1-a showed a meaningful HR in the HPV-persistent infection group, and all Hr1 groups resulted in HPV–non-persistent infection. Hr2 did not reach significance in all groups. Globally, after HPV 16 or 18, the virus type most common associated with disease progression was HPV 45, and HPV 16 or 18 was related to almost 90% of cervical cancer progression cases [19]. However, the 36-month follow-up data from the HPV Cohort Study showed that HPV 16 and HPV 58 have similarly high HRs [12]. These factors are in accordance with the new guidelines from the ASCCP that redefined the carcinogenic stains of HPV as HPV 16, HPV 18/45, and HPV 16-related types (33, 31, 52, 58, and 35). Our study results are in line with the global result [10].

In this study, we used ML algorithms to further analyze factors related to disease progression. The most effective ML tools for predicting disease progression were the gradient boosting model and the random forest survival model, both of which demonstrated high AUCs (greater than 0.9). In the analysis of the importance of factors related to disease progression, the most significant factor was progression of the cytology pattern (34% in the gradient boosting model), and the second and third most important factors were HPV infection type (3.2%) and number group (2.2%), with the same order of importance found in the random forest survival model. These ML results differ from what has been reported in the literature, where simple cytology was found to be less sensitive in diagnosing disease progression compared with HPV testing [26]. However, based on this multivariable and ML analysis, both HPV testing and cytology are important factors for detecting progression in the ASCUS/LSIL population.

According to univariable analysis in each 12-month period, the highest AUC in all types of HPV was at the 12-24-month interval. In ML analysis for the time-dependent AUC of disease progression, the highest mean AUC ratio per month was at 14-15 months in the gradient boosting model and 11-12 months in the random forest survival model. In another study, the median period of progression to CIN2+ from ASCUS/LSIL was 1.95 years [27]. That mirrors the American Cancer Society guideline recommendations, in which individuals with low-risk abnormal cytology with HPV infection are recommended to undergo annual cytology with HPV testing for 2 years [28]. Based on other studies and the present result, we believe that the effective follow-up period is 1 year, which is longer than the conventional recommendation of the Korean Society of Gynecologic Oncology [9]. Notably, our ML results were based on a bootstrapping method to reduce selection bias and used a more accurate ML method compared to conventional survival analysis, ensuring the reliability of the results. This was the strength of our study.

There are several limitations to the present work. First, we were unable to prove a relationship between prophylactic HPV vaccination and the progression rate. In Korea, prophylactic HPV vaccination has been available since 2007, but national vaccination programs for adolescents started in 2016, targeting individuals aged 12-15 years. Consequently, it is highly likely that the impact of vaccination was not captured in the findings of this study. However, other research using HPV Cohort Study data has reported lower rates of HPV 16 and 18 infections among women who received the vaccine [27]. This difference was due to differences in the definition of progression (biopsy only) and the use of strict inclusion criteria, with only participants who underwent at least two follow-up studies included. Second, multiple HPV infection was noted to be one of the risk factors for disease progression in several studies [29,30]. However, multiple infections did not reach statistical significance in multivariable analysis, even though almost 1,300 participants were included in this study. Third, except in the normal cytology and HPV-negative groups, only HPV-positive individuals with ASCUS/LSIL were included in this study. Therefore, there is a bias in the study as it does not reflect the characteristics of the general population, and the Cox analysis was conducted only on the HPV-persistent and non-persistent groups, excluding the HPV-negative group. Finally, the analysis in each follow-up period was conducted using only univariable analysis due to a lower number of participants in each period. However, this approach appeared to be sufficient to verify the duration of the effective follow-up time for each type of HPV. Regardless, the present study presents an HPV risk stratification strategy with follow-up using Korean data and provides a direction for future large-scale research and, despite the limitations, can be considered to have sufficient clinical significance.

Annual follow-ups are essential for monitoring the progression and regression of HPV in Korean patients with ASUCS/LSIL. The key predictors of disease progression include the persistence of specific high-risk HPV types, especially HPV 16, 18, 31, 33, 45, 52, and 58, and progression of cytology. While progression is typically detected within 2 years, individuals with HPV 16 or 18, regardless of signs of progression, should be more closely followed up due to their significant risk levels.

Electronic Supplementary Material

Supplementary materials are available at Cancer Research and Treatment website (https://www.e-crt.org).

Notes

Ethical Statement

This study was approved by the relevant Institutional Review Board (XC23ZIDI0039) and adhered to the principles of the Declaration of Helsinki. A waiver to require informed consent was obtained.

Author Contributions

Conceived and designed the analysis: Hur SY, Choi YJ.

Collected the data: Song H, Lee HY, Seong J.

Contributed data or analysis tools: Oh SA, Seong J.

Performed the analysis: Song H, Oh SA.

Wrote the paper: Song H, Choi YJ.

Review and Interpretation: Hur SY.

Conflict of Interest

Conflict of interest relevant to this article was not reported.

Funding

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (Ministry of Science and ICT) (No. NRF2021R1A2C2007425).

References

1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229–63.
2. Kang MJ, Jung KW, Bang SH, Choi SH, Park EH, Yun EH, et al. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2020. Cancer Res Treat 2023;55:385–99.
3. Korea Centers for Disease Control and Prevention. Guidelines for the National Immunization Program [Internet]. Korea Centers for Disease Control and Prevention; c2009. [cited 2023 Dec 31]. Available from: https://health.kdca.go.kr/healthinfo/biz/health/gnrlzHealthInfo/gnrlzHealthInfo/gnrlzHealthInfoView.do.
4. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin 2023;73:17–48.
6. Perkins RB, Guido RS, Castle PE, Chelmow D, Einstein MH, Garcia F, et al. 2019 ASCCP risk-based management consensus guidelines for abnormal cervical cancer screening tests and cancer precursors. J Low Genit Tract Dis 2020;24:102–31.
7. Aoki ES, Saika K, Kiguchi K, Morisada T, Aoki D. Validation of HPV triage in cytology-based cervical cancer screening for ASC-US cases using Japanese data. J Gynecol Oncol 2023;34e14.
8. Park Y, Kim TJ, Hwang CS, Cho CH, Jeong DH, Seong SJ, et al. Risk of cervical dysplasia among human papillomavirus-infected women in Korea: a multicenter prospective study. J Gynecol Oncol 2019;30e50.
9. Practice guidelines for the early detection of cervical cancer [Internet]. Korean Society of Gynecologic Oncology; 2021. [cited 2024 Jan 11]. Available from: https://cdn.medsoft.co.kr/201/date/j_02.pdf.
10. Cervical cancer screening. IARC handbooks of cancer prevention. Vol. 18 IARC Press; 2022.
11. Ronco G, Dillner J, Elfstrom KM, Tunesi S, Snijders PJ, Arbyn M, et al. Efficacy of HPV-based screening for prevention of invasive cervical cancer: follow-up of four European randomised controlled trials. Lancet 2014;383:524–32.
12. Seong J, Ryou S, Lee J, Yoo M, Hur S, Choi BS, et al. Enhanced disease progression due to persistent HPV-16/58 infections in Korean women: a systematic review and the Korea HPV cohort study. Virol J 2021;18:188.
13. Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol 2020;9:14.
14. Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med 2023;388:1201–8.
15. Deepa P, Gunavathi C. A systematic review on machine learning and deep learning techniques in cancer survival prediction. Prog Biophys Mol Biol 2022;174:62–71.
16. Lee WC, Lee SY, Koo YJ, Kim TJ, Hur SY, Hong SR, et al. Establishment of a Korea HPV cohort study. J Gynecol Oncol 2013;24:59–65.
17. Kjaer SK, van den Brule AJ, Paull G, Svare EI, Sherman ME, Thomsen BL, et al. Type specific persistence of high risk human papillomavirus (HPV) as indicator of high grade cervical squamous intraepithelial lesions in young women: population based prospective follow up study. BMJ 2002;325:572.
18. Schiffman M, Castle PE, Jeronimo J, Rodriguez AC, Wacholder S. Human papillomavirus and cervical cancer. Lancet 2007;370:890–907.
19. de Sanjose S, Quint WG, Alemany L, Geraets DT, Klaustermeier JE, Lloveras B, et al. Human papillomavirus genotype attribution in invasive cervical cancer: a retrospective cross-sectional worldwide study. Lancet Oncol 2010;11:1048–56.
20. Ascus-Lsil Traige Study Group. Results of a randomized trial on the management of cytology interpretations of atypical squamous cells of undetermined significance. Am J Obstet Gynecol 2003;188:1383–92.
21. Castle PE, Cox JT, Jeronimo J, Solomon D, Wheeler CM, Gravitt PE, et al. An analysis of high-risk human papillomavirus DNA-negative cervical precancers in the ASCUS-LSIL Triage Study (ALTS). Obstet Gynecol 2008;111:847–56.
22. Baseman JG, Koutsky LA. The epidemiology of human papillomavirus infections. J Clin Virol 2005;32 Suppl 1:S16–24.
23. Richardson H, Kelsall G, Tellier P, Voyer H, Abrahamowicz M, Ferenczy A, et al. The natural history of type-specific human papillomavirus infections in female university students. Cancer Epidemiol Biomarkers Prev 2003;12:485–90.
24. Franco EL, Villa LL, Sobrinho JP, Prado JM, Rousseau MC, Desy M, et al. Epidemiology of acquisition and clearance of cervical human papillomavirus infection in women from a high-risk area for cervical cancer. J Infect Dis 1999;180:1415–23.
25. Koshiol J, Lindsay L, Pimenta JM, Poole C, Jenkins D, Smith JS. Persistent human papillomavirus infection and cervical neoplasia: a systematic review and meta-analysis. Am J Epidemiol 2008;168:123–37.
26. Ogilvie GS, van Niekerk D, Krajden M, Smith LW, Cook D, Gondara L, et al. Effect of screening with primary cervical HPV testing vs cytology testing on high-grade cervical intraepithelial neoplasia at 48 months: the HPV FOCAL randomized clinical trial. JAMA 2018;320:43–52.
27. Seong J, Ryou S, Yoo M, Lee J, Kim K, Jee Y, et al. Status of HPV vaccination among HPV-infected women aged 20-60 years with abnormal cervical cytology in South Korea: a multicenter, retrospective study. J Gynecol Oncol 2020;31e4.
28. Fontham ET, Wolf AM, Church TR, Etzioni R, Flowers CR, Herzig A, et al. Cervical cancer screening for individuals at average risk: 2020 guideline update from the American Cancer Society. CA Cancer J Clin 2020;70:321–46.
29. Moscicki AB, Shiboski S, Hills NK, Powell KJ, Jay N, Hanson EN, et al. Regression of low-grade squamous intra-epithelial lesions in young women. Lancet 2004;364:1678–83.
30. Guan P, Howell-Jones R, Li N, Bruni L, de Sanjose S, Franceschi S, et al. Human papillomavirus types in 115,789 HPV-positive women: a meta-analysis from cervical infection to cancer. Int J Cancer 2012;131:2349–59.

Article information Continued

Fig. 1.

Study design. ASCUS, atypical squamous cells of undetermined significance; CIN, cervical intraepithelial neoplasm; HPV, human papillomavirus; LSIL, low-grade squamous intraepithelial lesion.

Fig. 2.

Definition of the types of human papillomavirus (HPV) infection. Modified from Park et al. J Gynecol Oncol. 2019;30:e50, with permission of Korean Society of Gynecologic Oncology [8].

Fig. 3.

Human papillomavirus subgroups as genotypes. Hr, high-risk; Lr, low-risk.

Fig. 4.

Multivariable Cox analysis by human papillomavirus (HPV) infection type (hazard ratios and 95% confidence intervals were adjusted by diagnostic age, body mass index (BMI), multiple HPV infection, and sexually transmitted disease infection history. (A) HPV-persistent infection. (B) HPV–non-persistent infection. Hr1, HPV 16, 18, 31, 33, 45, 52, 58; Hr2, HPV 35, 39, 51, 56, 59, 66, 68; Lr, low risk HPV.

Fig. 5.

Analysis of gradient boosting model. (A) Importance of prediction for progression. (B) Average increase in the area under the curve per time interval. HPV, human papillomavirus.

Table 1.

Basic characteristics of all participants

Persistent infection (%) (n=266) HPV-negative group (%) (n=49) Non-persistent infection (%) (n=958) p-value
Age (yr) 43±10.9 42±8.61 39±9.71 < 0.001
 < 30 18 (6.8) 6 (12.2) 173 (18.0)
 30-39 46 (17.3) 14 (28.6) 295 (30.8)
 40-49 62 (23.3) 19 (38.8) 294 (30.7)
 50-59 58 (21.8) 10 (20.4) 158 (16.5)
 60-65 19 (7.1) 0 7 (0.7)
 Unknown 63 (23.7) 0 31 (3.3)
BMI (mean) 21.6±2.61 20.8±2.69 21.3±2.88 0.004
Progression disease
 CIN 2 20 (7.5) 0 22 (2.3) < 0.001
 CIN 3 23 (8.6) 0 23 (2.4)
 Invasive cancer 4 (1.5) 0 6 (0.6)
Observation duration
 Less than 12 mo 50 (18.8) 18 (36.7) 308 (32.2) 0.097
 Less than 24 mo 100 (37.6) 14 (28.6) 153 (16.0)
 Less than 36 mo 39 (14.7) 6 (12.2) 115 (12.0)
 More than 36 mo 77 (28.9) 11 (22.5) 382 (39.8)
Initial cytology
 ASC-US 156 (58.6) 27 (55.1) 554 (57.8) 0.897
 LSIL 110 (41.4) 22 (44.9) 404 (42.2)
Cytology pattern
 Regression 73 (27.4) 32 (65.3) 501 (52.3) < 0.001
 Persistent 132 (49.6) 16 (32.7) 389 (40.6)
 Progression 61 (23.0) 1 (2.0) 68 (7.1)
HPV infected number group
 Hr1 130 (48.9) 0 393 (41.0) < 0.001
 Hr2 64 (24.1) 0 325 (34.0)
 Lr 72 (27.0) 0 240 (25.0)
HPV multiple infection
 Yes 112 (42.1) 0 272 (28.4) < 0.001
 No 154 (57.9) 49 (100) 686 (71.6)
STD infection history
 Yes 23 (8.6) 0 59 (6.2) 0.002
 No 243 (91.4) 49 (100) 899 (93.8)

Values are presented as mean±SD or number (%). ASC-US, atypical squamous cells of undetermined significance; BMI, body mass index; CIN, cervical intraepithelial neoplasm; HPV, human papilloma virus; Hr1, high risk 1 (human papilloma virus 16, 18, 31, 33, 45, 52, 58); Hr2, high risk 2 (human papilloma virus 35, 39, 51, 56, 59, 66, 68); Lr, low risk human papilloma virus; LSIL, low-grade squamous intraepithelial lesion; SD, standard deivation; STD, sexually transmitted disease.

Table 2.

Univariable Cox analysis by period and HPV infection group

Period HRa) 95% CI p-value
Initial-12 mo
 Hr1-a 9.58 2.77-33.1 < 0.001
 Hr1-b 10.5 2.84-38.7 < 0.001
 Hr1-c 8.00 2.37-27.0 < 0.001
 Hr2 1.94 0.502-7.51 0.336
12-24 mo
 Hr1-a 32.6 4.26-249 < 0.001
 Hr1-b 16.0 1.79-142 0.013
 Hr1-c 8.85 1.07-73.5 0.044
 Hr2 6.48 0.798-52.7 0.080
24-36 mo
 Hr1-a 10.7 1.19-95.6 0.034
 Hr1-b 7.94 0.720-87.5 0.091
 Hr1-c 4.54 0.472-43.7 0.190
 Hr2 1.90 0.172-20.9 0.600
Initial-60 mo
 Hr1-a 11.6 4.78-27.9 < 0.001
 Hr1-b 9.26 3.52-24.4 < 0.001
 Hr1-c 7.21 2.99-17.4 < 0.001
 Hr2 2.34 0.90-6.09 0.081

CI, confidence interval; HPV, human papillomavirus; HR, hazard ratio; Hr1-a, human papilloma virus 16 and 18; Hr1-b, human papilloma virus 31, 33, and 45; Hr1-c, human papilloma virus 52 and 58; Hr2, high risk 2 (human papilloma virus 35, 39, 51, 56, 59, 66 and 68).

a)

Among total 1,273 participants, 49 patient included HPV-negative group were excluded in multivariable cox regression. Since there were no progression cases in the HPV-negative group, it could not be used in the analysis, so the reference group was changed to HPV-non-persistent group in the survival analysis.

Table 3.

Multivariable Cox analysis at 60 months follow-up

Characteristic Univariable
Multivariable
No.a) Crude HR 95% CI p-value Adjusted HRb) 95% CI p-value
HPV infected number group
 Hr1-a 171 11.6 4.78-27.9 < 0.001 3.20 1.28-7.97 0.012
 Hr1-b 89 9.26 3.52-24.4 < 0.001 3.53 1.12-9.92 0.016
 Hr1-c 244 7.21 2.99-17.4 < 0.001 2.77 1.12-6.87 0.027
 Hr2 371 2.34 0.90-6.09 0.081 1.85 0.71-4.88 0.209
 Lr 301 1.00 Ref 1.00 Ref
Cytology pattern
 Progression 120 565 78.5-4,070 < 0.001 515 70.1-3,790 < 0.001
 Persistent 491 35.3 4.74-263 0.001 34.8 4.63-262 0.001
 Regression 565 1.00 Ref 1.00 Ref
HPV-infected type
 Persistent 260 3.38 2.27-5.02 < 0.001 0.86 0.53-1.37 0.527
 Non-persistent 916 1.00 Ref 1.00 Ref

BMI, body mass index; CI, confidence interval; HPV, human papilloma virus; HR, hazard ratio; Hr1-a, high-risk human papilloma virus 16 or 18; Hr1-b, high-risk human papilloma virus 31, 33, or 45; Hr1-c, high-risk human papilloma virus 52 or 58; Hr2, high-risk 2 (human papilloma virus 35, 39, 51, 56, 59, 66, or 68); Lr, low-risk human papilloma virus.

a)

Among total 1,273 participants, 49 patient included HPV-negative group and 48 patients had missed diagnostic age or BMI (age: 27, BMI: 13, age and BMI: 8) were excluded in multivariable cox regression. Since there were no progression cases in the HPV-negative group, it could not be used in the analysis, so the reference group was changed to HPV–non-persistent group in the survival analysis.

b)

Hazard ratios and 95% confidence intervals were adjusted by diagnostic age, BMI, HPV multiple infection, and sexually transmitted disease infection history.