1Faculty of Medicine, Zagazig University, Zagazig, Egypt
2Faculty of Pharmacy, British University in Egypt (BUE), El Shorouk, Egypt
3Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
4Department of Surgery, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
Copyright © 2022 by the Korean Cancer Association
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Ethical Statement
This study was approved by the institutional review board of Gangnam Severance Hospital, Yonsei University College of Medicine (Seoul, Republic of Korea) (Approval number: 3-2020-0018). The need to obtain informed consent was waived for this retrospective study.
Author Contributions
Conceived and designed the analysis: Osman MH, Kang J.
Collected the data: Mohamed RH, Sarhan HM, Park EJ, Baik SH, Lee KY, Kang J.
Contributed data or analysis tools: Mohamed RH, Lee KY, Kang J.
Performed the analysis: Osman MH, Kang J.
Wrote the paper: Osman MH, Kang J.
Conflict of interest
Conflict of interest relevant to this article was not reported.
SEER dataset (n=364,316) | Korean dataset (n=1,572) | p-value | |
---|---|---|---|
Years of diagnosis | 2004–2015 | 2003–2012 | |
Last follow up date | 2016 | 2019 | |
Age (yr) | 67.0±13.7 | 61.6±11.9 | < 0.001 |
Sex | |||
Male | 188,549 (51.8) | 965 (61.4) | < 0.001 |
Female | 175,767 (48.2) | 607 (38.6) | |
Tumor location | |||
Colon | 264,288 (72.5) | 939 (59.7) | < 0.001 |
Rectum | 100,028 (27.5) | 618 (39.3) | |
Missing | 15 (1.0) | ||
Histology | |||
Adenocarcinoma | 326,628 (89.7) | 1,383 (88.0) | 0.308 |
Other histology | 37,688 (10.3) | 146 (9.3) | |
Missing | 43 (2.7) | ||
Stage | |||
Stage I | 90,647 (24.9) | 318 (20.2) | < 0.001 |
Stage II | 96,337 (26.4) | 443 (28.2) | |
Stage III | 100,478 (27.6) | 535 (34.0) | |
Stage IV | 44,161 (12.1) | 180 (11.5) | |
Missing | 32,693 (9.0) | 96 (6.1) | |
Gradea) | |||
Grade I | 34,608 (9.5) | 237 (15.1) | < 0.001 |
Grade II | 230,595 (63.3) | 1,101 (70.0) | |
Grade III | 56,079 (15.4) | 45 (2.9) | |
Grade IV | 8,150 (2.2) | 73 (4.6) | |
Missing | 34,884 (9.6) | 116 (7.4) | |
Tumor size (mm) | 44.9±34.4 | 43.8±23.4 | < 0.001 |
No. of examined LNs | 15.0±11.0 | 21.0±17.9 | < 0.001 |
< 12 | 127,245 (35.4) | 397 (25.4) | < 0.001 |
≥ 12 | 226,543 (63.1) | 1,167 (74.5) | |
Unknown | 5,218 (1.5) | 2 (0.1) | |
No. of positive LNs | 1.6±3.6 | 1.8±4.0 | < 0.001 |
CEAb) | |||
Low | 109,429 (30.0) | 987 (62.8) | < 0.001 |
High | 80,015 (22.0) | 507 (32.3) | |
Missing | 174,872 (48.0) | 78 (5.0) | |
Radiation | |||
Yes | 43,087 (11.8) | 280 (17.8) | < 0.001 |
No/Unknown | 321,229 (88.2) | 1,292 (82.2) | |
Chemotherapy | |||
Yes | 124,894 (34.3) | 983 (62.5) | < 0.001 |
No/Unknown | 239,422 (65.7) | 589 (37.5) |
Values are presented as mean±SD or number (%). CEA, carcinoembryonic Antigen; LN, lymph node; SD, standard deviation; SEER, Surveillance, Epidemiology, and End Results.
a) Histologic grade: G1, well differentiated; G2, moderately differentiated; G3, poorly differentiated; G4, undifferentiated,
b) High: CEA ≥ 5, low: CEA < 5.
Characteristics of included patients
SEER dataset (n=364,316) | Korean dataset (n=1,572) | p-value | |
---|---|---|---|
Years of diagnosis | 2004–2015 | 2003–2012 | |
Last follow up date | 2016 | 2019 | |
Age (yr) | 67.0±13.7 | 61.6±11.9 | < 0.001 |
Sex | |||
Male | 188,549 (51.8) | 965 (61.4) | < 0.001 |
Female | 175,767 (48.2) | 607 (38.6) | |
Tumor location | |||
Colon | 264,288 (72.5) | 939 (59.7) | < 0.001 |
Rectum | 100,028 (27.5) | 618 (39.3) | |
Missing | 15 (1.0) | ||
Histology | |||
Adenocarcinoma | 326,628 (89.7) | 1,383 (88.0) | 0.308 |
Other histology | 37,688 (10.3) | 146 (9.3) | |
Missing | 43 (2.7) | ||
Stage | |||
Stage I | 90,647 (24.9) | 318 (20.2) | < 0.001 |
Stage II | 96,337 (26.4) | 443 (28.2) | |
Stage III | 100,478 (27.6) | 535 (34.0) | |
Stage IV | 44,161 (12.1) | 180 (11.5) | |
Missing | 32,693 (9.0) | 96 (6.1) | |
Grade | |||
Grade I | 34,608 (9.5) | 237 (15.1) | < 0.001 |
Grade II | 230,595 (63.3) | 1,101 (70.0) | |
Grade III | 56,079 (15.4) | 45 (2.9) | |
Grade IV | 8,150 (2.2) | 73 (4.6) | |
Missing | 34,884 (9.6) | 116 (7.4) | |
Tumor size (mm) | 44.9±34.4 | 43.8±23.4 | < 0.001 |
No. of examined LNs | 15.0±11.0 | 21.0±17.9 | < 0.001 |
< 12 | 127,245 (35.4) | 397 (25.4) | < 0.001 |
≥ 12 | 226,543 (63.1) | 1,167 (74.5) | |
Unknown | 5,218 (1.5) | 2 (0.1) | |
No. of positive LNs | 1.6±3.6 | 1.8±4.0 | < 0.001 |
CEA | |||
Low | 109,429 (30.0) | 987 (62.8) | < 0.001 |
High | 80,015 (22.0) | 507 (32.3) | |
Missing | 174,872 (48.0) | 78 (5.0) | |
Radiation | |||
Yes | 43,087 (11.8) | 280 (17.8) | < 0.001 |
No/Unknown | 321,229 (88.2) | 1,292 (82.2) | |
Chemotherapy | |||
Yes | 124,894 (34.3) | 983 (62.5) | < 0.001 |
No/Unknown | 239,422 (65.7) | 589 (37.5) |
Values are presented as mean±SD or number (%). CEA, carcinoembryonic Antigen; LN, lymph node; SD, standard deviation; SEER, Surveillance, Epidemiology, and End Results.
a)Histologic grade: G1, well differentiated; G2, moderately differentiated; G3, poorly differentiated; G4, undifferentiated,
b)High: CEA ≥ 5, low: CEA < 5.
Comparing AUROC and accuracy of light gradient boosting algorithm with Bayesian optimization using SEER dataset and Korean dataset
Survival periods | Internal validation using 18-fold CV on SEER dataset | External validation using Korean dataset | ||
---|---|---|---|---|
|
| |||
Accuracy (average±SD) | AUC (average±SD) | Accuracy | AUC | |
1 | 76.33±2.89 | 83.26±1.46 | 80.08 | 82.55 |
| ||||
2 | 75.63±1.89 | 82.45±1.12 | 78.16 | 83.62 |
| ||||
3 | 75.37±1.79 | 81.98±1.17 | 77.69 | 81.02 |
| ||||
4 | 74.87±1.42 | 81.83±1.22 | 76.41 | 80.52 |
| ||||
5 | 74.45±1.56 | 81.71±1.36 | 75.20 | 80.46 |
| ||||
6 | 74.08±1.28 | 81.59±1.17 | 74.57 | 78.75 |
| ||||
8 | 73.99±1.59 | 81.91±1.42 | 73.67 | 78.76 |
| ||||
10 | 74.26±1.41 | 82.82±1.20 | 74.21 | 77.72 |
AUC, area under ther curve; AUROC, area under the receiver operating characteristics; CV, cross validation; SD, standard deviation; SEER, Surveillance, Epidemiology, and End Results.
Comparing area under the receiver operating characteristics between light gradient boosting algorithm with Bayesian optimization and AJCC staging using validation Korean dataset
Survival periods | AJCC stage | LGB algorithm | p-value |
---|---|---|---|
1 | 75.13 | 82.55 | 0.002 |
2 | 76.66 | 83.62 | < 0.001 |
3 | 75.14 | 81.02 | < 0.001 |
4 | 75.15 | 80.52 | 0.001 |
5 | 73.67 | 80.46 | < 0.001 |
6 | 72.46 | 78.75 | 0.001 |
8 | 71.54 | 78.76 | 0.002 |
10 | 70.28 | 77.72 | 0.017 |
AJCC, American Joint Committee on Cancer; LGB algorithm, light gradient boosting algorithm.
Values are presented as mean±SD or number (%). CEA, carcinoembryonic Antigen; LN, lymph node; SD, standard deviation; SEER, Surveillance, Epidemiology, and End Results. Histologic grade: G1, well differentiated; G2, moderately differentiated; G3, poorly differentiated; G4, undifferentiated, High: CEA ≥ 5, low: CEA < 5.
AUC, area under ther curve; AUROC, area under the receiver operating characteristics; CV, cross validation; SD, standard deviation; SEER, Surveillance, Epidemiology, and End Results.
AJCC, American Joint Committee on Cancer; LGB algorithm, light gradient boosting algorithm.