Data Resource Profile: The Cancer Public Library Database in South Korea

Article information

Cancer Res Treat. 2024;56(4):1014-1026
Publication date (electronic) : 2024 April 30
doi : https://doi.org/10.4143/crt.2024.207
1National Cancer Control Institute, National Cancer Center, Goyang, Korea
2Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Korea
3Center for Breast Cancer, Research Institute and Hospital, National Cancer Center, Goyang, Korea
4Division of Data Promotion, Korea Health Information Service, Seoul, Korea
Correspondence: Kui Son Choi, Graduate School of Cancer Science and Policy, National Cancer Center, 323 Ilsan-ro, Ilsandong-gu, Goyang 10408, Korea Tel: 82-31-920-2912 Fax: 82-31-920-2189 E-mail: kschoi@ncc.re.kr
Received 2024 February 26; Accepted 2024 April 29.

Abstract

This paper provides a comprehensive overview of the Cancer Public Library Database (CPLD), established under the Korean Clinical Data Utilization for Research Excellence project (K-CURE). The CPLD links data from four major population-based public sources: the Korea National Cancer Incidence Database in the Korea Central Cancer Registry, cause-of-death data in Statistics Korea, the National Health Information Database in the National Health Insurance Service, and the National Health Insurance Research Database in the Health Insurance Review & Assessment Service. These databases are linked using an encrypted resident registration number. The CPLD, established in 2022 and updated annually, comprises 1,983,499 men and women newly diagnosed with cancer between 2012 and 2019. It contains data on cancer registration and death, demographics, medical claims, general health checkups, and national cancer screening. The most common cancers among men in the CPLD were stomach (16.1%), lung (14.0%), colorectal (13.3%), prostate (9.6%), and liver (9.3%) cancers. The most common cancers among women were thyroid (20.4%), breast (16.6%), colorectal (9.0%), stomach (7.8%), and lung (6.2%) cancers. Among them, 571,285 died between 2012 and 2020 owing to cancer (89.2%) or other causes (10.8%). Upon approval, the CPLD is accessible to researchers through the K-CURE portal. The CPLD is a unique resource for diverse cancer research to investigate medical use before a cancer diagnosis, during initial diagnosis and treatment, and long-term follow-up. This offers expanded insight into healthcare delivery across the cancer continuum, from screening to end-of-life care.

Introduction

Recently, the increasing value of big data on cancer, driven by advancements in information technology, has increased its demand in cancer research [1]. However, the various health and medical information being accumulated includes numerous sensitive individual health information, leading to many privacy protection restrictions on the use of healthcare big data. The Personal Information Protection Act was revised to promote data usage in 2020. Under the revised act, pseudonymized data that cannot identify individuals can be used for statistics, scientific research, and public records without individual consent. Furthermore, an amendment to the Cancer Control Act was implemented in 2021 to reinforce cancer data collection and sharing, with tasks delegated to the National Cancer Data Center (NCDC). The National Cancer Center was designated in the same year as the NCDC.

The Korean Ministry of Health and Welfare initiated the Korean Clinical Data Utilization Network for Research Excellence (K-CURE) project in 2022 based on the Personal Information Protection and Cancer Control Acts. This project aims to establish an ecosystem for combining and utilizing clinical and public cancer data. The Cancer Public Library Database (CPLD), established under the K-CURE project, combines data from four major population-based public sources: the Korea National Cancer Incidence Database (KNCI DB) in the Korea Central Cancer Registry (KCCR), cause-of-death data in Statistics Korea, National Health Information Database (NHID) in the National Health Insurance Service (NHIS), and National Health Insurance Research Database (NHIRD) in the Health Insurance Review & Assessment Service (HIRA).

This study aimed to offer a comprehensive profile of CPLD data, highlighting its representation of the entire patient population with cancer in Korea. We presented descriptive statistics detailing the number of patients included in the CPLD, their demographics, medical usage, and mortality. Furthermore, this study emphasized the potential CPLD value in cancer research by presenting its available data.

Materials and Methods

1. Data sources

The CPLD resulted from the collaborative efforts of the KCCR, NHIS, HIRA, and Statistics Korea. The NCDC requested the KNCI DB from the KCCR, cause-of-death data from Statistics Korea, NHID from the NHIS, and NHIRD from the HIRA to establish the CPLD. The KNCI DB is a nationwide and hospital-based cancer registration database that regularly collects information on newly diagnosed cancer (incident) cases among Korean residents [2]. The KCCR has reported nationwide statistics since 1999; our previous study provides detailed information on the KCCR and KNCI DB [3]. Completeness is an important data quality indicator, and the 2020 KNCI DB was estimated to be 98.3% complete using the method proposed by Ajiki et al. [4]. The KNCI DB contains data on demographics (such as age, sex, and residence), diagnosis date, cancer type (based on the International Classification of Diseases, 10th edition), Surveillance, Epidemiology, and End Results (SEER) summary stage, morphology, and treatment methods for patients with cancer.

The mortality data collected by the KNCI DB were primarily derived from the cause-of-death data collected by the Statistics Korea. The cause-of-death data are obtained from the death certificates of Koreans who had resided in Korea. Causes of death were classified using the disease classification recommended by the World Health Organization [5] and the 7th Korean Standard Classification of Diseases and Causes of Death [6]. We collected the cause and date of death information from the cause-of-death database.

The NHIS in Korea is a single insurer that provides health insurance coverage for all citizens living in Korea, managing their eligibility, collecting insurance contributions, and providing health insurance benefits. The HIRA evaluates medical service fees, healthcare quality, and medical service adequacy. Under this universal health coverage system, the NHID and NHIRD contain healthcare information such as treatments, pharmaceuticals, procedures, and diagnoses for approximately 50 million beneficiaries [7,8]. We collected sociodemographic data of the NHIS beneficiaries and medical aid recipients, alongside information on their general health checkups and national cancer screening examinations from the NHID. Medical utilization data were gathered from the NHIRD, which comprised the following files: (1) general information; (2) healthcare services, including inpatient prescriptions; (3) disease diagnosis; (4) outpatient prescriptions; and (5) drug master table.

2. Data linkage

The individuals included in the KNCI DB are linked to their NHID and NHIRD enrollment data, as well as cause-of-death data, using an algorithm based on their resident registration number. Each database entry is linked through a join key—an encrypted value derived from the resident registration number using a secure hash algorithm and salt value. Individual join keys and serial numbers generated by each institution are collected by the Korea Health Information Service (a trusted third-party organization) to protect personal information. The Korea Health Information Service uses this information to create a linkage table, which the NCDC utilizes to combine data from each institution along with the created linkage table. The NCDC deletes the linkage table once the combination process is complete. Consequently, personal identifiable information used to link the database is excluded from the CPLD. Instead, each individual receives a unique, non-identifiable number to enable tracking across data files and times. Therefore, database users are not allowed to link additional data resources at an individual level. Each institution must conduct an additional process to generate a linkage table for additional data linkages, performed only by specialized institutions authorized under the Personal Information Protection Act.

Finally, the CPLD incorporated 1,983,499 individuals diagnosed with cancer between 2012 and 2019, aged 0-100 years or older. The CPLD includes information on deaths between 2012 and 2020, health insurance eligibility, general health checkups, national cancer screening, and medical claims between 2012 and 2021 (Fig. 1).

Fig. 1.

The concept of cancer public library database.

3. Data access

The CPLD can be assessed through the K-CURE portal (https://k-cure.mohw.go.kr). Researchers are required to submit a study proposal with ethical approval from their Institutional Review Board. These requirements must be approved by the NCDC review committee before data access is granted. In principle, only the minimum data needed to conduct the research question are provided. Provider identifiers, sensitive disease names (such as mental diseases and sexually transmitted diseases), and related medical information are removed or replaced in the CPLD to protect privacy. Approval from the NCDC review committee is required for all restricted-variable requests.

4. Data included in the CPLD

The CPLD comprises various linkable files categorized by unique serial numbers assigned to each included patient with cancer because of the several cases and associated claims. Table 1 presents the various file types. Twenty-four cancer types in the CPLD are classified based on the International Classification of Diseases (10th edition) codes. The ages are grouped in 5-year intervals between aged 20 and 79, while those under 20 and those 80 or older are grouped separately (0-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, and ≥ 80 year old). However, the NCDC review committee can approve special requests for a 1-year interval. The regions are grouped into 17 municipal units, including Seoul, Busan, Daegu, Incheon, Gwangju, Daejeon, Ulsan, Sejong, Jeju-do, Gyeonggi-do, Gangwon-do, Chungcheongbuk-do, Chungcheongnam-do, Jeollabuk-do, Jeollanam-do, Gyeongsangbuk-do, and Gyeongsangnamdo. Health insurance premiums are categorized into 10 deciles. The causes of death were grouped into 24 cancer types and major classifications following the Korean Standard Classification of Diseases ver. 7, based on the International Classification of Diseases, 10th revision. The NHIS claims files from the NHIRD include unique patient identifiers, sex, service date(s), diagnosis codes, procedure codes, amount charged, and amount reimbursed. The general health checkups and cancer screening files contain general health status (including height, weight, results of blood tests, and disease history), health behavior (including smoking, alcohol consumption, and physical activity), and screening results for five cancer types.

List of variables in the Cancer Public Library Database (summary)

5. Statistical analysis

This study presented descriptive statistics on the clinical and sociodemographic characteristics and healthcare utilization of patients with cancer included in the CPLD. We presented the number of patients with cancer based on cancer sites and the prevalence of the top five sites over the years. Additionally, we presented the number of deaths between 2012 and 2020 and their main causes. Furthermore, to demonstrate the extent and type of NHIS used by patients with cancer, we calculated the annual average claims per patient in the year before diagnosis (12 months before the month of cancer diagnosis), the year after diagnosis (12 months after the month of cancer diagnosis), and 12 months before death (11 months before the month of death). Descriptive analyses were performed using SAS ver. 9.4 (SAS Institute Inc., Cary, NC).

Results

Table 2 presents the number of patients with cancer based on their sociodemographic characteristics and diagnosis year. Of the 1,983,488 patients, the majority were in their 60s (23%), followed by the 70-79 age group and 50-59 age group. Individuals in the 8-10 decile group were the most prevalent decile group of health insurance premiums at cancer diagnosis. The distribution of patients with cancer based on the SEER summary stage was as follows: 40.9% had localized cancer, 27.1% belonged to the regional group, 16.1% belonged to the distant group, and 15.8% were categorized as unknown.

Number of cancer cases included in the Cancer Public Library Database from 2012 to 2019 (n=1,983,488)

Fig. 2 shows the top five cancers by sex from 2012 to 2019. Among the 996,209 men, stomach, lung, colorectal, prostate, and liver cancer were the top five cancers, accounting for 16.1%, 14.0%, 13.3%, 9.6%, and 9.3% of all cancer cases diagnosed, respectively. The proportion of lung and prostate cancers in men steadily increased from 2012 to 2019, while stomach, colorectal, and liver cancer decreased. The most common cancers in women were thyroid (20.4%), breast (16.6%), colorectal (9.0%), stomach (7.8%), and lung (6.2%) cancers. The proportion of breast and lung cancers in women has steadily increased from 2012 to 2019, while gastric cancer has steadily decreased. Thyroid cancer accounted for about 30% of cancer cases in 2012, but it has gradually decreased since then, accounting for about 16.9% of all cancer cases in 2019. The number of incident cancer cases from 2012 to 2019 by cancer type in men and women is available in S1 Table.

Fig. 2.

Distribution of cancer cases included in the Cancer Public Library Database by cancer sites from 2012 to 2019. (A) Men. (B) Women.

Among these patients with cancer, 571,285 died between 2012 and 2020, with 89.2% of the deaths attributed to cancer and 10.8% to other causes (Table 3). Lung cancer caused the highest number of deaths in both sexes, with 91,437 deaths in men and 29,707 in women. Liver (14.4%), stomach (9.6%), colorectal (8.3%), and pancreatic (6.1%) cancers had the highest number of deaths among the men after lung cancer. Colorectal (11.3%), pancreatic (9.5%), stomach (9.1%), and liver (8.9%) cancers caused the most deaths among the women.

Number of deaths among patients with cancer included in the Cancer Public Library Database from 2012 to 2020

Table 4 presents the medical service utilization patterns during the 1 year before and after cancer diagnosis, as well as during the 1 year before death. Regarding medical services, 93% of the patients with cancer had outpatient claims and 43% had inpatient hospitalization claims during 1-year before cancer diagnosis. Almost all patients with cancer (92%) had at least one outpatient claim, and the majority (89%) had at least one inpatient claim during the year after diagnosis. Furthermore, of the 571,285 patients who died between 2012 and 2020, 98% had outpatient and inpatient hospitalization claims in the 1 year before death. The average number of outpatient visits and inpatient hospitalizations per patient was higher during the 1 year after diagnosis than the 1 year before. The frequency of inpatient hospitalization claims increased from 1.9 to 4.5. Medical care use increased during the last year of life, with an average of 38.7 outpatient visits and 7.8 inpatient hospitalizations. Furthermore, 41% and 34% of patients used dental and oriental medicines, respectively, in the 1-year before cancer diagnosis. However, fewer patients with cancer used dental and oriental medicines in the 1 year after diagnosis and before their death.

Medical use of patients with cancer included in the Cancer Public Library Database by time point

Discussion

The CPLD has several strengths. The CPLD encompasses 96.7% of all cancer incidence cases, as published in the annual report of cancer statistics of KCCR [9], ensuring a comprehensive representation of the population. This is advantageous because previous studies using NHIS claims data faced challenges in accurately defining patients with cancer using disease and procedure codes, which led to the underestimation or overestimation of cancer incidence or prevalence [7,10,11]. Consequently, the CPLD is a valuable resource for overcoming the limitations of defining cancer diagnoses in research.

The key features include patient demographics (including age and sex), detailed clinical cancer characteristics (including diagnosis date, site, histology, and summary stage), extensive healthcare service utilization, and cost information. These features facilitate the identification and comparison of cancer treatments and outcomes among the included populations. Moreover, the longitudinal nature of the CPLD, covering before and after cancer diagnosis periods, facilitates the calculation of time-dependent measures such as comorbidity indices, a comprehensive analysis of various treatments (including surgery, radiation, chemotherapy, immunotherapy, and other treatments), and outcomes (including time to subsequent events or death). Additionally, these longitudinal data offer valuable insights into the long-term outcomes of cancer survivors.

The CPLD is similar to the SEER-Medicare database in the United States, which combines SEER cancer registry data with Medicare enrollment and claims data [12]. The SEER-Medicare database offers advantages, including a substantial number of cancer cases, detailed tumor characteristics, population-based data sources, longitudinal Medicare data, an extensive range of covered services, and biennial linkage updates [12]. Additionally, the SEER-Medicare linkage encompasses non-cancer control groups and incorporates ancillary linkage data sources, such as the Medicare Health Outcome survey and the Medicare Consumer Assessment of Healthcare Providers and Systems survey. However, findings from the SEER-Medicare analyses may not be generalizable to younger populations owing to its focus on linking with Medicare data, primarily including individuals aged 65 years and older [12].

The CPLD has some limitations. First, a time lag of 2-3 years exists between the generation of individual data and their availability for research. The CPLD released in 2023 included patients with cancer through 2019, cases of death through 2020, and claims through 2021. This time lag is primarily driven by the KNCI DB, which is necessary for the completeness of the cancer registration [3]. Therefore, researchers should be cautious when designing studies using the CPLD, considering its unique characteristics.

Second, claims data from the NHID and NHIRD do not encompass all health-related information. For example, clinically observed information, which may be present in medical records, is excluded from the CPLD. Furthermore, services such as cosmetic surgical procedures or over-thecounter drugs not covered by the NHIS are absent in the CPLD because claims data are generated to reimburse healthcare services covered by the NHIS. The CPLD includes medical procedure codes to indicate that specific tests are conducted; however, the CPLD lacks information on test results (such as imaging test results, biomarker data, and laboratory values). Additionally, certain health conditions, such as mental illness, suicide, sexually transmitted diseases, and miscarriage, are not available because of privacy concerns. Therefore, researchers should consider these constraints when selecting study topics.

Third, researchers should understand CPLD structure and characteristics. Claims data in the CPLD comprises diverse file types, each with one-to-many linkage relationships. Furthermore, the CPLD contains left- and right-truncated data. Therefore, caution should be exercised when interpreting trends in cancer incidence, prevalence, and mortality rates. Additionally, specialized knowledge of NHIS billing and coding is essential for properly manipulating and interpreting data.

Finally, information related to diagnoses and diseases, excluding cancer, may not accurately reflect disease occurrence and prevalence because it primarily comes from the claims data used for reimbursement [8]. Moreover, administrative claims data alone do not provide insight into the decision-making process for cancer care and other patientreported outcomes. These limitations are not exclusive to the CPLD, but are common in databases relying on claims data, which are primarily gathered for administrative rather than research purposes.

In conclusion, the CPLD provides a unique resource for various cancer research, enabling the investigation of medical usage patterns before a cancer diagnosis, during the period of initial diagnosis and treatment, and long-term follow-up. This facilitates expanded insights into healthcare delivery across the cancer continuum, from screening to endof-life care. Partners from the NCDC, Statistics Korea, KCCR, NHIS, and HIRA ensure the continual enhancement and maintenance of the CPLD. The CPLD plans to add data on newly diagnosed cancer patients and update data on existing cancer patients annually. Furthermore, there are plans to expand the range of public agency data based on researchers’ needs, which includes the coronavirus disease 2019 DB of the Korea Disease Control and Prevention Agency. Finally, with continuous cooperation and efforts, the CPLD can contribute to the development of future insights into cancer research in South Korea.

Electronic Supplementary Material

Supplementary materials are available at Cancer Research and Treatment website (https://www.e-crt.org).

Notes

Author Contributions

Conceived and designed the analysis: Choi KS, Chae H, Choi DW, Ryu KS.

Collected the data: Im JS, Choi KS, Choi DW, Ryu KS, Kong HJ, Cha HS, Kim HJ, Chae H, Jeon YS, Kim H, Jung J.

Contributed data or analysis tools: Choi DW, Guk MY, Kim HR.

Performed the analysis: Choi DW, Guk MY, Kim HR.

Wrote the paper: Choi DW, Choi KS.

Interpretation and review: Choi DW, Choi KS.

Review and comment: Im JS, Choi KS, Choi DW, Guk MY, Kim HR, Ryu KS, Kong HJ, Cha HS, Kim HJ, Chae H, Jeon YS, Kim H, Jung J.

Conflicts of Interest

Conflict of interest relevant to this article was not reported.

Acknowledgements

Special thanks to the Korean Ministry of Health and Welfare, the Statistics Korea, the Korea Central Cancer Registry, the National Health Insurance Service, the Health Insurance Review & Assessment Service, and the Korea Health Information for their support and contributions to the K-CURE project. This work was supported by the Health Promotion Fund of the Ministry of Health & Welfare (No. 22A2400-1) and a research grant (No. 2310520-2, No. 2310690-1) from the National Cancer Center, Republic of Korea.

References

1. Batko K, Slezak A. The use of big data analytics in healthcare. J Big Data 2022;9:3.
2. Shin HR, Won YJ, Jung KW, Kong HJ, Yim SH, Lee JK, et al. Nationwide cancer incidence in Korea, 1999~2001: first result using the national cancer incidence database. Cancer Res Treat 2005;37:325–31.
3. Kang MJ, Jung KW, Bang SH, Choi SH, Park EH, Yun EH, et al. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2020. Cancer Res Treat 2023;55:385–99.
4. Ajiki W, Tsukuma H, Oshima A. Index for evaluating completeness of registration in population-based cancer registries and estimation of registration rate at the Osaka Cancer Registry between 1966 and 1992 using this index. Nihon Koshu Eisei Zasshi 1998;45:1011–7.
5. World Health Organization. International statistical classification of diseases and related health problems 10th rev. 5th edth ed. Geneva: World Health Organization; 2016.
6. Korea Informative Classification of Diseases. Korean standard classification of diseases and causes of death (KCD-7) Daejeon: Statistics Korea; 2016.
7. Kim JA, Yoon S, Kim LY, Kim DS. Towards actualizing the value potential of Korea Health Insurance Review and Assessment (HIRA) data as a resource for health research: strengths, limitations, applications, and strategies for optimal use of HIRA data. J Korean Med Sci 2017;32:718–28.
8. Seong SC, Kim YY, Khang YH, Heon Park J, Kang HJ, Lee H, et al. Data resource profile: the National Health Information Database of the National Health Insurance Service in South Korea. Int J Epidemiol 2017;46:799–800.
9. Korea Central Cancer Registry, National Cancer Center. Annual report of cancer statistics in Korea in 2019 Sejong: Ministry of Health and Welfare; 2021.
10. Baldi I, Vicari P, Di Cuonzo D, Zanetti R, Pagano E, Rosato R, et al. A high positive predictive value algorithm using hospital administrative data identified incident cancer cases. J Clin Epidemiol 2008;61:373–9.
11. Seo HJ, Oh IH, Yoon SJ. A comparison of the cancer incidence rates between the national cancer registry and insurance claims data in Korea. Asian Pac J Cancer Prev 2012;13:6163–8.
12. Enewold L, Parsons H, Zhao L, Bott D, Rivera DR, Barrett MJ, et al. Updated overview of the SEER-medicare data: enhanced content and applications. J Natl Cancer Inst Monogr 2020;2020:3–13.

Article information Continued

Fig. 1.

The concept of cancer public library database.

Fig. 2.

Distribution of cancer cases included in the Cancer Public Library Database by cancer sites from 2012 to 2019. (A) Men. (B) Women.

Table 1.

List of variables in the Cancer Public Library Database (summary)

Provider Database Year Variable
KCCR KNCIDB 2012-2019 Age at diagnosis, sex, cancer type (ICD-10), diagnosis date, SEER summary stage, ICD-O-3, morphology code, and treatment method
Statistics Korea Death certificate 2012-2020 Cause and date of death
HIRA General information 2012-2021 Age, sex, insurance type, review date, provider ID, indicators for inpatients/outpatients, and indicators for types of providers
Operation related to primary diagnosis
Dates of treatment and dispensation
Primary diagnosis, secondary diagnosis, surgery, and area of provider’s practice
Number of days undergoing care, first visit to a physician, and dates of encounter, admission, and discharge
Number of days of supply for prescriptions, prescription quantity, and special codes for different out-of-pocket costs
Healthcare services 2012-2021 Procedures, inpatient prescriptions, diagnostic tests, and treatments
Operation, injection, and examination
Unit price, quantity per day, and days of supply
Diagnosis 2012-2021 All disease diagnoses (KCD-6, KCD-7)
Outpatient prescription 2012-2021 Quantity per time, quantity per day, days of supply, drug code, unit price, amount, and prescription date
Drug master 2012-2021 Drug code, date of starting (and terminating) coverage, unit, manufacturer, coverage, and unit price
NHIS Eligibility for health insurance 2012-2021 Age, province of residence, eligibility type for health insurance, percentile group of income level, type of registered disability, grade of registered disability, and year
General health examination 2012-2021 Examination year
Height, weight, waist circumference, blood pressure, fasting blood glucose, total cholesterol, triglyceride, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, hemoglobin, urine protein, serum creatinine, AST (SGOT), ALT (SGPT), and gamma-GTP Personal history (stroke, cardiac disease, hypertension, diabetes, dyslipidemia, tuberculosis, and other diseases [including cancer]), family history (hypertension, stroke, cardiac disease, diabetes, and other disease [including cancer]), smoking status, number of cigarettes daily for current smokers, frequency of alcohol consumption and physical activity (intensive, moderate exercise, and walking)
Cancer screening examination 2012-2021 Common questions: family health and treatment history
Stomach: upper gastrointestinal series (UGIS) or endoscopy (biopsy)
Liver: abdominal ultrasonography+serum alpha-fetoprotein test (combined)
Colorectum: (1st test) fecal occult blood test (FOBT), (2nd test) colonoscopy (biopsy) or double contrast barium enema
Breast: mammography
Cervix uteri: Pap smear

ALT, alanine aminotransferase; AST, aspartate aminotransferase; Gamma-GTP, gamma-glutamyl transpeptidase; HIRA, Health Insurance Review and Assessment; ICD-10, International Classification of Diseases 10th revision; ICD-O-3, International Classification of Diseases for Oncology, 3rd edition; KCCR, Korea Central Cancer Registry; KCD-6, Korean Standard Classification of Diseases version 6, based on the International Classification of Diseases 10th revision; KCD-7, Korean Standard Classification of Diseases version 7, based on the International Classification of Diseases 10th revision; KNCIDB, Korea National Cancer Incidence Data Bases; NHIS, National Health Insurance Services; SEER, Surveillance, Epidemiology, and End Results; SGOT, serum glutamic oxaloacetic transaminase; SGPT, serum glutamic pyruvate transaminase.

Table 2.

Number of cancer cases included in the Cancer Public Library Database from 2012 to 2019 (n=1,983,488)

Variable Year at diagnosis
2012 2013 2014 2015 2016 2017 2018 2019 Total
Men Women Men Women Men Women Men Women Men Women Men Women Men Women Men Women
Age at diagnosis of cancer (yr)
 0-19 518 535 439 504 405 489 369 405 333 396 251 279 167 209 103 133 5,535
 20-29 1,179 3,532 1,207 3,508 1,173 3,270 1,091 3,090 1,244 3,754 1,192 3,848 1,337 4,061 1,341 4,246 39,073
 30-39 4,236 13,825 4,444 13,559 3,821 12,054 3,564 11,256 3,657 11,995 3,844 12,122 4,093 12,406 4,154 12,542 131,572
 40-49 11,159 25,910 11,067 26,019 10,272 23,632 9,836 22,441 10,050 24,018 9,794 24,224 9,694 24,804 9,832 24,948 277,700
 50-59 25,797 29,583 25,754 29,518 25,318 26,916 24,653 26,021 25,829 27,840 25,438 27,812 25,465 28,613 25,406 29,714 429,677
 60-69 31,944 20,756 31,564 19,990 32,034 19,153 33,047 19,877 36,413 22,260 37,096 22,940 39,145 24,102 40,742 26,211 457,274
 70-79 31,218 18,622 33,052 19,577 33,337 19,471 33,820 19,378 34,858 20,496 36,139 20,660 38,561 21,143 40,126 22,324 442,782
 ≥ 80 9,021 8,742 9,781 9,485 10,783 10,304 11,755 11,321 13,400 13,016 14,713 13,511 16,291 14,561 17,843 15,348 199,875
Region
 Capital city 22,419 25,001 22,427 24,515 21,967 22,844 21,982 22,311 23,125 24,276 23,885 24,419 24,821 25,383 25,665 26,204 381,244
 Metropolitan city 28,175 32,181 28,780 32,665 28,843 30,416 28,946 30,355 31,511 32,283 31,702 32,346 33,599 33,587 34,372 34,624 504,385
 Others 64,375 64,208 66,008 64,887 66,241 61,923 67,117 61,044 71,075 67,160 72,813 68,585 76,254 70,888 79,451 74,597 1,096,626
 Missing 103 115 93 93 92 106 90 79 73 56 67 46 79 41 59 41 1,233
Decile groups of health insurance premium
 0 5,276 5,697 5,245 5,602 5,272 5,372 5,258 5,311 6,086 5,997 6,260 6,043 6,796 6,138 6,909 6,230 93,492
 1-3 21,444 25,898 22,160 26,335 23,281 25,931 22,875 25,493 24,633 28,047 25,041 28,765 26,607 29,970 27,587 31,505 415,572
 4-7 34,456 37,245 35,796 37,984 35,835 35,840 37,054 36,065 39,193 39,086 40,431 39,560 42,536 41,225 43,599 43,320 619,225
 8-10 50,269 48,674 50,548 48,303 50,880 46,080 51,026 44,924 53,999 48,587 54,891 48,989 56,810 50,499 59,448 52,265 816,192
 Missing 3,627 3,991 3,559 3,936 1,875 2,066 1,922 1,996 1,873 2,058 1,844 2,039 2,004 2,067 2,004 2,146 39,007
Type of health insurance
 Self-employed insured 35,667 38,180 35,403 37,072 35,068 34,043 35,100 33,018 36,628 35,375 37,571 35,048 39,240 35,934 41,791 38,007 583,145
 Employed insured 74,035 77,525 76,568 79,394 76,711 75,773 77,687 75,384 82,998 82,348 84,570 84,259 88,643 87,789 90,793 91,194 1,305,671
 Medical-aid beneficiary 5,276 5,697 5,245 5,602 5,272 5,372 5,258 5,311 6,086 5,997 6,260 6,043 6,796 6,138 6,909 6,230 93,492
 Missing 94 103 92 92 92 101 90 76 72 55 66 46 74 38 54 35 1,180
SEER summary stage
 Localized 48,554 51,099 48,123 49,617 47,928 46,853 48,163 45,611 51,336 49,398 52,500 50,153 54,603 53,512 57,674 56,491 811,615
 Regional 31,130 37,318 32,308 37,742 30,987 33,341 30,525 31,447 32,345 33,775 33,464 33,814 34,993 33,838 35,726 34,910 537,663
 Distant 21,671 14,285 22,207 14,607 22,899 14,804 22,946 15,127 24,589 16,479 25,474 16,934 26,468 17,250 26,598 17,674 320,012
 Unknown 13,717 18,803 14,670 20,194 15,329 20,291 16,501 21,604 17,514 24,123 17,029 24,495 18,689 25,299 19,549 26,391 314,198

SEER, Surveillance, Epidemiology, and End Results.

Table 3.

Number of deaths among patients with cancer included in the Cancer Public Library Database from 2012 to 2020

Variable Year at diagnosis
2012 2013 2014 2015 2016 2017 2018 2019 2020 Total
Men Women Men Women Men Women Men Women Men Women Men Women Men Women Men Women Men Women
Cause of death
 Cancer-specific death
  Lip, oral cavity, and pharynx 198 48 444 138 598 180 687 200 775 219 821 216 748 247 900 248 726 208 7,601
  Esophagus 384 46 886 96 1,142 103 1,202 105 1,247 124 1,192 116 1,226 112 1,358 116 983 105 10,543
  Stomach 1,648 869 3,453 1,893 4,174 2,258 4,380 2,375 4,552 2,494 4,499 2,459 4,568 2,348 4,502 2,389 3,201 1,715 53,777
  Colon and rectum 966 822 2,204 1,834 2,961 2,351 3,492 2,717 3,773 3,014 4,277 3,187 4,378 3,350 4,580 3,409 3,686 2,572 53,573
  Liver 2,958 970 5,093 1,802 6,210 2,076 6,528 2,277 6,627 2,471 6,779 2,289 6,832 2,430 6,881 2,449 4,679 1,522 70,873
  Gallbladder 657 718 1,267 1,336 1,601 1,591 1,823 1,778 2,008 1,894 2,189 2,089 2,336 2,131 2,418 2,209 1,836 1,452 31,333
  Pancreas 1,184 1,001 2,100 1,750 2,440 2,053 2,712 2,293 2,713 2,509 2,800 2,615 3,034 2,677 3,285 2,787 2,125 1,900 41,978
  Larynx 67 4 184 13 220 16 220 18 239 14 271 20 216 18 252 19 211 5 2,007
  Lung 4,323 1,320 8,776 2,657 10,413 3,214 10,960 3,643 11,822 3,780 11,966 3,944 12,102 3,943 12,568 4,186 8,507 3,020 121,144
  Breast NA 124 NA 407 NA 740 NA 1,093 NA 1,333 NA 1,506 NA 1,661 NA 1,922 NA 1,859 10,645
  Cervix uteri NA 118 NA 355 NA 573 NA 688 NA 669 NA 685 NA 691 NA 760 NA 621 5,160
  Corpus uteri NA 38 NA 112 NA 198 NA 240 NA 264 NA 276 NA 297 NA 340 NA 297 2,062
  Ovary NA 156 NA 367 NA 517 NA 656 NA 851 NA 914 NA 1,009 NA 1,051 NA 1,002 6,523
  Prostate 156 NA 464 NA 717 NA 928 NA 1,116 NA 1,261 NA 1,494 NA 1,600 NA 1,592 NA 9,328
  Testis 3 NA 11 NA 7 NA 9 NA 11 NA 10 NA 14 NA 13 NA 8 NA 86
  Kidney 169 64 316 157 408 182 470 188 521 220 529 225 548 257 567 244 456 152 5,673
  Bladder 136 60 413 150 570 202 635 225 765 261 863 260 857 267 955 320 866 212 8,017
  Brain and CNS 113 96 311 247 451 315 471 397 533 433 594 408 583 467 621 503 504 416 7,463
  Thyroid 31 58 51 97 47 120 55 110 60 125 65 133 69 153 83 150 61 95 1,563
  Hodgkin lymphoma 14 2 22 13 24 20 25 9 29 13 36 9 32 16 28 23 18 14 347
  Non-Hodgkin lymphoma 327 217 606 419 715 499 807 592 895 627 939 732 1,061 764 1,076 765 747 515 12,303
  Multiple myeloma 139 93 207 181 304 245 334 308 415 363 422 347 438 368 456 369 352 344 5,685
  Leukemia 338 274 604 430 700 550 807 564 838 685 898 654 937 637 990 677 574 425 11,582
  Other cancers 690 462 1,389 1,006 1,741 1,285 1,930 1,495 2,104 1,666 2,320 1,788 2,486 1,879 2,611 1,926 1,922 1,421 30,121
 Non-cancer death by major classifications of KCD
  Certain infectious and parasitic diseases (A00-B99) 63 29 99 42 147 67 161 79 232 137 311 170 369 172 399 254 429 261 3,421
  Endocrine, nutritional and metabolic diseases (E00-E90) 30 22 79 34 117 49 190 80 197 127 231 140 288 169 284 177 352 198 2,764
  Mental and behavioral metabolic diseases (E00-E90) 3 4 9 12 16 8 20 25 38 27 55 35 58 44 69 53 79 52 607
  Diseases of the nervous system (G00-G99) 4 4 25 16 35 29 73 55 86 70 122 90 191 123 238 178 296 216 1,851
  Diseases of the circulatory system (I00-I99) 163 90 368 208 566 382 830 493 1,133 670 1,434 934 1,656 1,059 1,941 1,197 1,974 1,264 16,362
  Diseases of the respiratory system (J00-J99) 91 33 208 64 370 106 606 204 871 268 1,219 422 1,608 560 1,897 647 1,870 622 11,666
  Diseases of the digestive system (K00-K93) 108 29 188 62 258 103 334 122 413 160 461 234 604 243 668 290 641 285 5,203
  Diseases of the genitourinary system (N00-N99) 28 14 53 26 108 53 133 94 182 125 235 138 296 219 364 245 398 281 2,992
  Others 250 109 524 267 736 374 1,071 451 1,337 569 1,475 709 1,869 868 2,111 1,025 2,205 1,082 17,032
Age at death (yr)
 0-19 24 17 54 35 68 51 61 63 76 57 64 38 52 49 47 26 31 16 829
 20-29 31 23 111 76 139 90 157 149 142 139 150 160 153 140 172 164 134 157 2,287
 30-39 164 96 359 369 490 488 542 613 532 659 546 692 537 688 545 732 416 674 9,142
 40-49 786 347 1,648 918 2,109 1,282 2,338 1,555 2,391 1,744 2,284 1,775 2,258 1,936 2,318 1,990 1,828 1,964 31,471
 50-59 2,282 646 4,912 1,753 6,339 2,528 6,949 2,986 7,421 3,380 7,647 3,459 7,625 3,662 7,977 3,884 6,288 3,513 83,251
 60-69 3,435 1,095 7,160 2,648 8,902 3,178 10,024 3,823 10,987 4,280 12,045 4,495 12,495 4,751 13,351 5,153 10,924 4,443 123,189
 70-79 5,674 2,763 11,085 5,497 13,563 6,743 14,709 7,477 15,788 7,992 16,365 8,333 17,623 8,457 18,196 8,915 14,394 6,971 190,545
 ≥ 80 2,845 2,907 5,025 4,895 6,186 6,099 7,113 6,908 8,195 7,931 9,173 8,792 10,155 9,496 11,109 10,064 7,283 6,395 130,571
Region
 Capital city 2,444 1,191 5,016 2,675 6,233 3,296 7,113 3,985 7,593 4,335 7,867 4,546 8,536 4,809 8,838 4,982 6,813 4,014 94,286
 Metropolitan city 3,669 1,891 7,382 3,994 9,126 5,007 10,099 5,768 11,086 6,348 11,811 6,792 12,468 7,236 13,106 7,678 10,245 5,959 139,665
 Others 9,126 4,811 17,937 9,516 22,413 12,146 24,643 13,813 26,825 15,487 28,559 16,396 29,823 17,102 31,688 18,220 24,165 14,129 336,799
 Missing 2 1 19 6 24 10 38 8 28 12 37 10 71 32 83 48 75 31 535
Decile groups of health insurance premium
 0 1,212 1,011 2,560 1,930 3,126 2,209 3,622 2,495 4,261 2,968 4,649 3,234 4,999 3,269 5,338 3,494 4,264 2,640 57,281
 1-3 3,177 1,624 5,697 3,070 7,297 3,983 7,832 4,554 8,347 5,045 8,824 5,185 9,410 5,642 9,845 6,007 7,759 4,805 108,103
 4-7 4,590 2,009 9,329 4,298 11,483 5,579 12,601 6,623 13,635 7,115 14,290 7,524 14,913 8,062 15,731 8,493 11,511 6,531 164,317
 8-10 5,798 2,977 11,743 6,354 15,157 8,325 16,936 9,446 18,397 10,566 19,603 11,283 20,542 11,696 21,748 12,339 16,839 9,650 229,399
 Missing 464 273 1,025 539 733 363 902 456 892 488 908 518 1,034 510 1,053 595 925 507 12,185
Type of health insurance
 Self-employed insured 7,956 3,974 15,651 7,955 19,429 10,105 20,634 11,701 21,814 12,445 23,347 13,309 22,708 13,847 23,930 14,974 16,235 11,851 271,865
 Employed insured 9,057 4,423 18,195 9,395 22,750 12,094 25,586 13,937 27,903 15,589 29,327 16,411 30,724 17,219 32,042 17,993 24,186 13,836 340,667
 Medical-aid beneficiary 1,212 1,011 2,560 1,930 3,126 2,209 3,622 2,495 4,261 2,968 4,649 3,234 4,999 3,269 5,338 3,494 4,264 2,640 57,281
 Missing 18 6 24 10 38 8 27 11 37 10 42 14 55 21 47 15 383
SEER summary stage
 Localized 2,125 1,043 4,728 2,377 6,755 3,397 8,218 4,226 9,197 4,791 10,168 5,146 11,169 5,884 12,419 6,308 11,809 6,126 115,886
 Regional 3,366 1,431 7,635 3,608 10,049 4,987 11,543 5,910 12,408 6,496 13,111 7,102 13,641 7,391 14,169 7,779 12,163 6,686 149,475
 Distant 7,545 3,978 14,045 7,687 16,259 9,108 16,908 9,995 18,087 10,898 18,828 11,506 19,562 11,768 20,328 12,392 12,401 8,160 229,455
 Unknown 2,205 1,442 3,946 2,519 4,733 2,967 5,224 3,443 5,840 3,997 6,167 3,990 6,526 4,136 6,799 4,449 4,925 3,161 76,469

CNS, central nervous system; KCD, Korean Standard Classification of Diseases; NA, not available; SEER, Surveillance, Epidemiology, and End Results.

Table 4.

Medical use of patients with cancer included in the Cancer Public Library Database by time point

Type of claims and services 1 Year before cancer diagnosis
Within 1 year of cancer diagnosis
1 Year before deathd)
Total No. of patientsa) Total No. of claimsb) Average No. of claims per patientc) Total No. of patientsa) Total No. of claimsb) Average No. of claims per patientc) Total No. of patientsa) Total No. of claimsb) Average No. of claims per patientc)
Medical services
 Outpatient 1,845,904 46,481,287 25.2 1,824,157 70,198,749 38.5 560,041 21,664,803 38.7
 Inpatient 852,295 1,596,444 1.9 1,763,102 8,016,990 4.5 559,191 4,343,363 7.8
Dental services
 Outpatient 803,473 3,259,099 4.1 771,435 2,983,091 3.9 183,430 724,717 4.0
 Inpatient 2,461 3,024 1.2 6,656 9,713 1.5 3,365 4,799 1.4
Oriental medicine
 Outpatient 670,054 7,136,870 10.7 547,670 5,661,969 10.3 177,854 1,904,964 10.7
 Inpatient 20,398 76,992 3.8 88,507 323,089 3.7 61,744 232,643 3.8
Pharmacy 1,786,782 29,493,936 16.5 1,765,080 29,481,265 16.7 546,666 10,585,318 19.4
Others 12,112 218,463 18.0 10,764 200,492 18.6 6,418 120,393 18.8
a)

Number of cancer patients who have been claimed to the National Health Insurance Service using medical service at least once during the period,

b)

Number of claims during the time period,

c)

Annual average number of claims per patients,

d)

Among cancer patients who died between 2012-2020.