Abstract
Introduction: A number of classification systems (TIRADS) have been developed to estimate the likelihood of malignancy in thyroid nodules, but their reproducibility is yet to be assessed. We evaluated the interobserver variability and diagnostic performance of the TIRADS in Kwak’s modification (Kw-TIRADS) and European TIRADS (EU-TIRADS). Methods: Two independent specialists, blinded concerning the morphology of the nodules, evaluated ultrasound images of 153 thyroid nodules identified in 149 patients at multiple time points. Results: The interobserver agreement (Cohen’s κ) was 0.52 and 0.67 for Kw-TIRADS and EU-TIRADS, respectively, and rated as substantial. There were strong correlations between Kw-TIRADS and EU-TIRADS for the two observers with Spearman’s coefficients of 0.731 (p = 0.00025) and 0.661 (p = 0.0012), respectively. Sensitivity of Kw-TIRADS for the diagnosis of thyroid cancer was 95–92.31% and that of EU-TIRADS was 92.31–89.74%, with specificity of about 60% for both TIRADS. Conclusion: Despite the wide variability in the description of single ultrasonographic features, both Kw-TIRADS and EU-TIRADS may be a useful diagnostic tool in clinical practice.
Introduction
Ultrasonography (US) is the best tool for the visualization and assessment of thyroid lesions. Thyroid nodules are the most common among them, particularly in iodine-deficient areas [1, 2]. To make US results more reliable and less operator-dependent, a number of classification systems have been proposed, starting with the Thyroid Imaging Reporting and Data System (TIRADS) created by Hovarth et al. [3] with later modifications by other groups and authors. By the time the European Thyroid Association Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules were introduced as a European version of TIRADS (EU-TIRADS) in 2017 [4], a TIRADS modification by Kwak et al. [5] (Kw-TIRADS) had been used for several years in the Russian Federation. However, EU-TIRADS seemed more convenient and, therefore, was welcomed by many specialists in ultrasound medicine. So, the question was raised whether the relatively new EU-TIRADS could be used alternatively for the examination of thyroid nodules. Since information about the sustainability and reproducibility of any TIRADS is still lacking, we decided to evaluate and compare reproducibility, interobserver agreement, as well as diagnostic performance of both TIRADS classifications.
Materials and Methods
The cases for this study were selected consecutively from the patients with any thyroid nodules seen between September 2016 and December 2017 at the Endocrinology Department of a medical University Hospital. The images in B-mode and Doppler regimen were recorded and stored at first visualization before the application of fine-needle aspiration biopsy (FNAB) using different ultrasound machines: Aixplorer (France), Voluson E8 (GE, USA), Hitachi (Japan), or Aloka 5500 (Japan).
Two highly qualified specialists with >10 years of experience in thyroid US and biopsy and >5 years of working with TIRADS took part in the current study. They independently and in a blinded manner retrospectively analyzed the stored images of 153 thyroid nodules with established morphological structure from 149 patients.
Stratification of Thyroid Nodules
At our institution, we used a modified version of Kwak’s TIRADS. Since in the original publication, Kwak et al. [5] did not describe a TIRADS 2 category, for practical purpose, we utilized this category from the first classification by Hovarth et al. [3]. In the current study, the two observers independently scored the images, first according to the modified classification of Kw-TIRADS (Table 1). A few months later (in order to minimize the influence of their first decision using Kw-TIRADS), the same thyroid images were reevaluated by the same specialists according to EU-TIRADS [4] (Table 1). Indication FNAB were also evaluated according to both stratification systems as stated in the original publications [4, 5].
Types of thyroid nodules stratified according to Kwak’s (Kw-TIRADS) and European (EU-TIRADS) TIRADS in the study
Inclusion criteria were the presence of a thyroid nodule >5 mm and FNAB of this nodule performed or surgery planned at the time of ultrasound examination and finally performed within the study period. Exclusion criteria were absent cytology by FNAB or histology by surgery of the thyroid nodule, indeterminate cytology by FNAB, and suspicious or malignant cytology by FNAB without thyroid surgery within the study period. The Bethesda System for Reporting Thyroid Cytopathology (2009) [6] was used to report thyroid cytopathology.
Statistical Analysis
Statistical analysis was performed using STATISTICA software (version 8.0, 2007; Stat Soft Inc., USA). The interobserver agreement between the two observers was calculated using Cohen’s κ coefficient. Weighted κ statistics was used for comparative purpose. Hereby, a κ value of 0 corresponds to no agreement and a κ value of 1.0 to complete agreement. A κ value between 0 and 0.20 indicates a slight agreement, 0.41–0.60 indicates a moderate agreement, 0.61–0.80 indicates a substantial or good agreement, and 0.81–1.00 indicates an excellent agreement. κ was interpreted according to the guidelines laid out by Landis and Koch [7]. Correlations were assessed by Spearman’s correlation coefficient.
Sensitivity, specificity, positive (PPV) and negative predictive values (NPV), and positive likelihood ratio were calculated using TIRADS 4 and 5 categories for malignant classification of thyroid nodules and reported 95% CIs. The χ2 test was applied to compare the proportion of thyroid cancer between TIRADS categories and TIRADS-guided indications for FNAB. All tests were two-sided, and we used a significance level of α = 5%.
Results
Stored images of B-mode and Doppler ultrasound of 153 thyroid nodules with established morphological structure from 149 patients were available for analysis. Among these nodules, cancer was histologically verified in 39 cases (36 papillary, 2 follicular, and 1 medullary cancer), and follicular adenoma was proven in 26 nodules. Other 78 cases of benign colloid nodules and 10 cases of autoimmune thyroiditis were verified only cytologically (these patients had not been operated on since they had no indications for surgery). Some examples of the analyzed nodules, which both observers classified in the same TIRADS categories are presented in Figure 1.
a Example of a spongiform 12-mm nodule in the right thyroid gland classified as TIRADS 2 (consistent with benign lesion). FNAB cytology revealed a benign colloid nodule. b Example of a solid isoechotic nodule in the left thyroid gland classified as TIRADS 3 (probably benign lesion). FNAB cytology revealed a benign colloid nodule. c Example of a solid mildly hypoechotic nodule in the right thyroid gland classified as TIRADS 4 (probably malignant). FNAB cytology was consistent with follicular neoplasm. Histology revealed a benign follicular thyroid adenoma. d Example of a solid hypoechotic nodule, with irregular margins, taller-then-wide in shape, with microcalcification in the thyroid isthmus, classified as TIRADS 5 (probably malignant). FNAB cytology was typical for papillary cancer. Histology proved a papillary thyroid carcinoma.
Citation: European Thyroid Journal 10, 2; 10.1159/000508959
Reproducibility of Kw-TIRADS and EU-TIRADS
The distribution of the analyzed thyroid nodules by both variants of TIRADS classification varied between the two observers. Details and association of scoring from each observer with the number of malignancies are shown in Table 2. The correlations between Kw-TIRADS and EU-TIRADS in thyroid nodule distribution with Spearman’s coefficients (rs) for observer 1 and 2 were 0.731 (p = 0.00025) and 0.661 (p = 0.0012), respectively. There was a strong correlation between two operators in the process of thyroid nodule stratification: rs = 0.836 (p = 0.000008) for Kw-TIRADS and rs = 0.739 (p = 0.00048) for EU-TIRADS.
Distribution of thyroid nodules by two independent observers and number of malignancies, in n (%)*, within categories of Kw-TIRADS and EU-TIRADS
The interobserver agreement (Cohen’s κ coefficient) between the two observers was slightly higher for Kw-TIRADS with 0.621 than for EU-TIRADS with 0.567, which corresponds to “good” and “moderate” strength of agreement, respectively. An analysis of weighted κ, which also assumes close categories, showed already “good” agreement between the two observers with weighted κs of 0.674 and 0.627 for Kw-TIRADS and EU-TIRADS, respectively (Table 3).
Interobserver agreement (Cohen’s κ coefficient) between two independent observers working with Kw-TIRADS and EU-TIRADS
Diagnostic Performance of Kw-TIRADS and EU-TIRADS
Within Kw-TIRADS, 100% of the patients with TIRADS 2 had benign lesions, and 85.7–100% of all patients with TIRADS 5 had thyroid cancer. With EU-TIRADS, 88–93% of all patients with TIRADS 2 had benign lesions, and 68–99% of all patients with TIRADS 5 had malignancies (Table 2).
The distribution of malignant lesions between different TIRADS categories was compared using the χ2 test, separately for Kw-TIRADS and EU-TIRADS for both observers. Since the TIRADS 2 category had no cancer patients (0 cases) and the χ2 test can be implemented only with “non-zero” values, we combined TIRADS 2 and 3 categories as “benign” and TIRADS 4 and 5 categories as “suspicious”. The χ2 test showed that the absolute number of malignant nodules significantly differed between the “benign” and “suspicious” categories of both TIRADS systems and for both observers with p < 0.05, also indicating a good reproducibility of Kw-TIRADS and EU-TIRADS (Table 2).
The sensitivity of the TIRADS 4 and 5 categories for the diagnosis of thyroid cancer turned out to be quite good with both observers: 95% and 92.31% for Kw-TIRADS and 92.31 and 89.74% for EU-TIRADS, respectively. However, the specificity of both TIRADS systems did not exceed 60% with both specialists. The NPV was 94.4–97.0% for Kw-TIRADS and 94–95% for EU-TIRADS using TIRADS categories 4 and 5 for the diagnosis of malignancy (Table 4).
Diagnostic values of Kw-TIRADS and EU-TIRADS for the detection of malignant thyroid nodules
The χ2 test showed that both tested systems concordantly and unidirectionally select thyroid nodules for biopsy (online suppl. Table 1; see online Supplementary Materials). The EU-TIRADS-based criteria for FNAB demonstrated higher sensitivity (90.48–100%) but low specificity of 34.23–46.9%, as compared to Kw-TIRADS, which showed moderate sensitivity (76.19–79.49%) and higher specificity (56.76–57.89%). The rate of unnecessary FNAB comprised 64.04–53.51% with EU-TIRADS and 42.11% with Kw-TIRADS (online suppl. Table 2).
Discussion
Ultrasound examination traditionally is perceived as a highly subjective method of medical visualization, where judgment depends on the operator’s ability to describe specific features. On the other hand, thyroid US is the best tool for the visualization of extremely common thyroid nodules and their selection for biopsy. In the last decade, several stratification systems for thyroid nodules (TIRADS) have been proposed to minimize the subjectivity in describing their sonographic characteristics; however, there is still some concern about their reproducibility.
This study evaluated the concordance between observers who used a modified version of the earlier TIRADS by Kwak et al. [5] and the later European version of TIRADS [4] for the diagnostics of thyroid nodule. Kw-TIRADS stratifies thyroid nodules according to a number of suspicious ultrasound features, so it can be classified as a “quantitative” TIRADS system. On the other hand, EU-TIRADS is based on “appearance” or “absence” of any signs of malignancy in the nodules, so in principle it is a “qualitative” system. Therefore, there may be differences in an operator’s perception. In general, our results showed a “moderate” and “substantial” agreement between the specialists for both classifications with a Cohen’s κ coefficient of 0.621 for Kw-TIRADS and 0.567 for EU-TIRADS. A recent study by Grani et al. [8] also showed substantial interobserver agreement for different TIRADS versions with Krippendorff alpha statistics varying between 0.47 and 0.61. In another study by Chandramohan et al. [9], the overall agreement between observers was substantial for assigning Kw-TIRADS categories with a weighted κ coefficient of 0.721 (p < 0.001).
In addition, both the modified Kw-TIRADS and EU-TIRADS analyzed in our study showed an equal trend to increase the risk of malignancy with increasing category, irrespectively of the operator. This is consistent with the results of other previous studies using these systems [10, 11]. Also, the percentage of thyroid cancer in different TIRADS categories in the present study is in concordance with the results of the original works [4, 5]. Therefore, in our study the proportion of thyroid cancer varied from 0 to 7.7% in Kw-TIRADS 2 and 3 categories and from 35.7 to 100% in Kw-TIRADS 4 and 5 categories, while in the original publication by Kwak et al. [5], the corresponding frequencies of malignancies were 0–1.7% and 3.3–87.5%, respectively. Similarly, in EU-TIRADS, malignant nodules comprised 2.5–7% of EU-TIRADS 2 and 3 categories and 20–99% of EU-TIRADS 4 and 5 categories, as compared to 0–4% and 6–87% in the original publication by Russ et al. [4].
Concerning the diagnostic performance of TIRADS, we understand that, while sensitivity and specificity of a test are usually not influenced by the prevalence of disease in the population, PPV and NPV are influenced [12]. The relatively small sample of thyroid nodules in our research may influence NPV and PPV, since it may not reflect the true prevalence of thyroid cancer. In line with epidemiological data [13-15], of the 153 thyroid nodules in our study, 114 (74.51%) were benign and 39 (25.49%) were malignant, with papillary carcinoma comprising 85.7% of all malignancies. The comparable prevalence of the disease in the general population and in our sample made it possible to estimate PPV and NPV of the TIRADS systems. So, PPV and NPV of both TIRADS for the detection of thyroid malignancies and as indications for FNAB were comparable between our two observers. This further corresponds with the results of good reproducibility and interobserver agreement.
Like a previous meta-analysis [16], the present study demonstrated that sensitivity of both Kw-TIRADS and EU-TIRADS for the diagnosis of thyroid malignancies reached about 89–95%, while specificity of both diagnostic systems seems to be only moderate, with <60%, which is lower than in other published works [11, 17]. However, the positive predictive ratio of about 60% indicated a good diagnostic accuracy of both TIRADS systems. In our study, the malignancy rate in the nodules classified as TIRADS 2 (>7%) was significantly higher than the ideal range (<5%) recommended by Kwak et al. or Russ et al. [4, 5]. This was probably due to a difference in the radiologists’ experience, intra-observer variability, and interpretation of stored images instead of real-time ones, which could lead to misdiagnosis and improper management of some patients.
The main purpose of any thyroid nodule stratification is selection for further morphological examination and reduction of numbers of biopsies for benign thyroid lesions. In our study, diagnostic performance of FNAB criteria differed for EU-TIRADS and Kw-TIRADS. EU-TIRADS demonstrated higher sensitivity (90.48–100%) with lower specificity (34.23–46.9%) compared to Kw-TIRADS (sensitivity 76.19–79.49%; specificity 56.76–57.89%). Such a “practical parameter” as the rate of unnecessary FNAB in our study comprised 64.04–53.51% with EU-TIRADS and 42.11% with Kw-TIRADS. Therefore, EU-TIRADS behaved like a “screening test,” where better sensitivity results in a higher rate of unnecessary diagnostic procedures (biopsies); while in Kw-TIRADS, sensitivity and specificity values were comparable and produced more balanced indications for FNAB. Similar trends were described in studies using other TIRADS classifications. For example, in a study by Ha et al. [18], the rate of unnecessary FNAB reached 56.9% using the Korean Thyroid Association/Korean Society of Thyroid Radiology guidelines, but this was associated with the highest sensitivity among seven tested TIRADS systems (94.5%).
High NPVs for FNAB criteria of both tested TIRADS (from 97.83 to 100%) indicate that if these criteria are used in people with a strong suspicion of disease (suspicious nodules), then the absence of FNAB indication according to EU-TIRADS or Kw-TIRADS can reliably rule out a disease with a low chance of missing an aggressive thyroid malignancy.
Finally, the limitations of our study should also be addressed. First of all, in a retrospective assessment of “stiff” images, it was not possible to consider the influence of scan-related variables, such as probe inclination, US equipment used, operating conditions, and setting of real-time visualization. Second, the radiologists who participated in this study had more experience with Kw-TIRADS compared to EU-TIRADS. That could result in a difference in some parameters, although both TIRADS showed good reproducibility and consistency in our study. Last, not all thyroid nodules underwent surgery in this study. Some final diagnoses were based on cytology, which may have caused some false negative results and overestimated sensitivity of TIRADS systems. Moreover, our study was done in a highly specialized clinic, so the patients presented with relatively high-risk thyroid nodules, and the frequency of malignancy may be higher than expected in a general population of patients with thyroid nodules, although the real rate of thyroid cancer in the study seemed to be consistent with the literature data. Therefore, the selection bias was unavoidable. Since sensitivity and specificity also depend on the expected frequency of the investigated outcome, our results need to be confirmed in a longer prospective study using a wider population.
Conclusions
Both the modified Kw-TIRADS and EU-TIRADS showed good and comparable reproducibility and sustainability in the evaluation of malignant thyroid nodules. The overall agreement between observers for assigning TIRADS category was good or substantial. Despite the wide variability in the description of single ultrasonographic features, these TIRADS can be used as a reliable method of thyroid nodule assessment in clinical practice.
Statement of Ethics
The study protocol was approved by the local institute’s Committee on Human Research (IRB number: 03-19). The subjects did not give any special informed consent for the study, since no additional information or biological material was obtained beside that required in routine clinical practice.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
This project did not receive any additional financial support from funds or any third person.
Author Contribution
Y.P.S.: conception and design, organization of the study, data processing, manuscript drafting, and writing the final version. V.V.F.: critical revision of the work and final version of the manuscript. E.P.F.: data capturing and processing, manuscript drafting and revision. M.K.: statistical analysis, manuscript drafting and revision.
Footnotes
verified
References
- 1↑
Moon JH , Hyun MK, Lee JY, Shim JI, Kim TH, Choi HS, et al. Prevalence of thyroid nodules and their associated clinical parameters: a large-scale, multicenter-based health checkup study. Korean J Intern Med (Korean Assoc Intern Med). 2018 Jul;33(4):753–62. 1226-3303
- 2↑
Song J , Zou SR, Guo CY, Zang JJ, Zhu ZN, Mi M, et al. Prevalence of Thyroid Nodules and Its Relationship with Iodine Status in Shanghai: a Population-based Study. Biomed Environ Sci. 2016 Jun;29(6):398–407.0895-3988
- 3↑
Horvath E , Majlis S, Rossi R, Franco C, Niedmann JP, Castro A, et al. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J Clin Endocrinol Metab. 2009 May;94(5):1748–51. 0021-972X
- 4↑
Russ G , Bonnema SJ, Erdogan MF, Durante C, Ngu R, Leenhardt L. European Thyroid Association Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules in Adults: the EU-TIRADS. Eur Thyroid J. 2017 Sep;6(5):225–37. 2235-0640
- 5↑
Kwak JY , Jung I, Baek JH, Baek SM, Choi N, Choi YJ, et al.; Korean Society of Thyroid Radiology (KSThR); Korean Society of Radiology. Image reporting and characterization system for ultrasound features of thyroid nodules: multicentric Korean retrospective study. Korean J Radiol. 2013 Jan-Feb;14(1):110–7. 1229-6929
- 6↑
Cibas ES , Ali SZ. The Bethesda System for Reporting Thyroid Cytopathology. Thyroid. 2009 Nov;19(11):1159–65. 1050-7256
- 7↑
Landis JR , Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159–74. 0006-341X
- 8↑
Grani G , Lamartina L, Cantisani V, Maranghi M, Lucia P, Durante C. Interobserver agreement of various thyroid imaging reporting and data systems. Endocr Connect. 2018 Jan;7(1):1–7. 2049-3614
- 9↑
Chandramohan A , Khurana A, Pushpa BT, Manipadam MT, Naik D, Thomas N, et al. Is TIRADS a practical and accurate system for use in daily clinical practice? Indian J Radiol Imaging. 2016 Jan-Mar;26(1):145–52. 0971-3026
- 10↑
Srinivas MN , Amogh VN, Gautam MS, Prathyusha IS, Vikram NR, Retnam MK, et al. A Prospective Study to Evaluate the Reliability of Thyroid Imaging Reporting and Data System in Differentiation between Benign and Malignant Thyroid Lesions. J Clin Imaging Sci. 2016 Feb;6:5. 2156-7514
- 11↑
Skowrońska A , Milczarek-Banach J, Wiechno W, Chudziński W, Żach M, Mazurkiewicz M, et al. Accuracy of the European Thyroid Imaging Reporting and Data System (EU-TIRADS) in the valuation of thyroid nodule malignancy in reference to the post-surgery histological results. Pol J Radiol. 2018 Dec;83:e579–86. 1733-134X
- 12↑
Ranganathan P , Aggarwal R. Common pitfalls in statistical analysis: understanding the properties of diagnostic tests - Part 1. Perspect Clin Res. 2018 Jan-Mar;9(1):40–3. 2229-3485
- 13↑
Fagin JA , Wells SA Jr. Biologic and clinical perspectives of thyroid cancer. N Engl J Med. 2016 Sep;375(11):1054–67. 0028-4793
- 14
Kondrat’eva TT , Pavlovskaja AI, Vrublevskaja AI. Morfologicheskaja diagnostika uzlovyh obrazovanij shhitovidnoj zhelezy [Morphological diagnosis of nodular thyroid formations] // Prakticheskaja onkologija [Practical oncology], 2007; 8(1): 9-16 [in Russian], Bershtejn L.M. Rak shhitovidnoj zhelezy: epidemiologija, endokrinologija, faktory i mehanizmy kancerogeneza [Thyroid cancer: epidemiology, endocrinology, factors and mechanisms of carcinogenesis] [Practical oncology]. Prakticheskaja onkologija. 2007;8(1):1–8.
- 15↑
Bershtejn LM. Rak shhitovidnoj zhelezy: epidemiologija, endokrinologija, faktory i mehanizmy kancerogeneza [Thyroid cancer: epidemiology, endocrinology, factors and mechanisms of carcinogenesis]. Pract Oncol. 2007;8(1):1–8.
- 16↑
Wei X , Li Y, Zhang S, Gao M. Thyroid imaging reporting and data system (TI-RADS) in the diagnostic value of thyroid nodules: a systematic review. Tumour Biol. 2014 Jul;35(7):6769–76. 1010-4283
- 17↑
Zhang YZ , Xu T, Cui D, Li X, Yao Q, Gong HY, et al. Value of TIRADS, BSRTC and FNA-BRAF V600E mutation analysis in differentiating high-risk thyroid nodules. Sci Rep. 2015 Nov;5(1):16927. 2045-2322
- 18↑
Ha EJ , Na DG, Baek JH, Sung JY, Kim JH, Kang SY. US Fine-Needle Aspiration Biopsy for Thyroid Malignancy: Diagnostic Performance of Seven Society Guidelines Applied to 2000 Thyroid Nodules. Radiology. 2018 Jun;287(3):893–900. 0033-8419