Background: Thyroid nodule image reporting and data systems (TIRADS) provide the indications for fine-needle aspiration (FNA) based on a combination of nodule sonographic features and size. We compared the TIRADS-based recommendations for FNA with those based on the personal expertise of qualified US investigators in the diagnosis of thyroid malignancy. Methods: Seven highly experienced ultrasound (US) investigators from 4 countries evaluated, online, the US video recordings of 123 histologically verified thyroid nodules. Technical resources provided the operators with a diagnostic approach close to the real-world practice. Altogether, 4,305 TIRADS scores were computed. The combined diagnostic potential of TIRADS (TIRSYS) and the personal recommendations of the investigators (PERS) were compared against 3 possible goals: to recognize all malignant lesions (allCA), nonpapillary plus non-pT1 papillary cancers (nPnT1PCA), or stage II-IV cancers (st2-4CA). Results: For allCA and nPnT1PCA, TIRSYS had lower sensitivity than PERS (69.8 vs. 87.2 and 83.5 vs. 92.6%, respectively, p <0.01), while in st2-4CA the sensitivities were the same (99.1 vs. 98.6% and TIRSYS vs. PERS, respectively). TIRSYS had a higher specificity than PERS in all 3 types of cancers (p < 0.001). PERS recommended FNA in a similar proportion of lesions smaller or larger than 1 cm (76.9 vs. 82.7%; ns). Conclusions: Recommendations for FNA based on the investigators’ US expertise demonstrated a better sensitivity for thyroid cancer in the 2 best prognostic groups, while TIRADS methodology showed superior specificity over the full prognostic range of cancers. Thus, personal experience provided more accurate diagnoses of malignancy, missing a lower number of small thyroid cancers, but the TIRADS approach resulted in a similar accuracy for the diagnosis of potentially aggressive lesions while sparing a relevant number of FNAs. Until it is not clearly stated what the goal of the US evaluation is, that is to diagnose all or only clinically relevant thyroid cancers, it cannot be determined whether one diagnostic approach is superior to the other for recommending FNA.
Thyroid nodule ultrasound characteristics are used as an indication for fine-needle aspiration cytology, usually as the basis for Thyroid Imaging Reporting and Data System (TIRADS) score calculation. Few studies on interobserver variation are available, all of which are based on analysis of preselected still ultrasound images and often lack surgical confirmation.
After the blinded online evaluation of video recordings of the ultrasound examinations of 47 consecutive malignant and 76 consecutive benign thyroid lesions, 7 experts from 7 thyroid centers answered 17 TIRADS-related questions. Surgical histology was the reference standard. Interobserver variations of each ultrasound characteristic were compared using Gwet’s AC1 inter-rater coefficients; higher values mean better concordance, the maximum being 1.0.
On a scale from 0.0 to 1.0, the Gwet’s AC1 values were 0.34, 0.53, 0.72, and 0.79 for the four most important features in decision-making, i.e. irregular margins, microcalcifications, echogenicity, and extrathyroidal extension, respectively. The concordance in the discrimination between mildly/moderately and very hypoechogenic nodules was 0.17. The smaller the nodule size the better the agreement in echogenicity, and the larger the nodule size the better the agreement on the presence of microcalcifications. Extrathyroidal extension was correctly identified in just 45.8% of the cases.
Examination of video recordings, closely simulating the real-world situation, revealed substantial interobserver variation in the interpretation of each of the four most important ultrasound characteristics. In view of the importance for the management of thyroid nodules, unambiguous and widely accepted definitions of each nodule characteristic are warranted, although it remains to be investigated whether this diminishes observer variation.