Objectives: BI-RADS 3 is an established assessment category in which the probability of malignancy is equal to or less than 2%. However, monitoring adherence to imaging criteria can be challenging and there are few established benchmarks for auditing BI-RADS 3 assignments. In this study, we explore some parameters that could serve as useful tools for quality control and clinical practice management.
Materials and Methods: This retrospective study covered a 4-year period (Jan 2014-Dec 2017) and included all women over 40 years who were recalled from a screening exam and had an initial assignment of BI-RADS 3 (probably benign) category after diagnostic workup. A follow-up period of 2 years following the assignment of BIRADS 3 was used for quantitative quality control metrics.
Results: Among 135,765 screening exams, 13,453 were recalled and 1,037 BIRADS 3 cases met inclusion criteria. The follow-up rate at 24 months was 86.7%. The upgrade rate was 7.4% (77/1,037) [CI: 5.9–9.2%] and the PPV3 was 33.8% (26/77) [CI: 23.4–45.5%]. The cancer yield was 2.51% (26/1,037) [CI: 1.64–3.65%] and did not differ (p=0. 243) from the 2% probability of malignancy. The initial BI-RADS3 per screening exam and per recall from screening were 0.76% (1,037/135,765) [CI: 0.72– 0.81%] and 7.7% (1,037/13,453) [CI: 7.26–8.17%], respectively.
Conclusion: Regular audit of BIRADS 3 metrics has the potential to provide additional insights for clinical practice management. Data from varied clinical settings with input from an expert committee could help establish benchmarks for these metrics.
Keywords: Radiology; Mammography; Breast Cancer; Screening; BI-RADS Criteria; BIRADS 3
Abbreviations: IRB: Institutional Review Board; MRNs: Medical Record Numbers; RIS: Radiology Information System; DBT: Digital Breast Tomosynthesis
Screening mammography is a vital element of breast cancer detection that has helped to reduce disease mortality [1-4]. With the current screening strategy, yearly cancer detection rate in the US is approximately five per 1000 screens and fewer than 2% of screens prove suspicious and require biopsy [5-7]. In an effort to improve specificity, decrease cost, and reduce harm the American College of Radiology (ACR) established the Breast Imaging Reporting and Data System (BI-RADS) category 3 - probably benign designation to be used for short-term surveillance instead of immediate biopsy [8-10]. The morphological criteria for BI-RADS 3 include a solitary circumscribed mass with a solid ultrasound (US) correlate, focal asymmetry without an US correlate, and grouped, round calcifications [8,9,11]. Typically the designation of BI-RADS 3 is made after an initial diagnostic work-up and should not be assigned on a screening mammogram. The assignment of BI-RADS 3 activates a short-term (6-, 12-, and 24-months) follow-up protocol which has been demonstrated to reduce false-positive findings at biopsy, while also retaining a high sensitivity for earlystage breast cancer .
The designation of BI-RADS 3 is meant to indicate that a finding has a 2% or less risk of malignancy  and a recent retrospective report of 45,202 BI-RADS 3 cases from the National Mammography Database suggests that this expectation is concordant with reality . However, institution-level evidence still suggest that in practice 0.9 – 7.9% of BI-RADS 3 lesions are upgraded to BI-RADS 4 and sent for biopsy [9,13-15]. Additionally, as the BI-RADS 3 designation is afforded some flexibility there is an appreciable amount of interobserver variability within each modality [16- 18]. As a result, monitoring adherence to imaging criteria can be challenging and there are relatively few established benchmarks for auditing BI-RADS 3 assignment. Herein, we share BI-RADS 3 audit results from our own institution over a four-year period and propose discrete auditing criteria that may help to establish performance benchmarks. We introduce the following metrics while on surveillance and which may serve as useful benchmarks:
(i) Percentage of initial BI-RADS 3 to total screens
(ii) Percentage of initial BI-RADS 3 to screen-recalled cases (BIRADS 0)
(iii) BI-RADS 3 upgrade rates within 24 months
(iv) Positive predictive value (PPV3) of lesions biopsied within 24 months
(v) Distribution of imaging morphology assigned a BI-RADS 3 category
(vi) Cancer yield.
Materials and Methods
Our institute is a large tertiary academic medical center (a NAPBC accredited and a breast imaging center of excellence by the ACR) in the northeast United States with an effective catchment area of nearly 1 million individuals. This retrospective study was approved by the Institutional Review Board (IRB) and is compliant with the Health Insurance Portability and Accountability Act. Information regarding the annual number of screening mammograms and the specific number of BI-RADS 0, and BI-RADS 3 cases were obtained from the Radiology Information System (RIS). All relevant BIRADS 3 Medical Record Numbers (MRNs) were identified with the assistance of the institute’s translational science core. All cases were reviewed in the electronic medical records at our institution. All data was extracted and compiled in Red Cap  by study personnel. Efforts were taken to standardize the data extraction process and to minimize inter-observer variability. A sample of ten records was collaboratively reviewed by all study personnel to standardize the data extraction and compiling of records from radiologist’s interpretation. Subsequently, the data were extracted from the remaining charts independently by four study personnel.
The study included all women over 40 years of age recalled (BI-RADS 0) from screening and assigned BI-RADS 3 at a followup diagnostic evaluation from January 2014 through December 2017 at our institution. Our inclusion criteria were women who were assigned BI-RADS 0 on initial screening exam, and, assigned BI-RADS 3 from a diagnostic follow-up exam performed within 90 days of the screening exam, and, had at least one follow-up visit in the subsequent 24-month period. Exclusion criteria were women under 40 years of age at the date of their initial screening exam, or, BI-RADS 3 assessment following diagnostic assessment in a symptomatic patient, or, the follow-up diagnostic evaluation from a screening mammogram exceeded the 90-day time limit, or, did not have one or more evaluations in the 2-year follow-up period. The study was limited to mammographic and ultrasound evaluations only. All of the digital mammograms were performed at our multiple clinical sites on Hologic (Bedford, MA) Selenia® or Selenia® Dimensions™ units. Both full-field digital mammograms (2D) and Digital Breast Tomosynthesis (DBT) techniques  are employed at the time of the screening examinations. There are no clearly defined criteria with regards to who is offered a 2D mammogram and who is offered a DBT study.
All breast ultrasounds were performed on a Phillips (Bothell, WA) iU-22 unit by a dedicated breast sonographer, and when necessary, the radiologist will also personally scan the patient. At our institute BI-RADS 3 cases are evaluated at 6 months (ipsilateral breast), 12 months (bilateral) and 24 months (bilateral). At each time point, supplemental ultrasound as indicated was also performed. The data abstracted from the chart included the patient age at time of BI-RADS 3 designation as well as if the preceding BIRADS 0 mammogram was their baseline. We also recorded whether the BI-RADS 3 designation was made via diagnostic mammogram, or ultrasound, or both. The radiologist who assigned the BI-RADS 3 designation, the breast density category (A-D), the quadrantbased location, and the morphology of the BI-RADS 3 finding from mammography and ultrasound were recorded. The presence of follow-up imaging at 6, 12, 24 months was recorded and was used to calculate the follow-up rate. If a patient was deemed to be lossto- follow up at 24 months, the last known finding was recorded. If a biopsy was completed, the duration (months) after BI-RADS 3 assignment, modality used image guidance, and the histopathologic findings from the biopsied specimen were all captured.
The quantitative measures in this study are all reported as proportions/percentages. The Clopper-Pearson exact 95% confidence interval was computed. One sample tests of proportions were used to determine if the quantitative metrics differed from values reported in literature. All tests were two-tailed. Effects associated with p<0.05 were considered statistically significant. All analyses were conducted using statistical software (SAS version 9.4, SAS Institute, Inc., Cary, NC).
A total of 135,765 screening exams were performed during the four-year period from which 13,453 were recalled (Figure 1). A total of 1,360 women were assigned BI-RADS 3 of which 1,037 women met the study eligibility criteria during the four-year period. There were 24 unique radiologists who assigned BI-RADS 3 category during the study period. Eight out of the 24 radiologists were fellowship-trained in breast imaging and each of these eight radiologists assigned 50 or more BI-RADS 3 studies and accounted for 93% (n=969) of all included BI-RADS 3 cases. The mean age at time of initial BI-RADS 3 assignment was 56.6 ± 11.1 years with range of 40–94 years (Table 1). For 165 (15.9%) women, the BIRADS 0 mammogram that preceded their BI-RADS 3 assignment was the patient’s first mammogram. In terms of breast density, nearly half (49.6%, n=514) of all of the breasts studied were category B, followed by 37.1% (n=385) in category C, 8.29% (n=86) in category A, and 4.82% (n=50) in category D.
BI-RADS 3 Features: Morphology, Laterality and Location
Nearly all (95.9%, n=994) of the BI-RADS 3 cases were assigned BI-RADS 3 on either mammogram/DBT alone, or mammogram/ DBT with ultrasound. The remainder (3.95%, n=41) of cases were assigned BI-RADS 3 on ultrasound (Table 2). The imaging morphology breakdown of the 1037 cases were asymmetry/ architectural distortion (n=512, 49%), grouped calcifications (n=398, 38%), and non-calcified circumscribed mass (n=90, 9%). The remaining 37 BI-RADS 3 cases (4%) were called at the discretion of the radiologist and the electronic records did not document the classic descriptors for a BI-RADS 3 assessment. The assignment of BI-RADS 3 lesions was relatively even with 49.8% (n=516) in the left breast, 44.6% (n=462) in the right breast, and 5.70% (n=59) of cases bilaterally. The upper outer quadrant had the greatest number of lesions in both the right (n=232, 38.0%) and the left (n=195, 35.3%) breasts, followed by the subareolar/central region in the right (n=140, 22.9%) and left (n=115, 20.8%) breasts.
Follow-up of BI-RADS 3 Lesions
The follow-up rate at 6 months was 97.1% (1,007/1,037) and decreased progressively to 95.8% (979/1,022) at 12 months and 86.6% (876/1,011) at 24 months (Table 2). The denominator is adjusted for lesion downgrade due to benign pathology from biopsy at prior follow-up. Among the 1,037 BI-RADS 3 patients, 7.4% (n=77) of all the cases underwent biopsy, of which n=23, n=40 and n=14 cases were biopsied at 6 months, 12 months and 18-24 months, respectively. A majority of the biopsies (n=47, 61%) of the biopsies were performed under ultrasound guidance and the remainder (n=30, 39%) using stereotactic mammography. The distribution of biopsies at different follow-up periods was as follows: 23/77 (30%) at 6 months, 40/77 (52%) at 12 months, and 14/77 (18%) were performed between 18-24 months.
The quantitative benchmarks suggested for routine clinical practice management are summarized in Table 3. The percentage of initial BI-RADS 3 to total screens was 0.76% (1,037/135,765) and the percentage of initial BI-RADS 3 to screen-recalled cases (BIRADS 0) was 7.7% (1,037/13,453). Within the 24-month follow-up period, the BI-RADS 3 upgrade rate was 7.4% (77/1,037). Among the 77 lesions biopsied within 24 months following BI-RADS 3 assignment, there were 26 malignancies, resulting in positive predictive value (PPV3) of 33.8% (26/77). Among the 26 cancers, 62% (n=16) were biopsied under ultrasound guidance, while 38% (n=10) were biopsied under stereotactic mammography. The cancer yield within the 24-month follow-up period was 2.51% (26/1,037). Among these 26 cancers, 30.8% (8/26) were detected at 6 months, 57.7% (15/26) at 12 months and 11.5% (3/26) at 18-24 months. The most frequently identified cancer type was ductal carcinoma in situ (DCIS) with 46% (12/26) of the cases. This was followed by invasive ductal carcinoma (IDC) at 42% (n=11) and invasive lobular carcinoma (ILC) at 12% (n=3).
The purpose of introducing the BI-RADS 3 categorization in the BI-RADS atlas  was to reduce the harms of screening by decreasing the number of false positives biopsies, reducing the cost of health care and yet maintaining sensitivity for early detection of breast cancers. Although the BI-RADS atlas specifies the probability of cancer in this subset as 2% or less, there has been no established routine audit in recent times for various clinical practice settings [17,21]. We therefore conducted a retrospective review of our own data as a quality assurance project to better guide clinical practice management. In our study over a 4-year period of 1,037 BI-RADS 3 cases following an inconclusive (BI-RADS 0) screening mammogram, the cancer yield was 2.5% (n=26) during the 2-year surveillance period. The observed cancer yield was not statistically different (p=0.243) from the 2% probability of malignancy as described in the BI-RADS atlas. Our cancer yield did not significantly differ with the 1.86% cancer yield reported by Berg, et al.  (p=0.123) but was significantly higher than the 1.47% reported by Micheals, et al.  (p=0.006), the 1.02% reported by Lehman, et al.  (p<0.001), and the 0.8% reported by Baum, et al.  (p<0.001).
Among the 26 cancers detected within the 2-year follow-up period, 8/26 (30.8%) were detected within the first 6 months and supports the value of the short-term (6 months) follow-up. The ratio in our series was different from Berg, et al. , where 58% cancers were identified at 6 months (p=0.005). During the first 12 months of follow-up, 23/26 (89%) cancers were detected and is comparable to the 73% reported by Chung, et al.  (p=0.076). In keeping with multiple prior studies [11,12,21] most of our cancers were DCIS 12/26 (46%). There were 11/26 (42%) invasive ductal carcinomas and 3/26 (12%) invasive lobular carcinomas in our series. The invasive cancers were early-stage cancers. In our study, during the 2-year surveillance, 77/1,037 (7.4%) cases were upgraded to BIRADS 4/5 and were biopsied. This rate was higher than the 5.9% reported by Michaels, et al.  (p=0.037) and 0.88% reported by Vizcaino, et al.  (p<0.001). The positive predictive value (PPV3) in our series was 26/77 (34%), which is larger than the 16.6% in Berg, et al.  (p<0.001) and comparable to the 25% in Michaels, et al.  (p=0.076). In our study, the proportion of BI-RADS 3 to the number of recalls (BI-RADS 0) was 10.1% (1,360/13,453) among all women and 7.7% (1,037/13,453) among study eligible women. In our literature search on PubMed, we could not identify any publication that reported on the use of this metric. We suggest including this metric as part of routine audits for clinical practice management.
To establish a benchmark across different practice settings, there is need for sharing recent data from varied clinical settings (academic and private, dedicated and non-dedicated breast imaging practices). The above referred indices could serve as a useful benchmark of a practice’s quality assurance. Age, ethnicity, lack of transport, education, and cost of care all result in disparities and barriers that contribute to a poor follow-up. Poor compliance to follow-up would directly impact the cancer yield in BIRADS-3 cases. While the literature [12,21,23,24] describes loss to followup as a major concern, in our series the follow-up rates were good with 97% at 6 months, 94% at 12 months and 84% at 24 months. In Michaels, et al.  the compliance for follow-up progressively declined from 83% at 6 months to 54% at 24 months. In Baum, et al. , the studied cohort only had a 71% compliance with follow-up. The current edition of BI-RADS atlas clearly discourages assignment of BI-RADS 3 from a screening examination without a complete diagnostic workup. However, prior literature did not make that clear distinction . The BI-RADS atlas clearly outlines the morphology criteria for assignment of BI-RADS 3 under mammogram, ultrasound and MRI; however, it also mentions that the radiologist’s experience and discretion could determine the assignment.
The distribution of the different morphologies contributing to a BIRADS-3 assignment in our study was asymmetry/focal asymmetry/architectural distortion was 49% (512/1,037), microcalcifications 38% (398/1,037), non-calcified circumscribed mass on mammogram or ultrasound or both was 9% (90/1,037) and 4% (37/1,037) of the assignments were at the discretion of the interpreting radiologist without one of the above descriptors in the report. In most studies [13,14,15,21] calcifications accounted for greater than 50% of the BI-RADS 3 assignments, except in Varas, et al. , where calcifications accounted only for 19% of the BI-RADS 3 assignment. Institutional policies, reader variability and access to care may be contributing to these differences. Also, radiologist’s experience and fellowship-training may influence interpretation . Dedicated fellowship-trained breast imagers and general radiologists performing breast imaging are known to differ in their evaluation and assessment of breast lesions [17,18]. Literature also mentions of varying cancer yields depending on whether dedicated breast imagers or general radiologists interpret breast exams [2,18,21]. The majority of our BIRADS 3 cases at our facility were reviewed by dedicated fellowship-trained breast imagers. Another factor contributing to variability that has been recently reported is the patient’s age with cancer yield exceeding 2% for women older than 60 years of age .
Also, after the introduction of DBT, there is literature indicating better visualization of architectural distortion, some of which lack an ultrasound correlate . During the early stages of DBT adoption in clinical practice, there was lack of a DBT-guided biopsy device and hence consensus among the radiologists on the management of these lesions. Further, there is also variability among radiologists  in terms of lesion descriptors that could contribute to variability in assigning BI-RADS 3 category. Ambinder et al , refers to the decreasing incidence of BI-RADS 3 post-DBT implementation. All of these factors contribute to inter-reader and inter-facility variability and have resulted in wide variability across practices in the assignment of BIRADS 3 as a percentage of the total screens. We feel that larger data set from across the country may help us define some benchmarks necessitating practices to review their policies should there be large variances from established benchmarks.
Our study had limitations. The study was retrospective in nature. Only mammographic and ultrasound features were considered. Prior to mid-2016 when we acquired the capability to perform tomosynthesis guided biopsies, architectural distortion without an ultrasound correlate were assigned BIRADS 3 at our institute. On review of our records, architectural distortion and asymmetry, though distinct morphologies, were sometimes used interchangeably in the report. Hence, we merged the two categories for analysis rather than attempt to distinguish them. We did not specifically account for downgrades to BIRADS 1 and 2 during follow-up, which is likely a very small proportion, since a majority of our breast imagers continue to follow up cases assigned a BIRADS-3 for the entire 24-month surveillance.
Audit of BIRADS 3 metrics has the potential to provide additional insights for clinical practice management. Many of the criteria referred to in this paper (cancer yield, BI-RADS 3 as a percentage of screens, as a percentage of BI-RADS 0, distribution of the morphology of BI-RADS3 assignments, upgrade rates, positive biopsy rates) may serve a useful role in monitoring clinical practice and for establishing the optimal range for the appropriate use of the BI-RADS 3 category. Larger data sets from varied clinical settings, with inputs from an expert committee could help establish benchmarks for these metrics.
• Audit of BIRADS 3 metrics can provide additional insights for
clinical practice management.
• BIRADS 3 metrics to monitor could include cancer yield, BIRADS 3 as a percentage of screens, BI-RADS 3 as a percentage of BI-RADS 0, distribution of the morphology of BI- RADS 3 assignments, upgrade rates, and positive biopsy rates may serve a useful role in quality evaluation and establishing the optimal range for the appropriate use of the BI-RADS 3 category.
• Larger data sets from varied clinical settings, with inputs from an expert committee could help establish benchmarks for these metrics.
The authors thank Yurima Guilarte-Walker and the Data Lake team as well as Pratik Patel from the RedCap team for their assistance with data collection and data management. This work was supported in part by the National Cancer Institute (NCI) of the National Institutes of Health (NIH) grants R01CA195512 and R01CA199044. The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NCI or the NIH.
- Niell BL, Freer PE, Weinfurtner RJ, Arleo EK, Drukteinis JS (2017) Screening for Breast Cancer. Radiol Clin North Am 55(6): 1145-1162.
- Nelson HD, Fu R, Cantor A, Pappas M, Daeges M, et al. (2016) Effectiveness of Breast Cancer Screening: Systematic Review and Meta-analysis to Update the 2009 U.S. Preventive Services Task Force Recommendation. Ann Intern Med 164(4): 244-255.
- Løberg M, Lousdal ML, Bretthauer M, Kalager M (2015) Benefits and harms of mammography screening. Breast Cancer Res 17(1): 63.
- DeSantis CE, Ma J, Gaudet MM, Newman LA, Miller KD, et al. (2019) Breast cancer statistics, 2019. CA Cancer J Clin 69(6): 438-451.
- Lehman CD, Arao RF, Sprague BL, Lee JM, Buist DS, et al (2017) National Performance Benchmarks for Modern Screening Digital Mammography: Update from the Breast Cancer Surveillance Consortium. Radiology 283(1): 49-58.
- Rosenberg RD, Yankaskas BC, Abraham LA, Sickles EA, Lehman CD, et al. (2006) Performance benchmarks for screening mammography. Radiology 241(1): 55-66.
- Lee CS, Bhargavan Chatfield M, Burnside ES, Nagy P, Sickles EA (2016) The National Mammography Database: Preliminary Data. AJR Am J Roentgenol 206(4): 883-890.
- D Orsi CJ, et al. (2013) ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System.
- Sickles EA (1911) Periodic mammographic follow-up of probably benign lesions: results in 3,184 consecutive cases. Radiology 179(2): 463-468.
- Liberman L, Menell JH (2002) Breast imaging reporting and data system (BI-RADS). Radiol Clin North Am 40(3): 409-430.
- Sickles EA (1986) Breast calcifications: mammographic evaluation. Radiology 160(2): 289-293.
- Berg WA, Berg JM, Sickles EA, Burnside ES, Zuley ML, et al. (2020) Cancer Yield and Patterns of Follow-up for BI-RADS Category 3 after Screening Mammography Recall in the National Mammography Database. Radiology 296(1): 32-41.
- Helvie MA, Pennes DR, Rebner M, Adler DD (1991) Mammographic follow-up of low-suspicion lesions: compliance rate and diagnostic yield. Radiology 178(1): 155-158.
- Varas X, Leborgne F, Leborgne JH (1992) Nonpalpable, probably benign lesions: role of follow-up mammography. Radiology 184(2): 409-414.
- Vizcaíno I, Gadea L, Andreo L, Salas D, Ruiz Perales F, et al. (2001) Short-term follow-up results in 795 nonpalpable probably benign lesions detected at screening mammography. Radiology 219(2): 475-483.
- Grimm LJ, Anderson AL, Baker JA, Johnson KS, Walsh R, et al. (2015) Interobserver Variability Between Breast Imagers Using the Fifth Edition of the BI-RADS MRI Lexicon. AJR Am J Roentgenol 204(5): 1120-1124.
- Michaels AY, Chung CSW, Frost EP, Birdwell RL, Giess CS (2017) Interobserver variability in upgraded and non-upgraded BI-RADS 3 lesions. Clin Radiol 72(8): 694.
- Ambinder EB, Mullen LA, Falomo E, Myers K, Hung J, et al. (2019) Variability in Individual Radiologist BI-RADS 3 Usage at a Large Academic Center: What's the Cause and What Should We Do About It? Acad Radiol 26(7): 915-922.
- Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, et al. (2009) Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. Journal of biomedical informatics 42(2): 377-381.
- Vedantham S, Karellas A, Vijayaraghavan GR, Kopans DB (2015) Digital Breast Tomosynthesis: State of the Art. Radiology 277(3): 663-684.
- Michaels AY, Chung CSW, Birdwell RL, Frost EP, Giess CS (2017) Imaging and histopathological features of BI-RADS 3 lesions upgraded during imaging surveillance. Breast J 23(1): 10-16.
- Lehman CD, Arao RF, Sprague BL, Lee JM, Buist DS, et al. (2015) National Performance Benchmarks for Modern Screening Digital Mammography: Update from the Breast Cancer Surveillance Consortium. Radiology 283(1): 49-58.
- Baum JK, Hanna LG, Acharyya S, Conant EF, Bassett LW, et al. (2011) Use of BI-RADS 3- Probably Benign category in the American College of Radiology Network Digital Mammographic Imaging Screening trial. Radiology 260(1): 61-67.
- Chung CSW, Giess CS, Gombos EC, Frost EP, Yeh ED, et al. (2014) Patient compliance and diagnostic yield of 18 month follow up of surveillance in probably benign mammographic lesions. AJR 202(4): 922-927.
- Lee CS, Berg JM, Berg WA (2021) Cancer Yield Exceeds 2% for BI-RADS 3 Probably Benign Findings in Women Older Than 60 Years in the National Mammography Database. Radiology 299(3): 550-558.
- Vijayaraghavan GR, Newburg A, Vedantham S (2019) Positive Predictive Value of Tomosynthesis-guided Biopsies of Architectural Distortions Seen on Digital Breast Tomosynthesis and without an Ultrasound Correlate. J Clin Imaging Sci 9: 53.