Automated Segmentation of the Sigmoid Sinus using a Multi-Atlas Approach the using a

Purpose: To develop an accurate, automated multi-atlas segmentation algorithm for creating three-dimensional sigmoid sinus models from clinical computed tomography (CT) volumes for use in temporal bone mastoidectomy surgical simulation software. Methods: Clinical CT and micro-CT scans of 38 cadaveric temporal bones were used to develop and validate the algorithm. A single-atlas and multi-atlas segmentation were compared for accuracy using three different label fusion methods: majority voting, STAPLE, and joint label fusion. The automated segmentation algorithm was evaluated by comparing to ground truth manual segmentations through a combination of visual inspection and Dice, Hausdorff distance, and average Hausdorff distance metrics. Results: The best results were obtained for multi-atlas segmentation using joint label fusion for which a mean Dice value of 0.77 was found across all samples when compared to the manual segmentations. The mean Hausdorff distance was 10.39 mm, and the mean average Hausdorff distance was 0.30 mm, corresponding to less than two voxels. Visual inspection revealed accurate and high-resolution segmentations. Conclusion:


Introduction
Mastoidectomy is a complex surgical procedure that requires extensive knowledge of the anatomy of the temporal bone and is often required in cochlear implantation surgery. However, due to its complexity, mastoidectomy is a difficult procedure for trainees to master. Traditional training methods for surgical residents utilize cadavers, which can be expensive and difficult to access. To provide more consistent and accessible training for surgery involving the temporal bone, surgical simulators have been developed that provide haptic (touch) feedback and three-dimensional (3D) visualization [1][2][3][4][5][6][7][8][9][10]. Surgical simulation is becoming a widely accepted tool in Otolaryngology since it offers the ability to model difficult and varied cases and allows trainees to practice on patientspecific models.
Simulators such as CardinalSim [10] are able to import patient images and can be used for both training and in pre-operative planning. To maximize the variety and relevance of the anatomical cases available in the simulator, many clinical scans need to first be segmented (i.e., the anatomy needs to be delineated) for use.
A major drawback to VR simulators is the need for manual image segmentation, which can take hours per scan and is infeasible in a clinical setting. Automated segmentation methods are preferred to be able to rapidly and automatically produce a variety of 3D digital models of the temporal bone. One challenge associated with the development of simulators in Otolaryngology is the complex anatomy of the temporal bone. For example, in mastoidectomy one of the vital anatomical structures is the sigmoid sinus. The sigmoid sinus is a venous sinus that travels down an S-shaped groove in the temporal bone. During the initial portion of the procedure the sigmoid sinus represents the posterior boundary of bone removal [11]. It is critical for surgeons and trainees to be able to identify the sigmoid sinus to avoid catastrophic vascular complications during surgery.
Currently, creating 3D models of the sigmoid sinus requires manual delineation of structural boundaries by an individual with expertise in both the anatomy and the software tools used to segment medical image volumes. In addition, the sigmoid sinus is a highly variable structure in both shape and relative position to other structures in the temporal bone, and previous work has focused on evaluating its variability through statistical shape analysis [12].
Due to this variability, it is very time-consuming for an expert to manually perform segmentation (often taking up to 45 minutes).
Therefore, an automated algorithm is required to produce sigmoid sinus segmentations accurately and quickly with minimal expert intervention. However, due to the vast anatomic variability and low contrast at the medial wall, purely intensity-based methods such as thresholding are inconsistent, making development of an automated segmentation method of the sigmoid sinus uniquely challenging.
Several approaches to automated segmentation of anatomical structures have been described in the literature. One of the simplest approaches is thresholding. Thresholding is a fast and effective method for delineating structures that have high contrast from the surrounding objects, but as noted above this is generally ineffective for many anatomical structures such as the sigmoid sinus which has similar intensity values to its surrounding areas on its medial side. Atlas-based methods are far more promising for capturing variability in difficult to delineate anatomical structures due to their ability to capture the anatomical information and their relative robustness to poor contrast. One effective application of atlasbased segmentation on various structures of the temporal bone that excludes the sigmoid sinus has been presented by Powell et al. [13]. The present work describes the development and evaluation of a multi-atlas-based segmentation algorithm that compares a variety of label fusion methods on clinical CT scans of cadaveric temporal bones with the goal of accurately segmenting the sigmoid sinus. The multi-atlas method presented herein has been applied previously to segment medical images in a variety of fields [14][15][16] and has been adapted in this work to allow for the usage of highly detailed micro-CT (µCT) atlases to segment low resolution and poorly delineated clinical CT scans. This method was able to capture the high variability of the anatomy of the sigmoid sinus.

Ground Truth Segmentations and Atlas Creation
The sigmoid sinus was manually segmented from the µCT images by an expert anatomist (KV) using a combination of semi-automated and manual tools in 3D Slicer, an open-source software for medical image processing and visualization [17].
Consensus interpretation of the segmentations was achieved by an experienced surgeon (SKA), the anatomist (KV), and the lead author (DA). Details of the segmentations and anatomic analysis have been previously described in [12]. Twelve segmentations were used to define atlases to drive the segmentation algorithm and the remainder formed the ground truth and were used to evaluate the segmentation algorithm. Two sets of atlases were defined from the 12 segmentations: 6 atlases for left temporal bones and 6 for right temporal bones. This number of atlases managed to capture the variability of the sigmoid sinus while producing results comparable to using 36 atlases. The use of µCT images resulted in higher resolution and detail for both the ground truth and the algorithm-generated segmentations. As the algorithm is applied to clinical-CT volumes, the µCT segmentations were registered using a combination of rigid and affine techniques to their corresponding clinical-CT volumes and then reviewed by the research team to be used as a high-resolution ground truth.  Rigid and Affine Registration: Prior to non-rigid registration, a rigid step and affine step were used to approximately align each atlas to the target clinical-CT volume. This two-step approach was taken to improve accuracy. By applying the rigid registration first, the time required for the affine transformation is reduced, approximate before non-rigid registration. This two-step registration approach was performed using the NiftyReg implementation of a symmetric (source to target and target to source, simultaneously) blockmatching registration, applied in three pyramidal levels from coarse to fine, doubling the resolution on each step up to the original 0.154 mm isotropic voxel size [18].

Non-Rigid Registration:
Non-rigid registration for the µCT atlases was also accomplished using the NiftyReg implementation of a B-spline pyramidal approach in three progressively finer control point grids using 12mm x 12mm x 12mm, 6mm x 6mm x 6mm, and 3mm x 3mm x 3mm grid spacings, respectively [19]. The loss function used by NiftyReg for the non-rigid registration was a combination of normalized mutual information (NMI) and bendingenergy (BE), which was optimized using a conjugate gradient scheme. An example of the process of volume registration from rigid to the final non-rigid B-spline registration is given in Figure 2.  [20], and joint label fusion [14]. The registrations were then compared to determine the differences between the methods as they related to the sigmoid sinus. been applied in previous segmentation applications [13]. Since the sigmoid sinus is one connected blood vessel, the approach used for island removal was to discard all but the largest connected component of the segmentation. This resulted in a clean single label with no noise that was ready for use in a surgical simulator.

Evaluation and Metrics
The automated segmentations generated by the multi-atlasbased method were evaluated by comparing to the ground truth, manual segmentations completed by the anatomist using a variety of metrics. The segmentation algorithm was applied to clinical-CT volumes, but the assessment was done with comparison to ground truth µCT which were registered to their corresponding clinical-CT since label maps are higher resolution in µCT as well as boundaries being more visible in µCT. The first metric used was the Dice coefficient, which determines the overlap between the automated and manual segmentations. The second metric used was the Hausdorff distance, which measures the maximum distance from one segmentation to another. The Hausdorff distance is extremely sensitive to noise, such that algorithms which segment a smaller or larger portion of the structure than the ground truth will return larger values while segmenting the correct areas.
Island removal completed in post-processing is reasonably effective at negating this sensitivity to noise as it removes the unconnected components of the segmentation. The final metric used was the average Hausdorff distance (AHD), which considers the mean of all the Hausdorff distances between the two segmentations. The AHD metric is less sensitive to outliers than the Hausdorff distance and provides an understanding of the magnitude of the distance between the segmentations that cannot be seen in the Dice coefficient. Using the Dice coefficient, Hausdorff distance, and AHD in conjunction with visual comparison provided an overall understanding of the differences in shape, size, outliers, and distance between the two compared segmentations (ground truth and algorithmic) while providing values that could easily be compared to previous segmentation projects in the literature [21].

Results
Results of the majority voting, STAPLE and joint label fusion metrics for segmentations created using just a single atlas (only one atlas is registered to the target image) as well as multi-atlas are shown in Table 1

Discussion
While the single-atlas approach to automated segmentation can occasionally provide comparable metric results to individual multi-atlas segmentations, the average performance of the singleatlas applied across the dataset of temporal bone images is much lower and the results are less consistent. This is likely due to the difficulty of capturing the high degree of variability of the sigmoid sinus anatomy in one example. This performance increase from single atlas to multi-atlas has been seen in previous works that segmented other anatomical structures, such as the brain from MRI scans [16,22]. The current methods presented for multi- The seemingly large mean Hausdorff distance may be attributed to the difference in the amount of the extremities segmented by the algorithm from the ground truth segmentations, since there were no large portions segmented outside the sigmoid sinus in the visual inspection. One drawback to joint label fusion is that the method requires much more computation time and storage space for registered images than STAPLE or majority voting, which performed almost as well as joint label fusion in the current paper. STAPLE and majority voting are therefore attractive options for segmenting the sigmoid sinus when time and storage space is at a premium.
Additionally, while the present study focused on the structure of the sigmoid sinus, other structures of the temporal bone can be segmented using methods such as thresholding. Thresholding was not a viable option here due to the low contrast of the sigmoid sinus on the medial side.
By using µCT atlases, detailed, high-resolution models were created that had comparable metric scores to other temporal bone structures segmented by previous groups using different methods.
Other atlas-based approaches for segmenting the structures of the temporal bone differ from the one presented herein, as they do not use multiple atlases for one segmentation, do not use label fusion methods to better capture variability, do not use µCT atlases, and do not target the sigmoid sinus [13,23]. Despite the accurate results produced by the present approach, it is important to note that since these are automatically generated models there is risk of error.
If used clinically, automatically generated segmentations should be reviewed and revised as needed by an expert. Even in cases where the automated segmentation requires revision, automation significantly reduces the time and labor associated with manual segmentation.