Xiao Tianle and Carol Anne Hargreaves*
Received: July 10, 2023; Published: July 24, 2023
*Corresponding author: Carol Anne Hargreaves, Department of Statistics and Data Science, National University of Singapore, Singapore 117546
The application of artificial intelligence (AI), including Deep Learning (DL) algorithms, is an emerging ground in the healthcare industry. Rheumatoid arthritis (RA) is a disease that is at the forefront of digital healthcare. The traditional manual approach is time consuming, tedious, and highly subjective. Multiple research paper has demonstrated promising results for the classification of rheumatoid arthritis using the fingers and toes, but little research has shown progress at classifying rheumatoid arthritis using the wrist X-Ray images. The objective of this paper is to develop an algorithm for automatically detecting the wrist joints of rheumatoid arthritis patients using X-Ray images and computer vision technology. In this study, >95% accuracy is obtained for the wrist joint detection. This paper serves as a proof of concept for wrist joint identification. Our initial experiments in wrist joint detection have shown significant improvements to this area of research, bringing automatic wrist joint detection closer than ever before towards a clinical support system. Our automated algorithm also provides the foundation for the classification of RA using wrist joint X-Ray images.
Keywords: Rheumatoid Arthritis; Image Detection; X-Ray; YOLOv7; Computer Vision Technology
Abbreviations: CLAHE: Contrast Limited Adaptive Histogram Equalization; AI: Artificial Intelligence; Dl: Including Deep Learning; RA: Rheumatoid Arthritis; IOU: Intersection Over Union
There is no known method to automatically detect wrist joints using X-Ray images for the classification of rheumatoid arthritis patients.
What is Already Known
Multiple research papers have demonstrated promising results for the classification of rheumatoid arthritis using finger and toes X-Ray images, but little research has shown progress for the wrist joint identification.
Our Contributions to Wrist-Joint X-Ray Image Detection
The objective of this study was to construct a deep learning algorithm
that provides automatic detection of the wrist joints using
the state of the art computer vision technology, YOLOv7. In this preliminary
study, >95% accuracy was obtained for wrist joint detection.
This paper serves as a proof of concept for wrist joint identification.
It also provides the foundation for the classification of wrist joints for
rheumatoid arthritis. Rheumatoid arthritis (RA) is a chronic autoimmune
disease that primarily affects the joints. The global RA prevalence
estimate was 0.46% in 2015 , and this number is expected to
increase. Its symbolic symptoms include inflammation in the synovial
membrane (the lining of the joint), which can cause pain, stiffness,
and swelling in the affected area. Over time, RA can lead to damage
to the joints characterized by joint space narrowing, erosions in subchondral
bone, and joint deformity , which may eventually result
in irreversible abnormality and loss of function. Patients at early
stage of RA often develop disease-related or therapy-induced osteopenia
and joint space narrowing caused by the dissolution of cartilage
tissue. These morphological alterations can be detected by CAD
techniques before further deterioration . Early detection of RA is
crucial so that doctors could intervene in the deterioration progress
with drug treatments. Several attempts have been made to develop
software and algorithms to automate this process. Computerized
measurement provides quantitative data on JSW that can be more reproducible
than data obtained with traditional scoring systems [4-6].
Automated detection has been shown to be more sensitive to changes
than SvdH scores . Jin et al. illustrated that eye diseases could be
related to RA development in patients above age of 50 .
Initial experiments involving machine learning algorithms were first attempted . O’Neil et al. designed a regression model to give prediction whether a patient is at high risk of developing RA . Deep learning is a type of AI that proves to be powerful in healthcare industry, such as medical imaging and text from electronic health records . Deep learning-based models are used to conquer a broad range of missions such in rheumatology such as testing for antinuclear antibodies , interpretation of synovial ultrasounds , and predicting diagnoses from an EHR . A study found that building accurate models to forecast complex disease outcomes using electronic health record data is possible . More recently, convolutional neural networks have become the dominant method for medical image classification [16,17], medical image segmentation . An efficient CNN architecture (GRNN) achieves high accuracy for hand X-ray classification . An efficient CNN ResNet-Dwise50 model was designed for the overall scoring of RA in hand X-rays by introducing depth wise separable convolution block and inverted residual block . Another common approach involves a two-step approach to detect finger joint destruction [21,22]. A customed model using SIFT and CNN to extract features outperformed traditional ML classifiers . YOLOv4 and VGG16 were combined for the assessment of RA, osteoarthritis and achieved 90.7% accuracy . Additional efforts constructed a customized CNN model with batch normalization, ReLU, and pre-trained VGG 16 model but failed to improve the accuracy (67.5%) . The main contributions of this project, in which Rheumatoid Arthritis classification was performed on radiographic pictures at wrist area using novel computer vision algorithms, are shown as below.
• A dataset containing 367 pairs of hands was obtained from CLEAR repository and TETRAD study. The X-ray images provide an accurate depiction of various stages of patients in real-life scenarios.
• To understand the distribution of joint space narrowing and bones erosion scores at all wrist joints/bones, data exploration was performed with the given score tables.
• To resolve the issue of imbalance in various classes of scores, 3 joints and 3 bones are selected for further study. The selected bones have a more balanced distribution of scores and are typical targets of inflammation.
• In order to improve the interpretability of given images, several image pre-processing methods were applied in sequence, including cropping, normalization, resizing, padding, contrast increment.
• Manual labelling was conducted to provide the ground truth to the deep learning models. The joints and bones were labelled separately in a free, open-source software. Overall, 6 classes of bounding boxes were drawn on each wrist image.
• YOLOv7 was chosen as the object detection algorithm and the training results were compared. Metrics including mean average precision (mAP), precision, recall, F1-score were used to tune parameters of the model. Different sizes of bounding boxes were also tested, but their results showed negligible difference.
• With the newly publish YOLOv7, modern computer vision has proven to be capable of detecting wrist joints and bones with tremendously high accuracy. In this ongoing research project, the object detection result provides a solid foundation for the future scoring models.
The radiographic image dataset used in this project contains 367 pairs of X-ray images of both left hand and right hand of patients, including the wrist area. High-resolution radiographic images of both hands were presented in sequence as JPEG images. The side of hands shown in the picture was indicated in the file name as either LH (Left Hand) or RH (Right Hand). Two separate CSV files were provided with overall erosion scores, individual erosion score for each joint, overall narrowing scores, and individual narrowing score for each joint. The scores were obtained through Sharp/van der Heijde method by certified and recognized physicians.
Due to the limit of computational resources, we decide to only focus on joints that are comparatively equally distributed. We selected ‘mna’, ‘capnlun’, ‘radcar’, ‘mul’, ‘nav’, and ‘lunate’. We first categorized the scores into 0,1 and the rest. The assumption was samples with score 1 have only minor symptoms. Those at the initial stages of RA do not require too much in house treatment. We indicated zeros and ones as ‘healthy’ and the rest as ‘unhealthy’ for ease of representation. The distribution is roughly 2:1 for joints which is a reasonable range. The distribution of bones is significantly more unequal. We also tried to isolate zeros from the rest and derived the graphs in (Figures 1 & 2) The distribution of this categorization is more likely to give satisfactory results.
The raw X-ray images from the dataset were first cropped to focus only on the wrist area, which is the region of interest. For the convenience of the following steps, we removed the redundant parts of images by a fixed proportion. After conducting trials and manual observations, we decided to remove the upper 3/7 of all images as shown in Figure 3. Our aim was to eliminate most of the fingers and palms while still ensuring that all the wrist bones remained clearly visible.
Normalization can speed up the convergence rate during training by helping to find the optimal solution more quickly. This is because normalization reduces the range of the input data, which can help the optimizer take larger steps towards the minimum. After normalization, the input features have similar scales and distributions (0-1), which can improve the performance of the model during training.
Resizing and Padding
The cropped images were resized to a dimension of 1400 * 760. Firstly, we obtained the minimum width and length of all the images, and then we multiplied the exact pixel numbers by an arbitrary number to derive reasonable integer pixel numbers of 1400 and 760. We enlarged the images proportionally by a fixed number to make the labelling process easier. Uniform height and width also ensure that the joint detection model can identify areas of interest within a smaller range, thus increasing the success rate. To maintain the original aspect ratio for each image, black pixels were added to areas where necessary. For each image, we compared its height ratio (1400 / height) and width ratio (760 / width). We rescaled the image according to the smaller ratio while keeping the original aspect ratio. We calculated the amount of padding required on all four sides so that 1400 and 760 could be achieved. This ensures that the objects remain centered after resizing. Black pixels were used to fit the grayscale background color.
Contrast Limited Adaptive Histogram Equalization (CLAHE) is a proven technique for enhancing the contrast of an image. It extends the traditional Histogram Equalization method by spreading the intensity values of an image evenly across its entire histogram. We applied CLAHE to all the images with a clip limit of 2.0 and a grid size of (8,8). This procedure greatly improved the contrast and made the bones more visible. It also helped alleviate the problem of some pictures being overexposed or underexposed. The visual features are more likely to be captured by the model, thus easing the learning process of joint identification. CLAHE also achieved noise removal. As shown in Figure 3, the image is much clearer after the contrast is increased.
Joint Segmentation (YOLO)
Two separate models were trained for joints and bones, respectively. In the training process, we first download the weight document from the YOLOv7 official GitHub repository, which was pre-trained on the Common Objects in Context (COCO) dataset. This is the starting weight for our training process. The joints and bones models were trained independently of each other. Since the number of classes is the same for each model (Capnlun, Mna, Radcar for joints, and mul, nav, lunate for bones), the same set of parameters could be applied as follows: epochs = 300, batch-size = 2, img-size = 1400, 760, device = ‘0’, workers = 1. The final epoch number was determined after multiple failed attempts at 80, 100, and 200 epochs. The batch size and number of workers were kept low since increasing these values exceeded the maximum memory of the GPU. However, when training one model for each joint, 200 epochs were sufficient to generate bounding boxes independently. Since the three classes of labels were exported together, we had to extract only one line of label when training one model for one joint. Additionally, YOLOv7 only recognizes class labels that start from 0. Therefore, when training an isolated model to detect Mna joints, we first had to extract the lines of label that began with class label 1 (Mna is the second class) and then convert all the 1s to 0s.
Joints Detection Model Testing Result
The confusion matrix in Figure 4 below shows the accuracy of the YOLOv7 wrist detection results, with each box indicating a 100% accuracy for the joint location. For the test Intersection Over Union (IoU) threshold, we set it to the default value of 0.65. As long as the predicted joint area has an intersection of >=65% with the manually labelled area, it is classified as detected. All joints have an accuracy of >=95%, with Capnlun being the lowest, which could be due to differences in the extent of overlap. The F1-score reaches its highest value at a confidence level slightly over 0.6 and then continues to drop. To further improve the detection accuracy of a specific joint, we could train separate models for each joint and tune the parameters independently.
This study provides proof of the validity of applying YOLOv7 to detect wrist joints. To our knowledge, no algorithms have been able to automatically classify RA in the wrist area thus far. While classifying RA in fingers and toes has been proven to be reliable, progress in extending this concept to the wrists has been slow. This is primarily due to the overlapping of carpal bones, which makes wrist joints significantly fuzzier to observe. Our study results provide solid evidence that the most advanced computer vision technology can accurately detect wrist joints, which makes an automatic classification algorithm possible. A quick wrist joint detection algorithm will speed up the process for wrist joint scoring and the classification of RA. Even if the final diagnosis must be authorized by a clinician, automated algorithmic wrist detection is valuable in the health care digital transformation and can help clinicians to understand the severity of the RA and prioritise patients with more serious symptoms. A scientific, systematic approach to wrist joint detection and classification also reduces subjectivity, as even experienced clinicians can have conflicting opinions. Prior research has also indicated that clinicians can be negligent in monitoring the RA progress. A categorical score is often vulnerable to minor developments in symptoms, which can be effectively captured using automated algorithms.
Unfortunately, due to time constraints, we were only able to complete the wrist joint detection. The advancements in object detection algorithms have given us the opportunity to explore a new frontier in detecting wrist joints for the classification of rheumatoid arthritis. Initial experiments in wrist joint detection have shown significant improvements, bringing automatic detection closer than ever before. We will continue to build on existing experimental results and develop algorithms that can predict scores in targeted areas. Our first goal is to train the model to differentiate healthy joints from unhealthy ones. We define healthy joints as those with a score of 0, and unhealthy ones as those with a score greater than 0. As shown in the data exploration steps, the distribution of healthy vs. unhealthy joints is fairly equal and less prone to underfitting. Once we achieve optimal results, we will build on this foundation and begin training models to make precise predictions on individual scores. One major difficulty we foresee is the imbalance in the distribution of scores. Among all the joints, approximately 80% are healthy, and in some cases, the proportion of healthy joints can be as high as 85% (cmc3, cmc4, cmc5, radius, ulna). The lack of unhealthy joint examples makes it difficult to produce accurate predictions. One proposal is to use techniques such as oversampling (randomly increasing samples in minority classes), under-sampling (randomly decreasing the number of samples in majority classes during training), and ordinal class encoding (making classes with lower number labels subsets of higher order classes).
The authors declare that they have no known competing financial interests or personal relationships that might potentially have influence on this paper.
The datasets described and used in the paper were contributed by University of Alabama at Birmingham. These were obtained through RA2-DREAM Challenge.