Aron Hernandez Trinidad1*, Rafael Guzman Cabrera2, Blanca O Murillo Ortiz3 and Teodoro Cordova Fraga1
Received: November 02, 2024; Published:November 08, 2024
*Corresponding author: Aron Hernandez-Trinidad, Physical & Engineering Department-DCI, University of Guanajuato campus Leon, Loma del Bosque 103, Lomas del Campestre, Leon, 37150, GTO, Mexico
DOI: 10.26717/BJSTR.2024.59.009298
The diagnosis of pneumonia by X-rays is a widespread practice in the early detection of the disease and initiation of timely treatment. Automatic classification models using convolutional neural networks (CNN) have proven to be an effective and accurate tool in the diagnosis of this disease. A model that detects pneumonia in a set of chest X-rays, using two CNNs: ResNet50 & VGG16 is proposed in this work. The method consists of preprocessing and classifying the set of radiographs available in the Kaggle repository that contains 2,224 records classified by an expert and segmented into two divisions: training set (1,600 images with 800 normal and 800 pneumonia) and test set (624 images with 234 normal and 390 pneumonia). This model achieves 97% accuracy by classifying the set of chest images into two categories: normal and pneumonia. The performance of the automatic classification scenario is robust and proves to be effective in the classification of chest X-rays to detect pneumonia.
Keywords: Pneumonia Diagnosis; Convolutional Neural Networks; Automatic Classification Model; Chest X-rays
Abbreviations: CNNs: Convolutional Neural Networks; HOG: Histograms of Oriented Gradients; SVM: Support Vector Machines; VGG: Visual Geometry Group; SVM: Support Vector Machines; TP: True Positives; FN: False Negatives; FP: False Positives; TN: True Negatives; OCT: Optical Coherence Tomography
Convolutional Neural Networks (CNNs) are a type of deep learning algorithm used to analyze images. These networks are based on machine learning, where a computer learns to recognize patterns in the input data, in this case, chest X-ray images [1]. In pneumonia detection, CNNs can analyze specific image features that indicate the presence of disease, such as lung opacity, lung tissue consolidation, and inflammation [2]. In other words, CNNs are trained to identify specific patterns in images as edges, shapes, and textures [3]. By doing this, the network learns to extract relevant characteristics of interest from the image. The accuracy of CNNs in the detection of pneumonia on chest X-rays has been evaluated in several studies and has been shown to represent a solid auxiliary tool in the diagnosis of the disease [4-6]. They can analyze enormous amounts of images quickly and automatically, helping clinicians make informed decisions and improve diagnostic efficiency. Although CNNs are powerful tools for image classification, they may still face some difficulties in doing so, including [7,8]:
a) Variability: Images can vary in size, quality, lighting, orientation, and other factors, which can make accurate classification difficult.
b) Overlapping Classes: In some cases, there can be similarities between different classes, which can make it difficult to differentiate them.
c) Classification System: The CNNs contain an integrated classification section that allows them to be evaluated, however, being integrated makes it difficult to modify parameters or replace the section with other advanced classification algorithms. In this paper, a model is proposed that solves the challenges to generate a scenario for the automatic classification of chest X-rays to identify pneumonia optimally and accurately [9]. Instead of using CNNs as a complete and closed system, as is used in a wide range of computer vision and machine learning applications, they are used as feature extractors, allowing a model to be created consisting of three blocks: image preprocessing, feature extractor, and classification system [10]. This allows for taking advantage of each section in a better way, without affecting the complete system, to influence competitive classification values [11]. The main contributions of the work can be summarized as follows:
1. An automatic classification model is proposed to detect pneumonia in a set of chest radiograph images, using CNN as feature extractors.
2. The images go through a preprocessing that standardizes the set of radiographs with the same size (224X224) and filters: 2D convolution, low pass, high pass, Gaussian, median, and histograms of oriented gradients (HOG), which enhance the characteristics of lung opacity, suggesting the presence of pneumonia.
3. The dataset found in the Kaggle repository contains 2,224 chest X-rays. In addition, they are previously classified by experts and labeled into two classes: normal and pneumonia, which makes it possible to differentiate between both classes.
4. Two classification scenarios are proposed in the model: training/test set and crossvalidation that make up the classification system with support vector machines (SVM) as classifier algorithm and model evaluation metrics: Accuracy, Precision, F1 Score, and Recall.
The rest of this document is organized as follows. Section 2 presents relevant research and lists existing methods and models. In Section 3, the proposed model is explained in detail, including the image preprocessing, the CNNs used as feature extractors, and the classification system. In Section 4, the proposed model is evaluated, the advantages of our method are evaluated through experiments, and the results are reported. Section 5 concludes this work and forecasts our future work.
Detection of pneumonia using artificial intelligence is a constantly evolving area of research, and numerous studies have explored the use of the machine and deep learning techniques, using neural networks to improve the accuracy and effectiveness of detection of this disease. P. Rajpurkar, et al. [12] develop a model to detect pneumonia from chest Xrays using a CNN. The authors evaluated the neural network’s performance with 420 radiologist-labeled images, and it scored 95% on its evaluation metric, compared to the radiologists’ 95% accuracy, demonstrating that the trained and labeled deep learning algorithm previously, achieve high precision in the detection of the disease. The authors suggested that using CNN models could potentially be used to assist radiologists in clinical settings, improving the speed and accuracy of pneumonia diagnosis. T. Rahman, et al. [4] present four different methodologies using four CNNs, through transfer learning. The pre-trained networks analyze a total of 5,247 chest images consisting of three labels: normal, viral, and bacterial. The accuracy of their model to classify the three labels was 93.3%, so the authors propose the model as a useful and rapid study for the diagnosis of pneumonia in the three corresponding classes. The computer-aided diagnostic tool can significantly help the radiologist to take more clinically useful images and identify pneumonia with its type immediately after acquisition. T. B. Chandra, et al. [13] expose a method for early and automatic detection of pneumonia in segmented lungs using machine learning.
The authors focus on the pixels of the region of interest that contribute to the extraction of relevant features of the confined area. The proposed model is examined with five classifiers on a data set of a total of 412 chest X-ray images divided into 206 normal cases and 206 with pneumonia, thus representing a binary classification model. The experimental results show that the method proposed by the authors achieves a significant precision of 95.63% with the best-modeled classifier, thus, it has surpassed existing conventional methods. These studies demonstrate the efficacy of CNNs and machine learning in detecting pneumonia on chest radiographs and their potential application in clinical settings.
A model consisting of three sections is proposed: preprocessing, feature vector, and classification system. Figure 1 illustrates this model. As can be seen, the data set is previously labeled into two classes: normal and pneumonia; We proceed to pre-process the size, quality, and lighting of the set using machine learning techniques. Once the first section has been completed, the images enter two pre-trained convolutional neural networks using transfer learning: ResNet50 and VGG16, both well-known for classifying chest images. However, the CNNs in the model work as extractors to obtain the feature vector from their convolutional bases, which represents an essential part of the method. By obtaining the vector of characteristics of the set of radiographs, the binary classification system is finally introduced to evaluate the method and obtain its performance.
Dataset Preparation
Chest X-ray images used in this article were obtained from the Kaggle repository, a modified version of Paul Mooney [14], it is containing 2,224 patient cases. The division of the data set can be seen in Figure 2.
Preprocessing
The performance, of the neural architectures, is affected by the quality of the input images, so six filters are proposed for the original image to improve the characteristics of the set. In addition, Res- Net50 and VGG16 are pre-trained networks with an input dimension of 224×224, so the set of images is resized to that size. Figure 3 shows the corresponding preprocessing.
Convolutional Neural Networks
ResNet50: ResNet50 is a deep convolutional neural network model used for object recognition and image classification tasks. It was developed by Microsoft researchers in 2015 and has 50 layers [15]. The name “ResNet” refers to the residual connections used in its architecture (Figure 4). These connections allow data to move directly from one layer to another, avoiding gradient fading problems and improving the ability of the network to learn and generalize from training data. ResNet50 is widely used in machine vision applications and has demonstrated high performance in image classification tasks on large and complex data sets [16,17]. The notation (k k, n) in each convolutional layer (conv) block denotes the size of the square matrix k and n channels, below each stage represents the repetition of each unit, the penultimate layer FC 1000 is the fully connected layer with a thousand neurons. Finally, 1×n represents the feature vector.
VGG16: VGG16 is a convolutional neural network model used for image classification tasks. It was developed by researchers at the Visual Geometry Group (VGG) at the University of Oxford in 2014 and has 16 layers [18]. The VGG16 architecture is characterized by having very deep convolutional layers and small 3×3 convolution filters, which allows better feature extraction in high-resolution images (Figure 5). The model is trained on large and complex data sets, such as ImageNet, and has shown high performance in image classification tasks [19]. It is widely used in machine vision applications and has served as the basis for the development of other convolutional neural network models [20]. The input of the first convolution layer (Conv2D) is an image of size 224X224. For all convolution layers, the convolution kernel is 3X3 in size. The layers are accompanied by Max-Pooling layers (MaxPool) that reduce the size of the filters during training. At the output of the convolution and pooling layers, there are three layers of fully connected neurons: Dense, Dense, and softmax to determine the image class.
Cross-validation and the train/test set are two common techniques for evaluating the performance of a machine learning model. Both methods are used to assess a model’s ability to generalize to unseen data, but they differ in the way the data is partitioned for evaluation [21]. Cross-validation is a method that involves dividing the data set into k folds. The model is trained k times, each time using k 1 folds for training and the remaining fold for evaluation. Performance scores are calculated for each fold and averaged to get an overall measure of model performance. Cross-validation helps to reduce model evaluation scarcity and to obtain a more accurate estimate of model performance. The training/test set, on the other hand, involves dividing the data set into two exclusive sets: a training set and a test set. The model is trained on the training set and evaluated on the test set. The performance score is calculated based on the model’s ability to generalize data not seen in the test set. In general, crossvalidation is used when you have a limited data set and you want a more accurate estimate of model performance, while a train/test set is a faster and easier technique used when you have a large data set, and a rapid evaluation of the model is desired. From 2,224 images in the X-ray set, the training and testing scenario was divided into 80% and 20%, respectively. For the case of cross-validation, k = 20 folds were performed on the entire data set. The machine learning algorithm, Support Vector Machines (SVM) was used for the binary classification of the ensemble. The SVM classifier has had favorable and relevant results to separate two data classes using a hyperplane in high-dimensional feature space [22,23].
Confusion Matrix and Evaluation Metrics
A confusion matrix is an evaluation tool used in machine learning and statistics to assess the accuracy of a classification model. The confusion matrix shows the number of times the test samples are classified correctly and incorrectly, relative to the true class labels. The confusion matrix is a 2×2 table showing four possible outcomes of the classification [24]:
• True Positives (TP): The model correctly classified the positive
samples.
• False Negatives (FN): The model incorrectly classified the
positive samples.
• False Positives (FP): The model incorrectly classified the
negative samples.
• True Negatives (TN): The model correctly classified the
negative samples.
From these results, the model evaluation metrics are calculated: Accuracy, sensitivity, specificity, and the F1 score [25]. Accuracy is defined as the number of true predictions divided by the total number of predictions made. Sensitivity (also known as the true positive rate) is defined as the number of true positives divided by the total number of true positives and false negatives. Specificity (also known as the true negative rate) is defined as the number of true negatives divided by the total number of true negatives and false positives. The F1 score is a measure that combines precision and sensitivity in a single measure. The confusion matrix is an especially useful tool for evaluating the quality of the classification model and provides valuable information about the performance of the model in terms of its ability to correctly classify the test samples into the true classes.
The proposed model obtained optimal and competitive values with the state of the art. Image preprocessing acquires an important relevance to generate an efficient and accurate evaluation that manages to classify the classes with 97% in the best classification scenario. From the results obtained from the previous tables, the model improves its evaluation in all filters in the training/test set classification technique with the convolutional basis of VGG16, with its counterpart ResNet50 in the same technique. While in the cross-validation technique, it is the other way around, ResNet50 slightly outperforms VGG16 in most filters. However, it is observed that by applying the low-pass filter to the data set, it is possible to obtain the highest values in both techniques and architectures, respectively. The model generates the best evaluation with 95% accuracy in the training/test set and 97% in the same evaluation metric in cross-validation with the VGG16 convolutional base. Although both architectures are popular in computer vision and are used for similar tasks with differing architectural depth, the VGG16 network has slightly significant performance for classifying and detecting pneumonia in a set with two classes: normal and pneumonia, from chest radiographs, applying preprocessing to the images and classifying with SVM in the classification system. The results show that there is a high-reliability index in the proposed model.
The evaluation of the automatic pneumonia classification model from a set of chest X-rays was optimal, precise, competitive, and efficient, having values of 95% accuracy in the training/test set classification scenario with preprocessing and low pass filter and 97% under the same conditions in the cross-validation scenario, with the VGG16 convolutional neural network. The choice of a deep learning model for a specific task depends on several factors, including the size and complexity of the data set, the availability of computational resources, the experience of the developer, and the model’s performance on different metrics. In the specific case of pneumonia classification on chest radiographs, it has been shown that VGG16 may perform better than ResNet50, despite being a shallower model. This is partly because VGG16 was designed specifically for image classification, while ResNet50 was originally designed for more general object detection and classification tasks. Furthermore, VGG16 has a simpler and more uniform architecture than ResNet50, which can make it easier to train and optimize. It is also possible that the data set used to train the models is not large or complex enough to take full advantage of the additional depth of ResNet50. In conclusion, the choice of a deep learning model depends on several factors, and it is not always necessary to use the deepest model to achieve the best performance on a specific task. It is important to evaluate different models and select the one that best suits the specific needs of the project.
The authors thank the University of Guanajuato for the partial support in the development of this work under the project DAIP/2023-59023.
The authors declare that they have no conflict of interest that could influence the results, or the interpretation of the findings presented in this article. Furthermore, we confirm that the work presented here is original and has not been previously published elsewhere.
The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/paultimothymooney/ chest-xray-pneumonia, from: Kermany, Daniel; Zhang, Kang; Goldbaum, Michael (2018), “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification”, Mendeley Data, V2, doi: 10.17632/rscbjbr9sj.2
