Author(s): Cristian Rodríguez Originally published on Towards AI. Photo by National Cancer Institute on Unsplash This article delves into medical image analysis, specifically focusing on the classification of brain tumors. It introduces a novel approach that combines the power of stacking ensemble machine learning with sophisticated image feature extraction techniques. Through comparative evaluations, insights are provided into the effectiveness and potential applications of the proposed approach in medical imaging and diagnosis. Introduction A brain tumor, also known as an intracranial tumor, is an abnormal tissue mass in which cells grow and multiply uncontrollably, seemingly unchecked by the mechanisms that control normal cells. To date, more than 150 types of brain tumors have been detected; however, they can be grouped into two main groups: primary and metastatic [1]. The incidence of brain tumors has been increasing in all ages in recent decades. Metastatic tumors of the brain affect nearly one in four patients with cancer, or an estimated 150,000 people a year. There are various techniques used to obtain information about tumors. Magnetic resonance imaging (MRI) is the most used method, producing many 2D images. The detection and classification of brain tumors generated by manual procedures is costly in both effort and time. Therefore, it is worthwhile to develop an automatic detection and classification procedure to obtain an early diagnosis and thus have a faster treatment response to improve patients’ survival rate [2]. Stacking Ensemble Method An ensemble method is a machine learning technique that combines several base models to produce one optimal predictive model. By combining the output of different models, ensemble modeling helps to build a consensus on the meaning of the data. In the case of classification, multiple models are consolidated into a single prediction using a frequency-based voting system. Ensemble models can be generated using a single algorithm with numerous variations, known as a homogeneous ensemble, or by using different techniques, known as a heterogeneous ensemble [3]. As shown in Figure 1, the stacking method aims to train several different weak learners and combine them by training a meta-model to output predictions based on the multiple predictions returned by these weak models [4]. Figure 1. Stacking Model Representation Diagram. [4] Dataset The dataset comes from Kaggle [5], which contains a database of 3206 brain MRI images. The images are separated into four categories: no tumor, glioma tumor, meningioma tumor, and pituitary tumor. Figure 2 shows a sample image for each category. Figure 2. Sample Images for Each Category. [Image by Author] Image Features Extraction Image preprocessing was necessary to obtain the final dataset to train the models. Machines store images in a matrix of numbers, the size of which depends on the number of pixels in any given image. The pixel values denote the intensity or brightness; smaller numbers represent black, and more significant numbers represent white. For grayscale images, as in this case, the matrices are two-dimensional. After obtaining the pixel matrices, five first-order and seven second-order features were obtained for each image. For the first-order features, fundamental statistical analysis was implemented in the pixel’s matrices: Mean: is the average or the most common value in the pixel’s matrix. Variance: measures the average degree to which each point differs from the mean. Standard Deviation: looks at how spread out a group of numbers is from the mean. Skewness: measures the lack of symmetry. Kurtosis: defines how heavily the tails of a distribution differ from the tails of a normal distribution. The grey-level co-occurrence matrix (GLCM) was used to obtain the second-order characteristics. GLCM is a matrix representing the relative frequencies of a pair of grey levels present at a certain distance apart and a particle angle. In this case, one pixel of distance and angles of 0°, 45°, 90°, and 135° were used. Figure 3 shows how the GLCM is determined. Figure 3. GLCM Calculation Example. [6] The second-order features obtained from the greycomatrix are the next ones: Contrast: represents the difference in luminance across the image. Entropy: the measure of randomness. Dissimilarity: is a numerical measure of how different two data objects are. Homogeneity: expresses how similar some aspects of the image are. ASM: a measure of the textural uniformity of an image. Energy: the rate of change in the brightness of the pixels over local areas. Correlation: gives information about how correlated a pixel is to its neighboring pixels. Figure 4. General Overview of the Image Features Extraction. [Image by Author] Model Proposal As mentioned, stacking runs multiple models simultaneously on the data and combines those results to produce a final model. The previously mentioned can be schematically illustrated in Figure 5. Figure 5. Stacking Model Implementation Example. [7] The general idea of how the model works is as follows [7]: Initial training data has 2565 observations and 12 features. Three different weak learner models are trained on the training data. Each weak learner provides predictions for the outcome, which are then cast into second-level training data, now 2565 x 3. A meta-model is trained on this second-level training data to produce the final predictions. The three weak learner models used for this implementation were k-nearest neighbors, decision trees, and naive Bayes. For the meta-model, k-nearest neighbors were used again. K-Nearest Neighbors The KNN algorithm assumes that similar things exist in proximity, so it classifies new data points based on their position to nearby data points. In Figure 6, the data points have been classified into two classes, and a new data point with an unknown class is added to the plot. Using the KNN algorithm, the category of the new data point can be predicted based on its position in the existing data points. For example, if k is set to 3, the outcome of selecting the three nearest data points returns two class B and one class A, so the prediction for the new data point will be class B. On the other hand, if k is set to 6, the prediction will be class A. The chosen number of neighbors […]
↧