Brain Tumor classification and detection from MRI images using CNN based on ResU-Net Architecture

Sanyukta Suman
15 min readJan 27, 2021

Brain tumor is a serious disease occurring in a human being. Medical treatment process mainly depends on tumor types and its location. The final decision of neurospecialists and radiologist for the tumor diagnosis mainly depend on evaluation of Magnetic Resonance Imaging (MRI) images. The manual evaluation process is time-consuming and needs domain expertise to avoid human errors. To overcome this issue Convolution Neural Network (CNN) deep learning algorithm based on ResUNet architecture is proposed for detecting the tumor and marking the area of their occurrence. There are several advantages to using these proposed architectures for segmentation
tasks. First, a residual unit helps when training deep architectures. Second, feature accumulation with recurrent residual convolutional layers ensures better feature representation for segmentation tasks. The automatic brain tumor classification is a very challenging task in large spatial and structural variability of the surrounding region of brain tumor. The method proposed accuracy 96% on the test data.

For the implementation of this project, visit the GitHub Repo.

Problem Definition
Given a dataset of known Brain Tumor diagnoses it is possible to develop and implement an image-based classifier for autonomously detecting brain tumors or other anomalies.

Motivation and objective
The motivation of the proposed application is to aid neurosurgeons and radiologists in detecting brain tumors in an inexpensive and non-invasive manner. The objective of the proposed project are as follows: • to model a classifier to detect MRI Brain images with tumor and without tumor with the help of their corresponding mask images. • to perform segmentation process on MRI images to separate the tumor from the normal brain tissues.

1.3 System Architecture
The system architecture of the proposed system is illustrated in figure 1.1. It can be divided into classification, segmentation and prediction.

Figure 1.1: System Architecture of Proposed System

Classification
In the classification step, a Convolution Neural Network (CNN) model, based on ResNet50 architecture, is used to classify the MRI Brain scans into two classes — tumor & non-tumor. The algorithm terminates for images classified into the nontumor class, while images classified into the tumor class are forwarded to the next step of the architecture.

Segmentation
Segmentation is a process that partitions an image into regions, which allows the separation of objects and texture in images. In this study, a CNN based on ResUNet architecture is used for image segmentation. The segmentation process returns a mask which localizes the detected tumor in the input image.

Prediction
The prediction step of the algorithm either returns an empty mask in the case of no tumor, or it returns a predicted mask based on the segmentation algorithm for images with a tumor.

2. Description of Data

MRI images have been taken from The Cancer Imaging Archive (TCIA) [https:// wiki.cancerimagingarchive.net/display/Public/TCGA-LGG] along with the respective manual Fluid-Attenuated Inversion Recovery (FLAIR) segmentation masks. The FLAIR sequence is a robust method that produces high quality digital tumor responses (Rucco, Viticchi and Falsetti, 2020). Images of 109 patients are included in The Cancer Genome Atlas (TCGA) LGG collection [https://www.cancer.gov/aboutnci/organization/ccg/research/structural-genomics/tcga] each with an associated FLAIR sequence mask and Genomic cluster data. There are over 20 pairs of images and corresponding mask data for each patient. The dataset is open source and available free of charge [https://www.kaggle.com/mateuszbuda/lgg-mri-segmentation#TCGA_ CS_4943_20000902_11_mask.tif]. Each MRI image is a three-channel, RGB image with the dimensions 256×256x3. The masks are binary images with pixel values in the range [0,1] (Yang et al., 2021). 1.3 illustrates a sample tuple from the dataset. We see a raw MRI scan on the left, and the corresponding mask revealing the tumor tissue on the right.

Figure 1.2: Brain tumor MRI images and corresponding masks from the patient.

We have 3929 tuples of data existing in the form of (MRI_scan, MRI_mask). These scan comes from 109 patients. Among 3929 tuples of data, there are 1373 tuples with tumor tissue and 2556 tuples of data without tumor tissues. 1.3 shows the count of scan images with (1) and without (0) tumor tissue.

Figure 1.3: Distribution of brain with and without tumor

3. Methodology

Data Preprocessing

Preprocessing is the term used for all the steps taken to enhance the dataset in preparation for analysis (Bakas et al., 2017). The images from the dataset have been skull-tripped and co-registered. The tumor segmentation labels have been produced by an automated hybrid generative method. These segmentation labels have been manually corrected by a board-certified Neuroradiologist. The final images have rich imaging features, including intensity, volumetric, morphological, histogram-based and textural parameters. The brain data is measured in voxels. They are analogous to the pixels used to display images on a computer screen.

Allocating Train, Test and Validation Sets

The dataset is divided into three subsets, i.e. training set, validation set and test set. The training dataset is used to train the model (Shah, 2017). The validation set is used to fine-tune the parameters of a model that has been deduced from the training set (Shah, 2017). The validation step indirectly affects the generated model. The test set is the sample of data used to provide an unbiased evaluation of the generated model (Brownlee, 2017). Following convention, 15% of the dataset has been randomly allocated to the test set. 10% of the remaining dataset is allocated to the validation set. The remainder of the dataset is used for training the model.

Figure 2.1: Part of validation set shown in the table form

Data Augmentation

Image data augmentation generates new training samples from the original dataset. Random alterations in the arrangement of the images are used for generating the model, whilst maintaining the class labels of the data (Gu, Pednekar and Slater, 2019). Generally, the common methods to incorporate variation of features are as follows: • flip horizontally or vertically • rotate by fixed amount • shuffle and transform When applying data augmentation, our goal is to increase the generalizability of the model. Data augmentation is only applied in the training phase. The test set is required to provide an unbiased evaluation of the generated model. Data augmentation can be implemented using the ImageDataGenerator method in Keras (Gulli and Pal, 2017). The method takes an array of images as its argument. It transforms each image in the array through a series of random transformations to achieve augmentation. Lastly, the array of augmented images is returned to the calling function.

Classification

Machine Learning is applied in two different ways — classification & regression. Classification is often used for data in the discrete domain, whereas regression is applied for data in the continuous domain. In this investigation, we hope to generate a model which predicts whether an image contains a tumor or not. This translates to a binary output which is a problem in the discrete domain. Machine Learning in the discrete domain is applied through classification

Convolution Neural Network (CNN)

In this investigation, a Convolution Neural Network model is used to classify the MRI Brain scans revealing whether they contain a tumor or not. CNN models are commonly used in object recognition applications (O’Shea and Nash, 2015). CNNs are comprised of three types of layers — convolutional layers, pooling layers, and fullyconnected layers. When these layers are stacked, a CNN architecture is formed. A simplified CNN architecture for classification is illustrated in 2.2

Figure 2.2: A simple CNN architecture, comprised of layers (O’Shea and Nash, 2015).

The basic functionality of CNN can be broken down into four key areas:

  • The input layer will hold pixel values of the MRI scan of brain.
  • The Convolution layer will determine the weather the brain consist of tumor or not. These convolution layers are connected to local regions of the input through the calculation of the scalar product between their weights and the region connected to the input volume. The rectified linear unit (ReLu) focuses to apply an activation function such as sigmoid to the output of the activation produced by the previous layer.
  • The pooling layer will perform the same task down sampling along the spatial dimensionality of the given input; reducing the parameter number within that activation.
  • The fully connected layers will perform the task to produce class score from the activation, to be used for classification.

Through this method of transformation, CNN are able to transform the original input layer by layer using convolution and down sampling methods to generate class scores for the classification of MRI scans of the brain.

ResNet50

Instead of building a new network, the Resnet50 model is used as the base to train the model. The residual network is a deep convolution neural network model that was introduced by Microsoft in 2015 (Szegedy et al., 2015). The reason for using Resnet50 model at the base is that it is desirable to benefit from the collection of previously trained network, and also that Resnet50 model has achieved successful results in biomedical data. Furthermore, in residual network rather than learning features from the layers, the model learns from the residual that are a result of the subtraction of features learned from the inputs of the layers (H. A. Khan et al., 2020). The architecture of Resnet50 can be seen in 2.3.

Figure 2.3: Resnet50 Model Architecture.

In the project, the layers present in the residual network are -Input layer, Padding layer, Convolution layer, Batch normalization, Activation (ReLu), and Max pooling. Apart from these additional layers are added to the model. These additional layers are — Average Pooling, Flatten, Dense, Dropout to increase the total number of parameters, consequently, increasing the number of trainable parameters.

The residual network consists of 6 types of layers — Input layer, Padding layer, Convolution layer, Batch normalization, Activation (ReLu), and Max pooling. Apart from these layers, additional layers are added to the model, i.e Average Pooling, Flatten, Dense, Dropout. This is done to increase the total number of parameters, consequently, increasing the number of trainable parameters. The result after performing Resnet50 is illustrated in the 2.4 and the result after adding the additional layers is shown in 2.4.

Figure 2.4: Result after performing Resnet50.

Segmentation using ResUNet to localize the tumor

In their paper, Ronneberger, Fischer and Brox, 2015 state that the typical use of convolution networks is a classification task, where the output to an image is a single class label. However, in many biomedical image processing frameworks, the desired output includes localization, i.e., a class label should be assigned to each pixel . Segmentation is a process that partitions an image into regions, which allows to separate objects and texture in images. U-Net is a successful architecture that allows us to perform pixel-wise segmentation. U-Net takes its name from the architecture, which when visualized, appears similar to the alphabet ’U’, as shown in 2.6. (Ronneberger, Fischer and Brox, 2015) ResUNet consists of two part — contraction path and expansion path. The contraction path consist of several contraction blocks, each block takes an input that passes through res-blocks followed by 2x2 max pooling. Feature maps after each block doubles, which helps the model learn complex feature effectively. The most significant aspect of this architecture lies in expansion or decoder section. Each block takes in the up-sampled input from the previous layer and concatenates with the corresponding output features from the res-blocks in the contraction path. This feature ensures that the features learned while contracting are used while reconstructing the image. In the last layer of expansion path, the output from the res.block is passed through the 1x1 convolution layer to produce the desired output with the same size as the input.

Figure 2.7: ResUNet Architecture.

Performance Analysis

In this section, performance metrics i.e. accuracy and loss for evaluation of the performance of ResUNet model is discussed.

Accuracy is one metric for evaluating the classification models. The accuracy of the model can be easily be calculated by the confusion matrix using the formula (Suganthe et al., 2020).

where,

  • TP : True Positive
  • FP : False Positive
  • TN: True Negative
  • FN : False Negative

Precision is related to random errors. Precision effectively describes the purity of our positive detections relative to the ground truth. The high prediction means that an algorithm returned more appropriate results than irrelevant. The precision of the model can also be calculated by the confusion matrix using the formula (Anithadevi and Perumal, 2016).

Recall effectively describes the completeness of the positive predictions relative to the ground truth. High recall means that model returned most of the relevant result. Recall can be calculated using the formula.

Finally, in statistical analysis, the F-score is a measure of a test accuracy. The F1 is constructed as weighted average of the precision and recall. It can be defined as (Anithadevi and Perumal, 2016)

Loss function

Deep learning algorithm uses stochastic gradient descent approach to optimize and learn the objective from the model (Jadon, 2020). In order for model to learn the objective faster and accurately, it is important to ensure that the mathematical representation of objectives are able to cover the edge cases. In addition, class imbalance dataset is a frequent problem experienced when performing segmentation. Therefore, to deal with the class imbalance problem the loss function are used for updating the weight vector by using labelled output and calculated output of the model. In the medical community, the Dice Score Coefficient (DSC) is widely used to asses segmentation. The Dice Coefficient is known for being the preferred evaluation metric for image segmentation, but it can also serve as a loss function. Dice coefficient only considers the segmentation class and not the background class. Therefore, a novel approach proposed by (Abraham and N. M. Khan, 2019), the focal Tversky loss function (FTL), is used to evaluate the model. The Focal Tversky Loss (FTL) is defined as:

In practice, if a pixel is misclassified with a high Tversky index (an asymmetric similarity measures on sets that compares a variant to a prototype), the FTL is unaffected. However, if the Tversky index is small and the pixel is misclassified, the FTL will decrease significantly. This dynamic can be observed in Figure 2.8. Figure 2.8 shows two graphs — the focal Tversky loss and the overall Tversky score. The peaks and troughs reveal the implicit correction of the learning process

Figure 2.8: Focal Tversky Loss

3. Result and Analysis

The proposed CNN was tested to accurately categorize the images into tumor (1) and no tumor (0) with a test accuracy of 96%. As shown in Figure 3.1, the accuracy of the CNN obtained by ResUNet used to classify images by the process of image segmentation is 91.21%. In the proposed CNN architecture, there is no requirement of an explicit feature extraction algorithm.

The diagram of the classification loss and accuracy is also shown on the 3.2.

Data Visualization

Figure 3.3 visualizes a sample of the processed dataset. It shows the raw image, its true mask, the classifier’s predicted mask, and the visualization of the input scan with the predicted mask. The important thing to compare and contrast in this figure is the correlation of the middle two columns — true mask and predicted mask. In this section, finally the classification of images categorized by the ResUNet is shown below in 3.3

Figure 3.3: Images by the CNN are correctly classified.

4. Conclusion

The main goal of this investigation is to design efficient autonomous brain tumor classification and localization of tumor with high accuracy, performance and low complexity. First, the conventional brain tumor classification is performed by using CNN based on ResNet50 architecture. The result of the classification is -tumor exist or not. The complexity is low, computational time is high meanwhile accuracy is also high. Further to localize the tumor in the given image and to draw an edge around the tumor another convolution neural network based classification i.e. ResUNet based segmentation is introduced to carry out the localization of tumor in the proposed scheme. The results gives the brain MRI images with the predicted position of the tumor in the form of mask. Finally, the focal Tversky loss function is applied to achieve high accuracy. The training accuracy is 96%. Due to the importance of the diagnosis given by the physician, the accuracy of the doctors will help in diagnosing the tumor and treating the patient with increased accuracy in medical diagnosis by the proposed method.

5. Future Scope

A careful attention should be paid to the unclassified and misclassified samples. The presence of unclassified samples is something related to detections with low scores. Such samples may be thought to be classified as “no tumours” even though they might contain a tumour. In the future, this situation may be overcome by adding healthy images. The preprocessing step may include algorithms that will emphasize the unobvious features in the given images. In the future, some optimization algorithms (such as genetic algorithm, particle swarm optimization, simulated annealing, etc.) may be utilized to find the best parameter set giving the highest classification accuracy (Salçin et al., 2019)

References

Abadi, Martın et al. (2016). ‘Tensorflow: Large-scale machine learning on heterogeneous distributed systems’. In: arXiv preprint arXiv:1603.04467 (cit. on p. 17).

Abraham, Nabila and Naimul Mefraz Khan (2019). ‘A novel focal tversky loss function with improved attention u-net for lesion segmentation’. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE, pp. 683–687 (cit. on p. 16).

Anithadevi, D and K Perumal (2016). ‘A hybrid approach based segmentation technique for brain tumor in MRI Images’. In: arXiv preprint arXiv:1603.02447 (cit. on pp. 14, 15).

Bakas, Spyridon et al. (2017). ‘Segmentation labels and radiomic features for the pre-operative scans of the TCGA-LGG collection’. In: The cancer imaging archive 286 (cit. on p. 6).

Brownlee, Jason (2017). ‘What is the difference between test and validation datasets’. In: Machine Learning Mastery 14 (cit. on p. 7).

Gu, Shanqing, Manisha Pednekar and Robert Slater (2019). ‘Improve Image Classification Using Data Augmentation and Neural Networks’. In: SMU Data Science Review 2.2, p. 1 (cit. on p. 8).

Gulli, Antonio and Sujit Pal (2017). Deep learning with Keras. Packt Publishing Ltd (cit. on p. 8). Hall-Beyer, Mryka (2017). ‘GLCM texture: a tutorial v. 3.0 March 2017’. In: (cit. on p. 9).

Haralick, Robert M, Karthikeyan Shanmugam and Its’ Hak Dinstein (1973). ‘Textural features for image classification’. In: IEEE Transactions on systems, man, and cybernetics 6, pp. 610–621 (cit. on p. 9).

Jadon, Shruti (2020). ‘A survey of loss functions for semantic segmentation’. In: 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, pp. 1–7 (cit. on p. 15).

Khan, Hassan Ali et al. (2020). ‘Brain tumor classification in MRI image using convolutional neural network’. In: Mathematical Biosciences and Engineering 17.5, pp. 6203–6216 (cit. on p. 11).

O’Shea, Keiron and Ryan Nash (2015). ‘An introduction to convolutional neural networks’. In: arXiv preprint arXiv:1511.08458 (cit. on p. 10).

Rai, Hari Mohan and Kalyan Chatterjee (2020). ‘Detection of brain abnormality by a novel Lu-Net deep neural CNN model from MR images’. In: Machine Learning with Applications 2, p. 100004 (cit. on p. 6).

Ronneberger, Olaf, Philipp Fischer and Thomas Brox (2015). ‘U-net: Convolutional networks for biomedical image segmentation’. In: International Conference on Medical image computing and computer-assisted intervention. Springer, pp. 234– 241 (cit. on pp. 12, 13).

Rucco, Matteo, Giovanna Viticchi and Lorenzo Falsetti (2020). ‘Towards Personalized Diagnosis of Glioblastoma in Fluid-Attenuated Inversion Recovery (FLAIR) by Topological Interpretable Machine Learning’. In: Mathematics 8.5, p. 770 (cit. on p. 4).

Salçin, Kerem et al. (2019). ‘Detection and classification of brain tumours from MRI images using faster R-CNN’. In: Tehnički glasnik 13.4, pp. 337–342 (cit. on p. 21).

Shah, Tarang (2017). ‘About train, validation and test sets in machine learning’. In: Towards Data Science 6 (cit. on p. 7).

Suganthe, RC et al. (2020). ‘Deep Learning Based Brain Tumor Classification Using Magnetic Resonance Imaging’. In: Journal of Critical Reviews 7.9, pp. 347–350 (cit. on p. 14).

Szegedy, Christian et al. (2015). ‘Going deeper with convolutions’. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9 (cit. on p. 11).

Yang, Qifan et al. (2021). ‘Evaluation of magnetic resonance image segmentation in brain low-grade gliomas using support vector machine and convolutional neural network’. In: Quantitative Imaging in Medicine and Surgery 11.1, p. 300 (cit. on p. 4).

Zuiderveld, Karel (1994). ‘Contrast limited adaptive histogram equalization’. In: Graphics gems IV. Academic Press Professional, Inc., pp. 474–485 (cit. on pp. 2, 6, 17).

--

--

Sanyukta Suman

Engineer + Loves Computer Vision, ML, Programming, Robotics and Technology. https://sanyuktasuman.com.np