br Result and discussion br In this section we provide
3. Result and discussion
In this section, we provide details about the experimental process and setup of each step of the proposed approach. Sim-ilarly, we also provide a comparative analysis of SVM, NB, and RF classifiers, concerning the use of ensemble classifier for the selection of best classifier.
The proposed method is tested on 400 cytology images pro-vided by the pathology department of Lady Reading Hospital,
Fig. 6. Results obtained during each step of the proposed approach.
Peshawar, Pakistan. The use of real/local dataset is one of the significant contributions in this article due to the lack of available data and for the accuracy of the proposed method. Images are collected through the Olympus BX-51 microscope with a lens size of 640 × 400 dimensions and with a resolution of 72 dpi.
3.2. Experimental result
The most critical set of features (shape and textures based) are extracted for each nucleus cell from 400 image dataset. Most of the microscopic images are exaggerated by blood lesion, which has made it noisy and unclear.
Thus, the input image, given in Fig. 6a is preprocessed by the linear contrast enhancement technique to produce the image presented in Fig. 6b. This is further smoothed by applying a non-linear median filter with a small mask size of 3 × 3, as shown in Fig. 6c. The global threshold is applied to Fig. 6c. for the segmentation of foreground and background objects. The
resultant image is given in Fig. 6d. Next, morphological operations are employed for the removal of unwanted objects to produce an image provided in Fig. 6e. Similarly, objects which are closer to the nucleus are removed using area-based threshold with α = 30 for the threshold value, i.e., the object whose area is below 30 is removed. The resulted image is given in Fig. 6f.
Theoretically, a large number of features can be extracted for malignant MK 2206 classification, but in practice, the abundance of features creates the problem of overfitting. In the proposed approach, 20 features are extracted from 400 microscopic images. In order to select optimal features, the CAGA algorithm is used, which helps in obtaining those features which give higher accu-racy in the malignant cell detection and classification. It must be noted that CAGA benefited from the structure made by a chain of agents along with dynamic neighboring genetic, operators for the selection of optimal features.
Predicted (malignant) Predicted (benign) Total
3.3. Data performance measurement of classifier
The performance evaluation of each classifier is analyzed using the confusion matrix. In the confusion matrix, true-positive (TP) represents those malignant cell cases which are correctly classi-fied as positive (malignant), where false-positive (FP) represents those non-malignant cell cases which are classified as positive (malignant) (also known as Type 1 error). Similarly, true-negative (TN) denotes those non-malignant cell cases which are correctly classified as negative (non-malignant), and false-negative (FN) shows those malignant cell cases which are classified as negative (non-malignant) (also known as Type 2 error).
For the proposed approach, the performance evaluation through confusion matrix is given in Table 1, where n = 29 000 is the total number of cells in the image.
From the confusion matrix, various factors such as accu-racy, sensitivity, specificity, precision, false-positive rate, and F-measure, etc. can be determined to measure the performance of a given classifier. Each of these factors is briefly discussed below:
• Accuracy: The measurement for correct classification. Math-ematically, it is given by:
• Sensitivity: The measurement for True-Positive (TP) such as a person has the tumor. Mathematically, it is given by:
• Specificity: The measurement for True-Negative (TN) such as a person does not have the tumor. Mathematically, it is given by:
• Precision: The ratio of the positive cases that are calculated correctly. Mathematically, it is given by:
• False Positive (FP)-Rate: The ratio of negative cases that are not correctly classified as positives. Mathematically, it is given by:
• The Recall or TP-Rate: It is the measure of the correctly clas-sified positive cases, which can be shown by the following equation:
Some times higher precision is important, while in various situations, higher recall is very important.
Fig. 7. Results without CAGA.
• F-Measure: It is the combination of precision and recall, which can be represented by the following equation:
Precision + Recall
3.3.1. Experimental results with and without CAGA