A Novel Implementation of Image Processing and Machine Learning for Early Diagnosis of Melanoma
A multi-step system was created for early diagnosis of melanoma cancers. Image processing algorithms (edge detection and image segmentation) were used to extract the standard ABCD (Asymmetry, Border, Color, and Diameter) features of a skin mole. The extracted ABCD features were analyzed statistically to understand the impact of each characteristic. The features were then further tested in a machine learning algorithm known as Artificial Neural Networks for a comprehensive diagnosis. These combined steps provided about 80% accuracy and can successfully function as preliminary cancer diagnosis.
Melanoma is one of the most deadly cancers, but when diagnosed early, it can be cured. The purpose of this project was to create an accurate system that allows for early diagnosis. My project utilizes computer-aided analysis of skin lesions based on the ABCD guidelines used by dermatologists. This project consists of the following sections: image capture, pre-processing, edge detection, image segmentation, feature extraction, statistical analysis, and machine learning. Morphological closing was used to remove noise such as hair from the image. A feature extraction program was implemented to define a border around the lesion. "Outside-in" radial search was used in this program to improve accuracy because feature accuracy will dictate the validity of the final result. Certain features were extracted (area, perimeter, etc.) and used to calculate the border and asymmetry irregularity indexes. These index values were analyzed statistically using a normal distribution curve to better understand the impact for each independent factor. Finally, machine learning methods such as Artificial Neural Networks (ANN) were used and tested using the extracted values. The ANN is the final method for diagnosis with its sophisticated analysis of the inputted data. This entire multi-step system had a promising accuracy of 80%. Various advanced methods of computer programming were used in order to provide patients with information that allows them to be proactive in melanoma diagnosis. Future steps include creating a user-friendly interface and extracting accurate color features.
Hello! My name is Elizabeth Zhao, and I live in Portland, Oregon. I go to Jesuit High School and have learned so much about myself and the world around me there. It has taught me to appreciate the blessings in my life and to use them to better our society. That is what drives me to do science: my duty to do whatever I can to make our world safer, stronger, and healthier. Science, I believe, is real-world magic. The innovations and technology that have come out of this field continue to amaze me and inspire me each and every day. I want to change lives, and the best way to do that, I believe, is through science.
When I was a freshman in high school, my best friend's mother passed away from melanoma cancer. I saw how deeply it affected her, her family, and her relationships. Cancer has touched everyone's lives and continues to hurt thousands of people every day. The most common cancer is skin cancer, and the most deadly skin cancer is melanoma. I wanted to do something to help the countless people affected by this disease, and that is what drove me to do my project.
Winning would mean getting word out about my project and what it entails. At the very least it would mean more people reading and realizing just how unsuspecting melanoma cancer is. Having more people understand this disease would ultimately work towards eradicating it which is the ultimate goal.
The Problem: Melanoma is almost 100% curable if diagnosed early. The issue with melanoma lies not in the actual cancer cell but that it is often not diagnosed early enough. My project aims to make melanoma cancer diagnosis easier so that patients are more inclined to check themselves earlier.
The Aim: To test, write, and combine various image processing and machine learning algorithms to determine which yields the most accurate diagnosis. I will create a system that combines both algorithms and delivers the probability that a given mole is cancerous. The system will analyze an image of a lesion and output values for “A”symmetry and “B”order. Using these values, a statistical analysis will be used to determine the distribution curves of border and asymmetry. These curves can assist the user in determining the likelihood of a lesion being cancerous. From there, the probability that a given threshold value is malignant is provided. These A and B values will be used with machine learning algorithms such as ANN (Artificial Neural Network) that have already been pre-trained with over 300 images for even more accurate determination. Based on its learned values, the ANN will deliver the diagnosis. This multi-step system I will create should optimize melanoma diagnosis based on numerous previous cases and mathematical analysis.
Melanoma is a type of skin cancer caused by irregular growth in melanocytic cells. It is the most frequent and the most malignant skin tumor.
This graph plots information from the AJCC (American Joint Committee on Cancer) and illustrates how the survival rate rapidly decreases as the years progress and as the stages progress in severity. It is also easy to see that when the cancer is diagnosed within the first year, the survival rate is almost 100%.
Current Methods of Diagnosis
Melanoma is currently first diagnosed by dermatologists using the ABCD method:
If the lesion is asymmetric, meaning that when the lesion is bisected and the halves do not match up then it is more likely to be malignant. If the border is irregular, there is more than one color, or the diameter of the lesion is greater than 6mm (the size of a pencil eraser) then it is also very likely to be malignant. As to the dermatologist's discretion, he may order a biopsy for further confirmation. These criteria were what I based my program off of.
A new technology that is beginning to develop is using imaging techniques to diagnose melanoma. Although a good concept, current imaging technologies are not for individual home use. By making the system more accessible to individual users, the process can help in the early diagnosis of melanoma. These imagine techniques consist of a few different key steps: image extraction, preprocessing/noise reduction, and feature extraction. From the extracted features various other methods can be employed to determine a trend among the lesions. Such methods can be: statistical analysis and machine learning.
There are a variety of methods for image capturing. These include a video camera, an analog camera, a digital camera, a high-resolution camera, and many more.
Preprocessing can also help enhance images in ways that could not be done in the image acquiring process and allows for more accurate feature extraction. One example of preprocessing is hair removal.
One important step in feature extraction is edge detection or creating an accurate border. This step involves finding the various different aspects of the lesion in regards to the ABCD method dermatologist's use.
A normal distribution curve can determine how probable a lesion is benign. If the values from feature extraction lie in "highly probable" area, it is more likely to be benign. Consequently, if the lesion's values are at the edge of the curve, it is more likely to be malignant.
The information from feature extraction is saved into a database and whether the mole was cancerous or not. From there, the machine learning algorithm will learn what dictates a cancerous mole based on the information given. Various machine learning algorithms exist such as decision trees, neural networks, etc. It is a type of artificial intelligence that calls for computers to learn behaviors from a given database.
1. Image Capture
Taking a picture of an image and upload it.
2. Image Segmentation and Processing
This step removes the hair so that the feature extraction will be more accurate. The RGB colored image is
first converted to L*a*b* scales and then it under goes a morphological closing. The morphed image is subtracted from the original L*a*b* image then a threshold is applied to the resultant image. If the pixel value is above the threshold the pixel in the original image is replaced with the pixel from the morphed image.
Hair No Hair
Border detection is a key component. Only with an accurate border can there yield accurate results. The method I wrote for border detection is called radial search. First, the user chooses a center. Then the Matlab gradient function is used. Lines from the edge of the picture are created from the edge of the picture to the center. This "outside-in" method improves the accuracy of the border. When the first derivative of the gradient had a peak in its graph, that corresponded to the change between the lesion and the skin. This is where the edge is located and where the point stopped. This was repeated 200x and then smoothed out with a smoothening method that has each point look at its 6 neighbors and averages it out for every point, it continues around the lesion until the optimal border is achieved
Example of radial search/smoothing method
3) Feature Extraction
Feature extraction's primary objective is to utilize the accurate methods to have the best results. As such, emphasis was placed on the aforementioned edge detection method to create an accurate border. With a successful border, feature extraction will go more smoothly.
A MatLab method regionprops() for region properties takes an image and gives its area, perimeter, orientation, major and minor axis.
<--An example of regionprops
The lesion is folded over its major and minor axis and the overlapping area is shown in green. The ideal is 50%.
Bindex used the properties from regionprops such that:
4) Statistical Analysis
To determine if the two groups were statistically different
If we use benign samples’ normal distribution as the “criteria”, we determine how likely the lesion is benign based on its location in the curve. First we check the asymmetric values, then its border value.
5) Artificial Neural Networks
Figure of various neurons
Extracted (B)order and (A)symmetry Table
Here is a small sample size of the border and asymmetry indexes as found by my method on benign and melanoma cases. The ideal border (a perfect circle) would be 1, and it can be seen that the border index of the benign cases are much closer to 1 than the malignant (melanoma) cases. Furthermore, the ideal asymmetry case would be 50 and the values for the benign values are much closer than for the malignant cases.
The first step was to illustrate that the benign and melanoma case values were indeed different enough to be considered as two separate groups. This was verified by utilizing a T-test. It was the first statistic test I performed to prove that for both “border” and “asymmetry” data, benign and melanoma do belong to different groups. T-test result proves that they are strongly different because both are extreme low probability values to be in the same group.
Data Distribution for the Border and Asymmetry Indexes
As mentioned in the methods, I employed a statistical analysis normal distribution curve to categorize my extracted data. This is the first of its kind and yielded promising results.
I based each region off the benign standard. If the lesion lay in the 3 or 4 region, it has a high probability of being benign (and not malignant.) However, if the indexes cause the lesion to fall in the other regions (such as 1 or 6), it has a higher chance of being malignant. A sample size of the images tested and their corresponding region values is provided below:
Border Irregularity Index
The following information is in regards to only the border irregularity index, the corresponding regions, etc.
Machine Learning--Artificial Neural Networks (ANN)
In summary, statistical analysis was employed to understand how each value (A or B) affects the overall diagnosis and the weighted importance of each characteristic. However, a machine learning algorithm relates both the A and B irregularity indexes in a comprehensive analysis by utilizing large databases for learning. For final diagnosis, the machine learning method was employed and provided about 80% accuracy.
Through this project, a system was created that accurately determines the probability that an unknown lesion is cancerous or not. This system featured a MatLab program that combines a variety of methods (edge detection, image segmentation) to optimize feature extraction. Normal distribution curves of benign characteristics were used to determine the a lesion's cancerous probability and to better understand the relative importance of each characteristic. An Artificial Neural Network was used for the final method of diagnosis.
The MatLab program inputs an image, removes its hair, creates an accurate border and then provides the border and asymmetry irregularity index. The border value ideally is 1, and the average benign value is 2.1201; the average malignant value is 3.098. Malignant cancers will have a significantly higher irregularity value. This value is calculated using the equation:. It was found that 77% malignant cases were diagnosed correctly--a promising success rate. The second parameter was the asymmetry irregularity index calculated by . ΔArea was calculated using the checkSymmetry method which calculates the area of each region by taking the integral of the curve and finding the difference. The total area is calculated using the regionprops method. These values (for over 300 images) were then inputted and trained in a neural network. The network was tested to yield about an 80% accuracy rate.
A common hardship in computer-aided diagnosis’ of melanoma are atypical nevi which are benign moles that have irregular borders and are asymmetric. However, using this system, it still had a 75% accuracy rate. This high success rate illustrates the breadth of my system in its ability to diagnose melanoma.
This diagnostic system was also compared to other free, downloadable software such as SkinSeg. SkinSeg essentially takes an image, performs segmentation on it, and gives different characteristics of the lesion. However, it failed to automatically provide accurate borders, and the user often had to create the borders manually or it would perform inaccurate segmentation. In comparison, my system is simpler and superior.
In conclusion, the results show that my diagnostic system and analysis yield relatively accurate results. It can also be applicable to a variety of cancers. Another improvement is to convert the program to a common language such as C or Java to make it readily available to the public. The ultimate goal is to promote early diagnosis of melanoma by creating a program that is easily accessible so that patients can get an idea of how likely their mole is cancerous through statistical and machine learning based information. Based on the diagnostic result they can decide to visit a dermatologist for a professional diagnosis.
Thanks to Dr. Perkowski, PSU Professor, for teaching me and encouraging me to continue in this project.
Thanks to Jabeer Ahmed, OHSU BSMD department, for providing me with mole images and assistance in MatLab.
Thanks to my family for encouragement and constant support throughout this entire process.
All tables and graphs were created by me.
All images were taken and uploaded by me with the exception of the figures listed below.
Graph of various stages (Research section) : "AJCC TNM stages and overall survival of patients with cutaneous melanoma." Graph. 2010. TNM Staging. MMMP.org. 17 March 2012.
ABCD Method (Research Section) : ABCD's of Diagnosing Melanoma. Chart. n.d. Melanoma. skincancer.org. 10 March 2012
Normal Distribution curve (Method & Results) : Greenhut, Josh. Normal Distribution Bell Curve. Graph. n.d. Ring the Bell Curve. joshgreenhut.com. 22 September 2012
Bosdogianni, Maria Petrou Panagiota. Image Processing: The Fundamentals. New York: John Wiley & Sons, LTD, 1999.
Chang. T, Kuo. C. C. J., “Texture Analysis and Classification with Tree-Structured Wavelet Transform.
DeLaCruz, Jomer and Dr. Dinesh Mital. “Classification of Malignant Melanoma and Dysplastic Nevi Using Image Analysis: A Visual Texture Approach.” University of Medicine and Dentistry of New Jersey.
“DermWeb: Dull Razor.” UBC Dermatology Department. www.dermweb.com/dull_razor/
Fikret Ercal, Senior Member, IEEE Anurag Chawla, William V. Stoecker, Hsi-Chieh Lee, and Randy H. Moss, “Neural Network
Diagnosis of Malignant Melanoma From Color Images”, IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING. VOL. 41,
Karen Cheung, “Image Processing for Skin Cancer Detection: Malignant Melanoma Recognition”, Masters Thesis, (ECE, University of Toronto) ,1997.
Nikita V. Orlov, John Delaney, D. Mark Eckley, Lior Shamir, and Ilya G. Goldberg, “Pattern Recognition for Biomedical Imaging
and Image-Guided Diagnosis”.
Nock. R, Nielsen. F, “Statistical Region Merge”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE, VOL. 26, NO. 11, NOVEMBER 2004
Perkowski, Marek. “Machine Learning: Approach Based on Decision Trees.” Class Lecture Notes. 15 December 2011.
Piolette, Paul. "Neural Network Toolbox." MATLAB. Mathworks, n.d. Web. www.mathworks.com/products/neural-network
Rios, Daniel. "Neuro AI - Intelligent Systems and Neural Networks." ARTIFICIAL NEURAL NETWORKS. N.p., n.d. Web. www.learnartificialneuralnetworks.com
Russ, John C. The Image Processing Handbook Second Edition. Boca Raton: CRC, 1995.
Sachin V. Patwardhan, Atam P. Dhawan, Patricia A. Relue, “Classification of Melanoma using tree structured wavelet transform”,
Computer Methods and Programs in Biomedicine 72
Skinseg. Wright State University. 27 Oct 1998. www.cs.wright.edu/~agoshtas/skinseg.html.
Stanganelli, Ignazio. “Dermoscopy.” Center for Cancer Prevention, Italy. http://emedicine.medscape.com/article/1130783-overview
Tannenbaum, Bruce. "Image Processing Toolbox." - MATLAB. Mathworks, n.d. Web. www.mathworks.com.
“Understanding Melanoma.” The Skin Care Foundation. New York, New York. http://www.skincancer.org/Melanoma/
Zhao Zhang, William V. Stoecker, and Randy H. Moss, “Border Detection on Digitized Skin Tumor Images”, IEEE
TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 11, NOVEMBER 2000