Heart Smart: A Novel Deep Learning Approach to Improving Heart Disease Diagnosis

Summary

TOPIC: Heart disease is the leading cause of death worldwide. I chose this topic because heart disease is a silent killer -- it could be lurking in millions of people, and neither they nor their doctors could have the slightest idea. This is a huge problem, since nearly half of heart attacks occur in people who have not been flagged as ’at risk.’

TECHNIQUE: My research uses the technique of machine learning to detect heart disease based on routine clinical data. Machine learning is a type of artificial intelligence that teaches the computer to learn from existing data and predict outcomes for future data.

OVERVIEW: This work led me to code a novel software innovation called HEARO -- Heart Evaluation for Algorithmic Risk-reduction and Optimization. HEARO is a variable-layer deep neural network with optimizations known as regularization and hyperparameter tuning.

FINDINGS: HEARO achieves 99% accuracy, outperforming previously published results -- including those from Stanford. This will be helpful in the area of heart disease risk assessment because my software is more accurate than anything else out there, and can provide a potentially life-saving tool for doctors to make a more informed diagnosis.

CONCLUSION: This conclusion supports my hypothesis, as the optimized deep neural network was the most accurate for this particular problem.

NEXT: Future directions involve developing this scientific research into a commercial product. I am working towards this goal by launching a business, migrating the software to a web-based platform, and communicating with cardiologists.

 

Slides: https://docs.google.com/presentation/d/1vlqPPDAYZy-40uWun3TE9t8rB5yyGcjQwjnY_pW_l1A/edit?usp=sharing

Question / Proposal

PROBLEM: Heart disease is the leading cause of death, and doctors misdiagnose nearly 1/3 of patients as not having heart disease.

Doctors’ basic screening tools can be inaccurate and slow. Common evaluation guidelines such as the American Heart Association Guidelines rely on analysis of features such as cholesterol, blood pressure, or blood sugar individually.

Research shows that these tools may be inaccurate because they “assume a linear relationship between factors that may not be linearly related.”

QUESTION

To address this problem, this research investigates the following questions:

1.How can I improve the accuracy of heart disease diagnosis by analyzing relationships between medical metrics?

2.Which machine learning algorithm: linear regression, 2-layer shallow neural network, deep neural network, or regularized deep neural network creates the most accurate model for diagnosing heart disease?

3.What novel combination of hyperparameters best optimizes the algorithm’s accuracy and generalization capability on test data?

HYPOTHESIS: Deep learning is known to be effective at making predictions. One challenge of coding a deep learning algorithm is determining how to set the hyperparameters, or values that determine its mathematical model.

The expected outcome is to innovate a novel deep learning framework to predict heart disease outcomes and improve upon existing research.

I hypothesize that a regularized deep neural network is the most accurate for this problem because it uses a more detailed procedure to ensure that it can apply its predictions to new data. During experimentation, I expect to refine the tuning of the parameters until the algorithm achieves optimal accuracy.


 

Research

EXISTING RESEARCH: There are research papers that use artificial neural networks to improve heart disease diagnosis. Yu et al. concluded that a neural network topology with two hidden layers was an accurate model with 90% test data accuracy. They focus on the multiplicity of risk factors in constructing their model to classify features before determining a possible diagnosis. This study concluded that neural networks are an effective method of analyzing cases when it is impossible to create a strict mathematical model. Vinodhini et al. build on this research by performing feature classification with  statistical models such as the chi square and then using a neural network as a predictive model. This method proved successful overall, but exhibited weaker performance when given redundant attributes. Loh et al. demonstrate the accuracy of deep neural networks by proving their ability to learn from nonlinear relationships in data. However, they faced overfitting, when an algorithm learns too much from training data and becomes less capable of applying itself to  unfamiliar data. Kim et al. help address the problem of overfitting by ranking features, training the neural network with each feature ranking, and then training the neural network to output a potential diagnosis.

CONTRIBUTION TO FIELD: This existing research has shaped my project, as it highlights the areas for improvement in algorithmic framework design. Part of the challenge of coding a neural network involves structuring it so it is accurate on both training and test data. If an algorithm is too accurate on training data it can be unable to generalize to unfamiliar data, preventing it from being implemented in a clinical setting. If the algorithm is inaccurate on training data, it can be inaccurate on test data, leading to incorrect results.

My project helps address this challenge by combining the optimizations of hyperparameter tuning and regularization on a variable-layer deep neural network. The contribution to the field is as follows:

1.This is a novel method that builds on current research to derive quick and precise diagnostics.

2. The method significantly outperforms other published research in this area due to its superior accuracy.

BENEFIT: The existing research validates my project by proving that there is a need for more accurate software solutions to improve heart disease diagnosis. According to the NIH, “approximately half of myocardial infarctions occur in people who are not predicted to be at risk.” Not only is there a need for better diagnostic technology, but there is also a need for optimized neural network frameworks to achieve this goal.

This proves how my research can benefit the real world by helping doctors save lives and improve patient care. By accurately analyzing relationships between medical metrics, HEARO can find patterns that are invisible to doctors. I have conducted primary research with seventeen doctors who evaluated my software and described its potential impact. According to Dr. Myiesha Taylor of University of Texas Southwestern Medical Center, “having the HEARO software tool that helps risk-stratify will guide treatment and decrease morbidity and mortality.”

Method / Testing and Redesign

TESTING PROCESS

1.I downloaded the Cleveland dataset from the University of California Irvine machine learning repository.

2.I divided this dataset into 2/3rds training data and 1/3rd test data.

3.I used the following features as contained in the dataset: age, sex, chest pain type, resting blood pressure, cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression, and slope of the peak exercise ST segment, vessels colored by fluoroscopy, and thallium scan results.

4.I coded a 2-layer neural network using Google Tensorflow.

5.I coded a variable-layer deep neural network, and tested a 5-layer network and a 7-layer network.

6.I coded the regularization optimization.

7.I varied the following hyperparameters: number of nodes (computational units that function as artificial neurons), activation function (the process used to translate numerical output to classification outcome), learning rate (the increment the algorithm uses to iteratively converge to the minimum value of the cost function), and regularization parameter (the numerical value by which regularization penalizes abnormal numerical weights). I used the tool OpenTuner to randomly generate 1000 different combinations, and tested each of these combinations.

8.I tested each algorithm by training it and running it. Dividing my dataset into training and test data allows the algorithm to learn from existing data and apply the same procedure to an unfamiliar dataset.

9.To further evaluate accuracy, I used the statistical methods of Matthews Correlation Coefficient (MCC) and ten-fold cross validation. MCC evaluates how well the algorithm performs on all possible data outcomes regardless of their ratio within the dataset. MCC is computed through this formula, where TP represents true positives, TN represents true negatives, FP represents false positives, and FN represents false negatives.

10.Ten-fold cross validation evaluates accuracy on a variety of test data, where the full dataset is divided into ten partitions and one is left out with each iteration.

VARIABLES

Manipulated:

1.Type of computer algorithm tested

2.Type of optimization used

3.Hyperparameters (number of nodes, activation function, learning rate, regularization parameter)

Responding:

1.Heart disease severity output

2.Accuracy

Controlled:

1.Type of computer (MacBook Pro)

2.Source of dataset (University of California Machine Learning Repository)

3.Number of input parameters to each algorithm (13)

4.Input parameters to each algorithm

5.Number of training iterations each algorithm runs (6000)

FAIRNESS OF PROCESS: To ensure the fairness of the testing procedure, I used statistical evaluations in addition to measuring percent accuracy. While percent accuracy is suggestive of an algorithm’s classification performance, it can be biased -- if an algorithm is inaccurately skewed towards a particular outcome in an unbalanced dataset, the percent accuracy may overlook this flawed model because the algorithm's tendency corresponds to the imbalance of outcomes. Data scientists regard the MCC as an effective evaluation metric because it considers an unequal distribution of classes in the dataset. Therefore, I calculated the MCC to ensure that my evaluations were unbiased.

MATERIALS: This research took place at home, on a MacBook Pro computer. I used Google Tensorflow, and downloaded a public dataset from the University of California machine learning repository.


https://docs.google.com/document/d/1kGY1KY35WVsFnomADvK0EyuubOzSZhh0gzl92ebYPgQ/edit?usp=sharing

Results

RESULTS: The hypothesis that the five-layer regularized deep neural network would be the most accurate was proven correct for this dataset. Because this model was already the most accurate out of the ones I tested, I applied hyperparameter tuning to this algorithm to further improve accuracy. The algorithm's accuracy for each of the 1000 generated and tested hyperparameter combinations is not shown here, because I coded the optimization to output the one combination of hyperparameters that yields the highest accuracy. This maximizes the computational efficiency of the algorithm, and ensures the clarity of results.

Hyperparameter tuning found the most accurate combination of values to be as follows: 9x7x5x3x1 node dimensions at each layer, learning rate of 0.01, regularization parameter of 0.7, and a sigmoid activation function. 

This algorithm exhibited 99 percent accuracy on test data, and a Matthews correlation coefficient of 0.98.

SIGNIFICANT PATTERNS: While the unregularized DNN exhibits a discrepancy between training accuracy and test accuracy (99% on training, 93% on test), regularization and hyperparameter tuning increased the accuracy on test data to 99%. Regularization improved the accuracy on test data by reducing the impact of outliers on training data. On a relatively small dataset, outliers can inhibit the algorithm’s ability to learn from consistent relationships in training data, and do not add scientific value. Therefore, by regulating the effect of outliers on learning, regularization improves the algorithm’s ability to generalize while maintaining the same scientific standard. Because regularization reduces overfitting on training data, the algorithm’s accuracy is expected to decrease on training data.

Of interest is also to mention logistic regression and the fact that it is the least accurate algorithm, likely due to its approach to fitting a fluctuating dataset of features with non-linear correlation to heart disease.

COMPARISON TO PREVIOUS RESULTS: Stanford researchers used a deep neural network and obtained precision, recall and F1 scores of 0.80, 0.82, and 0.80 respectively. HEARO outperforms these results, as it achieves precision, recall, and F1 of 0.98, 1, and 0.99 respectively. In 2016, Aravinthan et al. applied a Naive Bayes classifier and artificial neural network to this dataset with accuracy of 81.3% and 82.5%, respectively. A study published in the International Journal of Computer Applications (Marikani) obtained results of 95.4% accuracy for classification tree and random forest algorithms. At 99% accuracy, HEARO outperforms these results.

Furthermore on accuracy, ten-fold cross validation tests confirmed that HEARO effectively reduces overfitting, as the cross validated accuracy was approximately the same as accuracy on test data with the set ratio.

The 0.98 MCC illustrates HEARO’s accurate evaluation of all class outcomes.

Matthews correlation coefficient ranges from -1 to 1, where 1 represents perfectly balanced accuracy. Therefore, results of 0.98 MCC and 99% accuracy are indicative of the algorithm’s comprehensive data analysis model that is not skewed towards any particular outcome.

Graphs are included in this document, and are attached as well.


 

Conclusion

In this project, I developed a novel software model that can predict the probability that a patient has heart disease, based on the American Heart Association diagnostic guidelines.

My results show that a regularized deep neural network with optimized hyperparameters achieves 99 percent accuracy and 0.98 Matthews Correlation Coefficient, thereby using a novel method to outperform previously published results.

The original problem is that heart disease is the #1 killer worldwide, and doctors’ diagnostic screening tools can be inaccurate and slow. My research helps solve this problem by using accessible patient information and a unique method to improve the accuracy of heart disease diagnosis. Therefore, these results support my expected outcome by proving that a regularized deep neural network with hyperparameter tuning is extremely accurate for this problem. In addition, this research won First Place in Medicine at the Intel International Science and Engineering Fair (ISEF).

The results from this research are reliable, as I ran ten trials with cross-validation and validated generalization capabilities through statistical tests. Further research is necessary, as the features used are not the only factors that indicate the presence of heart disease. Future directions include extending this analysis to construct a more thorough model that includes heart visualizations and CT image data.

My results inspired me to keep asking questions about how I can incorporate more features such as image scans and electrocardiograms directly into the software, perform feature analytics, and test the software with large amounts of real patient data.

This research has huge potential for future impact. HEARO can improve the accuracy of heart disease diagnosis by helping doctors diagnose patients who have contracted heart disease and identify those who are at risk of contracting heart disease. An accurate prediction of heart disease often leads to positive action that could be the difference between life and death for a patient. The software I coded can equip doctors with a potentially life-saving tool in the diagnosis toolbox.

This potential for impact inspired me to ask another question: how can I commercialize this software to make it accessible to real doctors?

I founded a business called Qardian Labs, which builds on this software by incorporating it into a user interface where a doctor can upload patient data, run the algorithm, and receive results in under 5 minutes. I am currently working to make this software accessible on the web through Google Cloud, and I have doctor commitment to use the software. This business was awarded a $15,000 funding grant from the Boyd Venture Challenge. To make the software further accessible to those in need, I plan to start a nonprofit to donate the HEARO software to under-resourced hospitals in the US and Brazil.

 


 

About me

My name is Sofia Tomov, and I am passionate about solving problems and helping others. I am a high school student enrolled at the University of Tennessee in Knoxville and the founder of Qardian Labs, a business that develops artificial intelligence-based software for evaluating heart disease risk. 

I am a 2018 First Place winner in medicine at Intel ISEF. As an aspiring computer scientist, I have pursued projects on algorithms for genomic analysis as well as machine learning. I was recognized by Business Insider as one of "15 Young Prodigies Who Are Already Changing the World," and my work has been featured in US News and World Report. Community activism is also important to me, as I have founded Teen Vote, a nonprofit dedicated to lowering the voting age and promoting civics education, and started a local chapter of Project CS Girls to teach middle and high school girls computer science.

One scientist who inspires me is Jean Bartik because of her vital role in programming ENIAC, which laid the foundation for today's computers. Her perseverance and problem solving benefited America during World War II and led to computing breakthroughs.

My future plans involve graduating high school and college simultaneously, obtaining a PhD and MBA, and growing my medical software business. Winning would help me make this dream into reality. The prizes would change my life by giving me resources and recognition to make my software accessible to doctors, hospitals, and clinics around the world, which can help save lives.

Health & Safety

My research did not require any specific health and safety guidelines. I coded the software independently at home on my personal computer using Google Tensorflow.

The dataset I used contains real anonymized patient data, is publicly available through the University of California Irvine Machine Learning Repository, and has been well-studied by numerous researchers.

I received some guidance from my mentor, Beverly Tomov. Her email is beverlytomov@gmail.com, phone number is +1 865-317-3590.

 

Bibliography, references, and acknowledgements

BIBLIOGRAPHY

[1] Akhil, Jabbar. ”Intelligent heart disease prediction system using random forest and evolutionary approach.” Journal of Network and Innovative Com- puting, Jan 2016.

[2] Aravinthan, K. Vanitha, M. ”A comparative study on prediction of heart disease using cluster and risk based approach.” International Journal of Ad- vanced Research in Computer and Communication Engineering, Feb 2016.

[3] Beant, Kaur. ”Review on Heart Disease Prediction system using Data Min- ing Techniques.” International Journal on Recent and Innovation Trends in Computing and Communication, Vol 2(10), 2014.

[4] Beckett, Jamie. ”Change of Heart: How AI Can Predict Cardiac Failure Before Its Diagnosed.” Nvidia Technology Blog, 11 Apr. 2016.

[5] Ben-Hur, Asa. ”Support Vector Machines and Kernels for Computational Biology.” PLOS Computational Biology Journal, 31 Oct. 2008.

[6] ”CDC: U.S deaths from heart disease, cancer on the rise.” American Heart Association News. 24 Aug. 2016.

[7] Chen, Chao et al. ”Using Random Forest to Learn Imbalanced Data.” UC Berkeley Department of Statistics. Jul. 2004.

[8] Chitra, R. ”Heart Disease Prediction System Using Supervised Learning Classifier.” Bonfring International Journal of Software Engineering and Soft Computing, Volume 3, Issue 1.

[9] Choi E, Schuetz A, et al. ”Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction.” Cornell University Library, 11 Feb. 2016.

[10] Das, Resul. ”Effective diagnosis of heart disease through neural networks ensembles.” Expert Systems with Applications. Volume 36, Issue 4. May 2009.

[11] Davie, AP et al. ”Value of the electrocardiogram in identifying heart failure720 due to left ventricular systolic dysfunction.” British Medical Journal. Volume 312, Issue 7025.

[12] Davies, S W et al. ”Clinical presentation and diagnosis of coronary artery disease: stable angina.” British Medical Bulletin. Volume 59, Issue 1. Oct. 2001.

[13] Detrano, Robert. (1990). Heart Disease Data Set [processed.cleveland.data]. Retrieved from https://archive.ics.uci.edu/ml/datasets/Heart+Disease

[14] Dreiseitl S, Ohno-Machado L. ”Logistic regression and artificial neural net- work classification models: a methodology review.” Journal of Biomedical Informatics, 2002.


[15] ”FHS Research Policies.” Framingham Heart Study: A project of the national heart, blood, and lung institute and Boston University.

[16] Fisher, Edward. ”Coronary Artery Disease - coronary heart disease.” Amer- ican Heart Association. 26 Apr, 2017.

[17] Ghumbre, Shashikant. ”Heart Disease Diagnosis using Support Vector Machine.” International Conference on Computer Science and Information Technology, Dec. 2011

[18] Goff DC, Lloyd-Jones DM, Bennett G, Coady, et al. ”2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk: A Report of the Amer-740 ican College of Cardiology/American Heart Association Task Force on Practice Guidelines.” National Institute of Health.

[19] Goldberg, Robert et al. ”Symptom Presentation in Patients Hospital- ized with Acute Heart Failure.” NIH PubMed Central library. 2011.

 [20] Guidi, G. et al. ”Random Forest for automatic assessment of heart failure severity in a telemonitoring scenario.” NIH PubMed Library. 2013.

[21] ”Heart Disease Facts.” American Heart Association 2015 Heart Disease and Stroke Update, compiled by AHA, CDC, NIH and other governmental sources.

[22] Heidenreich PA, Trogdon JG, Khavjou OA, et al. ”Forecasting the future of cardiovascular disease in the United States: a policy statement from the American Heart Association.” NIH PubMed Library. Jan. 24 2011.

[23] Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. ”Validation and validity of diagnoses in the General Practice Research Database: a systematic review.” British journal of clinical pharmacology, 2010.


[24] Hutson, Matthew. ”Self-taught artificial intelligence beats doctors at predicting heart attacks.” Science Magazine, 14 Apr. 2017.

[25] Jeserich, Michael, Merkely B., Schlosser P., et al. ”Concurrent Exercise- Associated Ventricular Complexes and a Prolonged QT Interval are Associ-760 ated with Evidence of Myocarditis.” Journal of Cardiovascular Diseases and Diagnosis, Volume 6, Issue 1. 2018.

[26] Jin, Michelle. ”Arrhythmia Classification for Heart Attack Prediction.” 2014.

[27] Kawachi I, Sparrow D, Vokonas PS. ”Symptoms of anxiety and risk of765 coronary heart disease. The Normative Aging Study.” Circulation Journal,Volume 90, Issue 5. 1 Nov. 1990.

[28] Kim, Jae et al. ”Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation Analysis.” Journal of Healthcare Engineering, 6 Sept. 2017.

[29] Knight, Will. ”The machines are getting ready to play doctor.” MIT Technology Review. 7 Jul. 2017.

[30] Lepeschkin, Eugene. ”The Measurement of the Q-T Interval of the Electrocardiogram.” Circulation Journal, Volume 6, Issue 3. 1 Sept. 1952.

[31] Loh, Brian et al. ”Deep learning for cardiac computer-aided diagnosis: benefits, issues & solutions.” mHealth Journal, 19 Oct. 2017.

[32] Marikani, T. Shyamala, K. ”Prediction of Heart Disease Using Supervised Learning Algorithms.” International Journal of Computer Applications, Volume 165, May 2017.

[33] Mcnulty, Eileen. ”Machine learning can make cardiology diagnoses better than doctors can.” Dataconomy, 8 Jul. 2014.

[34] Mortazavi, Bobak et al. ”Analysis of Machine Learning Techniques for Heart Failure Readmissions.” Circulation: Cardiovascular Quality and Out- comes. 8 Nov. 2016.

[35] Mukherjee, Sabyasachi et al. ”Diagnosis and Identification of Risk Factors for Heart Disease Patients Using Generalized Additive Model and Data Mining Techniques.” Journal of Cardiovascular Disease Research. Oct. 2017.

[36] Ng, Kenney. ”Using AI and science to predict heart failure.” IBM Research, 5 Apr. 2017.

[37] Obermeyer Z, Emanuel EJ. ”Predicting the Future: Big Data, Machine Learning, and Clinical Medicine." The New England journal of medicine 2016.” National Institute of Health PubMed Library.

[38] Paschalidis, Yannis. ”How machine learning is helping us predict heart disease and diabetes.” Harvard Business Review. 30 May. 2017.

[39] Rajpurkar P, Hannun A, et al. ”Cardiologist-level Arrythmia Detection795 with Convolutional Neural Networks.” Stanford University Research Publications. 6 Jul. 2017.

[40] R Chitra et al. ”Analysis of myocardial infarction risk factors in heart disease data set.” Allied Academies Biology and Medicine Case Report. Volume 1, Issue 1. 3 Aug. 2017.

[41] Strickland, Eliza. ”AI Predicts Heart Attacks and Strokes More Accurately Than Standard Doctor’s Method.” IEEE Spectrum. 1 May, 2017.

[42] Sudarshan, Vidya, Acharya U, Oh S, Adam M et al. ”Automated diagnosis of congestive heart failure using dual tree complex wavelet transform and statistical features extracted from 2 s of ECG signals.” Journal of Computersin Biology and Medicine. Volume 83. Apr 1, 2017.

[43] Taylor, Rod. ”Exercise-based rehabilitation for patients with coronary heart disease: systematic review and meta-analysis of randomized controlled trials.” The American Journal of Medicine. Volume 116, Issue 10. May 15, 2004.

[44] Tripoliti, Evanthia et al. ”Heart Failure: Diagnosis, Severity Estimation and Prediction of Adverse Events Through Machine Learning Techniques.” Computational and Structural Biology Journal. Volume 15, 2017.

[45] Vadicherla, Deepti. ”Classification of Heart Disease Using SVM and ANN.” International Journal of Research in Computer and Communication Technology. Volume 2, Issue 9.

[46] Vinodhini, G. ”A comparative performance evaluation of neural network based approach for sentiment classification of online reviews.” Journal of King Saud University. Volume 28, Issue 1.

[47] Weng, Stephen. ”Can machine-learning improve cardiovascular risk prediction using routine clinical data?” PLOS Journals, 4 Apr. 2017.

[48] World Health Organization. Global Status Report on Noncommunicable Diseases. Geneva, Switzerland: World Health Organization, 2014.

[49] Yu, Oleg. ”Coronary heart disease diagnosis by artificial neural networks including genetic polymorphisms and clinical parameters.” Journal of Cardiology. Volume 59, Issue 2.

DATASET USED

Detrano, Robert. (1990). Heart Disease Data Set [processed.cleveland.data]. Retrieved from https://archive.ics.uci.edu/ml/datasets/Heart+Disease

COURSES TAKEN

To prepare for this project, I took several online machine learning courses through Coursera.

1. Machine Learningtaught by Andrew Ng on Coursera, offered by Stanford University

2. Neural Networks and Deep Learningtaught by Andrew Ng on Coursera (obtained certificate of completion)

3. Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization​​​​​​​, taught by Andrew Ng on Coursera (obtained certificate of completion)

 

ACKNOWLEDGEMENTS

I would like to thank my parents/teachers, Beverly Tomov and Stanimire Tomov. My mother helped advise me with my project by suggesting resources for my literature review and providing feedback on my research plans and written write-ups. My father gave me feedback on my code -- I coded everything on my own and he recommended ways I could change it to make it more readable and computationally efficient. 

I would also like to thank Dr. Myiesha Taylor, a critical care specialist at the University of Texas Southwestern Medical Center, and Dr. Jeffrey Hirsh, a cardiologist at the University of Tennessee Medical Center. Both doctors tested my software with the user interface and gave me feedback and testimonials. In addition, Dr. Hirsh has given me the potential opportunity to conduct a clinical trial in summer 2019 by testing my software with real patient data. As of November 2018, I have written a proposal and we are in the process of obtaining IRB approval to use patient data with informed consent.