Home WakeSpace Scholarship › Electronic Theses and Dissertations

INCORPORATING EMR AND GENOMIC DATA USING NLP AND MACHINE LEARNING TO REFINE CANCER TREATMENT

Electronic Theses and Dissertations

Item Files

Item Details

abstract
Electronic medical records (EMR) have collected vast amounts of clinical data, including genomic testing results. In contrast to numerical data, majority of EMR are unstructured free text and not easy to be processed by computers. In this study, we explored how natural language processing (NLP) and machine learning can help to evaluate their impact on the clinical practice using free-text progress reports of cancer patients. We obtained 5,889 de-identified progress reports for 755 cancer patients from Wake Forest Baptist Health Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN), including gated recurrent unit (GRU), long-short term memory (LSTM), and bidirectional LSTM (LSTM_Bi), were applied to classify documents to treatment-change group and no-treatment-change group. The performances of RNNs was compared to five machine learning algorithms including Naive Bayes (NB), K-nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR). Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among RNNs. In addition, pre-trained word embedding can improve the results of RNNs and reduce their training time. Our findings demonstrated that RNN-based algorithms have advantages in unstructured clinical progress reports classification.
subject
cancer
Deep learning
electronic medical records
Natural language processing
neural network
contributor
Guan, Meijian (author)
Cho, Samuel (committee chair)
John, David (committee member)
Topaloglu, Umit (committee member)
Ballard, Grey (committee member)
date
2018-05-24T08:36:17Z (accessioned)
2018-11-23T09:30:12Z (available)
2018 (issued)
degree
Computer Science (discipline)
embargo
2018-11-23 (terms)
identifier
http://hdl.handle.net/10339/90752 (uri)
language
en (iso)
publisher
Wake Forest University
title
INCORPORATING EMR AND GENOMIC DATA USING NLP AND MACHINE LEARNING TO REFINE CANCER TREATMENT
type
Thesis

Usage Statistics