INCORPORATING EMR AND GENOMIC DATA USING NLP AND MACHINE LEARNING TO REFINE CANCER TREATMENT
Electronic Theses and Dissertations
Item Files
Item Details
- title
- INCORPORATING EMR AND GENOMIC DATA USING NLP AND MACHINE LEARNING TO REFINE CANCER TREATMENT
- author
- Guan, Meijian
- abstract
- Electronic medical records (EMR) have collected vast amounts of clinical data, including genomic testing results. In contrast to numerical data, majority of EMR are unstructured free text and not easy to be processed by computers. In this study, we explored how natural language processing (NLP) and machine learning can help to evaluate their impact on the clinical practice using free-text progress reports of cancer patients. We obtained 5,889 de-identified progress reports for 755 cancer patients from Wake Forest Baptist Health Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN), including gated recurrent unit (GRU), long-short term memory (LSTM), and bidirectional LSTM (LSTM_Bi), were applied to classify documents to treatment-change group and no-treatment-change group. The performances of RNNs was compared to five machine learning algorithms including Naive Bayes (NB), K-nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR). Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among RNNs. In addition, pre-trained word embedding can improve the results of RNNs and reduce their training time. Our findings demonstrated that RNN-based algorithms have advantages in unstructured clinical progress reports classification.
- subject
- cancer
- Deep learning
- electronic medical records
- Natural language processing
- neural network
- contributor
- Cho, Samuel (committee chair)
- John, David (committee member)
- Topaloglu, Umit (committee member)
- Ballard, Grey (committee member)
- date
- 2018-05-24T08:36:17Z (accessioned)
- 2018-11-23T09:30:12Z (available)
- 2018 (issued)
- degree
- Computer Science (discipline)
- embargo
- 2018-11-23 (terms)
- identifier
- http://hdl.handle.net/10339/90752 (uri)
- language
- en (iso)
- publisher
- Wake Forest University
- type
- Thesis