Efficient Information Extraction Using Statistical Relational Learning
Electronic Theses and Dissertations
Item Files
Item Details
- title
- Efficient Information Extraction Using Statistical Relational Learning
- author
- Picado Leiva, Jose Manuel
- abstract
- Information extraction has gained significant importance due to the dramatic increase of information stored in the form of natural language text. In this thesis we explore a machine learning-based approach to support a natural language processing (NLP) algorithm, and an application of information extraction. One of the challenges of learning-based approaches is the requirement of human annotated examples. Current successful approaches alleviate this problem by employing some form of distant supervision. In this work, we take a different approach -- we create weakly supervised examples for relations by using commonsense knowledge. The key innovation is that this commonsense knowledge is completely independent of the natural language text. This helps when learning the full model for information extraction as against simply learning the parameters of a known model. We demonstrate on two domains that this form of weak supervision yields superior results when learning structure compared to simply using the gold standard labels. In the second part of this thesis, we consider the problem of Adverse Drug Events (ADEs) discovery. Several methods have been proposed for ADE discovery, exploiting various information sources such as health data, social network data, and scientific literature. We propose a NLP-based method that exploits scientific literature to quantitatively evaluate proposed ADEs. We validate our approach on a common ADE dataset, where we find better agreement than state-of-the-art ADE discovery methods.
- subject
- information extraction
- markov logic networks
- probabilistic examples
- statistical relational learning
- weak supervision
- contributor
- Natarajan, Sriraam (committee chair)
- John, David J (committee member)
- Turkett, William (committee member)
- date
- 2013-06-06T21:19:34Z (accessioned)
- 2013-06-06T21:19:34Z (available)
- 2013 (issued)
- degree
- Computer Science (discipline)
- identifier
- http://hdl.handle.net/10339/38554 (uri)
- language
- en (iso)
- publisher
- Wake Forest University
- type
- Thesis