Loading...
Thumbnail Image
Item

Machine Learning Methods for Finding Textual Features of Depression from Publications

Wang, Zhibo
Citations
Altmetric:
Abstract

Depression is a common but serious mood disorder. In 2015, WHO reports about 322 million people were living with some form of depression, which is the leading cause of ill health and disability worldwide. In USA, there are approximately 14.8 million American adults (about 6.7% percent of the US population) affected by major depressive disorder. Most individuals with depression are not receiving adequate care because the symptoms are easily neglected and most people are not even aware of their mental health problems. Therefore, a depression prescreen system is greatly beneficial for people to understand their current mental health status at an early stage. Diagnosis of depressions, however, is always extremely challenging due to its complicated, many and various symptoms. Fortunately, publications have rich information about various depression symptoms. Text mining methods can discover the different depression symptoms from literature. In order to extract these depression symptoms from publications, machine learning approaches are proposed to overcome four main obstacles: (1) represent publications in a mathematical form; (2) get abstracts from publications; (3) remove the noisy publications to improve the data quality; (4) extract the textual symptoms from publications. For the first obstacle, we integrate Word2Vec with LDA by either representing publications with document-topic distance distributions or augmenting the word-to-topic and word-to-word vectors. For the second obstacle, we calculate a document vector and its paragraph vectors by aggregating word vectors from Word2Vec. Feature vectors are calculated by clustering word vectors. Selected paragraphs are decided by the similarity of their distances to feature vectors and the document vector to feature vectors. For the third obstacle, one class SVM model is trained by vectored publications, and outlier publications are excluded by distance measurements. For the fourth obstacle, we fully evaluate the possibility of a word as a symptom according to its frequency in entire publications, and local relationship with its surrounding words in a publication.

Comments
Description
Date
2017-12-14
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Information retrieval, Text representation, Outlier document detection, Textual feature extraction, Word2Vec, Latent Dirichlet allocation
Citation
Wang, Zhibo (2017). Machine Learning Methods for Finding Textual Features of Depression from Publications. Dissertation, Georgia State University. https://doi.org/10.57709/11192612
Embargo Lift Date
2017-12-04
Embedded videos