Language as a Reflection of Human Minds: Machine Learning-Based Corpus Analyses of Linguistic Patterns in Online Mental Health Communities
Citations
Abstract
Mental disorders such as depression, post-traumatic stress disorder (PTSD), and anxiety disorders affect millions of people worldwide, disrupting social functioning and often leading to serious impairments. Many individuals increasingly turn to online mental health communities, which offer anonymity and may reduce social stigma, self-stigma, and financial barriers associated with traditional care. As these platforms continue to expand, a closer examination of their discourse is essential for understanding and supporting potentially vulnerable people. Previous research in linguistics and psychology has shown that language reflects cognitive and emotional states. Studies of mental health discourse have used qualitative approaches, such as discourse analysis, and quantitative methods, such as frequency-based analyses of words and phrases associated with mental health conditions. More recently, computational research using natural language processing (NLP) and machine learning (ML) has advanced online mental health studies through predictive modeling and topic modeling. Although these approaches have generated valuable insights, they also have limitations. Qualitative studies often rely on small datasets, while quantitative and computational methods may depend on predefined linguistic categories or prioritize scalability over detailed linguistic interpretation. To address these limitations, this study integrates NLP, ML, and corpus linguistic approaches to examine online mental health communities on Reddit, focusing on depression, PTSD, and anxiety disorders forums. It investigates distinctive linguistic features and prevalent discussion topics while developing an ML classification model based on a broad range of linguistic features, including n-grams, TF-IDF terms, part-of-speech tags, and dependency labels. Beyond model development, the study conducted an in-depth linguistic analysis of recurring patterns and topics in community discourse. By combining computational efficiency with linguistic depth, this research offers a more comprehensive account of language use in online mental health communities and contributes both theoretical insights and practical implications for mental health diagnosis and intervention.
