Author ORCID Identifier
https://orcid.org/0000-0003-1355-1330
Date of Award
12-13-2021
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science
First Advisor
Yubao Wu
Second Advisor
Zhipeng Cai
Third Advisor
Yingshu Li
Fourth Advisor
Yan Huang
Abstract
As an increasing amount of our lives is spent interacting online through social media platforms, more and more people tend to seek out and consume social media information rather than traditional ways. The pervasive use of social media produces massive data at an unprecedented speed. It is cheap to provide information online and much faster to disseminate through social media. Large volumes of fake data are produced online for various purposes, such as financial and political gain. Social media platforms prove to be a popular venue whereby vendors can post illicit drug ads with relative ease, expansive reach, and little cost. We present two machine learning methods to detect illicit ads on Google+ and use graph theories to detect communities who post spam on social media.
However, there is one significant drawback that it needs labeled data to help the classification process. The posts are too many to be tagged, making the labeling data procedure tedious and time-consuming, preventing the previous framework from effectively and efficiently. One way is to keep the user in the loop to ask users to mark the posts. Nevertheless, users' concerns about their privacy have risen sharply in recent years. Federated learning provided an alternative approach to train models without collecting users' data. There are two main problems in this structure. The first one is to what extent we can trust our users' reports. We may even have some users intentionally report regular posts. We design a weighted federated averaging framework, a system to evaluate each user's credit to detect attackers or dishonest users. The natural idea comes from finding out the poor performance returned local models through a testing procedure. Furthermore, it is a challenge for the global model to obtain good performance across different devices, say, not fair to all users. Considering that clients have similar behaviors or features, we organize the clients into groups, give each cluster a weight according to the clustering results, and select the participator across these clusters to train the global model.
DOI
https://doi.org/10.57709/11964673
Recommended Citation
Zhao, Fengpan, "Spam Detection in Social Media With Privacy Protection." Dissertation, Georgia State University, 2021.
doi: https://doi.org/10.57709/11964673
File Upload Confirmation
1