Author ORCID Identifier

Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Yubao Wu

Second Advisor

Zhipeng Cai

Third Advisor

Yingshu Li

Fourth Advisor

Yan Huang


As an increasing amount of our lives is spent interacting online through social media platforms, more and more people tend to seek out and consume social media information rather than traditional ways. The pervasive use of social media produces massive data at an unprecedented speed. It is cheap to provide information online and much faster to disseminate through social media. Large volumes of fake data are produced online for various purposes, such as financial and political gain. Social media platforms prove to be a popular venue whereby vendors can post illicit drug ads with relative ease, expansive reach, and little cost. We present two machine learning methods to detect illicit ads on Google+ and use graph theories to detect communities who post spam on social media.

However, there is one significant drawback that it needs labeled data to help the classification process. The posts are too many to be tagged, making the labeling data procedure tedious and time-consuming, preventing the previous framework from effectively and efficiently. One way is to keep the user in the loop to ask users to mark the posts. Nevertheless, users' concerns about their privacy have risen sharply in recent years. Federated learning provided an alternative approach to train models without collecting users' data. There are two main problems in this structure. The first one is to what extent we can trust our users' reports. We may even have some users intentionally report regular posts. We design a weighted federated averaging framework, a system to evaluate each user's credit to detect attackers or dishonest users. The natural idea comes from finding out the poor performance returned local models through a testing procedure. Furthermore, it is a challenge for the global model to obtain good performance across different devices, say, not fair to all users. Considering that clients have similar behaviors or features, we organize the clients into groups, give each cluster a weight according to the clustering results, and select the participator across these clusters to train the global model.


File Upload Confirmation