Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Yubao Wu

Second Advisor

David Maimon

Third Advisor

Yanqing Zhang

Fourth Advisor

Yingshu Li


In this dissertation, we investigate the applications of data mining algorithms on online criminal information. Ever since the entry of the information era, the development of the world wide web makes the convenience of peoples' lives to the next level. However, at the same time, the web is utilized by criminals for illegal activities like drug smuggling and online fraudulence. Cryptomarkets and instant message software are the most popular two online platforms for criminal activities. Here, we try to extract useful information from related open source intelligence in these two platforms with data mining algorithms.

Cryptomarkets (or darknet markets) are commercial hidden-service websites that operate on The Onion Router (Tor) anonymity network, which have grown rapidly in recent years. In this dissertation, we discover interesting characteristics of Bitcoin transaction patterns in cryptomarkets. We present a method to identify vendors' Bitcoin addresses by matching vendors' feedback reviews with Bitcoin transactions in the public ledger. We further propose a cost-effective algorithm to accelerate both steps effectively. Comprehensive experimental results have demonstrated the effectiveness and efficiency of the proposed method.

Instant message(IM) software is another base for these criminal activities. Users of IM applications can easily hide their identities while interacting with strangers online. In this dissertation, we propose an effective model to discover hidden networks of influence between members in a group chat. By transferring the whole chat history to sequential events, we can model message sequences to a multi-dimensional Hawkes process and learn the Granger Causality between different individuals. We learn the influence graph by applying an expectation–maximization(EM) algorithm on our text biased multi-dimensional Hawkes Process. Users in IM software normally maintain multiple accounts. We propose a model to cluster the accounts that belong to the same user.


File Upload Confirmation