Bot or User? - Twitter Classification

In this repository the Heterogeneous Graph Transformer [1] is applied on the TwiBot-22 Twitter dataset [2] to distinguish bots from real users on the platform.

The focus of this project diverges from classification based on text or images - towards the behavioral and interaction patterns as well as user metadata to distinguish real from fake users.
These patterns are more difficult to camouflage, and thus are a promising direction of research.

Because:
With the increasing effectiveness as well as ease of access to generative AI, such as text and image generation models more sophisticated impersonations and generation of fake media can be achieved, difficult or impossible to distinguish from real content. Malicious-bot operators have gained access to multi-billion dollar means which camouflage their bots' behavior. The generative models, trained with human feedback , effectively to be indistinguishable from real content, diminish the out-of-distribution shift which was previously easily detectable.

Experiments can be found in the main.ipynb.

[1] https://arxiv.org/abs/2003.01332
[2] https://arxiv.org/abs/2206.04564

Experimental Results

Below, the training loss and validation F1 score are shown (Figure 1 and 2).
In the loss, equal class weighting for positives and negatives is applied.
The model is trained for 25 epochs. This amounts to 50 hours on a NVIDIA P100 GPU.

One training epoch corresponds to 300,000 node classifications. The model is trained with the Adam optimizer and a learning rate of 0.0002 and a minibatch size of 32. The evaluation is done on the remaining 90,000 test set nodes, which the model has not seen.

The result (37.1% F1, 0.69% precision, 0.25% recall) is short a few percent from the Twibot-22 paper (39.6%) (maybe because the model uses only graph-based information, opposed to the paper's implemementation).

The naive logistic regression baseline (user vector + degree counts from the graph) outperforms the model with an F1 of 43%, (30% precision, 77% recall).
Since the GNN achieves similar results to the Twibot-22 paper, despite not knowing their exact implementation we can guessably observe that the logistic regression baseline outperforms over 58% of the algorithms (17/29) implemented for demonstration of the Twibot-22 dataset.

The GNN might show more desirable performance for real-world application compared to the logistic regression model without any tuning (0.69% vs 30% precision).

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
lib		lib
training_data		training_data
README.md		README.md
edit_input_nodes.ipynb		edit_input_nodes.ipynb
graph-visualization.png		graph-visualization.png
graph-visualization.svg		graph-visualization.svg
heterodata_graph.html		heterodata_graph.html
heterodata_graph2.html		heterodata_graph2.html
image-2.png		image-2.png
image-3.png		image-3.png
image.png		image.png
main.ipynb		main.ipynb
twibot22-hgt_sample_evaluation.ipynb		twibot22-hgt_sample_evaluation.ipynb
twibot22-hgt_sample_training.ipynb		twibot22-hgt_sample_training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bot or User? - Twitter Classification

Experimental Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AmosDinh/Bot-or-User_Twitter-Classification

Folders and files

Latest commit

History

Repository files navigation

Bot or User? - Twitter Classification

Experimental Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages