GPT-4chan

A text generator trained on 4chan’s /pol/ board

About GPT-4chan

Over the course of three and a half years, the creator honed the GPT-J language model by analysing more than 134.5 million postings on /pol/.

The thread structure of the board was put into his programme. Thus, an artificial intelligence was created that could post to /pol/ in the manner of an actual human.

Model Description

GPT-4chan is a language model fine-tuned from GPT-J 6B on 3.5 years worth of data from 4chan's politically incorrect (/pol/) board.

Training data

GPT-4chan was fine-tuned on the dataset Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board.

Training procedure

The model was trained for 1 epoch following GPT-J's fine-tuning guide.

Intended Use

GPT-4chan is trained on anonymously posted and sparsely moderated discussions of political topics. Its intended use is to reproduce text according to the distribution of its input data. It may also be a useful tool to investigate discourse in such anonymous online communities. Lastly, it has potential applications in tasks suche as toxicity detection, as initial experiments show promising zero-shot results when comparing a string's likelihood under GPT-4chan to its likelihood under GPT-J 6B.