A text generator trained on 4chan’s /pol/ board
Over the course of three and a half years, the creator honed the GPT-J language model by analysing more than 134.5 million postings on /pol/.
The thread structure of the board was put into his programme. Thus, an artificial intelligence was created that could post to /pol/ in the manner of an actual human.
GPT-4chan is a language model fine-tuned from GPT-J 6B on 3.5 years worth of data from 4chan's politically incorrect (/pol/) board.
GPT-4chan was fine-tuned on the dataset Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board.
The model was trained for 1 epoch following GPT-J's fine-tuning guide.
GPT-4chan is trained on anonymously posted and sparsely moderated discussions of political topics. Its intended use is to reproduce text according to the distribution of its input data. It may also be a useful tool to investigate discourse in such anonymous online communities. Lastly, it has potential applications in tasks suche as toxicity detection, as initial experiments show promising zero-shot results when comparing a string's likelihood under GPT-4chan to its likelihood under GPT-J 6B.