About the models
This app uses a deep-learning model called DistilBERT (publicly available on the
huggingface hub), which is a smaller version of the
BERT large-language model introduced by Google in 2018. Along with the model used by Chat-GPT, DistilBERT and BERT belong to a class of neural networks called
transformers, and were pre-trained to perform text prediction on a very large corpus of documents consisting of about 3.3 billion words. When buildling an AI-detector, I re-trained a number of large language models on the smaller collection of texts I compiled, for the purpose of distinguishing between human and AI generated text.
My dataset consisted of an equal number of human and AI generated texts, 24180 of which were used for training and 4772 for testing. The human-generated texts came from three different sources: 1) the IMDB review database, 2) introductory sections of Wikipedia articles, and 3) the r/AITA and r/relationship_advice subreddits. For each human-generated text, I prompted GPT-3.5 Turbo to write a similar text.
The model performed extremely well during testing, with only 23 texts incorrectly classified out of the 4772. The model is of course expected to perform less well when given texts that are significantly different in style to the training data, e.g. poetry, film scripts, legal documents, etc, or when given texts generated by a large language model other than GPT-3.5 Turbo. The hope is that the knowledge the model acquired from the training data is transferrable to samples of written language from other sources, but it remains to be seen how well the model generalises beyond its training data.