The poor success rate of OpenAI’s classifier

On January 31st, OpenAI’s team released a “text classifier” aimed at determining if a text was written by a language model or by a human. We are not sure why they released it because there are already several good classifiers on the internet that are much better than what the OpenAI team released. 

The classifier from OpenAI is so bad you really wonder what it is supposed to do. The only good thing about it is that they are open and transparent with their data, so you can see yourself how poor it is and then use your own judgement.  

Here are the current statistics, from their website: 

In other words; 

  • 80% of Human written text are classified as either “Unclear”, “Possibly AI-generated” or “Likely AI-generated”. 
  • Only 20% of texts written by a human are classified as “Unlikely” or “Very unlikely to be AI-generated”. That’s one in five. 

Correctly identifying one out of five texts as written by a human is very poor. 

And it gets worse; one out of three texts (30%)  written by humans are wrongly classified as AI-generated.

This means that a text written by a human is 50% more likely to be classified as “AI-generated” than correctly classified as written by a human.

Here is a pie chart showing how bad it is: 

And here is a bar chart showing the same thing. The red is wrong, the yellow is useless and the green is correct.

As you can clearly see from the graph. When trying to classify a human written text OpenAI is wrong more often than they are correct. By a huge 50% margin. 

From their own statistics, you can also see that a classification of “Likely AI-generated” or “Very likely AI-generated” can’t be trusted either.

When something is flagged as AI-generated there is a 42% chance it’s written by a human. That’s how many false positives this “classifier” churns out. That’s really bad. 

If you are a school teacher and you are using OpenAI to check students’ essays, you run a high risk of falsely accusing students of cheating. 

And this is from their own data. 

Classifiers we recommend you use instead are:

UPDATE 2023-02-27: It seems like they have removed the statistics from their website, without explanation. So the only thing that was good about the classifier (being open with the data) is no more.

