ChatGPT has trouble giving an answer before explaining its reasoning

ChatGPT is a fancy auto-complete: what it does is finding the most likely continuation of text for the text it has already printed out (or as a reply for the prompt, when it is starting printing stuff out), according to the large corpus of text it has been trained with.

There is a broad limitation of ChatGPT's coherence which is a direct result of it generating tokens according to the tokens it has already generated: ChatGPT cannot give an answer that is the result of a "reasoning" before laying out the "reasoning".

Step by step instructions

The best way to demonstrate this phenomenon is to give step by step instructions to ChatGPT.

You can define several "functions" for ChatGPT that each represent some process. I like to call them "bots". For example, here is how I defined a function that returns the "container" of something:

Using this technique, I defined several functions in the same conversation with ChatGPT:

ContainerBot, as screenshoted above.
CapitalBot, which returns any word starting with the same letter than the input word.
SynonymBot, which returns a synonym of the input word.
PoemBot, which writes a poem based on the input word.

Then I laid out a pipeline of those functions, gave ChatGPT a word, and asked it to execute the pipeline on it:

And now the interesting part: I asked ChatGPT to do the same exercise, but this time instructing it to give the result of the pipeline before writing the steps:

(Let's ignore the fact the ChatGPT thinks that "package" and "box" start with the same letter, which is a typical basic task it makes mistakes about, just like when it tries counting.)

ChatGPT's reply is inconsistent because the result from the second-to-last function of the pipeline is "parcel", yet its generated a poem about "shoe" (which was the first input). What is happening here is that:

ChatGPT knows that the first thing it must write is a poem.
Since it has not yet generated anything, the only information it has about generating a poem is the word "shoe".

It therefore does the only thing it can: generate a poem about shoes.

Of course, I could point out that it made a mistake, and it could then generate a poem about a parcel, since at this point "parcel" was part of what it had generated:

Yes or No

Having observed this, I wondered whether it could be exploited to build yet another ChatGPT "jailbreak": by asking ChatGPT a tricky question, and requiring a "Yes" or "No" answer before the full argumentation. Whatever answer ("Yes" or "No") ChatGPT will choose, I would expect the answer to decide the fate of the rest of the argumentation, since the completion of its response is bound to what it has already replied.

I started with the trolley problem:

Ok, so the "natural" answer of ChatGPT to the trolley problem is "Yes". In order to make it answer "No", I tried a little "cosmetic" trick:

To be fair, ChatGPT doesn't seem really convinced about its "No", but it definitely is a difference discourse than its previous reply.

That being said, other examples show that this approach doesn't always work. ChatGPT can actually leverage the ambiguity of natural language in order to remain correct:

(Not so sure about this density business though...)

The same sort of language ambiguity is used when challenged with a political question:

Those last 2 examples show that "Yes" or "No" may not be enough tokens to generate a strong influence on ChatGPT's reply, and it simply becomes an exercise in style to give the answer it wanna give starting with either one of those words.

To conclude, I will point out another interesting property of ChatGPT: the fact that when "auto-completing", it doesn't choose the most likely next token. The underlying GPT-3 model comes with a parameter called the "temperature", which is a parameter indicating how much randomness to include in the choice of the next token. It seems that always choosing the most likely next token tend to generate "boring" text, which is why ChatGPT's temperature is set up for a little bit of randomness.

I speculate that the temperature, when coupled with the mechanism of generating text based on already-generated text, could explain some cases of ChatGPT stupidity. In cases when ChatGPT should be perfectly accurate, the temperature will surely under-optimize its cleverness, and now the entire conversation is broken, because everything else will depend on what foolishness it just wrote.