LLM

Natural Toxicity in LLMs: Left-Brain Antics Requires Right-Brain Control

By Vernon Budinger, CFA

New studies reveal that the performance of ChatGPT is declining; some results favor ChatGPT 3.5 over 4. Many AI pundits are speculating that ChatGPT 4.0 has become the victim of the constraints of human-supervised learning. The developers of Large Language Models claim that supervision is necessary because, left unsupervised, LLMs tend to gravitate toward toxic language and behavior.

As I study and use LLM and Generative AI, I am struck by assertions from industry professionals that unsupervised Generative LLMs tend to respond with “toxic” information without Reinforced Learning from Human Feedback, or RLHF (the AI crowd’s term for Supervised Learning to correct and direct the responses from AI models).

This paper explores the toxicity in LLMs and then compares this to the behavior of humans that rely on the left side of their brain because of damage to the right brain. This paper aims to suggest a new path that improves the Generative LLM experience for users with less censorship and more common sense in controlling toxicity.

How Do LLMs Function?

First, an overview of the mechanics of ChatGPT and other Generative LLMs.  A human provides a “prompt” to ask for information or request some other action. Generative LLMs first “tokenize” the prompt, this is the process of converting each word or sentence fragment into packets of numbers called tokens.  The tokens are then fed to ChatGPT which analyzes (decodes) the prompt and uses an autoregressive process to generate the response one word at a time. 

Autoregressive means that the first word generated is then used as the seed for the second word, the first and second words are combined and fed back to the model to provide the seed to generate the next word – this is observable as ChatGPT or Bard are answering questions. This autoregressive process repeats itself until ChatGPT has answered the prompt.

Another statistical process named “attention” is used to keep the autoregressive process moving in the direction provided in the prompt.   Attention, in this case, is a statistical formula that applies weights to the keywords to keep ChatGPT from “drifting” away from the main topic.

Source: “Attention is All You Need

 

Toxicity

I was curious why a class of AI models would tend to generate “toxic” content. This does not seem, at first pass, something that would be expected from any model.  I use quotes around toxic because this seems to be a term without a clear definition in the AI dictionary. Furthermore, wouldn’t the model’s tendencies and responses to questions be a function of finding balanced training data and employing a balanced training process?  

Since commercial Generative LLMs are self-aware (they can explain how they work); I asked the model (Bard – more honest than ChatGPT in my opinion) to explain this phenomenon and this was the response:

“There are a number of reasons why LLMs can generate toxic responses. One reason is that the training data that they are exposed to may contain toxic language. Another reason is that LLMs are trained to predict the next word in a sequence, and they may not be able to distinguish between toxic and non-toxic language. Finally, LLMs may be able to generate toxic responses if they are prompted to do so.

Some examples of toxic responses that LLMs can generate include:

·        Hate speech: This includes language intended to denigrate or attack a person or group based on their race, religion, ethnicity, sexual orientation, or other protected characteristics.

·        Discriminatory language: This includes language that is intended to promote or justify discrimination against a person or group based on their race, religion, ethnicity, sexual orientation, or other protected characteristics.

·        Violent language: This includes language that is intended to incite violence or harm against a person or group.

·        Sexually explicit language: This includes language that is intended to be sexually suggestive or offensive.”

Transformer-based LLMs also have a couple of other undesirable traits:

·       They hallucinate and fantasize

·       They sometimes miss the big picture

Model trainers should be able to control parts of the toxic tendency with “clean” training data and control over prompts to prevent users from tricking LLMs into generating toxic responses.   However, Bard’s focus on word-by-word text generation stands out because this is a process employed by transformers and it is also a tendency of left-brain dominant individuals.  Even more stunning, the results are similar in that left-brain humans also face problems with an attraction to fantasy and bizarre thinking.

Left Brain vs Right Brain: Lessons in Behavior

What light can studies of the left and right brain shed on this behavior? To clarify, the term left-brain dominant does not mean someone who is more logical versus artistic and emotional.  It refers to humans with real physical problems with the right brain, such as lesions or physical trauma that damage the right brain and force the individual to depend almost exclusively on the left brain. These true left-brain dominants may provide some key insights into the root cause of toxicity, hallucinations, and other unwanted characteristics in transformer-based LLMs.

The basis of the split-brain theory stems from outdated research by Roger Sperry that found that people are either left-brained or right-brained, meaning that one side of their brain is dominant. If you're mostly analytical and methodical in your thinking, the theory says that you're left-brained. If you tend to be more creative or artistic, then you're right-brained.

The most recent research offers a more nuanced version of Sperry’s observations, Iain McGilchrist has compiled an impressive review of recent research on the left and right brain in his book “The Master and His Emissary.” 

McGilchrist writes:

“The conventional neuropsychology literature distinguishes five types of attention: vigilance, sustained attention, alertness, focused attention, and divided attention.”

“The right hemisphere is responsible for every type of attention except focused attention.”

The Master and His Emissary, Iain McGilchrist, Chapter 2 pages 38- 39.

McGilchrist goes on to say:

“There is evidence of left-hemisphere dominance for local, narrowly focused attention and right-hemisphere dominance for broad, global and flexible attention.48  The scope of the right hemisphere world is broad.49 Patients with a right hemisphere lesion (therefore relying on their intact left hemisphere) start with pieces and put them together to get to the overall picture, whereas those with a left-hemisphere lesion (relying on their right hemisphere) prefer a global approach “

The Master and His Emissary, Iain McGilchrist, Chapter 2 pages 39- 40.

McGilchrist then examines the behavior of individuals with damage to one side of the brain or the other.

“Patients with right hemisphere damage don’t seem to be able to adjust the ‘spotlight’ of their attention: they suffer ‘an excessive and more or less permanent narrowing of their attentional window.” 51 That’s what happens when we rely on the left-hemisphere attention on its own.”

The Master and His Emissary, Iain McGilchrist, Chapter 2 page 40

“The right hemisphere prioritizes whatever actually is, and what concerns us. It prefers existing things, real scenes, and stimuli that can be made sense of in terms of the lived world.”

“At the same time, the left hemisphere is more at home dealing with distorted, non-realistic, fantastic - ultimately artificial - images. This may be because they invite analysis by parts rather than as a whole.”

The Master and His Emissary, Iain McGilchrist, Chapter 2 page 56.

Now we can see why it is important to understand the inner workings of AI and especially LLM models.  As I explained above, the “transformer” in Generative Pre-trained Transformer (GPT) generates the material word by word just like the workings of the left hemisphere of the brain.  

The parallel is stunning; even Bard points to word-by-word thought processing as a possible source of “toxic behavior.” Both left-brain-dominated humans and transformer-driven LLMs generate responses in pieces, and both seem attracted to distorted reality and fantastic images. There is too much similarity between procedure and results for serious AI professionals to let this go unexamined. 

Is RLHF Degrading ChatGPT?

Even more alarming is the possibility that we may be compounding the problem with the use of RLHF to solve the “toxicity” problem. Could it be that RHLF is more left-brain medicine that doesn’t work?

ChatGPT was expected to learn and become smarter. However, UC Berkeley scientists’ studies indicate that ChatGPT performance is not improving as expected and may be deteriorating in some areas. The paper makes a compelling case for using ChatGPT 3.5 (the free version) rather than paying a monthly fee for ChatGPT 4.0. 

The scientists reached the following conclusions in the paper “How Is ChatGPT’s Behavior Changing over Time?” 

·       LLM drift explains how a model can change over time: “Perhaps surprisingly, substantial LLM drifts emerge on this simple task. As shown in Figure 2(a), GPT-4’s accuracy dropped from 84.0% in March to 51.1% in June, and there was a large improvement of GPT-3.5’s accuracy, from 49.6% to 76.2%.” The researchers also found that “chain-of-thought” behaviors used for mathematics were less effective.

·       Less Adherence to Formatting Instructions: ChatGPT failed to follow formatting instructions in the prompt.

·       Poor prompt stability: ‘GPT-4 in March was actually able to find the correct answer to the question of the political affiliation of two test subjects: they both were Democrats. However, the LangChain agent expected a specific format: the generation from LLM must be “[action]+text”, which was encoded in its prompts. Unfortunately, GPT-4 in March failed to follow this format, and thus the LangChain agent simply generated an error message “could not parse LLM Output”.’

·       There were small improvements in visual reasoning for both models

Source: ChatGPT’s performance is slipping

 

Many AI professionals wonder if the widespread use of RLHF to continually implement stricter controls on ChatGPT has impeded the pace at which ChatGPT learns and improves.

Conclusion

McGilchrist observes that the West, and especially Europe, believes that the left brain rules our thought process because the West values logic and methodical thinking. However, his studies show that the right brain is the master, and the left brain is the emissary. 

Maybe this is the answer to the problem of toxicity in LLM models, instead of using RLHF to modify weights calculated during training, maybe the LLM model should be constructed with two opposing “minds.”  One mind would be an autoregressive model that pieces the concepts together one piece at a time with focused attention. The other brain would use a right-brain process with more general attention instead of RLHF. The second model would be a more overarching model (RNN) to control thinking using two forms of attention in much the same format as the human brain.