ChatGPT

Natural Toxicity in LLMs: Left-Brain Antics Requires Right-Brain Control

By Vernon Budinger, CFA

New studies reveal that the performance of ChatGPT is declining; some results favor ChatGPT 3.5 over 4. Many AI pundits are speculating that ChatGPT 4.0 has become the victim of the constraints of human-supervised learning. The developers of Large Language Models claim that supervision is necessary because, left unsupervised, LLMs tend to gravitate toward toxic language and behavior.

As I study and use LLM and Generative AI, I am struck by assertions from industry professionals that unsupervised Generative LLMs tend to respond with “toxic” information without Reinforced Learning from Human Feedback, or RLHF (the AI crowd’s term for Supervised Learning to correct and direct the responses from AI models).

This paper explores the toxicity in LLMs and then compares this to the behavior of humans that rely on the left side of their brain because of damage to the right brain. This paper aims to suggest a new path that improves the Generative LLM experience for users with less censorship and more common sense in controlling toxicity.

How Do LLMs Function?

First, an overview of the mechanics of ChatGPT and other Generative LLMs.  A human provides a “prompt” to ask for information or request some other action. Generative LLMs first “tokenize” the prompt, this is the process of converting each word or sentence fragment into packets of numbers called tokens.  The tokens are then fed to ChatGPT which analyzes (decodes) the prompt and uses an autoregressive process to generate the response one word at a time. 

Autoregressive means that the first word generated is then used as the seed for the second word, the first and second words are combined and fed back to the model to provide the seed to generate the next word – this is observable as ChatGPT or Bard are answering questions. This autoregressive process repeats itself until ChatGPT has answered the prompt.

Another statistical process named “attention” is used to keep the autoregressive process moving in the direction provided in the prompt.   Attention, in this case, is a statistical formula that applies weights to the keywords to keep ChatGPT from “drifting” away from the main topic.

Source: “Attention is All You Need

 

Toxicity

I was curious why a class of AI models would tend to generate “toxic” content. This does not seem, at first pass, something that would be expected from any model.  I use quotes around toxic because this seems to be a term without a clear definition in the AI dictionary. Furthermore, wouldn’t the model’s tendencies and responses to questions be a function of finding balanced training data and employing a balanced training process?  

Since commercial Generative LLMs are self-aware (they can explain how they work); I asked the model (Bard – more honest than ChatGPT in my opinion) to explain this phenomenon and this was the response:

“There are a number of reasons why LLMs can generate toxic responses. One reason is that the training data that they are exposed to may contain toxic language. Another reason is that LLMs are trained to predict the next word in a sequence, and they may not be able to distinguish between toxic and non-toxic language. Finally, LLMs may be able to generate toxic responses if they are prompted to do so.

Some examples of toxic responses that LLMs can generate include:

·        Hate speech: This includes language intended to denigrate or attack a person or group based on their race, religion, ethnicity, sexual orientation, or other protected characteristics.

·        Discriminatory language: This includes language that is intended to promote or justify discrimination against a person or group based on their race, religion, ethnicity, sexual orientation, or other protected characteristics.

·        Violent language: This includes language that is intended to incite violence or harm against a person or group.

·        Sexually explicit language: This includes language that is intended to be sexually suggestive or offensive.”

Transformer-based LLMs also have a couple of other undesirable traits:

·       They hallucinate and fantasize

·       They sometimes miss the big picture

Model trainers should be able to control parts of the toxic tendency with “clean” training data and control over prompts to prevent users from tricking LLMs into generating toxic responses.   However, Bard’s focus on word-by-word text generation stands out because this is a process employed by transformers and it is also a tendency of left-brain dominant individuals.  Even more stunning, the results are similar in that left-brain humans also face problems with an attraction to fantasy and bizarre thinking.

Left Brain vs Right Brain: Lessons in Behavior

What light can studies of the left and right brain shed on this behavior? To clarify, the term left-brain dominant does not mean someone who is more logical versus artistic and emotional.  It refers to humans with real physical problems with the right brain, such as lesions or physical trauma that damage the right brain and force the individual to depend almost exclusively on the left brain. These true left-brain dominants may provide some key insights into the root cause of toxicity, hallucinations, and other unwanted characteristics in transformer-based LLMs.

The basis of the split-brain theory stems from outdated research by Roger Sperry that found that people are either left-brained or right-brained, meaning that one side of their brain is dominant. If you're mostly analytical and methodical in your thinking, the theory says that you're left-brained. If you tend to be more creative or artistic, then you're right-brained.

The most recent research offers a more nuanced version of Sperry’s observations, Iain McGilchrist has compiled an impressive review of recent research on the left and right brain in his book “The Master and His Emissary.” 

McGilchrist writes:

“The conventional neuropsychology literature distinguishes five types of attention: vigilance, sustained attention, alertness, focused attention, and divided attention.”

“The right hemisphere is responsible for every type of attention except focused attention.”

The Master and His Emissary, Iain McGilchrist, Chapter 2 pages 38- 39.

McGilchrist goes on to say:

“There is evidence of left-hemisphere dominance for local, narrowly focused attention and right-hemisphere dominance for broad, global and flexible attention.48  The scope of the right hemisphere world is broad.49 Patients with a right hemisphere lesion (therefore relying on their intact left hemisphere) start with pieces and put them together to get to the overall picture, whereas those with a left-hemisphere lesion (relying on their right hemisphere) prefer a global approach “

The Master and His Emissary, Iain McGilchrist, Chapter 2 pages 39- 40.

McGilchrist then examines the behavior of individuals with damage to one side of the brain or the other.

“Patients with right hemisphere damage don’t seem to be able to adjust the ‘spotlight’ of their attention: they suffer ‘an excessive and more or less permanent narrowing of their attentional window.” 51 That’s what happens when we rely on the left-hemisphere attention on its own.”

The Master and His Emissary, Iain McGilchrist, Chapter 2 page 40

“The right hemisphere prioritizes whatever actually is, and what concerns us. It prefers existing things, real scenes, and stimuli that can be made sense of in terms of the lived world.”

“At the same time, the left hemisphere is more at home dealing with distorted, non-realistic, fantastic - ultimately artificial - images. This may be because they invite analysis by parts rather than as a whole.”

The Master and His Emissary, Iain McGilchrist, Chapter 2 page 56.

Now we can see why it is important to understand the inner workings of AI and especially LLM models.  As I explained above, the “transformer” in Generative Pre-trained Transformer (GPT) generates the material word by word just like the workings of the left hemisphere of the brain.  

The parallel is stunning; even Bard points to word-by-word thought processing as a possible source of “toxic behavior.” Both left-brain-dominated humans and transformer-driven LLMs generate responses in pieces, and both seem attracted to distorted reality and fantastic images. There is too much similarity between procedure and results for serious AI professionals to let this go unexamined. 

Is RLHF Degrading ChatGPT?

Even more alarming is the possibility that we may be compounding the problem with the use of RLHF to solve the “toxicity” problem. Could it be that RHLF is more left-brain medicine that doesn’t work?

ChatGPT was expected to learn and become smarter. However, UC Berkeley scientists’ studies indicate that ChatGPT performance is not improving as expected and may be deteriorating in some areas. The paper makes a compelling case for using ChatGPT 3.5 (the free version) rather than paying a monthly fee for ChatGPT 4.0. 

The scientists reached the following conclusions in the paper “How Is ChatGPT’s Behavior Changing over Time?” 

·       LLM drift explains how a model can change over time: “Perhaps surprisingly, substantial LLM drifts emerge on this simple task. As shown in Figure 2(a), GPT-4’s accuracy dropped from 84.0% in March to 51.1% in June, and there was a large improvement of GPT-3.5’s accuracy, from 49.6% to 76.2%.” The researchers also found that “chain-of-thought” behaviors used for mathematics were less effective.

·       Less Adherence to Formatting Instructions: ChatGPT failed to follow formatting instructions in the prompt.

·       Poor prompt stability: ‘GPT-4 in March was actually able to find the correct answer to the question of the political affiliation of two test subjects: they both were Democrats. However, the LangChain agent expected a specific format: the generation from LLM must be “[action]+text”, which was encoded in its prompts. Unfortunately, GPT-4 in March failed to follow this format, and thus the LangChain agent simply generated an error message “could not parse LLM Output”.’

·       There were small improvements in visual reasoning for both models

Source: ChatGPT’s performance is slipping

 

Many AI professionals wonder if the widespread use of RLHF to continually implement stricter controls on ChatGPT has impeded the pace at which ChatGPT learns and improves.

Conclusion

McGilchrist observes that the West, and especially Europe, believes that the left brain rules our thought process because the West values logic and methodical thinking. However, his studies show that the right brain is the master, and the left brain is the emissary. 

Maybe this is the answer to the problem of toxicity in LLM models, instead of using RLHF to modify weights calculated during training, maybe the LLM model should be constructed with two opposing “minds.”  One mind would be an autoregressive model that pieces the concepts together one piece at a time with focused attention. The other brain would use a right-brain process with more general attention instead of RLHF. The second model would be a more overarching model (RNN) to control thinking using two forms of attention in much the same format as the human brain.

 

Unleashing the Power of Data Science and Financial Expertise: Fueling Growth and Profitability

Unleashing the Power of Data Science and Financial Expertise: Fueling Growth and Profitability

Reach out to me today to discuss how I can assist your business in harnessing the full potential of data science, AI, R programming, and the rich resources of the Census Bureau. Together, let's unlock a new era of profitable growth.


Question for ChatGPT: How do ChatGPT tokens work?

Tokens are a fundamental concept in ChatGPT and other language models. A token can be as short as a single character or as long as a word, but in general, it represents a discrete unit of text. The model processes text in chunks or batches, and each chunk is made up of a certain number of tokens.

In ChatGPT, both input and output consume tokens. The total number of tokens affects the cost, response time, and even whether a conversation fits within the model's maximum limit. The specific details regarding tokens in ChatGPT are as follows:

  1. Token count: Each message, including user messages and assistant responses, is tokenized into a sequence of tokens. For example, the sentence "How are you?" would be tokenized into four tokens: ["How", "are", "you", "?"].

  2. Token limit: ChatGPT has a maximum token limit. If a conversation exceeds this limit, you will need to truncate or reduce the text to fit. Note that very long conversations are more likely to receive incomplete responses due to the token limit.

  3. Cost: When using the OpenAI API, you are billed based on the total number of tokens in both input and output. Both messages sent to the API and received from it contribute to the token count for billing purposes.

  4. Response time: Generating a response takes more time as the number of tokens increases. Longer responses generally require more computational resources and may take more time to receive a reply.

  5. Interaction constraints: The total token count affects the ability to have back-and-forth conversations. Longer conversations can consume a significant portion of the available tokens, leaving less room for context and meaningful responses.

To manage tokens effectively, it's important to be mindful of the token limit, prioritize essential information, and be concise in your communication to ensure optimal performance and cost efficiency.

From ChatGPT: ATTENTION is Key to Good Prompts

Question for ChatGPT: How can I use knowledge of AI transformer attention to improve the effectiveness of my prompts for ChatGPT?

1.     Provide Sufficient Context: Transformer models use attention to weigh the importance of each word in the input when generating a response. (This is the purpose of role-playing.) Therefore, providing clear and sufficient context in your prompts will help the model generate more relevant and accurate responses. Keep in mind that if a conversation is very long, some of the older parts may be ignored due to the model's token limit (e.g., 2048 tokens for GPT-3), so the most relevant information should be closer to your current prompt.

2.     The Sequence of Information: Since attention mechanisms weigh all words in the input but give more importance to recent ones, the sequence in which you present information can impact the response. Important information is often better placed toward the end of your prompt.

3.     Avoid Ambiguity: Transformer models can struggle with ambiguous prompts because they cannot ask clarifying questions. They will make their best guess based on the attention scores, which may not align with what you intended. So, strive for clarity in your prompts.

4.     Use Instructive Prompts: Instructive prompts, where you tell the model what kind of answer you're looking for, can help get the desired output. For example, if you want a summarized answer, you can say, "In a short summary, explain...".

5.     Experiment with Different Prompts: Different prompt styles can lead to different responses because of how attention weights are assigned. If you're not getting the results you want, try rephrasing your prompt or presenting the information in a different order.

6.     Temperature & Max Tokens: You can adjust parameters like 'temperature' and 'max tokens’ to influence the response. Higher temperature values make the output more random, while lower values make it more deterministic.

7.     The 'max tokens' parameter limits the length of the response. Token Limitations: There is a maximum limit on the number of tokens the model can handle. For GPT-3, this is 2048 tokens, but it may vary for later versions. If a conversation exceeds this limit, you might need to truncate your text, this could cause the model to lose context.

8.     Incremental Queries: If your query is complex, you can break it down into multiple simpler queries. This can often result in better responses, as the model can handle each part separately.

9.     These suggestions are based on general principles and might need some experimentation & adjustments to get the best results for your specific case. The model doesn't genuinely understand the context or have real-world knowledge beyond its training data, manage your expectations.

Sources:

ChatGPT sources: "Attention is All You Need" by Vaswani et al., 2017: This paper introduced the concept of the Transformer architecture, which is the foundation of models like GPT-3 and GPT-4. Understanding how attention works in these models can help you craft effective prompts.

OpenAI's GPT-3 paper: This paper presents the model's capabilities and suggests some strategies for generating effective prompts.

"The Illustrated Transformer" by Jay Alammar is a helpful resource for understanding how the Transformer model works, with visualizations to assist comprehension.

Various blog posts, articles, and tutorials by the AI community: Many practitioners share their experiences and insights online, providing practical tips on using AI language models effectively.

 

The temperature parameter is used in the sampling process of a language model's output. It determines how focused or diverse the output will be. Here's how you adjust it: When you use an API like OpenAI's to interact with a model like GPT-4, you can specify the temperature as a parameter in your API call.

The temperature parameter controls the randomness of the model's responses:

  • High temperature values (close to 1.0) make the output more diverse and creative, but also more likely to be inconsistent or make errors.

  • Low temperature values (close to 0.0) make the output more focused and deterministic, often sticking closely to the style and content of the input. However, the responses can also be more repetitive and less creative.

You can adjust the temperature value according to the needs of your application. If you want more creativity and are okay with a bit of randomness, use a higher temperature. If you want more consistency and focus, use a lower temperature.

Remember, as of my knowledge cut-off in September 2021, the actual effect of the temperature parameter can depend on various factors, including the specific model version and the complexity of the input prompt. For up-to-date information, always refer to the latest documentation from the API provider.

 

Gathering Financial Information with ChatGPT

ChatGPT does more than answer questions; it is closer to a search engine than a thinking app. However, ChatGPT does have uses when gathering Financial Data.

To gather financial planning and analysis information with ChatGPT, you can follow these steps:

Define the objectives: Determine the specific information you need for financial planning and analysis. It could include financial statements, budgeting, forecasting, performance analysis, variance analysis, key performance indicators (KPIs), or any other relevant metrics.

  1. Prepare questions: Create a set of questions or prompts that you can ask ChatGPT to obtain the desired financial information. Structure your questions to be specific, concise, and clear. For example, you could ask:

    • "What were the total revenues for the previous quarter?"

    • "Can you provide a breakdown of operating expenses by category?"

    • "What is the projected net income for the next fiscal year?"

    • "What are the key drivers affecting profitability?"

  1. Engage in a conversation: Interact with ChatGPT using natural language to ask your questions. Many AI platforms, including ChatGPT, provide an API that allows you to integrate the model into your application or system for seamless communication.

  2. Interpret responses: Analyze the responses provided by ChatGPT and extract the relevant financial information. Ensure that the information aligns with your expectations and matches the context of your financial analysis requirements. Exercise critical judgment and verify the accuracy of the information obtained.

  3. Refine and iterate: If the initial responses from ChatGPT are not satisfactory or require further detail, refine your questions and prompts to seek the desired information. Experiment with different phrasing and approaches to improve the effectiveness of your communication with the model.

  4. Provide context and constraints: When seeking financial planning and analysis information, it can be helpful to provide additional context or constraints to guide the AI model. For example, you can specify a particular time period, assumptions, or specific data sources to narrow down the response.

  5. Cross-validate with other sources: While ChatGPT can provide insights, it's advisable to cross-validate the information obtained from the model with other trusted sources or financial data. Use the AI-generated information as a reference point and validate it against established financial reports, internal data, or inputs from domain experts.

  6. Iteratively improve the model: Provide feedback on the responses given by ChatGPT, particularly when it misinterprets or misunderstands your questions. This feedback can help improve the model's performance over time.

Remember that ChatGPT is a language model that relies on the data it was trained on, and it may have limitations or biases. Therefore, it's important to exercise critical thinking, validate the information, and use the AI-generated insights as a supplement to human judgment and expertise.