ChatGPT & LLMs 101 – The 10-Minute Guide I Give Colleagues

How can a machine generate text that sounds almost human? The answer lies in understanding the power behind ChatGPT and Large Language Models (LLMs). No worries, apart from an interest and a few minutes of your time, you don’t need anything else.

My article breaks down how ChatGPT and LLMs work: their “building blocks”, training as well as capabilities and limitations. We’ll also look at future developments (e.g., “agentic systems” or humanoid robots) that promise even more mind-blowing applications.

Table of Contents

Understanding ChatGPT and “LLMs”

What is “ChatGPT”?

Maybe you’ve already experimented with it: ChatGPT, developed by OpenAI, is a “generative AI” chatbot designed to generate human-like text. Its text generation is enabled by an “LLM” (currently it’s the variant “GPT-4o”). ChatGPT excels at various tasks, from drafting emails to answering complex questions, by predicting the next word in a sentence.

Breaking Down the Abbreviations

“GPT” stands for “Generative Pre-trained Transformer”:

Generative: Refers to the model’s ability to generate content.
Pre-trained: Indicates the model is trained on a large dataset before being fine-tuned for specific tasks.
Transformer: Describes the type of neural network architecture used, which is highly effective for language understanding.

“LLM” means “Large Language Model”:

Large: Shows the model’s size, which includes many parameters and extensive training data.
Language: Specifies the model’s focus on understanding and generating human language as part of the AI subfield NLP. That stands for “natural language processing”.
Model: Refers to the AI being a system (for NLP).

upward-dynamism_ai_ml_dl_genai_chatgpt_venn-diagram_john.png

The (simplified) Venn chart puts it into context: ChatGPT, a chatbot, is powered by an LLM. LLMs are a type of generative AI (systems that create novel content). GenAI is a subset of deep learning which involves training models on vast amounts of data. Deep learning, in turn, is part of machine learning (ML) which employs statistical methods for data analysis. All of these fall under the umbrella “AI”, which aims to endow machines with human-like capabilities. This includes subfields like ML, robotics, etc.

LLMs vs. Small Language Models (SLMs)

LLMs, equipped with larger data piles and more parameters or weights, excel in generating responses across various domains. This offers greater flexibility, esp. for typical “general chatbot” situations/applications (like ChatGPT). However, SLMs can outperform LLMs in (many) specific, narrow tasks and are more efficient at that (thanks to smaller size). For instance, an SLM can be more effective at processing particular types of documents quickly and accurately. This makes them ideal for “niche” applications.

A Diverse Bouquet of LLMs …

ChatGPT isn’t the only fish in the sea. Other notable LLMs include Anthropic’s Claude, Google’s Gemini etc. Each of these (and countless others) models brings unique features to the table. Comparing them goes beyond this article, but check out my post where I did the “dirty work” of comparing the leading AI chatbots for you.

Inside the Engine: How LLMs Operate or ”Think”

“Auto-Complete on Steroids”

You can think of LLMs as a “supercharged version” of the auto-complete feature on your smartphone. They predict the next word (or “token”) in a sequence by calculating which word most likely follows – based on patterns from the training data. By recognizing and “applying” these patterns, LLMs can generate “plausible” sentences.

This analogy hopefully helps demystify their working and shows how they create seemingly “thoughtful” sentences from your prompts. Yes, it’s (almost) that simple – if that’s already enough “understanding how it works” for you, you may skip the following “deeper dive”…

Deep Dive Excursus: “Transformers Under the Hood”

LLMs consist of several key elements working together in the “transformer model” (not what you think… 😉), a type of “neural network” modeled after our brain’s inner workings. Simpler put, think of LLMs as sophisticated machines that process and generate text through a structured process:

1. Receiving and “Tokenizing” Input: When you input text (at the model’s “input layer”), the model first breaks it down into smaller units called tokens. These tokens can be whole words or parts of words. Imagine this step as chopping up a sentence into bite-sized pieces that the model can handle more easily.

2. Applying “Embeddings”: Next, these tokens are transformed into vectors, which are numerical representations that the model can process. For example, if “red” = “1”, “round” = “1” and “plant” = “1”, then “tomato” = “111” to the AI. The closer (or further) another word’s vector is to “111” the closer its meaning to “tomato”. The model uses these vectors to understand the relationships between words incl. synonyms and antonyms.

3. “Self-Attention Mechanism”: The magic happens here. The self-attention mechanism helps the model focus on important words in the sentence. It evaluates the context by weighing the relevance of each word, like how we emphasize certain words to convey meaning. This allows the model to understand complex relationships and generate relevant responses (instead of “word-by-word” translations).

4. Processing with “Hidden Layers”: These layers do the heavy lifting. They process the input data, enabling the model to understand and synthesize the information. Think of these layers as the brain or neurons of the model, where all the thinking happens.

5. Generating the Output: Finally, the processed info is used to predict and generate the next word or sequence of words. This step is like the model’s “mouth” (“output layer”). It produces coherent responses based on what it has “understood” from the input.

By working through these stages, the model generates language that feels natural and relevant. In this context, the “transformer model” stands out. It is also the architecture behind ChatGPT. This model processes sequences of words efficiently and understands broader and longer contexts.

Other architectures like “Recurrent Neural Networks” (RNNs) and “Convolutional Neural Networks” (CNN

For those curious for more, you can even experiment with “neural networks” in a fun, hands-on way via interactive web tools like this TensorFlow sandbox. It’s a great way to see these concepts in action – without needing any tech skills. 😉

“Train the Brain”: An LLM’s Lifecycle

Step 1: The Pre-Training Phase

This stage involves training on massive (terabytes+) datasets to develop a base understanding of language aka the LLM’s “world knowledge.” This stage is resource-intensive, requiring significant costs and time (months…). For example, developing a model like GPT-4 involves millions of dollars and extensive computational resources (thousands of high-end graphics cards).

This output of this stage is called “base model” which contains a lot of “knowledge” but is still practically useless since it hasn’t been fine-tuned for its target use case yet (e.g., to become the engine of a general-purpose chatbot like ChatGPT). Also, mind that the quality of AI models is highly driven by the quality of the data used in training (and, later, running) it. “Garbage in = garbage out”.

Fun fact: Creating smaller models like GPT-2 is relatively affordable today and everyone can do it, like Andrej Karpathy teaches in one of his amazing tutorials (click me).

Step 2: Customizing with Fine-Tuning

After pre-training, the “base model” must be fine-tuned on specific datasets to become useful for any particular task. This iterative process allows customization for specific applications using labeled training data that is optimized for the intended use case (e.g., sample questions and answers for chatbots). This training data is often created by humans or other AI models or jointly.

2-date, relevant and to improve its performance when mistakes occur. This stage ensures the model adapts to latest info and specific (user) needs for better utility.

Step 3: RLHF – Aligning AI

“Reinforcement Learning from Human Feedback” (aka RLHF) ensure that the model’s outputs are aligned with human values and expectations. These steps are critical in refining the model’s behavior and making it more useful and ethical in real-world applications.

Technically, typically humans rate the outputs generated by an LLM: Positive (negative) feedback for desirable (undesirable) outputs. Based on this feedback, the LLM then adjusts its parameters/weights (“neurons”) and becomes more likely to produce desirable outcomes. It’s a bit like “Pavlov’s conditioning” of dogs if you’re more familiar with these psychological experiments…

This step is also a rather work-intensive process, especially for larger models and some bigger companies have been criticized for how this led to new types of “sweat shops” in third world countries: Workers have to evaluate (disturbing) content for little money and under tough conditions…

“Step” 4: Prompt-based In-Context Learning

In-context learning, achieved through prompt engineering, offers a flexible and cheaper alternative to customize AI performance. Instead of changing a model’s parameters through training or fine-tuning, you guide the LLM to desired outputs by carefully crafting input prompts.

This involves giving the model examples of desired outputs and instructions within your input prompt. For many use cases, especially when relevant databases and input data are available, this approach is more efficient compared to (re-)training AI models.

Proprietary vs. Open-Source Models

On a related note: There are two general approaches to creating AI models: proprietary and open source. In this article (and generally), I don’t try any “conclusion” which one is “better”: Both approaches have their merits (and limitations).

Proprietary models, i.e. where developers don’t release the model’s neural network/weights, benefit from commercial incentives, often leading to advanced capabilities. But, they risk becoming “black boxes” (see the chapter about current AI models’ limitations around “explainability”). Notable examples include OpenAI’s GPT or Anthropic’s Claude series.

Open-source models, i.e. where the developers make the model’s neural network publicly available, gain from community-driven efforts. They risk increased misuse since they can easily get into the wrong hands and then even get fine-tuned etc. for malicious purposes. Striking examples of open-source models are Mistral’s “Mixtral” and Meta’s “Llama”.

Key Players in AI Ecosystem

Big research-heavy companies like OpenAI or Meta focus on building the large foundational models. Other companies tend to fine-tune and integrate them into specific applications. This differentiation allows for a diverse ecosystem where foundational research and practical application complement each other (to a certain degree…)

So, both “big tech” like Google and Microsoft and “smaller” startups and scale-ups as well as other institutions, e.g. universities, all play vital roles in making AI tech applicable in business and all other life areas. For deeper insights into the “flourishing” (Gen)AI industry, check out my other article (click me).

Performance and Evaluation of LLMs

Benchmarking: Who’s the King of the Hill?

Tests like the “Massive Multi-Task Language Understanding” (MMLU) are used to evaluate LLMs’ performance across various tasks. While benchmarks provide standardized ways to measure progress, they also have limitations. They often fail in testing the real-world practical utility of AI models. For example, there are “wild” stories of ChatGPT passing bar exams, or AI programs trying to sell IT services on Reddit etc. which go beyond such tests…

So how does the “LLM leaderboard” look like currently? Well, it is not so easy to answer. The picture is changing every day. Generally, proprietary models like OpenAI’s GPT (and “o” series) and Anthropic’s Claude have been leading over the last years. However, open-source models like Llama are already catching up and occasionally outperforming proprietary ones. This competition (hopefully) drives more innovation across the board.

An important aspect here is the “scaling law” of LLMs: There is a clear correlation between LLMs’ performance and the amount of data and parameters they are trained with, i.e., the bigger (and more compute-intensive) the model, the better it performs. So far, this trend doesn’t seem to “slow down”. This will lead to very expensive developments in the future (billions or trillions of dollars for the next generations of LLMs)…

LMSYS: Chatbots Competing in the Arena…

Furthermore, the “LMSYS Chatbot Arena” offers a platform where users can actually test different LLMs/chatbots against each other. The leaderboard with “Elo scores” provides an alternative benchmark, reflecting user evaluations. While valuable, the effectiveness of such evaluations depends on the quality of human user feedback (for better or worse…).

FYI: Sometimes even cutting-edge models get (pre-)released there, to gather some first market feedback. Feel free to try it out yourself here. You can also find the Elo leaderboard of current AI models there. Share your experiences in the comments: Any new “mysterious model” spotted? 😉

Current Capabilities & Limitations of LLMs

“Core” Use Cases around Text Generation

As explained, LLMs can generate coherent text, summarize info and engage in interactive conversations. Practical applications include content drafting, intent detection, ideation, feedback but also writing computer code. As an everyday example, an LLM like ChatGPT’s can provide movie recommendations based on what you tell about your interests. Please check out my other article with 7 more use case ideas to incorporate ChatGPT into your everyday life.

Expanding Horizons with Multimodal Systems

LLMs are now integrating text with other modalities like images and audio inputs/outputs, expanding their functionality. Systems like “Retrieval-Augmented Generation” (RAG) are “Compound AI”. They combine LLMs with other “digital tools” like databases, web access, APIs etc. These smart integrations enable richer interactions and more comprehensive AI solutions. In this article, for example, I show you some more examples how you can use ChatGPT’s multimodal capabilities, e.g. to decipher restaurant menus, identify plants with your cam etc.

Current Limitations

Despite already impressive capabilities, LLMs still have limitations, including lack of “true reasoning” and sentience. They can produce hallucinations (i.e. confidently output wrong info since they generate “likely next tokens” and not “certainly true next tokens”). They also typically have “knowledge cut-offs” because they were trained with data up to a certain date only.

Also, LLMs are still bad at performing maths calculations which is ironic because computers are typically so good at it… These “unintuitive examples” also show how “unpredictable” some of the strengths and weaknesses of LLMs (or (Gen)AI in general) are.

Techniques like “RAG” partially mitigate these issues by “throwing relevant real data into the pot” or hooking the LLM up to a calculator like “Wolfram Alpha”. However, other limitations stay challenging, especially around the explainability of AI-generated outputs, due to its (partial) “black box” nature.

Needless to say (…or is it maybe not needless to say thanks to our tendency to humanize things?): It’s vital to always be aware that when we interact with an LLM it’s not a human, despite their advanced conversational abilities and the impression it may have on us…

Future Developments around LLMs

Towards Advanced Multimodal & Agentic AI

Future advancements will likely even more integrate multiple data modalities (text, image, sound, etc.) for richer outputs and more versatile applications of LLMs. “Agentic AI” systems can perform tasks autonomously. They base their actions on goals which they break down into plans. They use the LLM as the “brain” to coordinate all the other tools like databases or APIs. These “agents” are the next level of today’s “compound AI” systems.

“Humanizing” Robotics

LLMs could also serve as the “brains” for robotic systems, enabling more natural and autonomous interactions and performance. This evolution from digital to physical-digital agents will be a significant leap for AI capabilities, although it also brings many new challenges and concerns…

Integrating AI with robotics will transform most industries (e.g. manufacturing) but also our everyday life (no more housework, right?!). Early prototypes like the one shown in this video demo by startup “Figure” already hint at the future of robotics. OpenAI also invested heavily in this startup recently. Many experts believe we’ll soon have the “ChatGPT-moment” for robots, too – I’m curious… and you?

The Hunt for “AGI“

The longer quest for “Artificial General Intelligence” (AGI) aims to develop models that can understand, learn and apply knowledge across a wide range of tasks. They may do this in a way that is comparable or even superior to humans.

However, current LLMs are still specialized tools designed for specific tasks, like “predicting the next token” and not “general thinkers” (yet). AGI remains distant (for now), but the current advancements in LLMs are surely significant steps forward.

If you wonder what to expect from an “AGI-powered” future, check out my take on this.

Wrap-up: (Artificial) Brainpower at our Fingertips

In a nutshell, ChatGPT, driven by advanced Large Language Models, uses extensive data and smart algorithms to generate relevant text in various contexts. We covered their creation, from the resource-intensive pre-training to fine-tuning for specific tasks. Despite their strengths, LLMs still have limitations like a lack of “true” reasoning and limited explainability. Future advances in LLMs, incl. more improved multimodal systems and integration with robotics, promise exciting changes.

Exploring ChatGPT and LLMs shows their incredible potential for both business leaders and everyday users. These innovations could revolutionize various aspects of life, enhancing both professional workflows and personal interactions with technology. Consider integrating AI language models like ChatGPT into your daily routines and business strategies for efficiency and new opportunities. You could, e.g., as a “first step”, try using ChatGPT for drafting your next email.

Anyway, please share your experiences with ChatGPT/LLMs in the comments below. What is your favorite LLM? This article became a bit longer since there is so much to cover. I’m keen to hear from you if it helped you. Let me know if there is anything to change, add or delete to make it “the intro to LLMs for everyone”.

Cheers,
John

AI Transformation – Field Notes from the Inside