Stackology #6 - ChatGPT, LLMs and Beyond
ChatGPT became a buzzword and entered mainstream conversations in late 2022, following the release of the model ChatGPT-3.5 and its subsequent public access.
Its ability to interact conversationally, provide detailed answers across various subjects, and even emulate creative writing caught the attention of the tech world and the general public. This surge in popularity was further fueled by widespread media coverage and social media discussions, highlighting its capabilities and potential implications for various industries and everyday use. The buzz around ChatGPT was not just among tech enthusiasts but extended to a wide range of audiences, showcasing the growing interest and curiosity in AI and its applications in daily life.
But what exactly powers ChatGPT, and what future developments and impacts can we anticipate from this evolving technology? We'll delve into these questions in this week's issue, unraveling the complexities and envisioning the road ahead for AI.
Introduction
Artificial Intelligence (AI) is a term encompassing a variety of technologies, some of which have been part of our lives for years. These include spam filters in our email programs and recommendation systems that suggest the next movie to watch or song to listen to on streaming platforms tailored to our preferences.
Recently, generative AI, a specific application of AI, has seen a marked increase in popularity, primarily due to the widespread attention garnered by OpenAI's ChatGPT. However, this field rapidly evolves with other major players entering the arena. Google, with its Bard, Microsoft integrating AI in its Bing search engine, and the upcoming Grok Chatbot by Elon Musk are also contributing to the growing interest and advancement in generative AI. Let's explore how they all work.
ChatGPT Unraveled: the Engine of Large Language Models
What are Large Language Models (LLMs)?
Large Language Models (LLMs) are sophisticated AI systems developed in the realm of deep learning, specifically using transformer architectures. They fall under the category of supervised learning, where they are trained on vast datasets of text labeled by humans. This human input helps the models learn the nuances of human language, enabling them to understand and generate text in a way that closely resembles natural human communication. LLMs can perform various tasks, such as writing, conversing, summarizing, and answering questions. Their extensive training on diverse language data allows them to recognize and replicate complex patterns in language, making them highly advanced tools for natural language processing.
LLMs Training vs Inference
Large Language Models (LLMs) like ChatGPT undergo two key phases: Training and Inference. During training, the model is exposed to vast datasets (think billions of words) and learns language patterns. This phase requires immense computational power and can take weeks or even months. Inference, on the other hand, is when the trained model is used to generate text based on input prompts. This phase is relatively faster, happening almost in real-time when users interact with the model.
How do LLMs work?
The training of LLMs like ChatGPT is twofold. In the Pre-Training stage, the model becomes a sophisticated 'next word predictor', creating a Base Model. This forces the model (Neural Network) to compress a vast understanding of the world into its parameters (weights). However, this Base Model mainly predicts text, which isn't always useful for specific tasks. That's where Fine Tuning comes in, transforming the Base Model into an Assistant Model tailored for specific tasks like answering questions or writing in certain styles.
Stage 3: Fine Tuning with Comparison Labels, RLHF, and Beyond
After fine-tuning, there's a crucial third stage involving Comparison Labels and Reinforcement Learning from Human Feedback (RLHF). This process refines the model's accuracy using human-selected responses. Looking ahead, new directions like Direct Preference Optimization are being explored to enhance LLMs' effectiveness further.
Hallucinations
LLMs can sometimes 'hallucinate', producing plausible but incorrect information. This is a challenge when the model extends beyond its training data or misinterprets prompts. Tackling these hallucinations is crucial for improving AI reliability.
Prompt Engineering
Prompt Engineering is emerging as a crucial skill for guiding LLMs. The right prompt (the question you ask the model) can significantly affect the quality of the AI's output. However, with the rise of 'prompt gurus' offering 'magic prompts' on Twitter threads or Udemy courses, it's essential to approach this field with a critical eye. Prompt engineering is nuanced and evolving, not a one-size-fits-all solution.
Advanced Use of LLMs: Beyond Standard Answers
Tools and Actions
Large Language Models are evolving beyond offering just text-based answers. With the right tools, they can now execute external actions, like booking flights or setting reminders, transforming a simple chat interface into a powerful automation tool. This leap is possible through integrated capabilities within the models, like ChatGPT's plugins, which allow for more dynamic interactions with the digital world or external tools such as Zapier.
LLM Customization and GPT Store with RAG
Customization of LLMs is gaining momentum, particularly with platforms like the GPT Store, recently launched by OpenAI, where users can access or create tailor-made models. A notable feature is the Retrieval-Augmented Generation (RAG), which allows models to fetch and integrate external information (such as PDF files), significantly enriching responses. This customization enables personal and business users to adapt AI tools to their specific needs, from handling specialized queries to integrating niche knowledge areas.
I have created one GPT to test this possibility, focused on the security of web apps. It’s called Secure Stack Dev, and you can try it for free if you’re interested.
Rabbit R1 and the Hardware Dimension
At the forefront of innovation, companies like Rabbit, who are presenting their Rabbit R1 at CES in Las Vegas, propose embedding AI and LLMs in hardware devices.
Rabbit’s vision goes beyond simply embedding AI like ChatGPT in devices; they aim to integrate a variety of tools to assist users in performing actions such as booking flights. The concept involves the seamless use of different services, managing tasks on behalf of the user, including complex steps like authentication. This approach is designed to offer a transparent and hassle-free experience, handling tasks and decision-making without burdening the user.
While the idea of a dedicated AI device is intriguing, questions linger about its practicality, especially in handling sensitive tasks involving passwords and two-factor authentication. The company's vision might not be to replace smartphones but to integrate their technology into them. This could be a stepping stone towards AI-powered devices becoming ubiquitous in our daily lives.
Voice Assistants
As LLMs continue to evolve, their integration into voice assistants like Alexa, Siri, and Google Assistant seems inevitable. This will transform these assistants into entities resembling the AI in the movie 'Her', offering a more intuitive, conversational, and helpful experience. Such advancements signal a future where AI assists us not just through screens and text but as a seamless part of our verbal and interactive lives.
Challenges and Implications of Large Language Models
Copyright Concerns
As LLMs like ChatGPT learn from vast datasets, copyright issues arise, notably in cases like the New York Times. The concern is about using copyrighted material in training these models without explicit permission, sparking debates over intellectual property rights in the AI era.
Impact on Employment
The advent of AI poses significant implications for job roles. A Goldman Sachs forecast suggests 300 million jobs could be at risk due to automation, with AI efficiencies potentially leading to layoffs. The emergence of AI in various sectors challenges us to consider retraining, upskilling, and possibly universal basic income (UBI) as responses to this shift.
Security Challenges
With the growth of LLMs, new security concerns emerge. These include vulnerabilities like 'jailbreaks' where AI models are manipulated to behave unexpectedly, 'prompt injection' attacks that trick the AI into undesirable outputs, and other novel security risks. These challenges parallel those faced in traditional software and require rethinking security measures in AI applications.
The Future of Large Language Models
LLM Scaling Laws – The Computational Gold Rush

LLMs are demonstrating that their performance can be predictably determined by the model size ‘N’ (number of Parameters) and the dataset size 'D' (number of Tokens).
This insight has triggered a computational gold rush, prompting tech giants and startups alike to invest in larger GPU clusters. The underlying principle is straightforward: the larger the dataset, the more complex the model needs to be to effectively process and learn from this data. Consequently, more powerful GPU clusters are required to handle these complexities. As the model grows in size and complexity, its ability to process information and generate accurate outputs improves, following a predictable performance function. This trend highlights a direct correlation between the scale of data, model complexity, and the resulting performance, paving the way for more advanced and capable AI models.
The Advent of Multimodality
The future of LLMs isn't limited to text. We're entering an era of multimodal AI that can understand and generate images, video, and audio. This evolution will enable AI to interact and assist more dynamically and versatilely, bridging the gap between digital and real-world experiences.
Incorporating System 2 Thinking
Drawing inspiration from the book 'Thinking, Fast and Slow' by Daniel Kahneman, there's a push to give LLMs a 'System 2' – a more deliberate and analytical way of thinking. This would shift from rapid, intuitive responses to more thoughtful, reasoned outputs, significantly enhancing the depth and utility of AI interactions.
LLM OS – A New Computing Paradigm
Envision a future where LLMs form the core of an Operating System (OS) – a new paradigm in computing. This LLM OS could revolutionize how we interact with technology, making interfaces more conversational, personalized, and intuitive. It's a step towards an AI-centric computing world, redefining our relationship with technology.
More Than Just Generative AI
Medical Marvels: AI in Healthcare
AI's influence in healthcare extends beyond generative applications, bringing revolutionary changes. Notably, AI is reshaping our understanding of biological mechanisms, such as protein folding, leading to significant progress in drug development and disease treatment.
A recent breakthrough in this domain is biologists' use of neural networks to discover a new class of antibiotics against methicillin-resistant Staphylococcus aureus (MRSA). By training models on a dataset of chemical compounds, researchers have been able to screen for potential antibiotics, demonstrating AI's emerging role in addressing global health challenges like antibiotic resistance. This underscores AI's pivotal role in advancing healthcare, with its ability to catalyze a new era of medical discovery, as highlighted by the increasing interest of pharmaceutical companies in machine learning expertise.
AI's Role in Industrial Automation: A Personal Insight
In my area of expertise, industrial automation, AI is steadily becoming a transformative tool. While still evolving, models focused on predictive maintenance are no longer just theoretical but are being actively explored and implemented. These models aim to proactively identify equipment issues, optimize efficiency, and minimize downtime. My work in this sector has shown me the tangible impact and growing importance of AI in industrial contexts, illustrating a future where its application is essential for innovation and operational excellence.
Final Thoughts
While AI advancements challenge traditional job roles, they simultaneously create new opportunities. The increasing demand for machine learning expertise, especially in fields like biomedicine, is a clear testament to this. AI isn’t merely displacing jobs; it’s creating paths for new careers, much like past industrial revolutions.
However, these changes at an unprecedented and exponential pace leave me contemplating my 1-year-old child’s future.
What skills will be essential in such a rapidly evolving society? Amidst this uncertainty, I believe that nurturing soft skills like critical thinking, creativity, and curiosity will be invaluable. Also, fostering an Antifragile mindset and the drive to pursue one’s goals and passions tirelessly is crucial. This aligns with the philosophy in Arnold Schwarzenegger’s recent book 'Be Useful', which advocates for embracing challenges with vigor, or as he puts it: “Work Your Ass Off.”
I hope these skills will enable my child to successfully navigate and flourish in the dynamic landscape shaped by AI and technological advancements.











