Quantcast
Channel: Machine Learning | Towards AI
Viewing all articles
Browse latest Browse all 819

Build the Smallest LLM From Scratch With Pytorch (And Generate Pokémon Names!)

$
0
0
Author(s): Tapan Babbar Originally published on Towards AI. Source: Image by Author So, there I was, toying with a bunch of Pokémon-inspired variations of my cat’s name — trying to give it that unique, slightly mystical vibe. After cycling through names like “Flarefluff” and “Nimblepawchu,” it hit me: why not go full-on AI and let a character-level language model handle this? It seemed like the perfect mini-project, and what better way to dive into character-level models than creating a custom Pokémon name generator? Beneath the vast complexity of large language models (LLMs) and generative AI lies a surprisingly simple core idea: predicting the next character. That’s really it! Every incredible model — from conversational bots to creative writers — boils down to how well they anticipate what comes next. The “magic” of LLMs? It’s in how they refine and scale this predictive ability. So, let’s strip away the hype and get to the essence. We’re not building a massive model with millions of parameters in this guide. Instead, we’re creating a character-level language model that can generate Pokémon-style names. Here’s the twist: our dataset is tiny, with only 801 Pokémon names! By the end, you’ll understand the basics of language modeling and have your own mini Pokémon name generator in hand. Here’s how each step is structured to help you follow along: Goal: A quick overview of what we’re aiming to achieve. Intuition: The underlying idea — no coding required here. Code: Step-by-step PyTorch implementation. Code Explanation: Breaking down the code so it’s clear what’s happening. If you’re just here for the concepts, skip the code — you’ll still get the big picture. No coding experience is necessary to understand the ideas. But if you’re up for it, diving into the code will help solidify your understanding, so I encourage you to give it a go! The Intuition: From Characters to Names Imagine guessing a word letter by letter, where each letter gives you a clue about what’s likely next. You see “Pi,” and your mind jumps to “Pikachu” because “ka” often follows “Pi” in the Pokémon world. This is the intuition we’ll teach our model, feeding it Pokémon names one character at a time. Over time, the model catches on to this naming style’s quirks, helping it generate fresh names that “sound” Pokémon-like. Ready? Let’s build this from scratch in PyTorch! Step 1: Teaching the Model Its First “Alphabet” Goal: Define the “alphabet” of characters the model can use and assign each character a unique number. Intuition: Right now, our model doesn’t know anything about language, names, or even letters. To it, words are just a sequence of unknown symbols. And here’s the thing: neural networks understand only numbers — it’s non-negotiable! So, to make sense of our dataset, we need to assign a unique number to each character. In this step, we’re building the model’s “alphabet” by identifying every unique character in the Pokémon names dataset. This will include all the letters, plus a special marker to signify the end of a name. Each character will be paired with a unique identifier, a number that lets the model understand each symbol in its own way. This gives our model the basic “building blocks” for creating Pokémon names and helps it begin learning which characters tend to follow one another. With these numeric IDs in place, we’re setting the foundation for our model to start grasping the sequences of characters in Pokémon names, all from the ground up! import pandas as pdimport torchimport stringimport numpy as npimport reimport torch.nn.functional as Fimport matplotlib.pyplot as pltdata = pd.read_csv('pokemon.csv')["name"]words = data.to_list()print(words[:8])#['bulbasaur', 'ivysaur', 'venusaur', 'charmander', 'charmeleon', 'charizard', 'squirtle', 'wartortle']# Build the vocabularychars = sorted(list(set(' '.join(words))))stoi = {s:i+1 for i,s in enumerate(chars)}stoi['.'] = 0 # Dot represents the end of a worditos = {i:s for s,i in stoi.items()}print(stoi)#{' ': 1, 'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8, 'h': 9, 'i': 10, 'j': 11, 'k': 12, 'l': 13, 'm': 14, 'n': 15, 'o': 16, 'p': 17, 'q': 18, 'r': 19, 's': 20, 't': 21, 'u': 22, 'v': 23, 'w': 24, 'x': 25, 'y': 26, 'z': 27, '.': 0}print(itos)#{1: ' ', 2: 'a', 3: 'b', 4: 'c', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'h', 10: 'i', 11: 'j', 12: 'k', 13: 'l', 14: 'm', 15: 'n', 16: 'o', 17: 'p', 18: 'q', 19: 'r', 20: 's', 21: 't', 22: 'u', 23: 'v', 24: 'w', 25: 'x', 26: 'y', 27: 'z', 0: '.'} Code Explanation: We create stoi, which maps each character to a unique integer. The itos dictionary reverses this mapping, allowing us to convert numbers back into characters. We include a special end-of-word character (.) to indicate the end of each Pokémon name. Step 2: Building Context with N-grams Goal: Enable the model to guess the next character based on the context of preceding characters. Intuition: Here, we’re teaching the model by building a game: guess the next letter! The model will try to predict what comes next for each character in a name. For example, when it sees “Pi,” it might guess “k” next, as in “Pikachu.” We’ll turn each name into sequences where each character points to its next one. Over time, the model will start spotting familiar patterns that define the style of Pokémon names. We’ll also add a special end-of-name character after each name to let the model know when it’s time to wrap up. Character N-grams. Source: Image by Author This example shows how we use a fixed context length of 3 to predict each next character in a sequence. As the model reads each character in a word, it remembers only the last three characters as context to make its next prediction. This sliding window approach helps capture short-term dependencies but feel free to experiment with shorter or longer context lengths to see how it affects the predictions. block_size = 3 # Context lengthdef build_dataset(words): X, Y = [], [] for w in words: context = [0] * […]

Viewing all articles
Browse latest Browse all 819

Trending Articles