Let’s build GPT: from scratch, in code, spelled out.

We build a Generatively Pretrained Transformer (GPT), following the paper “Attention is All You Need” and OpenAI’s GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable with the autoregressive language modeling framework and basics of tensors and PyTorch nn, which we take for granted in this video.

Source: Let’s build GPT: from scratch, in code, spelled out.

Interesting. I’d like to know more about how this works but would probably stop after learning more and then using a pre-built OSS solution. That said, the real problem is training it and then having the computing resources to run it. It would be nice to have complete trust over how the input is used, though.