How I'd Recommend You Self-Teach Artificial Intelligence (Or Any Other Subject)
A casual commentary
I frequently receive requests for a guide on how to learn AI, what topics to focus on, what resources to use, etc. Rather than repeatedly sifting through my text or discord messages from the last time I had this conversation, I’ve decided to share my thoughts and resources in this article
Of course these are only my own opinions and I am by no means an expert in the field; I’m a self-taught learner just like you who only happens to maybe be a bit ahead. This article contains plenty of general insights and advice for anyone looking to do self-teaching on any subject matter, but some parts are built specifically around what I’m interested in rather than whatever role you might be leaning towards
Table of Contents
Principles to keep in mind when self-teaching
Using LLMs
Assume you are not aware of that which you do not know
How to take many steps forward and no steps back
Prerequisite knowledge
Math
Code
Learning Resources
General AI sources
AI sources specific to my interests
My own self-teaching / research pipeline
My studying strategy (not recommended)
How I spend my time
1. Principles to keep in mind when self-teaching
1a. Using LLMs
Be wary of using ChatGPT and other Large Language Models (LLMs) for learning. While extremely knowledgeable in theory, they have a tendency to spew incorrect information and generally kiss your ass. If you provide an idea you’ve had and ask for feedback, you’ll receive praise even if it’s nonsensical and/or unoriginal. When you ask for information on a topic, you’ll often get high level metaphors and vibes rather than concrete definitions
If you plan on using LLMs to assist your learning, which I suggest you do, here are some things to keep in mind:
Demand brutal critiques of your ideas
Have the model point you in the direction of textbooks, research papers, etc. and then use them to fact check everything it says. There is still a use for Google even in 2024
Ask the model to quiz you, but take the grades it gives with a heavy pinch of salt
Ask for any math written out step-by-step in LaTeX and ensure you walk through every single tiny step yourself. I do not recommend using LLMs for math proofs that go beyond what you’d find in the first chapter of an introductory proofs textbook. For some types of math you can have the model write code designed to test and confirm whether what you’ve been talking about is correct
DO NOT blindly use code written by LLMs. Even if it seems to work as far as you can tell, there’s a significant chance of a problem somewhere that’ll bite you in the butt later. Beyond that, the fact that you don’t understand how the code works is a red flag for your learning process. Generally speaking, when I ask for code I also provide a full conceptual explanation of how I want said code to work in reference to a process that I understand; the LLM is only there to speed up a process that would otherwise take me far longer if I had to sift through documentation and stack overflow myself. In May of 2024 we’re definitely not at the point of capabilities nor reliability for you to be blindly trusting any of these models to execute arbitrary code
Many learning resources such as this one are written for a specific audience, and you may not be that ideal audience. I love making LLMs aware of my specific circumstances so that they can cater recommendations. For example, you might copy & paste portions of this very substack article and say something like “This guy is too into language modeling and has a stronger math background than me. Please provide me with equivalent information about self-driving with reinforcement learning and for someone who never took calculus”
For more general strategies when interacting with LLMs, I recommend LearnPrompting, the original guide to prompt engineering that every other guide you’ll find on the internet tends to copy. FYI, the maker of those docs also has some paid courses with more up-to-date information that I helped create but no longer profit off of in any way
1b. Assume you are not aware of that which you do not know
Many of my YouTube subscribers describe themselves as self-learners. Some of those will go as far as to claim they could hold their own with legitimate researchers who have traditional university-provided credentials. I’ll tell you right now, 99% of them don’t know 99% of what they’re talking about 99% of the time. For myself, change those numbers to maybe 98% if you’re feeling charitable. For researchers with university-provided credentials, let’s be cautious scientists and call it 80% (unless they’re economists, in which case make it 99.9%)
The number of times I’ve had people in my YouTube comments and Discord server make absurd claims is astounding. Most of the time I give them the benefit of the doubt and assume that they’re the ones who know what they’re talking about and I’m just not as deep into the given topic they’re discussing, but frequently there’s no way of cutting around it. “Why don’t you do a guide on model X, it’s easier than the guide you just posted.” Umm no, it’s literally twice the number of lines of code and stems from an area of math that a university course load would expose you to later in the major; you’re just convinced that it’s easy because you’ve been watching Wes Roth videos about it instead of reading the appendices to the paper and sifting through the source code
I think this issue stems from a larger problem in modern science education. Rather than learning from first principles, in school we were given shitty metaphors at best and completely incorrect descriptions at worst. For example, protons, neutrons, and electrons are actually weird soup-y quantum field stuff that I definitely do not understand, but in second grade I was told that they’re hard little balls floating around in a vast open space and now I can’t get that image out of my head. Past that, most people’s exposure to science after high school comes in the form of five minute Neil deGrasse Tyson clips on YouTube in which he takes an approach along the lines of “The truth is too complex, it’s more important that we get them interested and hope that 0.01% of the viewers are inspired to go get STEM degrees.” Unfortunately, these pop-scientists rarely bother to clarify that all they’re telling you is a loose metaphor, and that the truth is way crazier and more difficult to understand. These issues have created a population of pop-sci nerds that are overconfident in their knowledge. I’m not trying to gatekeep here, rather I’m saying that the gate is heavy and you should be prepared to put the effort into pushing it open. I refuse to lie by letting you think that you’re already in the garden
I cannot understate this point enough. If you have not completed the most difficult practice problem for that chapter in the textbook, then 100% guaranteed no arguing with me, you DO NOT understand that chapter. Likewise, if you have not completely replicated the core part of that paper on your own, from scratch, in code, then you do not understand how the architecture that it proposed works. Replication is the name of the game when it comes to self-teaching. You should at the very least be thoroughly reading through the source code of the papers you’re most interested in and starring the GitHub repo to revisit in the case that you ever need to do a deeper dive
1c. How to take many steps forward and no steps back
When self-teaching it’s easy to wander off course, down a dead end, or get completely lost. Here are a couple pieces of advice I’d like to provide in terms of how to steer the ship:
Set (a) specific goal(s): It’s okay to have a very broad idea of just wanting to learn or gain a new skill in the long term, but in the short term you need something to anchor to if you want to make the slightest bit of progress on anything. Mine often look like “confirm whether idea x is feasible,” “make a prototype of idea y,” or “become knowledgeable enough about topic z that I’d feel confident in my ability to produce insightful questions that an expert in the area would find intriguing.” This is not available as an option to most people, but for that latter example I’ve been working on actually having said conversations over on my YouTube channel
I’d also like to point out that oftentimes goals can succumb to the “New Years Resolution Effect.” Because of this, I try to keep my goals on the scale of what can be finished in a day or maybe a week at most. For the long term, instead of using “goals” I think in terms of trend lines. It’s okay if I didn’t hit that many work hours today as long as the trend line for hours worked per day has been going up or staying level over the past week, month, etc. Switching to a trend line mindset removes the demoralizing feeling you get from a temporary failure and allows you to instead focus on bringing the average up slowly over time, a task that’s far more mentally sustainable in my experience.
Limit your scope: My interests vary far too widely to ever fit into a single lifetime. In college I started off as a psychology major, then added economics, a math minor, eventually I tried to triple in all three, dropped psyc, began sitting in on philosophy & physics classes, and somewhere in there I was asking my computer science friends so many questions that one of them complained about how much of his time I was taking up. Since then I’ve limited most of my areas of interest to “you can to spend a few minutes jotting down thoughts/ideas you have, but absolutely no work or studying in the area is allowed” so that I can focus my effort towards higher priority projects.
Many of the resources I’ll be providing in this article are relatively broad, and you definitely should not go through every single one of them word-for-word. Instead, sift through them and select topics to learn based on your specific goals. Do you think language models are cool but don’t care about image generation? Then read everything you can find on transformers and nothing at all on diffusion models
Find a feedback mechanism: I have to credit this gem to Paul Christiano, a prominent figure in the field who I was lucky to talk with last summer (it was quite the waste of his time). He pointed out to me that researchers in academia have the luxury of learning from experienced peers who’ve already curated the key material for them, and then are available to critique their ideas and methodology. Entrepreneurs are blessed with the free market, a mechanism all too happy to take a metaphorical baseball bat to their financial knee caps. People self-teaching in their spare time have no such feedback mechanisms; we are free to be wrong and continue being wrong in our seclusion
You need to find some sort of mechanism that’ll let you know when you’re wrong as soon as possible so that you can learn from the mistake and move on. One way to do this is to find a mentor (HMU if you’d like to be mine). If your idea or subject involves some kind of work product with a built-in way to verify its functionality, such as code, then that’s great. If you’re concocting an idea for a business then you need to go out and talk to your target demographic to see if they’d even be willing to pay for your idea
The somewhat unique feedback mechanism I’ve stumbled upon is posting my learning journey and research ideas to YouTube. Although the information was sparse at first, I’ve begun to develop a network of people who send me in the direction of papers that I should probably be aware of or point out problems with my ideas. I strongly believe more people should be doing this with any projects they’re working on, but if you don’t like being in front of a camera then I’d suggest finding some discord servers or subreddits full of experts kind enough to provide you with feedback
Develop a system: Finding a way to plan out and record progress has been a game changer for me. I keep track of the number of papers I read per week, hours per day I spend working, tasks completed in a day, videos posted per week, YouTube subscribers gained per month, etc. I can’t say too much on this subject since your system needs to be catered to you, but if you’re curious about what I’ve got going then stick around to read section 4b
2. Prerequisite knowledge
2a. Math
Artificial Intelligence is a broad term encompassing a large variety of techniques that require varying levels of math knowledge—sometimes little to none and other times PhD level stuff. Luckily, the core sub-field that’s blowing up right now, called Deep Learning, is actually pretty middle-of-the road. If you happen to have minored in mathematics or attained an equivalent level of knowledge through another educational background such as physics, engineering, statistics, etc. then you may be able to skip this section
If you were to learn entirely through university then I’d tell you to take the following three classes (and all their prereqs of course):
Multivariable Calculus
Linear Algebra
Whatever statistics course has you do a full derivation of Ordinary Least Squares Regression, which is usually a 200 or 300 level in the math major
However, actually taking these classes might be a bit overkill as you’ll also be paying to learn some knowledge that won’t actually get used in deep learning. That’s why I instead recommend you check out Part 1 of this beautiful textbook written by legends in the field. If it feels like you don’t have the prerequisite knowledge to read that book then check out this video by The Math Sorcerer that you can use to build yourself a learning plan based on what you’re missing. Alternatively to buying those specific textbooks, you can take the equivalent khan academy courses. For a list of resources that’s more detailed than the beautiful textbook but also more guided than the math sorcerer video, check out this list of textbooks and other media called Mathematics for Machine Learning
P.S.
I definitely do not recommend you look up ways to pirate free textbook pdfs since piracy is illegal. Just to make sure you don’t stumble upon them accidentally, steer clear of anything on the internet called “libgen” or “Anna’s Archive.” I do not recommend going to those sites under any circumstances; you should instead pay an arm and a leg for old, used, falling apart, physical textbooks
2b. Code
I tried to learn how to code multiple times before it clicked. First there was Java in middle school, then python, then Java again, then R (doesn’t count, I know). Sometimes it was way too difficult and complicated, and other times it was super easy but I just didn’t enjoy it. It wasn’t until my junior year of college where I had an actual product I wanted to build that everything finally came together. Once I found something I actually wanted to make, suddenly I had all the motivation and mental clarity in the world to learn whatever it took to get my idea up and running, no matter how inefficiently (I completely melted an M1 MacBook mini over the course of a summer as it ran an ungodly amount of Python for loops and poorly designed numpy operations. If you’re curious, check out one of the first videos I made on YouTube)
Anyways, for deep learning the one and only answer is Python. Some very silly academics will try to get you to use R instead but you can safely ignore or make fun of them. Also, if you get into engineering and making models hella efficient then eventually you’d want to learn C, but that’s well outside the scope of this article. For an intro to Python course idk what to recommend, maybe Codecademy?
The question is really which deep learning library to go with. To be fair I am very much biased and far from knowledgeable about all the frameworks. I just went with PyTorch because I follow the herd, and now I’m about to talk off the cuff about each framework without doing the slightest bit of research or fact checking. If you’d like some better advice that’s also more catered to you, I recommend having a conversation with ChatGPT about which one is right for you. In the conversation, include things like what hardware you have access to, what types of models you think you’ll be training, etc. Here are some of my low quality thoughts on the subject:
PyTorch: Most Popular / Broadest Compatibility
The default, most popular, and probably your answer. Unless one of the below libraries stands out to you, then pick this. There are more tutorials on the internet available for PyTorch than any other library, and for any given research paper that comes out they probably open-source their code in PyTorch. Hell, even if they didn’t do the experiment in PyTorch, there’s a good chance they’ll go out of their way to write up a PyTorch version because it’s that popular. And even if the researchers don’t, someone else will. There are benefits to being part of the herd
Tensorflow: Google Product Focus
Designed primarily for Tensor Processing Units (TPUs) as opposed to Graphics Processing Units (GPUs). Google invented and makes TPUs, so only use this if you work at Google or plan on doing all your model training/inference in Google Colab. Thanks to being so old it is well supported, there are plenty of tutorials online, and to my memory it does still support GPUs so honestly if you’re just a Google fanboy this is probably fine
Jax: Speed??
I’ve heard it’s pretty fast/efficient, and I know Elon Musk’s XAI used it for their Grok model for some reason. Not sure why Google feels the need to have two different machine learning frameworks. Very Google of them, but that’s about all I can say
MLX: Apple Product Focus
The fastest framework for running on Macs with Apple Silicon. Use this if you’re an Apple fanboy who will only ever need to train tiny models or if you plan to integrate models into software you’re developing for Apple’s ecosystem. I’ve not used it but I’ve heard good things, and they seem to be updating it constantly and building some very interesting small models. The syntax is based on the Python package Numpy, so if you’re used to working with that then maybe there’s another reason
TinyGrad: Long-Term Potential
Created by our lord and savior and still in beta yet apparently it’s already faster than PyTorch on CPUs, AMD, and Apple Silicon. I can’t really say I understand what’s behind TinyGrad, but supposedly it works differently from all the other frameworks at some low level in a way that ensures it’ll end up being far faster than any of the other options in the long run. And, they designed it to feel like PyTorch so that it’d be easy to switch over to. I do personally plan on making the jump at some point since it seems the most future-proof, but today is not that day
Assuming you’ve chosen PyTorch, here’s the guide I used forever ago. There are probably newer/better ones out there ngl. What I noticed is that it’s definitely meant for programmers who don’t care about math and just want to learn some simple models. I don’t think that’s a great approach since if you don’t understand what you’re building then you’re likely to run into trouble building it, but whatever. Later in this article I’ll be linking to some guides I like a whole lot more, but are also more specific to my interests
3. Learning Resources
3a. General AI sources
The best hub of learning resources I’ve stumbled upon is Open DeepLearning. They aggregate a bunch of different links based on topic including publicly released college courses, free online textbooks, key papers, etc. Another would be the aforementioned list Mathematics for Machine Learning. For a one-stop shop on the most important parts of the field, I’d recommend the aforementioned MIT textbook Deep Learning
Once you’ve roughly caught up and want to start keeping on top of the latestAI research, you need to be aware of ArXiv, the place where almost all AI preprints get published. To get a far more pleasurable experience than actually perusing the raw uploads, I recommend Bytez which allows you to listen to papers, links to references and relevant YouTube videos, has a built in LLM to chat with about each paper, and a whole lot more. Others that might be worth looking at include arxiv-sanity, papers with code, Deep Learning Monitor and LLM Articles
As I mentioned earlier, the best way to ensure you’ve learned what you’ve read in a research paper is to replicate it. To get you started, check out Papers in 100 Lines of Code. Although they rarely actually stick to that 100 line characterization, this is a great list of key papers and minimal implementations of them in PyTorch. Another would be Deep Learning Models which has both PyTorch and TensorFlow versions of the guides. These lists are great for the stage of learning where you think you’re kind of understanding the papers you read but can’t consistently replicate them from scratch, whether that be because you’re not quite there with the coding abilities or because authors rarely write in a manner designed to make replication easy for those of us who haven’t already read and replicated all the prerequisite citations
Then as a bonus I’ll list off some of my favorite YouTube channels in the space, in a very rough rough descending order of how accessible they are for a broad audience:
And I can’t quite vouch for most of the following since I haven’t seen enough videos, but if you want your YouTube algorithm to start sending you more niche stuff then check out these smaller creators I see in my subscription list:
3b. Specific to my interests
This is the fun part where I get to point you towards my real favorites and plug my own stuff. I mentioned earlier how broad my interests can be, but here we’ll focus on what I’ve restricted myself to studying over the past few months. That can be described roughly as whatever looks to me like the most promising route towards the interaction/intersection between Artificial General Intelligence (AGI) and progressions of consciousness, changing economic systems, accelerating information propagation and/or decentralization of power. In order to get to all those broad goals though, I’m having to first develop a good model of (and hopefully do some shaping of) the form and function that AGI will take. More specifically for the purposes of this guide we’ll be discussing transformers and large language models
By far the best guide to learn about neural networks is Andrej Karpathy’s Neural Networks: Zero to Hero course which starts off building foundational concepts from scratch like what gradients are and how they are tracked. One of the many great things about this playlist is that it completely skips over all the useless models that other courses will try to shove down your throat and instead gets right to building what sometimes feels like the model to end all models, the decoder-only transformer. Lately he’s been working on building a GPT from scratch in C and I seem to remember him mentioning creating more educational courses, so be on the lookout for those. As an alternative, you could try Neel Nanda’s What is a Transformer? playlist which assumes you’re already confident with PyTorch, prompts you to do certain parts yourself as exercises, and includes some key insights from his interpretability work that’ll really help you understand what we think might be happening under the hood
For a spiritual continuation of Andrej’s work, check out my recent guide that walks you through Llama 3’s architecture step-by-step. If you’re also interested in experimenting with novel GPT architecture ideas you’ve got, consider using this template on GitHub which allows you to easily debug, quickly iterate on, and comprehensively evaluate tiny GPT models with practically no compute
4. My own self-teaching / research pipeline
4a. My studying strategy (not recommended)
I’ll be straight up with you, in my own journey I’ve not actually followed the above advice or used most of the resources that I just gave you. For example, even though I spoke with Paul Christiano in June I did not actually internalize his point until December; prior to then I was still setting my goals too high with projects that had no clear feedback mechanisms. Similarly, I’ve done very little textbook reading or online course taking
Warning: The following strategy that I use may not work as well for you. Even though I may not always show it, I began this journey with a math degree under my belt and fresh off a short stint working for a law-adjacent firm of economists that taught me what it means to demand the highest possible standard when doing research. Unlike these vague hand-wavy academic citations, we had to cite sources down to the specific page number and write in such a way that absolutely no stone was left unturned. Unlike these YouTube lessons that hold your hand and let you copy & paste their code, every time I proof-read a coworker’s report it was expected that I could and did replicate all of their findings from scratch on my own. Use this strategy at your own intellectual peril
I’ll jump right into a paper that I have absolutely no business reading and I know it. First I’ll do a high level skim where I’m looking for the key takeaways and not expecting to have any clue about how any of it works
If I deem the paper sufficiently interesting, I’ll begin the deep dive. What this means is that I’ll start reading from the beginning, and the moment I’m the slightest bit unsure of something I’ll mark my place, put the paper down, and start researching. Usually ChatGPT is good for pointing me in the right direction for where I need to be looking
If I’m looking for the meaning of some technical word and find that its definition has even more terms that I don’t understand, then I go deeper. I do this until I’ve hit “bedrock,” meaning math or computer science principles that I can write proofs or code of myself
After hitting bedrock, I climb my way back up until I eventually get back to the original paper. Sometimes this takes ten minutes and other times it can legitimately take weeks; this strategy is not for the faint of heart. It allows me to learn the minimal necessary information for any given paper I want, but I acknowledge that if it weren’t for that math degree then this process might take more like months or even years rather than weeks for a single paper. And even with the math degree there are many articles that I give up on
Beyond crucial for this process is my notes app of choice, Obsidian. If you’re not already aware, the key feature of obsidian is the ability to link to other notes just like hopping between links on wikipedia. My Zoomer brain is incapable of memorizing anything, so the ability to not only revisit a note but to click down the hyperlink rabbit hole until I reach something that is in my memory is extremely helpful for facilitating this recursive “digging” process
4b. How I spend my time
Earlier in the autodidact process I had what felt like a never-ending pile of textbooks, papers, GitHub repos, etc to sift through, but eventually that stack did dwindle away. Now in theory I’m able to keep a relatively consistent schedule where I churn through new information just as quickly as it gets added to the pile
My week revolves around ArXiv uploads (if you’re still waiting for peer review before you read papers then someone needs to drag you out of the 18th century) and I’ve built a set of python scripts to help me sift through them. Basically every weekday I read through all the titles of the papers that come out and download whichever ones catch my eye. Then at the end of the week I take all the downloaded pdfs, read through their abstracts, and do another round of filtering to see which are actually worth reading. You can watch me do this part of the process over on my YouTube channel. This usually leaves me with anywhere between 5 and 20 papers which I’m supposed to have finished reading through by the following Friday. In reality I don’t end up reading all of them; for many I get part way through, change my mind, and then delete
I’ve also elected to publish two paper breakdown videos per week on Monday and Wednesday, but usually instead of doing them in the moment I’ll pre-record a dozen or so in one day so that I can not worry about it for awhile. Other than that, the remainder of my time goes towards working on projects which are occasionally worthy of their own unscheduled videos
Looking at my total work hours for 2024 so far, they break down to 25% creating content such as YouTube videos or this substack, 40% developing projects, 30% studying and 5% working for hire, which mainly has to do with the last few weeks of my time writing courses for LearnPrompting back in the beginning of the year. This averages to 39.6 hours per week (this is my full time gig, AKA I’m unemployed) but keep in mind the way I track time is very strict. Getting up to use the bathroom or grab a glass of water? Those two minutes don’t count. Checking to see how a YouTube video is doing? Interacting with my subscribers over on my discord server? A more reasonable person might count those under creating content and building the community, but I categorize them as fun. It’s my belief that most people only get 2-5 hours of “core work” done at their 9-to-5 after we take out lunch & bathroom breaks, time spent zoned out gazing blankly at the screen, etc. so I’m hella proud of my 39.6 hours
Outro
If you’ve enjoyed this article, consider subscribing to this Substack or my YouTube channel, joining my Discord server, connecting on LinkedIn, or supporting me financially either on Patreon, Youtube or Substack. For everything else, here’s my Linktree
post-hoc addition, apparently this is Ilya Sutskever's list of the 30 papers you should read to understand 90% of what's happening in the field rn
https://arc.net/folder/D0472A20-9C20-4D3F-B145-D2865C0A9FEE