Join us as we explore the future of work in the age of artificial intelligence.

Watch Episode Here

Read Episode Description

Join us as we explore the future of work in the age of artificial intelligence. Nathan shares essential strategies for college students to succeed in a landscape dominated by AI tools. Discover the most important skills to develop, how to use AI for efficiency, and insights into AI-driven task automation. Get valuable advice on incorporating AI into your work and resources for beginning your journey into AI professionalism.

SPONSORS:
The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/

Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.

CHAPTERS:
(00:00:00) Introduction
(00:05:18) AI Closing in on Human Experts
(00:08:48) Limitations of Current AI
(00:13:16) AI's Speed and Availability Advantages
(00:16:26) AI's Evolutionary Advantages
(00:19:01) Viewing AI as Entry-Level Employees
(00:20:38) Sponsors: Brave | Omneky
(00:22:05) Modes of AI Production
(00:25:09) Agent Mode: The Best of Both Worlds
(00:30:08) Prompting Techniques
(00:32:17) AI Coding Assistants
(00:37:01) Automating Processes with AI
(00:37:55) Sponsors: Squad
(00:39:23) University Curriculum and AI Tools
(00:44:50) AI-Generated Music and Video Editing
(00:46:04) Introduction to AI Tools
(00:47:11) Devon AI Agent
(00:49:26) Waymark AI Evaluation Example
(00:52:51) Setting the Context
(00:54:30) Mantras for AI Adoption
(00:56:48) Recommended AI Thought Leaders
(00:59:53) Setting up AI Training
(01:04:25) Avoiding Over-engineering Solutions

Full Transcript

Transcript

Nathan Labenz: (0:00) Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Eric Torenberg. Hello, and welcome back to the Cognitive Revolution. Today, I'm excited to share a presentation that I recently gave to the Texas State University Computer Science Club in which I offered my thoughts on how current college students can best position themselves for success in a world of infinite AI interns. As I've lived and breathed applied AI for the last couple of years, it's become clear to me that most institutions are really struggling to make sense of what's happening with AI, let alone keeping up with all of the latest developments. Many colleges, in particular, are really at risk of failing their students by continuing to focus too heavily on skills that, while in some sense, yes, are still important fundamentals, are increasingly best provided by AI systems. This concern, to be very clear, is not theoretical. I am hearing more and more software companies saying AI tools make them less inclined to hire junior software developers simply because they believe that AI tools create more leverage for their senior team members and generally promise better ROI. My goal with this lecture was to help bridge that gap, helping students start to grapple with and get ahead of the AI megatrend that will shape their entire careers. From mastering copilot mode to learning how to effectively delegate repetitive tasks to AI to using AI for quality control, a paradigm that I discussed in the context of AI task automation, which actually fits perfectly well into all sorts of business processes, including human powered ones. This is my attempt to outline the key skills that early career knowledge workers should be developing now. Before we begin, a couple quick caveats. First, this is all, of course, just my personal outlook based on my experience. And as with any advice, it should never be applied uncritically or as a 1 size fits all solution. Second, things are changing so fast in AI that I fully expect my recommendations to change before long as well. In just a couple weeks since I gave this talk, we've seen notable advances in areas like transformer working memory, which hint at near term changes to the tale of the cognitive tape that I present here. I also wanna note that while there is a ton of AI snake oil being sold online these days, I do think well made and continually updated tutorial libraries can provide a lot of value. If only as a sort of shopping guide to the rapidly expanding world of AI tools. For what it's worth, I personally subscribe to the 1 that NLW is building, be super.ai. Overall, my expectation is that AI is going to radically reshape the world of work and that to compete effectively, early career professionals in particular need to adapt and adopt as fast as possible. So while I would not go as far as to recommend that anyone drop out of their their college program or ignore their classes, I absolutely do encourage students to dedicate real time and energy to educating themselves and starting to incorporate AI into their work now. I hope this presentation provides a solid practical starting point for that journey. And if you find it valuable, I would really ask that you take a moment to share it with a student or an early career professional in your life. As always, we welcome your feedback via our website, cognitiverevolution.ai, or by DMing me on the social media network of your choice. And if you are a student, particularly in computer science or another information technology, who's looking to get into AI and could use a little guidance, definitely feel free to message me. I can suggest resources and projects, and when you're ready, I can help you connect with hiring companies, including some previous guests on the Cognitive Revolution. For now, I hope you enjoy my take on how today's college students can best position themselves for professional success in the AI era. This is how to stand out in a world of infinite AI interns. What we would do today is go through a list of things that you guys should be thinking about as the workforce is in for some significant change, and it is probably going to impact you the most. So I wanted to explain why that's the case, and then also explain what I think you guys should do about it to give yourselves a competitive advantage early in your career, which also happens to be early in the AI era. This was what I thought would be the most useful for students today. So I've titled this 10 super practical AI tips for current college students or how to stand out in a world of infinite AI interns. So my first couple tips are really around just calibrating yourself to what's out there today. I'm sure you guys have all used ChatGPT. So forgive me if some of this is a little bit obvious or repetitive. I speak to all different kinds of audiences, and you would be amazed by how many people who are actually very smart, accomplished leaders have remarkably little curiosity about this technology and and little awareness about just how far it has come already. So I always like to just set a little bit of a baseline by talking about where the technology is today. So my first recommendation in terms of what to do is understand current state of the art. This is a a recommendation to just about everyone and, obviously, a lot of ways to describe what the state of the art is. But the high level description that I tend to give is that AI is closing in on human expert performance on routine tasks. And the word routine there is really doing a lot of work. AI is getting really good at things where we know what good looks like. And so a few examples of that. We're starting to see this is on a benchmark called NLU, which is a set of exams from the college and graduate school level from all disciplines. A typical human might get 35% of those questions right. An expert in the domain of interest gets 90% right. GPT-four gets 86% right. So it's closing in on human expert performance at these discrete tasks where we know what the right answer is. Here's another 1. This is a paper out of Google, out of the MedPalm series. This is MedPalm 2 in particular. This is performance on medical licensing exams. So it's, again, another 1 of these test like environments. To pass is 60% score on the licensing exam, which, by the way, should give you some pause as to just how accurate your human doctor is gonna be on a any given topic that you might have for them. MEDPALM 1 got to 67% correct, and MEDPALM 2 blew past what you needed to pass, and again, closing in on that expert level, 86%. This task is specifically medical question answering. And here, they asked a bunch of human doctors the same questions that they asked the AI. And then they asked human doctors to evaluate whether the AI answer was better or whether the human doctor answer was better. The top chart are the good things. These are, like, the desirable traits that better reflects the medical consensus, better reading comprehension, etcetera. This is how often the AI was judged to have done a better job by human doctors as compared to other human doctors. So you see that, overwhelmingly, the AI is getting the higher marks. This is how often the human doctor was judged to have done a better job, and this is in between where it was considered to be a tie. Then these are the bad things. So these are the things that you don't want to have. This is the only category that the AI lost, and that is basically on hallucinations. I'm sure you guys are all familiar with the problem of hallucinations. Here, they describe it as more inaccurate or irrelevant information, and the AI did that more compared to the human doctor. But all the other bad things, the things that you don't want to happen in your medical question answering, the human doctors did more. So as judged by human doctors, the AI is beating human doctors on 8 out of 9 evaluated criteria. This is basically happening across the board. This is another version where they're doing diagnosis. Again, it's beating humans. There's, like, tons and tons of examples like this. I know there's a lot of CS students in the room. Coding challenges are another area where AIs are now outperforming the majority of people who participate in coding challenges. They'll enter AI into coding challenges alongside humans. It won't be the number 1 contestant, but it will be above average. These are pretty striking results. Now 1 thing that the AIs are not really able to do yet or at least very infrequently is come up with genuinely new, really high quality ideas. I call these eureka moments. I used to say there were no eureka moments coming from AI. Now we've started to see a few of them, so I I now say there are precious few eureka moments. Here's 1 example of a eureka moment. This is actually from a paper called Eureka. Oddly enough, I used to say that, and then they titled the paper Eureka. So it was like a very sort of meta thing. I was like, wow. They're reading my minds here. The task in this case is writing a reward function for a robot to learn how to do a certain task. So to unpack, that means if you're trying to do reinforcement learning to teach a robot to do a task, you need a reward function to evaluate how well it's doing on the task so that you can then train it to maximize that reward. A problem, though, is at the beginning when the robot is just fumbling around and and making very little progress, you have the problem of what's called sparse reward. So that is to say, if the robot is coming nowhere close, then it doesn't get any reward, then it doesn't have anything to learn from. Right? It's so far off that we can't even identify, oh, that was good. Do more of that. It's just totally fumbling and flailing around. This is actually why a couple years ago, OpenAI had a a project where they were gonna try to get AIs to browse the web, and they were gonna try to do this reinforcement learning. But they abandoned that project because they found that it was making no progress. It was getting no reward, and so it had nothing to learn from. So when you're doing a project like this with robotics, you have to oftentimes write a custom reward function. That is to say, okay. We know that you can't in this case, it's like a robot's trying to twirl a pencil in its hand. Now robot starts no ability to do that, and it's not coming close. So you need a reward function to at least proxy to try to give some sense of, like, how can we detect even, like, the flicker of you getting close so we can reward that so we can hopefully bootstrap you into learning this task. If you were to sit down and try to come up with a custom reward function, you might think maybe angular momentum around a certain axis of the pen or whatever could be a leading indicator of possibly this is headed in the right direction. So humans do this. This is how it's done in robotics. Humans will sit there and try to think what would be close, and I'll write a reward function for this. GPT-four is actually a lot better at writing reward functions for these tasks than humans are. And this is something that really only experts do. Right? Like, you don't have a lot of amateur reward functions if you go approach a person on the street and say, hey. Can you write a reward function with this robot? They'll just look at you like, I have no idea what you're even talking about. So it's not you can't even compare to an average person. This is all compared to specialized people who, like, have expert or at least specialized skills. And GPT-four is beating them and and coming up with better reward functions, better able to train the robots. But you still see very few of these things where the robot is outperforming genuinely expert people at new novel tasks. So that's important just to understand what can AI do today. Again, to summarize, it is closing in on expert performance on routine tasks, but still precious few eureka moments, precious few genuinely new insights coming out of AI systems. When they do happen, they make news because they're still very rare. Okay. Next, understand the tale of the cognitive tape. This is why is it happening that way? And there's a lot of dimensions to this. I'll try to go through it quickly. I break down all these different aspects of cognition and then give each a human expert and an AI a score. Again, this is not an average person. This is an expert that I'm evaluating here. So for breadth, obviously, AI has a huge advantage over humans. It has read the whole Internet. Right? It's read all the books. It can score not 86% on some of these exams, but all of the exams across all the subjects. It can speak all the languages, etcetera, etcetera. Humans in comparison, an individual human is, like, super narrow. Humanity as a whole is more on the level of the AI, but that's 1 area in which the AI is, like, already pretty clearly superhuman. On the other hand, depth is still an advantage for the human experts as opposed to the AIs. They're getting pretty good, but they're definitely not at the level of command of a subject like a human expert has. Breakthrough insight, that's the same thing we just talked about with the eureka moments. That is really humanity's biggest edge right now in comparison to AI systems. So I only give the AI a 1 on that dimension. AI is a lot faster. It can typically generate content faster than we can read it, so that's a pretty notable threshold. We can't even keep up reading what it's able to write. And it's also way less expensive than a human, and you can expect it to be, like, at least 10 times faster and probably at least 10 times cheaper. It's also super available and parallelizable, meaning it's on 24 7. It can just sit there and do nothing until you come back to it and say, hey. I have another question for you or whatever. And it doesn't cost anything to do that. It it only costs something when you're using it. If you want to, you can also spend up 10 or a 100 at the same time, which obviously you can't do for humans. So these are probably AI's biggest advantage. Memory is as much as our memories are definitely very fallible and imperfect, our memories are definitely still way better than AI memories. We have this sort of integrated sense of who we are, what we're doing, what our purpose is, what our long term goals are, how what we're doing now fits into the big picture. That's all very intuitive for us. The more I study AI systems, the more impressed I am with human memory because today's batch of AI systems don't have that. They, at best, can, like, search on the Internet and find stuff. Of of course, they know a lot internally. They have a good long term factual memory in some sense. But if they need to go query information on the fly, it's a brittle process. And they also have a finite working memory. They're pretty good in that working memory, but outside of that, things start to get tough. I'm sure you guys have seen systems like RAG, retrieval, augmented generation. But that's just 1 of the ways that people are trying to improve AI memory, and it's very much a work in progress right now. Technology diffusion speed, this is gonna be a big focus of the talk. Actually, it's a big part of what you guys should be thinking about because humans are not great at learning new technologies, especially people who are mid and late career. They are gonna be pretty reluctant in a lot of cases to embrace a new way of working. Of ever of course, everybody's heard the saying you can't teach an old dog new tricks. That is actually key to a lot of the opportunity that I see for young people entering the workforce to take advantage of AI in a way that their more senior coworkers may be, for whatever reason, not inclined or or not able to do. AI, on the other hand, though, does take advantage of this stuff really quickly. When somebody figures out a new way to do whatever, whether that's a new memory technique or a new optimization, something that makes it faster, something that makes training more efficient, whatever, these things spread super quickly because a lot of times it's like, oh, okay. Cool. You found a way to optimize the learning process so it works 40% faster. I'll plug that into my system. Boom. Now everybody can take advantage of that. Broadly speaking, most of the research has been published. These days, we are in a period of closing. So due to the fact that the human researchers are actually now sharing fewer of their breakthroughs, especially out of the top labs like your OpenAI's, your Anthropics, DeepMinds, the pace of diffusion in AI might be actually slowing a little bit, but it's still pretty fast. And here's 1 paper. If you want a sobering read, might consider natural selection favors AI's over humans. That's like an academic paper. Okay. Bedside manner is another 1. People often think you could never have an AI that could match the the warmth or the empathy of a human. On the contrary, that's not really the case. The AIs are extremely patient. You can get them to behave weirdly, but by default, they're actually quite nice, quite patient, quite understanding, very willing to explain things to you. And indeed, like we see this in the medical system, doctors are overworked, they're stressed out, they don't have time, but the chatbots, they'll explain things to you 10 times, and they'll be polite every single time. So it's maybe a little bit of a radical position for me to give the AI a higher mark on bedside manner than humans, but certainly, at least in some ways, is true. And then this final 1, maybe alongside breakthrough insight, is probably the biggest weakness of AI relative to humans. We are broadly pretty robust to crazy stuff. If a crazy person comes up to you on the street, you get this quick sense. Wait a second. I think I'm dealing with a crazy person here. If somebody's trying to scam you, the alarm bells go off pretty quick. And something about this doesn't seem quite right. Your guard goes up, and you're scrutinizing these inputs at a different level. AI systems don't really do that very well, so they're much easier to trick. We see all these jailbreaks. We see them divulging information they're not supposed to divulge. We see chatbots on car dealer websites agreeing to $1 car sales. All these things are because AIs are not very adversarially robust. This is not just a chatbot phenomenon. I'm sure I've heard of AlphaGo, the AI that is the world record best Go player in history. It's broadly considered to be superhuman at playing Go. I recently did a podcast episode on this. If wanna learn more about it, it's from a group called Far AI. They found a way to create an adversarial attack on AlphaGo and consistently defeat it with a strategy that a human would never lose to. I don't play Go, so I don't know a lot about that. But they basically found that it has major blind spots. Even these superhuman Go players, when they're able to adversarially optimize against it, they were able to find ways to beat it. And, again, ways that, you know, human will look look at that and be like, you lost to that? That's crazy. But the AIs just have these big blind spots. They are not adversarially robust. It's really important to keep these strengths and weaknesses in mind because you wanna play to your strengths as a human, and you don't wanna be competing with the AIs in the ways that they are superhuman. A lot of people cash this out, and this is the kind of advice that people mid career are oftentimes getting these days and including from me. You can think of an AI as a day 1 employee who's pretty bright, eager, hardworking, but doesn't know anything about your business. Totally lacks context and can also make really weird mistakes. You gotta be super careful to spell everything out for them, give them super clear directions, show them what good looks like, all these sorts of things. But if you do that, then you have infinite interns. Or in a coding CS type of context, people will say you have infinite entry level software developers. So that is a radically different work working environment that you guys are entering into as compared to anyone in in human history. Never before did any software firm have anything where they could say, oh, yeah. We have infinite junior coder or we have infinite interns. That's never been a thing, but it is now starting to be a thing. And I was just at an event not too long ago where people were like, yeah. We're not hiring as many junior coders anymore because we're really just focused on making our senior people more productive. And we think that with all these tools and the infinite interns that we can give them, that's a more advantageous strategy for us. It's gonna be higher ROI for our business. So I'm definitely not 1 to sugarcoat things. I think this is a real challenge for a lot of people entering the workforce. You're now competing against infinite interns, infinite entry level coders, and you wanna make sure you are angling toward human strengths and away from AI strengths. Hey. We'll continue our interview in a moment after a word from our sponsors. Okay. 3, understand the modes of AI production. This I think there's just a ton of confusion, and so I like to try to clarify this for people. There's basically 3 ways that I see people working with AI systems today. 1 that's, like, very familiar is often called Copilot mode. Microsoft uses that term for their product. But this is basically the chat GPT experience where you, as the human, are doing your thing. You might be doing your coding work. You might be writing a letter. You might be putting together an analysis. You might be brainstorming a list of things. But at some point, think, oh, what I could do is I could ask AI for help. And then you go over to it, you put in some instructions, and it gives you something back. And you can look at it in real time, and you think, oh, yeah. That's good, bad, whatever. I can use it. Maybe not. There's a couple of good ideas there. This is the real time back and forth mode where you are the pilot. It is the copilot. And this is definitely something that you'll see in 1 of my later things. Like, you wanna get good at this. This is only 1 of the 3 modes, though. The other mode, which I think is actually gonna be even more important in some ways in business context, especially big business context, is what I call delegation mode. You could also call it task automation mode, And it is the idea where you're not just interacting real time, ad hoc, haphazard. Instead, you're saying, let's find some bottlenecks in our business. Let's find some things that right now we have to put a lot of time and energy into, and maybe we really have to manage people very carefully to get the quality and consistency where we want it to be. And maybe this inbox is overflowing, and what we're gonna go hire a bunch more people to do this thing. Or maybe we would like to 10 x what we're doing. We just don't have the resources to do that. Identifying these sorts of tasks and then setting up a system for AI to do that and to do it consistently in the broader context of the business, that's what I call delegation mode. And the key thing there is you wanna get to the point where you are not evaluating every single AI output anymore. When you're doing Copilot mode, you have to evaluate every single output. You would be very unwise to just take the output of CHET GPT and turn it in as your paper or turn it in as your coding assignment. There have been examples, I'm sure you've seen them, where lawyers have done that. They thought the CHET GPT didn't make mistakes or whatever, and so they turned in a brief into the court. And some of those guys have lost their license because that's just outright malpractice. You have to review what you're getting out of Copilot mode. But if you do your setup right, then you can get to the point for many tasks where the AI can consistently do the task, and then you don't have to evaluate every single 1 anymore. You can get to the point where you actually can trust it. There's a trade off there where you're dialing in, you're narrowing the scope, you're zooming in a very particular problem, you're setting it up, you're controlling what the inputs are gonna be, you're working through it, you're testing a bunch of inputs to make sure that the outputs are what you want. And then at some point, depending on how important it is, how high risk it would be if it did make a mistake, you can say, actually, this feels good enough now that we can actually use this as a process in our business and not have to supervise it every single time. So that's delegation mode, and that that's gonna come back again in a minute too. And then in the middle, the the best of both worlds would be, like, if you could delegate in real time like you do in Copilot mode, but you could trust the results like you can you can get to with work in delegation mode, this would be the dream of agents. Right? This would be like saying, hey. I just had this idea. Oh, by the way, can you go out and find 20 websites of businesses that offer whatever accounting services in Detroit, Michigan where I'm based and go look at the rates that they have and then put those into a spreadsheet and come back write an analysis of all that and come back to me when it's done. The sort of midscale problem that you would give to a person, if you can do that on the fly and actually get good results back, now we're into agent mode. And that's not quite there yet, but it is coming quite soon. The general consensus in the field is that the next big OpenAI release, whether that's GPT 5 or GPT 4.5 or whatever, is probably going to power a lot of those multistep, multi app agent use cases. But that's not quite there yet, but it's pretty safe to say it's coming. Okay. So those are the modes. So just getting clear on what mode you're actually using and being able to talk about that and have conversations where you're helping other people understand what's going on. Having this conceptual framework in your head is super useful to do that. Very simple tip, but super important. Definitely use the best available AIs. If you're using free ChatGPT, not good enough, it's absolutely worth the $20 a month. And if you have to go do a Upwork project to get the $20 a month to do it, like, again, it's absolutely worth it. There's no reason really to use anything other than 1 of the top tier models. Top tier models today are Claude 3, GPT-four. Actually, when I wrote this 2 days ago, Claude 3 had taken the top position. I would now say GPT-four, which just released a new version. This goes to show how quick the the leaderboard can change. Now GPT-four latest version is probably top again. Claude 3 is great. It's very good for writing. And Gemini Advanced Gemini 1.5 is also extremely good, and and Google is very much a live player in this game. There are a couple of open source options that are decent. They're all very good compared to what we had even a year ago, but I tend to stick with the very best tools, and I don't mess around too much with anything else. I basically use nothing other than these 3 tools. If you're gonna go into the open source, you're into hobbyist land, you also have to figure out where you're gonna run it. Like, the very best open source models are also big, or you can run that on your laptop. It's just a pain in the butt. I would generally advise sticking with the best tools. This is, by way, also my advice to business owners. Buy the best tools for your team. Don't cheap out. Don't be a cheapskate. I would say for you guys, don't work at a place that isn't willing to buy you the best tools either. Alright. So now we get into the modes. Mastering Copilot mode. I think you guys are probably all well on your way here. Everybody has different frameworks for prompting best practices, but they largely end up being the same. OpenAI has published their official guide to prompting. Anthropic has their official guide to prompting. I have my official guide to prompting. They're basically all saying the same thing. It's you wanna make sure your instructions are very clear. 1 of the great rewards, especially in a software environment of working with AIs, is that it forces you to take a breath before just and I used to do this all the time, and I'm sure some of you do too. Oh, I'm gonna code code just immediately. I'm starting to type code. Class. Whatever. And have I even thought about what I'm trying to do? I personally struggled with that in the past. Working with AI helps me there because I can't expect the AI to do anything for me until I've articulated what I want. And I have to do that pretty clearly, or I'm gonna get something in the general direction of what I want, but not really what I want. So everybody always is gonna say, you're gonna need clear, accurate, unambiguous instructions. That's an art in and of itself. Definitely something worth practicing. Probably pretty clear to you guys already. The next big thing that is super important or super useful is, especially if you have examples, some things are easier shown than told. So give examples of what good looks like. In some ways, this is, like, the thing that has unlocked the the current AI era. The GPT-three paper title was large language models are few shot learners. Few shot learners means that if you give it a couple of examples, it can pick up on the task, and it be can begin to do the task even if you didn't describe the task just based on the examples that it sees. That's a pretty profound thing. Like, no AI system before GPT-three could ever do that. Certainly not in a general purpose way. So we're only 2 years into the few shot and learning era, and it's a huge advantage for many tasks instead of to to try to tell it exactly what to do. If this, then that, do this, then that. So often, you can just be like, here's the sample input. Here's what a good output looks like. Give it a few of those, and you'll get much better results. Again, I think this is probably fairly obvious, so I don't wanna waste too much time on prompting. You can also get good results by telling it what role you want it to play. That can be like a professional role. You're the senior software engineer supervising my work. Give me feedback on it. That your job is to do a code review of my work, that kind of thing. You can also do specific names. Sometimes when the AIs write in a very verbose kind of flowery wordy way, I'll tell it, I want you to think Einstein Hemingway. That's my phrase. And then sometimes I'll also so Einstein Hemingway, that's your role. It's this mashup. Don't want super smart, but I also want terse, crisp, clear, simplest possible language. Sometimes I also say we want to demonstrate our intelligence in part via economy of words. You'll get a very different style of writing out if you say you want Einstein Hemingway versus just letting it write in its normal default way. And I personally strongly prefer the Einstein Hemingway. Other simple things I'm sure you've seen before, labeling your data, giving the model time to think. That's chain of thought. Explain your reasoning. These days, they tend to do that by default. You can also tell exactly how you want your answer formatted. Use the format and literally give it a template for how you want it to return information to you. Again, I I imagine you guys have seen this sort of thing. So everybody has their different framework on prompting. You wanna be good at this. When you get a job, likely, your boss will not be good at this. Possibly, they will. Most of the time, they won't. So even though this stuff is, like, fairly basic, it is a huge advantage relative to not knowing how to do it. So a relatively simple crash course on this kind of thing can be worth a huge amount in terms of the ROI. And not just money, but time, energy. You'll just get way better results from AI. Now AIs are also starting to get pretty good at this. This is the latest thing from Anthropic. They call it the meta prompt. Basically, what you do here is go to a collab notebook that they provide and give it a prompt, and it will give you a way better prompt based on your initial prompt. So it's a funny meta thing where the AIs are already starting to help us prompt. Crazy, but I've used this, and it definitely really helps. It flushes it out for you, and you can read all the more detailed instructions, and you can start to refine them. You go, oh, that's not exactly what I meant. Tweak. Very useful. Also, in Copilot mode, definitely getting used to these coding assistants. I'm sure you guys have all used some or maybe all of these things. GitHub Copilot is the most common 1. Code EMA is a really interesting 1 that's lesser known, but they specifically focus on code integrity. So they help you generate unit tests. They do all these sort of type checking things, All the things that you get dinged on in your homework that you might also get dinged on in a code review in your job because, like, somebody with a stick up just telling you, oh, you gotta do this every time, whatever. And you're like, oh, do I really care? Whether that really matters is debatable and definitely depends on context, but that's what Codium is there to help you do. And it's quite good at it. I've never been great at that stuff. And so I really like Codium because it comes around and cleans up my mess and reminds me of the things that I should be doing that I wasn't doing. And Cursor is, the next evolution on Copilot where it's an AI first coding environment. How many of you guys have used Cursor just out of curiosity? No Cursor users? Oh my god. Okay. Get Cursor. Definitely try it. It blows people's minds with how helpful it is. Okay. Cool. If if you get nothing else out of this, you should all go try Cursor. Obviously, you're gonna wanna watch out in Copilot mode for hallucinations and other mistakes. AI makes mistakes. Makes a lot less mistakes than it used to. It's probably gonna continue to make fewer mistakes in the future, but it does still make mistakes. Okay. Next 1. Distinguishing yourself with delegation mode. I have a whole presentation here on AI task automation 1 0 1. This is something that young people entering the workforce can literally blow their boss' minds with and even crack into places that you might not otherwise be able to crack into by doing a little project like this and demonstrating to them what's possible. The reason these are tips is because people do not understand the state of the art. They do not understand the relative strengths and weaknesses. So if you have that and then you can bring some task automation to them and show them what's possible, a lot of times, they'll be literally mind blown by what is possible. And the fact that you can do it, they're gonna look at you like you have magic powers. So we don't have time for the whole task automation thing, but it's often done with no code platforms. Many of you are in CS. Like, you shouldn't have any trouble using these no code platforms. But 1 thing to really understand about organizations, like how businesses really work, is that a lot of times their processes are not documented and not really formal at all. There is a way that work gets done, but nobody actually sat down and designed that with a flowchart. So this is why I say in this presentation, these process diagrams do not exist. You guys may remember this person does not exist. There was, like, a classic 1 of the early AI things. You can go to this persondoesnotexist.com, and it just makes a new person for you on every page load. This is from a GAN regenerative adversarial network that just makes headshots. And all these people are fake, and it's a mind blowing wow. So whatever. That's an aside. This is an a sort of reference to that, but the key thing to understand is businesses have implicit processes. Nobody really designed it in a lot of cases. Nobody really documented it, but just 1 person who knows I do this, and then I hand it off to them, and then they do something, and then it goes wherever. And people know where they fit, and they know what their responsibility is. They don't necessarily have command of the bigger picture. So a lot of what you have to do to be successful is map out these processes. They exist implicitly, but no nobody's ever really gotten specific about, like, how actually do we do this? So just understanding that that's the state of play, and you're gonna probably have to go figure that out, map out this territory, figure out what are the inputs, what are the outputs, what's happening in between, whose responsibility is it. I use these terms, input logic and outputs, for the AI portion, but then there's also when does it happen? What causes the process to start? What happens at the end of the process? You have to answer these questions. If you do that and then you find 1 of my other little best practices is prompt before app. What I mean by that is the first thing you wanna do is understand what this core task is and demonstrate that the AI can do it. Work on the prompt. Make sure the inputs and outputs are working. Work with whoever kinda owns the process to say, give me 10 inputs and and 10 examples of what good looks like, and let me see if I can match that with the AI. And if I can do that and possibly, by the way, you might just take their 10 examples and use them as few shot examples and just say, hey, AI. Here's 9. Can you do the tenth? And just see, maybe that's enough. It can often be quite simple. Sometimes it takes more work. Each case is different. But if you can get that working, then you can do all this other stuff with the no code platforms and the triggers and the automations and Zapier. And maybe you have to write a little custom code at some point in the Zapier zap to get it to work. But if you can get that core thing to work, then you can build the process around it. You can pipe it into where it needs to go. You will blow people's minds. I think 1 of the most common new AI jobs is just automating existing processes with AI. And these are not, like, sexy things a lot of times. They're things that nobody really wants to do. Write the first draft, and sometimes there may still be a human in the loop as well. Maybe it's just like we get a ton of customer service tickets. Nobody really enjoys answering those tickets. Can you write the first version of the response to the tickets? Here's a 100 examples of what we've done in the past. If you could get that to work, though, in an environment where nobody else knows how to do that, you are immediately a difference maker. And that can open doors into context where you're gonna have remarkable access to expertise because these people do have something that they're bringing to the table, but it's not AI task automation. And they will love you for it if you can actually bring this and make it work for them. And 1 of the reasons they're gonna love it is it's gonna be a lot faster, it's gonna be a lot cheaper than having a human do it. Hey. We'll continue our interview in a moment after a word from our sponsors. So there's more question, please.

Audience Member: (36:41) I have 1 question. A lot of the time people in their degree in CS right now, they're doing a bunch of data structures type stuff, and it's all what is in stock, what is a queue, like, all that sort of thing. ChowGPT and AI tools are really good. They understand a lot of that already. And I feel like there's this gap between what most students are spending a lot of their time on in classes and all the stuff that you're talking about is all new. But none of this is in university. My question to you is given the fact that you've spoken with people in industry, what do you think the right balance is? Is it like just stop, just do the bare minimum for school or is it like 50 50? What?

Nathan Labenz: (37:17) Yeah, it's a good question. School's a little fucked up, to be honest, in a lot of cases. Simultaneously, I'm a believer that there's no world more real than the 1 you're living in right now. It's not there's some, like, mythical real world out there that's totally different. But in the work for pay world, which is definitely different from school, nobody cares about using ChatGPT. There are maybe a few idiosyncratic people out there who are purists or who have extreme data security issues. There are a few legitimate reasons to think that you should never use a certain tool like this. Even then, I I would say data concerns are fairly overblown, and there are ways to use these tools that protect your data privacy. But by and large, people just want the job done. They want it done well. They want it done fast. And if you can deliver that and AI is part of that recipe, then that's a win. So I don't really know what that means for school. It's hard to give advice because, like, you need to pass your classes. Actually, you could drop out. And if you're sweet at AI, there's there's infinite opportunity. Like, the AI world does not care if you have a degree, you could automate tasks effectively. I I do believe that there is a lot of opportunity that is increasingly independent of a degree. But I'm not telling you to drop out. And if you're not gonna drop out, you should pass your classes. And if your classes have, like, final exams that involve do this programming task and you're not allowed to use AI, that's a little retro, but it is the reality, and so you gotta do it. But I don't code by hand really anymore at all. I pretty much only go to an AI, describe what I want. Sometimes I will curate context. So I I might bring documentation from something that I'm using, or I might if it's a code base that I'm already working in, I might copy 1 class in and another class in and be like, hey. I wanna make a new method that does this. I don't wanna implement the caching pattern from over here or whatever. And it's a much better typer than I am and frankly a better coder as well. But the typing alone saves a ton of time just to have it generated quickly. So, yeah, not knowing enough about the specifics of the context of the classes and the requirements, I can't say with confidence, like, ignore this or don't do that. But I will say, generally speaking, people don't care if you're using AI. They want. The classic thing is, like, good, fast, and cheap, you can only pick 2, and AI is starting to break that paradigm. Or another way to say that is good, fast, and cheap will be redefined so you can still only pick 2. But relative to traditional good, fast, and cheap, you can only pick 2. With AI, I feel like I routinely deliver good, fast, and cheap, and people don't have to pick. And if you can deliver that, people will love you for it, and nobody's really like, oh, it's cheating to use chat GPT. Very few have that attitude. And I probably wouldn't work at that sort of place because unless there's a really good reason, like, we're a military contractor or whatever, you can come up with reasons. But unless there's a really good reason, if it's just like the boss doesn't like it or something, then I would be like, this is

Audience Member: (40:27) Yeah. That that's great. Thank you.

Nathan Labenz: (40:29) You bet. Another thing, by the way, this is, like, the easiest thing in the world, but in so many organizations right now would radically change how they do things. Claude for Sheets. It's just a integrated API call that they've wrapped up into a typical spreadsheet function, and it's amazing. It can structure data for you. It can answer questions for you. It can fill in gaps in data. It can do all sorts of things that you might wanna do, and it takes 2 seconds to install. And especially with their cheap version, it's, like, insanely cheap too. They even have a caching layer in there. It's an extremely useful tool, and this is, like, 0.1% of businesses have probably installed Claude for Sheets right now. If you literally just went in and were like, hey. Have we thought about using Claude for Sheets? Yeah. What's that? Nobody will know, and it's immediate alpha. Okay. Mindful of time. Scout all applications. I describe myself as an AI scout. I literally spend all my time trying new stuff, reading research, trying to understand what's going on from all angles. Definitely find it remarkable how often people just can't be bothered to try a new thing. The fact that all of you that nobody raised their hand for cursor is a little bit of a warning sign for that. Go try some of these new things. There's an AI app for everything these days. And anytime I have an unfamiliar task that I haven't really done in a while, whether it's, like, making slides, for example I mean, these slides actually predate Gamma getting quite good, but gamma. App is a really good little slide maker now. If I'm editing video, Descript has all sorts of cool AI tools for helping to edit video. Suno and this other 1, Yudio, make unbelievably sick music. Listen to this. This is AI.

Nathan Labenz: (42:16) I

Nathan Labenz: (42:33) know about you, but, like, it's getting good enough that I actually would listen to it for just enjoyment of music. Like, it's not even just a novelty anymore. The lyrics are honestly pretty good too. My favorite part is when it goes to deep minds. And this was made by a friend of mine, I asked him, did you ask for that? And he said, no. I I basically just did nothing. I just gave it a quick prompt. It took no effort. So it's definitely worth going out and trying a lot of these new products. There's 1000000 of them popping up all the time. Again, it's something that other people won't do, and it's certainly something that mid and late career people won't do. So it's something that you, with your youthful energy, can go out and do. Other people will find remarkable that you do that. So it's an extremely easy way to get an edge, and you can just, again, routinely blow people's minds. I'm sure you guys are familiar with Replit at least somewhat. Their ghostwriter product is pretty cool. Watch that space because they're gonna have a lot more stuff coming soon. I've been trying Julius recently. It's another kind of real time coding thing. I gave it a prompt earlier today. Maybe I could just show this how easy this was. I just went to it and said I wanted to get audio of off a video. Simple as that. But all I had was the YouTube URL. So I said, can you fetch the audio from a YouTube video and give me an m p 3? It wrote the code, executed the code, gave me a download link, worked flawlessly first time. No problem. Amazing. Now why would I do this instead of ChatGPT? First of I'm not sure if ChatGPT would do it. It may or may not refuse me. So that's 1 issue. Second, ChatGPT can execute code, but this is, like, even more robust code execution as part of the process. If you went to Claude, it might write you this code, but then you'd still have the question of, like, where am I gonna execute this code? And maybe that's no problem for you because you've got, like, development environments all over the place, whatever. Other people don't. The ability to just go do this in a browser and get this done in 15 seconds, how much of a pain would this be? I'm sure you've seen Devin as well. Who's seen Devin? Couple hands? Okay. Cool. Yeah. Devin is a coding agent. Actually, this is the beginning of the agent moment, and I haven't used it yet because it's, like, still waitlisted. But from what I've heard, it's, like, getting good, but not quite good yet, but you can start to see how it's happening. It will actually hit a bug, not know what to do. Go to the documentation online. Read the documentation. Come back and try to fix the bug based on reading the documentation. It's starting to run this actual loop process of just keep going. Just because I failed doesn't mean I'm done. Just like a person, I'm not gonna just give up because I hit a first bug. Like, I gotta keep going. I gotta try something else. I gotta go find some new information. I gotta find another approach. Maybe I gotta change my plan. That's what Devin and there's also an open Devin, which you can go download and and run-in your own environment. Another 1 coming soon will be magic back dev. Brush past that for now because nobody really knows what it is other than that it's allegedly a huge deal. Okay. Almost there, and then I can do a couple questions. I've covered most of it. This eval's expert concept is pretty similar to or is at least intimately related to delegation mode, but it can apply in a lot of contexts. But, basically, is this AI thing working? How would we know if it's not working or if something changes and it's not working as well as it used to be? What are the things that we wanted to always do? What are the things that we wanted to never do? And how are we sure that it's always and never doing that? It it becomes quite challenging. So setting up, this is basically like a unit testing type of thing, except you're doing it potentially on an ongoing basis for all the inputs and outputs of these systems. And this can really help you assure yourselves and assure the company that you're working with that this is actually working, and we have visibility into what's happening. We can quantify. We said we never want it to do x. As of right now, we're seeing that it is doing x 3% of the time. Is that tolerable? Is that intolerable? That obviously, it depends on context. There's gonna be judgment calls to be made, but you need to have those ability to describe what is actually happening in order to have an informed discussion. So these are sometimes called benchmarks. Evals and benchmarks are the same thing. Benchmarks are more like public. Evals tend to be, again, the same thing, but more internal. I'll show you an example. Okay. For Waymark, we have an AI write video scripts, and we have all these things that we wanted to do and not do, and you don't have to worry about the details of this too much. But here's the sort of thing that you can do. Let's say, for example, we give the AI a template of for a video, and its job is to write a script that fits that template. I've installed Cloud for Sheets here. That's what I'm using. I used the meta prompt to help flesh out all these instructions. And now I'm gonna give it a script structure that was the input to the AI and then the output. And what I wanna check-in this particular eval is, are you using and we serve small businesses, so these are all, like, local restaurants and law firms and whatever that are making these videos. So we wanna confirm that the AI is using the business contact information in the same way that it's being used in the original script. Because if they put it in the wrong place, the videos end up looking weird or whatever. So this is a highly idiosyncratic problem. A lot of this stuff is gonna be idiosyncratic. The businesses you guys are gonna work at are all gonna be very idiosyncratic. Everybody has their own weird shit. And nobody likes looking at this stuff to sit there and say, oh, hey. Can you go through a 100 of these and check to see if the AI did any contact information in the wrong place? People hate that kind of work. Super tedious, super time consuming. That also means it's expensive, and people aren't very good at it because it just gets super boring and their minds go elsewhere. And just in every way, it sucks. This is definitely the perfect job for AI. So here's what that looks like. We have a JSON structure. It doesn't really matter. And now I'm literally just calling the Claude for sheets function and saying, here is this full script. This was just my template instruction, and I actually now put the variables here. So you could see the variables are now populated. And now we're calling to Claude, and it's gonna tell us at the end of the thing, are there any places where contact information is incorrectly used? I had to workshop this for a while to get it to work, but here you have it. Okay? This is the expected answer. So we're, like, actually testing in this phase that Claude can do the job. But now we're gonna start to use it on all of the stuff that we do so we can monitor. How often do we see that the AI script has contact information in the wrong place? So here it's like, oh, expert hair regeneration. But here we have, oh, in the actual 1 that it wrote, it gave a URL. That according to our rules is a violation, so boom. It flags it. Here's another example of that. This 1, there was 1. It had call today with a phone number, but in the 1 that it wrote, it didn't have any contact information. So that's a violation in our rules. We get to define our rules. That's a violation going the other way. And this 1, there weren't a violation. So it's just told to return true. It returns true. So, anyway, all this just to say, being a person who can set up these evals to quantify what is happening, oh, you wanna make sure that our AI is never doing this? Okay. Cool. Let's set up a framework. It can be as simple as Claude for Sheets. There are way more advanced and complicated tools out there. But a simple Claude for Sheets where it's, give me 5 examples of where it's doing it right and 5 examples of where it's doing it wrong. I'll use the meta prompt. I'll workshop the prompt. I'll set it up. I'll be able to demonstrate that it can evaluate things accurately, and then we can use that going forward however we want. That is an again, another skill that people don't even have their heads wrapped around what they need. So if you can understand how to use evals and actually develop them, you're gonna be like a unicorn in 99% of businesses in the country today. Okay. I'm gonna land the plane here because I know we're just about at time. You're gonna be the 1 that knows this. The mid career and late career people are not. They're gonna have expertise, of course, that you're not gonna have, but I think you really want to position yourself as somebody who understands this stuff and can help lead them through understanding what they don't know. These could be really basic questions like, what are the new jobs that companies are hiring for in AI? There is this concept of the AI engineer. If you haven't heard that, look that up. That's kind of software engineering but with a heavy AI emphasis. It's like a software engineer that can really use the AI tools and models well and actually build models into products. The ML ops specialist is, like, fine tuning models, curating datasets, some of this benchmarking and eval type stuff. And then the AI implementation specialist is a little less technical. And by the way, these are all very much in flux, and businesses call them different things where where these are just kind of things that I observe. This would be somebody who's like, oh, I'm gonna help us set up an internal chatbot that has access to our database or has access to all of our knowledge base. So here's a super simple example. I work with a company that has 1000 employees. They just wanted to set up a simple chat so that people can ask AI their day to day questions instead of having to go ask a person. They have a whole team that sits there and answers questions for the thousand employees, and now they're able to shift half of that work to AI. AI doesn't know all the answers, but we gave it 250 documents, and we just it's a very simple thing to set that up with a off the shelf product. I literally just used chatbased.co for that 1. Load the documents in. Couldn't be easier, but just knowing how to set that kind of stuff up is a job unto itself these days. I call that AI implementation. Also, I think sometimes it really helps to have these simple mantras for people. At Waymark, I say AI or die. Like, you're never going to or I also say done for you beats do it yourself. Or if used to be make your own video, you had to write all the copy. Now the AI writes it for you. It's a 10 times better experience. So having these little mantras that people can wrap their heads around when they're very unfamiliar with the technology really helps. 3 general ones that I say are summary, not strategy. This kinda goes back to the eureka moments. A lot of times business leaders will be like, can I have this thing help me with business strategy? And I always tell them, no. That's unfortunately, that's still on you. You can have it do a lot of routine stuff, repetitive stuff, the work that nobody wants to do, or the work that you'd like to scale. That stuff it can do, but it can't do your business strategy. Similarly, process, not product. Whatever it is in your business that is, like, your unique product, don't delegate that to the AI. Delegate all the other stuff to the AI that sucks to do or that's at least not that awesome to do. But whatever makes you super special, hold on to that. Double down on that as humans and use AI in all the other places. Again, convert not create. Similar, these are just simple ways that you can communicate to people, like, what the AI is good for and what it's not really so good for. This may change, and that leads us to the last thing, which is just continue to update your understanding, continue to update your world model on this. We are only a couple years into AI. I would say, really, we're just 1 year into AI. GPT-four was really the moment when it went from not that useful to, like, often very useful to a naive user. That's only been 1 year. So the systems are still gonna get a lot better. The state of the art is gonna change. The tale of the cognitive tape is gonna change. All this stuff is gonna happen. It's gonna continue to evolve. So just keep in mind, we are nowhere near a static fixed end state right now. You're gonna have to keep evolving with it and follow people that because this is definitely a community thing. Going back to the breadth on the tail of the cognitive tape, the AIs are bigger than us in very profound ways. They know more than us. They can speak all the languages. Because they are so big, the surface area is so vast, it takes a community to understand what is going on. Even a full year after GPT-four was released, people are still finding better ways to prompt it that are bringing out new state of the art. So this is a very short list, but these are people that I really highly recommend specifically for coding. This dude, McKay Wrigley, on Twitter posts super good demos. There's quite a few like him. Another 1 that I really respect a lot is, Swix on Twitter, and his podcast is called Latent Space. He also has a blog. Really good content. And if you are less into coding but more into just kind of general business and knowledge work, then I think the number 1 commenter in today's world is a guy named Ethan Malik. He's a professor at Wharton and is super prolific. Just came out with a book. And if there's 1 business school professor that your future boss will maybe have heard of, it might be this guy. So it would definitely be a good 1 for you to know about too. These people I trust to have good content. There's a lot of AI snake oil out there these days of 99% of people are using ChatGPT wrong. Buy my course. Don't buy that course. All the real information you need is free, and it's, like, freely available from good thought leaders. Buy the products. Buy GPT-four. Don't buy the, like, prompting secrets because they're not there's not really any prompting secrets you can't get for free. Alright. Happy to do a couple questions.

Audience Member: (55:08) Can you tell me something about how AI is used in finance starts, like in the finance sector a little bit?

Nathan Labenz: (55:15) Yeah. To be honest, it's not an area I know a ton about, and a big reason for that is that a lot of it is held much more closely and proprietary as trade secret in finance than in other areas. This is actually something I just heard Swick say the other day. He was like, in finance, everybody is keeping everything a secret, whereas in software, people are sharing methods and open sourcing stuff all over the place. So, certainly, like, high frequency trading has been an AI game for years, and I'm not sure how much language models are breaking into, like, highly quantitative finance. Speed is really super important in finance, especially for trading style finance. So those systems tend to be much more narrowly focused versus the, like, broad purpose AIs that are dominating the news today. Bloomberg tried to train their own model based on all the proprietary data that they had, and they found that GPT-four was better. Even though they had all this special data and all the whatever that they thought this would give us an advantage, in the end, still, they found GPT-four was better. So that's an interesting data point. Finance is obviously a huge thing. Right? The mortgage industry is, like, part of finance. I think there's definitely a ton of opportunity for language models to do, like, document review. Are these documents in order? Is anything missing? Does anything seem wrong? There's a lot of task automation work that you could do there. Anything where there's, like, a checklist. Right? Did the person submit this? Did the person correctly fill this form out? Does this match with this and this other form? All those sorts of things, those are, like, ways in which finance is similar to other business. And in that way, I think that'll happen in finance just like it happens in other places. But the more sort of trading you get, the more it's like a lot of hedge fund and big bank secrets. Where can I go to find more information on setting up my own AI to train on it? Say I had a lot of data and results from people that took action on that data. I wanted to train an AI to do that. It depends on how much data you have, and it depends on how technical you wanna get. You are gonna be almost for sure limited to fine tuning because the amount of data that it takes to train a model from scratch and here, I'm referring again to general purpose language models. If you're talking about narrow you know, you can train a linear regression. That's a model in some sense. You can train a a really simple thing with very little data. But assuming we're talking about, like, relatively general purpose language model type things, they take a huge amount of data to train from scratch. So you're gonna be much more likely fine tuning 1. And the easiest way to do it would be to use the OpenAI platform. They have a fine tuning API. It doesn't cost much, couple bucks. Typically, like, a 100 examples is enough if you wanna just do 1 task. But 1 of the things to realize about fine tuning is that they've trained these models. Like, it's a high art to train these chatbots to handle everything in the way that they do. Tons of trade offs, tons of data to get it to say no to the things that it's supposed to say no to and do the things it's supposed to, and there's a ton that goes into that. Fine tuning is relatively easy if you are narrowing down to 1 task. So with Waymark, for example, we do fine tune GPT 3.5 to be our script writer, and we have hundreds of examples. The quality of the data is probably the most important thing because that's what your model is learning from. If there are mistakes in there, it's gonna learn those mistakes. So you really need to have high quality data. That, again, comes back to the the evals are really important there. How do I know what's high quality? I might define a bunch of rules as to what's high quality, and then I have to actually examine all the things with all those rules in mind. Maybe AI can help with that. So I would say, keep in mind, you're going to be fundamentally narrowing the scope of what the system can do when you fine tune it. It is possible to fine tune in a way that can do more than 1 task. Standard starting place would be, like, a few 100 examples, maybe even fewer than that. Could be a few tens of examples to start. Probably to really get good performance, you might need a few 100. If it's a hard task or there's a lot of edge cases or whatever, you might need a few thousand. But somewhere in that range is enough to do reasonably good fine tuning on the OpenAI platform, and you just need to assemble high quality data and run it through. You can also do that open source. You could get a LAMA model or any number of open source models with techniques that are generally called PEFT, p e f t, parameter efficient fine tuning. Basically, what they do you've heard, of course, like, b 3 has a 175,000,000,000 parameters, whatever. That's, like, a lot to mess with. Today's models are smaller. You might have a LAMA model that's 7,000,000,000. That's still a lot to mess with. If you use parameter efficient fine tuning, basically, most of those are just held in place constant, and then just 3% of them or whatever are are chained to fine tune it just to do whatever you want it to do. You can run that in a Google Colab notebook and could be, like, very limited compute requirements. So techniques there are called LoRa, qLoRa. Those are examples of parameter efficient fine tuning. LoRa is like low rank something low rank, meaning it's not changing that much of the matrices. Yeah. That's a pretty good start. That'll take you pretty far. Just know that when you do it, you're gonna be narrowing what it can do. If you bring it a 100 examples of writing a script and then you come ask it to write you a recipe for a birthday cake or whatever, it literally can't do it. It's lost that ability entirely. Now, like, our Waymark scriptwriter can only write scripts. It can do nothing else. That's in a way good because we don't want it to do anything else, but it's in a way bad because oh, this is this is a good tip actually also too. So I have this podcast. I do an intro essay for each podcast. I literally write 1 page, and I read it. After doing that a bunch of times, I started to think, man, maybe I could fine tune a model to write as me. Very hard to do. I write a lot of different things. It's tough. What works a lot better is taking 30 of my essays, dropping them into Claude 3, and then saying, use the style of all these essays and write me something new. And then it will do a much better job. So in a lot of cases, may think you need fine tuning. You really don't. You just need to give the AI a lot of examples. Every case is different. So just keep in mind that fine tuning is not always the answer. And, especially, I find that software engineers wanna go to more technical solutions often than are really necessary. If I were to go talk to a software engineer and say, like, how can I get AI to write as me? They're gonna immediately be like, oh, let's fine tune. Let's do this. We could use Glora. We could use this Pefta, blah blah blah. And the actual best answer today is give Claude 3 a ton of examples and have it do it, and it's gonna beat any fine tuning you're gonna do. Keep that in mind. Go out there and establish yourselves as the AI experts that these dinosaurs will never become, and they will love you for it. The infinite interns is like, it's on the horizon. It's here. It's not quite here, But you're in a moment where if you can establish that you're the person to go to for this kind of technology, all the business leaders are gonna be looking for that. The next talk I'm giving is a group of business leaders that are all mid and late career. They're all worth millions of dollars, but they're all asking these questions. So if you can answer these questions for them, then they will hire you.

Audience Member: (1:02:40) Awesome. Thank you so much.

Nathan Labenz: (1:02:41) My pleasure, guys. I hope this was helpful. It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Universal Medical Intelligence: OpenAI's Plan to Elevate Human Health, with Karan Singhal

Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

Mathematical Superintelligence: Harmonic's Vlad & Tudor on IMO Gold & Theories of Everything

Infinite AI Interns: How Young Professionals can Win in an AI-Powered World

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Universal Medical Intelligence: OpenAI's Plan to Elevate Human Health, with Karan Singhal

Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

Mathematical Superintelligence: Harmonic's Vlad & Tudor on IMO Gold & Theories of Everything

Infinite AI Interns: How Young Professionals can Win in an AI-Powered World

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Universal Medical Intelligence: OpenAI's Plan to Elevate Human Health, with Karan Singhal

Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

Mathematical Superintelligence: Harmonic's Vlad & Tudor on IMO Gold & Theories of Everything

Approaching the AI Event Horizon? Part 2, w/ Abhi Mahajan, Helen Toner, Jeremie Harris, @8teAPi