OpenAI DevDay: Beyond the Headlines with Logan Kilpatrick, OpenAI's Dev Relations Lead

Watch Episode Here

Video Description

We’re deep diving into OpenAI DevDay with Logan Kilpatrick, Dev Relations Lead at OpenAI. Logan and Nathan discuss the GPT Store and GPT agents, Assistant API, custom models, finetuning, multimodal GPT, and much more. If you need an ecommerce platform, check out our sponsor Shopify: https://shopify.com/cognitive for a $1/month trial period.

SPONSORS:
Shopify is the global commerce platform that helps you sell at every stage of your business. Shopify powers 10% of ALL eCommerce in the US. And Shopify's the global force behind Allbirds, Rothy's, and Brooklinen, and 1,000,000s of other entrepreneurs across 175 countries.From their all-in-one e-commerce platform, to their in-person POS system – wherever and whatever you're selling, Shopify's got you covered. With free Shopify Magic, sell more with less effort by whipping up captivating content that converts – from blog posts to product descriptions using AI. Sign up for $1/month trial period: https://shopify.com/cognitive

With the onset of AI, it’s time to upgrade to the next generation of the cloud: Oracle Cloud Infrastructure. OCI is a single platform for your infrastructure, database, application development, and AI needs. Train ML models on the cloud’s highest performing NVIDIA GPU clusters.
Do more and spend less like Uber, 8x8, and Databricks Mosaic, take a FREE test drive of OCI at https://oracle.com/cognitive

NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

X/SOCIAL:
@labenz (Nathan)
@OfficialLoganK (Logan)
@OpenAI
@CogRev_Podcast

TIMESTAMPS:
(00:00:00) - Episode Preview
(00:02:08) - How many startups did OpenAI kill?
(00:05:50) - Current employee count at OpenAI
(00:06:59) - OpenAI's mission being focused on developing safe AGI to benefit humanity
(00:07:10) - How the GPT Store relates to AGI and progressing agent development
(00:08:22) - OpenAI's strategy to release AI iteratively so society can adapt
(00:10:50) - Safety considerations around the OpenAI Assistant release
(00:11:30) - Capability overhangs and is the internet ready for agents?
(00:14:13) - Why certain agent capabilities like planning aren't enabled yet by OpenAI
(00:15:28) - Sponsors: Shopify | Omneky
(00:17:34) - GPT-4-1106 Preview designation
(00:21:50) - 16k fine-tuning for 3.5 Turbo
(00:25:13) - GPT-4 Finetuning and how to join the experiment
(00:27:53) - Custom models: $2-3 million pricing to build a defensible business
(00:29:48) - Bringing costs down to bring custom models to more people
(00:30:19) - Sponsors: Oracle | Netsuite
(00:33:53) - Copyright shield
(00:35:42) - OpenAI doesn’t train on data you send to the API
(00:36:37) - New modalities and low res GPT vision
(00:37:26) - GPT Vision Assessment for Aesthetics
(00:42:30) - WhisperLarge v3
(00:44:15) - Text-to-speech API: the voice strategy and AI safety
(00:49:20) - Is there an Omni API coming?
(00:50:17) - Reproducible outputs
(00:51:45) - Log probabilities coming soon
(00:53:45) - The evolution of plugins to GPTs: the challenges with plugins
(00:55:33) - GPT Instructions, expanded knowledge, and actions
(01:00:18) - How is auth handled with GPTs
(01:01:04) - Hybrid auth
(01:02:50) - GPT Assistant API Billing
(01:07:58) - AI Safety: redteaming and efforts that went into the release
(01:10:28) - OpenAI Jailbreaks and Bug Bounties
(01:11:57) - The OpenAI roadmap for a year from now

The Cognitive Revolution is brought to you by the Turpentine Media network.
Producer: Vivian Meng
Executive Producers: Amelia Salyers, and Erik Torenberg
Editor: Graham Bessellieu
For inquiries about guests or sponsoring the podcast, please email vivian@turpentine.co

#OpenAIDevDay #OpenAI #GPT #ChatGPT#artificialintelligence #ai #samaltman

Full Transcript

Transcript

Nathan Labenz (0:00) How many startups do you think you guys killed yesterday?

Logan Kilpatrick (0:02) There's so much opportunity for startups being created. And I think like the reality is if your startup was like a super, super thin wrap around something, like you didn't really have a startup. You just had some like project that you were working on that was building off of an API. And like my takeaway from yesterday is the barrier to entry for new startups and for people to build like extremely compelling products has never been lower. And there's never been more amazing technology to build with. I think we probably created thousands and thousands of new startups. If you take a step back, if you're a venture backed, or even if you are later series, ABC, whatever it is, and have raised enough money to do this and have cash, it actually makes a lot of sense. This is an opportunity to truly build a defensible business. You can have a model that's you know, trained on, I think in order of like billions of tokens, which is like not something that anybody else has access to. And again, it requires a sufficient investment to make that happen. But I do think that it's like, this is what people have been asking us for. Like how can we build a defensible business using your products and services?

Nathan Labenz (1:08) Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, Together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Eric Torrenberg. Logan Kilpatrick from OpenAI. Welcome to the Cognitive Revolution.

Logan Kilpatrick (1:34) How's it going, Nathan? I'm excited to be here.

Nathan Labenz (1:37) It is an exciting time. You guys have just launched a whole smorgasbord of stuff at OpenAI. Congratulations on that, and thanks for taking a little time to dig into the details and all the little nuances of it with us. I wanted to start with a couple big picture questions and then really get into the weeds. So starters, obviously going into yesterday's event, the sort of meme sphere and certainly coming out of it too was like, how many startups is OpenAI going to kill with this release? So, you know, kind of a tongue in cheek starter question. But if you had to put a number on it, how many startups do you think you guys, killed yesterday?

Logan Kilpatrick (2:12) I've been seeing this trend on Twitter and it it I get such a such a visceral negative reaction to it. I really do think there's so much opportunity for startups being created. And I think the reality is if your startup was a super, super thin wraparound something, you didn't really have a startup, you just had some project that you were working on that was building off of an API. And like, of course, as things change, like you'll be disrupted in some capacity, like true genuine startups like are thinking about from the ground floor, how do I build a defensible business? And like, you have to go and like build an innovative product to do that. So my takeaway from yesterday is the barrier to entry for new startups and for people to build like extremely compelling products has never been lower. And there's never been more amazing technology to build with. I think we probably created thousands and thousands of new startups and gave the ones who are building on our platform today, you know, lower prices, a ton of new features. They can go and compete with, you know, other big companies. So I'm really excited. I think there's never been a better time to build something cool using AI.

Nathan Labenz (3:22) Yeah. Well, no doubt about that. So we're going to get into all the tools. Just another angle on that same question. It does seem like there is a bit of a shift where, you know, not even a year ago yet, and it's crazy to think about. Right? ChatGPT, I understand was kind of motivated at OpenAI by this sense that, like, hey. These models are getting pretty good, but we're not seeing kind of the traction or the sort of everyday use that we would, you know, expect to see or hope to see. So, like, let's put a product out there and try to kind of kick start that. That obviously really worked. And now this you know, a year later, it seems like there's a bit of a shift toward more of an Apple, like, not first, but best, maybe you could say, kind of product direction where, you know, the community has been out there kind of exploring all these things of rag and, you know, agent frameworks and all that stuff. And here comes OpenAI with, like, presumably, by a significant margin, would imagine, the best in class version of that. So how would you kind of frame that shift? And is that something that we'll see more of in the future from OpenAI? Very curious about that.

Logan Kilpatrick (4:24) I think part of it is and the community has an advantage in the sense I think this is probably the same thing for Apple. Like, I think we would love to be first to a bunch of things. I think the reality is, like, for us to build something, it has to have, like, you have to be able to scale it to the hundreds of millions of weekly active users of ChatGPT and like really make sure that it fits into like what is the core strategy from a product perspective. So there's always like 1000001 different things that we could be doing, And it's just hard to ship things quickly when you are building at that scale. Like there's just like a lot of really difficult technical challenges to solve. So it's not like we've been sitting there, like, scratching away at like every last little detail being like, ah, we wanna make sure that we're putting out the very best thing and we don't wanna move quickly. It's like, really feels like at least to me internally that like we're always in an all out sprint all the time. And like, as soon as something is ready in the pipeline, like it's coming out to people. The reality is like the team has just been growing and there's not enough like human hours in the day to do all the stuff. I think people forget, like it's not like we didn't ship anything for the last 9 or 9 or 12 months. Like, there's been a constant slew of things. It's just there's a lot of work that has to go into making these things happen.

Nathan Labenz (5:38) Yeah. Even in just the last month or 2, I mean, it wasn't that long ago that 3.5 fine tuning dropped and, you know, people were like, boy, if they're releasing this ahead of DevDay, DevDay is gonna have some real fireworks. What is the employee count at OpenAI now? Do you know the current number off the top your head?

Logan Kilpatrick (5:55) Yeah. I know I know the general number. I think when I joined last year, it was something around, like, 300, and I think we're well and beyond that. I think the only general comment I make, I'm not sure how public facing the internal numbers are, is that I think the estimate on LinkedIn is a little bit higher than what it actually is in practice. But we've grown significantly. It's been awesome to see it, and like throughout these, we needed to because there's so much work that has to happen. But, yeah, it's interesting to see the shift that's happened with going from a, you know, when I again, when I joined in, I wasn't even that early, a 300 person company to now something that's much, much larger than that and obviously is going to continue to grow in the next year or so. It'll be interesting.

Nathan Labenz (6:38) Yeah. So obviously, of the keys to scaling an organization is having a strong sense of kind of mission and high level strategic clarity. I'm sure not unrelatively, OpenAI recently updated its core values. And there is now a statement, even I'd say more explicitly than previously, that the company is focused on AGI. AGI focused, we are committed to building safe, beneficial AGI that will have a massive positive impact on humanity's future. Anything that doesn't help with that is out of scope. All these things that got launched yesterday, a lot of details, a lot of features, the GPT store. How are you guys thinking about that as being on the path to AGI? Because I could certainly imagine somebody saying like, wait a second, a GPT store? Is that really on the, you know, kind of critical path to AGI? But I know that's obviously something you've thought deeply about as a team. I'd love to kind of hear how this is all contextualized as part of that grand mission.

Logan Kilpatrick (7:34) Yeah. It's a good question. So I think, like, to put it, like, very precisely, like, OpenAI is not we don't exist to make the best chat platform. We don't exist to make the best API. These are mechanisms in which help get us to the point of AGI that benefits humanity. I think the piece around ChatGPT and the piece around the API is another angle of our philosophy, which is we need to get this technology out to people iteratively. And you heard Sam say this actually in the keynote yesterday where Assistants and GPTs are really our first shot at and moving in the direction of agents. And we didn't want to go all the way to the sort of autonomous agent type workflows that you see and some other products and services today, really in the vein of we need time for society and for people to adapt to these changes. And I think that's really, ChatGPT today is really like the mechanism to allow people to see where this technology is going and how to adapt. Like, I think Sam also made this comment sometime in the last few years, but like it is entirely possible that OpenAI could have continued to do the research that we were doing and build what could become AGI in the confines of OpenAI, not sharing it with the world. And in 5 years, we just sort of emerged with this thing that works. That's actually like a plausible outcome. Actually think that someone, whether us or someone else could actually do that. The reality is like, that's not the best thing for humanity. Us showing up with this thing that's now all of a sudden super powerful and no one's ever seen ChatGPT, no one's ever used GPTs or any of the future products, it would cause a tremendous amount of turmoil for everybody. I'm like, that's not great. So I think that's really the strategy is like the core research is what's going to enable AGI to happen, But ChatGPT and our API are the way of making sure that that technology benefits everyone and everybody gets a chance to see what's coming.

Nathan Labenz (9:35) And shape it too. The 2 quotes that stood out to me were, we think it's important to give people a chance to experience and adjust to this technology, which you're speaking to. And then also we want more people to shape how AI behaves. I thought that was also pretty telling. Obviously, there's been a number of different things there, but the GPT store is kind of seemingly a way to really open up access to how you can shape the behavior of an AI assistant without even necessarily needing to get into the API docs. So that's definitely very interesting as well. So I'm definitely somebody who's pretty concerned about AI safety big picture, but I've definitely grown a lot more convinced over time by this kind of capabilities overhang argument that like things have to kind of get into the public sooner rather than later or else, you know, it gets to be super disruptive when there's that huge or potentially disruptive when there's that huge delta. So let's kind of get into the weeds and help close the, capability overhang gap. The big themes that I saw, I sort of wonder if you would add anything to it. Multimodality, obviously huge. I'll call it accessibility via price drops across the board and also the Turbo speed up. I'll call it platform expansion as kind of a shorthand for saying these formerly kind of adjacent complementary things like vector database and runtimes, which have existed a little bit in ChatGPT but haven't been to the API. Now that's kind of available for everyone. And then finally, agents. And we'll get into it all, of course. But I'm curious, maybe for starters, you said something interesting a second ago on not going all the way to autonomous. How are you guys thinking about how far these assistants, not yet maybe full agents, how far are they kind of intended to go right now in terms of their autonomy?

Logan Kilpatrick (11:28) The context around how far the limitation of these models, how far the assistants can go and how far GPTs can go today, really there's no mechanism for them to do like self planning. And if you look at like a lot of the like traditional like agents products on a GPT, maybe a GI, whatever they are, they do this whole like planning process and they say, I'm gonna go and execute these tasks and I'm gonna sort of follow-up afterwards. GPTs don't have that capacity today. It's really still the core, like send a message, get a response, send a message, get a response. So the like long term queuing up of actions, I think there's also the sort of missing link towards the more agent workflow is like, I'm gonna go and run these things in the background, maybe without you self directing me to do it. Like, I'm kind of just gonna go and randomly check and see whether your email has been updated or whatever that might be. So I think those are really the big missing pieces right now. I think with the assistance API, you could probably build some of these yourself if you really want it to. I do think like we'll eventually get there in the API. We'll eventually get there with GPT, assuming that it can be done in a safe way. I think there's also like that question of reliability. Like, I don't think that anybody has really like hit the home ground yet with agents products, whether in an API or a consumer product. And I think that's another missing piece of like, it's probably just going to take us a little bit more time and others as well to figure out like, how do we actually do this reliably? Like make sure that it really provides value. I've tried a bunch of the agents products and I'm just like, this basically takes as much work as if I was going to do it myself. So like, what's really the point of using this? Like, it's just kind of, if anything, you're just like making yourself, at least the current state today, my feeling personally is like, you're making yourself like liable for more risks. Like I'm giving my credentials to some system where I don't really get to see every single step that it's taking and it might not really do the things that I want. And I think there's just like a little there's like some small little missing piece of refinement for that workflow to get really, really good. And then people will just Yeah, I think it's going to be the explosion of those agents. I also think like the challenge today with agents is the internet and society are really not ready at the moment, I think, for like a 100,000,000 or 1000000000 agents. And kind of like, know, the ChatGPT has a 100,000,000 weekly active users. If all of a sudden, every single 1 of those weekly active users has access to agents who can go out and do things on their behalf, like The internet and products and services are just not ready for that. I think it's going take a little bit of time and hopefully GPTs are a nice way for companies and people to start thinking about this. Interesting. Is this sort

Nathan Labenz (14:13) of a conscious decision to say we're gonna, you know, not train this kind of planning capability past a certain point right now? Or is it just that, you know, OpenAI also hasn't quite, you know, figured out how to do it? Most of that kind of suggested that it probably is, like, within the scope of what OpenAI could do, at least in the near term, if not already.

Logan Kilpatrick (14:39) Yeah. I I think it's a I think it's a little bit of both. Like, my guess is there's just some, like, core technology things that need to be figured out. Like, Even with what was released in the last couple of days, it was an all out sprint to get there. So the scope had to be constrained in a lot of ways. I do think that there needs to be deeper thought put into actually rolling these things out and making them accessible. So I think that's like a huge angle of it. I do think the other angle is like actually making sure that the core technology works really well. I'm less concerned about that piece just because our team is incredible and can continue to make amazing things. I think the deeper challenge is getting everyone else in the world on board with this idea and comfortable with it. I think that's just a very difficult problem.

Nathan Labenz (15:21) No doubt. There's a lot of adjustment that people are gonna need to make over the not too long time horizon. Hey. We'll continue our interview in a moment after a word from our sponsors. So let's kinda start, you know, with some fundamentals and then, you know, work our way back up, you know, I think kind of a complexity ladder towards some of the highest end, you know, higher up the stack as we go. Starting just with the models. GPT-four 11 o 6 preview. First question, what is meant by preview? I guess I noticed that there's, like, a 100 requests per day limit right now. So I guess that's probably the main thing. Right? Obviously, you can't deploy that. Anything else that we need to be aware of on the preview designation and any sense of timeline for when that limit goes up?

Logan Kilpatrick (16:05) Yeah. The the preview designation is just intended to tell people that this isn't the final version of the model. So this was a of, like, a essentially an early release candidate of of what the the finalized model will actually look like. So there's still, like, a bunch of, like, actual modeling work, post training work, and other things that need to happen to get this model, like, to the standard of something that we would normally release in the API. So that that's that's why that preview designation exists. I'd imagine in the order of, like, weeks, hopefully, less than months, it's it's available as, like, the regular general version and no longer accessible through through preview. You know, we we need to make sure that we get feedback from people as we release these new models and making sure that this especially given how how many changes are baked into this model, that it's still performing to the level that people would expect. So hopefully you can try it as a drop in replacement for some of your existing use cases, see how it performs, make sure there's no major regressions. And then if there are major regressions, come and yell to us and let us know so that we can we can explore and and make sure that the model is is gonna be generally good.

Nathan Labenz (17:09) You know, yesterday, the the sort of highest level statement on performance was just it's better than GPT-four. How is that being characterized? Like, I I didn't see maybe this was in a breakout session. Funny enough, neither you nor I were there in person yesterday, for personal reasons. But is there, like, a model card coming out? Are we gonna should we expect to see, like, the MMLU score and the, you know, bar exam score in the same way that we got for the original GPT-four? What's the outlook there?

Logan Kilpatrick (17:38) Yeah. I think so. 1 example is and I don't I don't know if how specifically this generalizes to to to 4 turbo, but 3.5, you we can use this as a as a proxy. The quote is for instance, our internal eval showed a 38% improvement on format following tasks, such as generating JSON, XML and YAML. The point of feedback and the question around releasing benchmarks, I think is 1 that we've heard. And I think with our last model API release, I think there was, we didn't do a good job of like fully characterizing, what are the performance differences between these models? The TLDR is I think we want to release more of these benchmarks. I don't know yet if the full plan is to release a bunch of benchmarks for the, like when this model comes out of the preview version, I would hope so, but don't have a definitive answer right now. I also think the nice part is like hopefully, people can can run some of these evals as well and, like, keep us honest. And I think people have done that with previous model iterations as well.

Nathan Labenz (18:40) Yeah. It'll take a couple days worth of your 100 requests per day to get the MMLU suite run, but I am sure somebody is on it somewhere. Dan Hendricks and and team, I'm sure, have already started, pooling their requests, and, you know, it can't be too long now before we'll have that score. So on the vision side, it seems like the the note there is it's basically the same model under the hood. It's just kind of a presentation, like, system message difference. So we basically should expect 1 in the same. Given that, is there a reason that that it's too I guess it's the system message, but I'm a little confused as to why it's even presented as, like, 2 different models via the API.

Logan Kilpatrick (19:21) Yeah. The the big, background context is it's it's 2 separate, like, tech stacks, if you will, to get it to work right now, and and there just wasn't enough time to to they they used to be completely independent separate systems, and there wasn't enough time to unify things, which is why they're being presented as 2 different options because there's, like, some, like, very slight nuance differences. But behind the scenes, it is the same thing. It will end up being unified into 1, and and people should should look at it and expect it to to behave in the same way with this the caveat that the with Vision, there's, a very small addition to the system message, which could, again, technically, like, throw off some of the props that people have in in some cases.

Nathan Labenz (20:04) Next up, 3.5 Turbo 16 k fine tuning. This is the first fine tuning that supports function calling. Correct?

Logan Kilpatrick (20:13) So when we originally released 3.5 Turbo fine tuning, it didn't support function calling, then we fast followed, like, a couple weeks later with function calling. So the last version did support it. This is the first version that supports 16 k out of the box.

Nathan Labenz (20:27) Definitely very interested to crack into that. We're using 3.5 fine tuned now with the Waymark, you know, core script writing model. And, you know, we've got a lot of stuff that we're throwing at it. So we do, you know, sometimes have to even truncate our inputs to make sure we stay within the 4,000, but the 16,000 is gonna obviously open that way up and create the possibility for, like, downstream, you know, iteration, chat style back and forth, function calling. So that'll be a a big, you know, expansion of possibility just for us with our, you know, pretty narrow product. So excited to get into that in more detail.

Logan Kilpatrick (21:02) It's been crazy to see how much value people have gotten from 3.5 fine tuning. I think 16 k has been like the number 1 ask for people. And it Yeah. I think now, especially with extra context, you're probably getting even closer to GPT-four level. Like, I think we've already had a bunch of customers. I think we shared some of these use cases out of people getting like GPT-four like performance using 3.5 turbo. And I think now it's the gap is even gonna be closer because you have more tokens to work with. You can give more intricate examples. And I think that was like 1 of the core limitations before is like, there just wasn't enough context in some cases to like really give everything that you wanted them all to do plus improved instruction following and everything else that's coming in this latest 3.5 turbo version. I think it's Yeah. The gap between the models in some cases continues to shrink, which is really cool.

Nathan Labenz (21:52) Yeah. I wonder if you have any big tricks. And I know you guys are doing a I guess I don't if you're cosponsoring it, but there's a scale webinar tomorrow that I'm looking forward to attending that's on kind of best practices for fine tuning 3.5. For me, the number 1 thing that has driven performance is training it on reasoning. So the huge difference between if I just give it good examples and ask it to, you know, follow you know, obviously, the implicit ask of fine tuning is learn from these examples. But then layering in the reasoning and explaining why it's writing, in our case, you know, the script for the video in a particular way is a huge delta in performance. Any other tips along those lines that are, like, kinda simple but, like, drive a lot of value in the 3.5

Logan Kilpatrick (22:37) fine tuning? Yeah. We actually had a breakout session on this at DevDay. It was recorded, and and it should hopefully be available in in, like, the next few days, hopefully, by the end of this week. I'll defer people to that recording. I know John, who's 1 of the folks on our Fine-tuning team and Colin, who actually is maybe the person who's doing that webinar tomorrow, were the 2 speakers for that. And I saw a bunch of the chatter online. People seemed really excited about it. I haven't actually seen the talk, so I'll defer to them because they're the sort of experts in the space right now. But they went through a bunch of like customer use cases and the pitfalls and all the things like that. So it should hopefully be the perfect answer for people looking for tips like that.

Nathan Labenz (23:17) Cool. Okay. Well, then the next, there's even 2 tiers of, you know, kind of fine tuning possibility beyond that. GPT-four fine tuning coming soon. Is this the 1 28 k that will be the base? I assume so. Can you get me into the program? And, more generally, like, what kind of frontier use cases are you looking for as you're looking as you're, you know, gonna pull people into that experimental program?

Logan Kilpatrick (23:43) Yeah. The context for the experimental program is just wanting to make sure that there's a bunch of safety concerns, obviously, because GPT-four is much, much more powerful than 3.5 in the way that it's trained and with more context and things like that. So we just need to be a little bit more initially cautious about rolling that out wider. It's also more expensive. So I think people, you really need to be somebody who is pushing on the boundaries. And part of the concern was a bunch of random developers are going show up and kick off a bunch of fine tuning jobs and not really have perspective on the cost for GPT-four fine tuning, which is more expensive than 3.5 as you would assume. And then be like, Oh wow, I just paid all this money. And by the way, like I probably am not a machine learning expert and like, didn't make my training data right. And like, now I just paid all this money for something that's not super great. So I think there's like, from a product perspective, a little bit of concern of that as well as like, you just have to be more thoughtful as you're spending more money and making a more powerful model about like how you're actually training it. And I think it takes folks who are really thoughtful about this, like you and the folks at Waymark, to do it well and actually have maybe a little bit of data science and machine learning background. I don't think there's no specific use cases that we're looking for on the 4 early access side. It's really about just, like, deploying it safely and and specifically to people who have already used 3.5 fine tuning. Like, you should really try 3.5 fine tuning for your use case, especially now with 16 k. So you won't even have the option to opt in to like apply for fine tuning if you haven't used 3.5. And this has been 1 of the mistakes that we've made in the past with our like general wait lists that we've done is like everybody just says, Yeah, I'm interested in this. Like, that would be cool to do for fine tuning without actually having spent the time or looked at the data and all those things. And like, it just makes it really, really difficult for us when 1000000 people say that they want something and we're like, well, we don't It's like, how do you filter through those people? And it's a lot of work. So hopefully this new approach of like gating it based on actually having used previous fine tuning will make sure that we get, like, a much higher signal to noise ratio from customers.

Nathan Labenz (26:04) And then beyond that, custom models. This was kind of the highest end thing. I saw via Twitter that 2 to $3,000,000 is kind of the entry level price expectation. I was surprised. My initial guess was $10,000,000 just to start, just given how much, you know, kind of demand I would expect you guys to get for that from companies for whom, you know, money is presumably not much of a constraint. I'm priced out of that way, Mark, but there's a lot of companies there with some pretty healthy balance sheets. So any guidance for kind of what you're looking for there beyond obviously people with the money to spend?

Logan Kilpatrick (26:41) Yeah. I think part of the context with custom models is really wanting to push the boundaries of what is actually possible if you put in a bunch of your proprietary data. And I think there's companies who are building their core product around AI are going to be well suited for this. And I think like, if you take a step back, if you're a venture backed or even if you are like later series, ABC, whatever it is, and have raised enough money to do this and have cash, like it actually makes a lot of sense. This is an opportunity to like truly build a defensible business. Like you can have a model that's trained on, I think in order of billions of tokens, which is not something that anybody else has access to. And again, it requires a sufficient investment to make that happen. But I do think that it's like, this is what people have been asking us for. How can we build a defensible business using your products and services? And I think custom models is gonna be 1 of the ways to do that. I'm hopeful that like, I feel like I deeply resonate with the voice of like the average developer. And obviously most people who are hacking on a project or sort of getting their feet wet in this space don't have 2 to $3,000,000 So I'm hopeful that we'll learn a ton of stuff from this process and like figure out a way that we can bring those costs down to make it more widely accessible to more companies, more startups and people who are interested in pushing the limits of what's possible with these models without putting in 2 to $3.8 But I think we just need to learn a bunch of stuff still. It takes like, actual OpenAI researchers are gonna help you make those models better from end to end. So it's a huge investment from our side, especially given there's so many things that we could be doing, but I do think it's important to help people with these problems.

Nathan Labenz (28:29) Hey, we'll continue our interview in a moment after a word from our sponsors. I thought it was a bargain. It's like if you are a an enterprise that is sitting on a huge amount of data that, you know, you've kind of been collecting and not really knowing what to do with and you've got a $100,000,000 plus sitting on a balance sheet, it's like, there's really I can't come up with anything better to spend than to go into doing something like this. Right? I mean, you talk about what is going to transform your business. What is a threat to every business, but also what is a massive opportunity for every business? I don't know how public company CTOs, CIOs are not, like, basically all, you know, on the waitlist for this product. And I'm not paid to say that, but it just seems like a pretty clear value prop at not that extreme of an entry point.

Logan Kilpatrick (29:13) You know, if you're somebody who's just pushing into this space today, like to go and procure, hire a bunch of people to do the actual model training, get the resources, all those things, you'd probably spend on the and you're like a genuine technology company, you'd probably spend in the order of like at least a few million dollars in CapEx anyway, and likely end up with something that's like not as performance in many cases as what we can hopefully give you for 2 to 3,000,000. So I do think there's that angle of it as well where you can sort of outsource some of that stuff to us.

Nathan Labenz (29:50) Yeah. 0 doubt about that. I mean, that doesn't go that far in machine learning, you know, researcher salaries in today's world and, you know, good luck, I would say, competing. So a lot to go. So keep moving on. So the copyright shield is another big thing. I'm kind of just interested in a little bit of the strategy on this and if there are any footnotes. I believe I saw with Google's policy, which is kind of similar, that there was a carve out, if I understand correctly. Create you know, if you're trying to kind of jailbreak and create copyrighted stuff and then it, like, doesn't cover you. I wonder if there's anything similar here. And also just kind of wondering, like, I guess, what what is the strategy for this? Is this like you wanna get into the litigation, like, sooner rather than later or you, like, totally confident that everything you need to license is licensed? I could kinda see multiple different reasons for trying to put something like this out there.

Logan Kilpatrick (30:40) Yeah. I I don't wanna touch on the specifics because I'm looking at the blog post now and we don't link out to, like, a specific page right now. I think we'll we'll have something that's that's forthcoming that, like, goes into the nitty gritty. So I won't make any sort of definitive statements about it. I think generally the context on why we want this is we want people to feel comfortable that they can bill with us and not have to worry about copyright claims. I think that other providers have done this. I think it's important for us to do this as well. So that, I think going back to the comment you made about like public company, CTOs and stuff like that, I think there's a lot of people who want to build with this technology and they perhaps have other folks who they work with in a legal department or something else like that, who are worried about the risks involved. And it's a limitation on who can use this technology if we don't provide things like this. So the hope is we provide this, we're able to work with more people and bring more people into the ecosystem. I don't think anybody wants to be in litigation, but it's I think it's important. It's an important service to our customers.

Nathan Labenz (31:37) In general, I would agree. Nobody wants to be in litigation, but it does feel like there's maybe a moment here where, you know, the leading developers kind of feel like we need to be party to some of these things to, you know, help shape the the precedence. I know some of that is already happening. Now would maybe be a good time for you to remind everybody that you don't train on data via the API. We've gone how many minutes and you haven't said that yet, so here's your chance.

Logan Kilpatrick (31:59) Yeah. We don't train on data that you send to the API. I think it's it's super important. I think in the context of assistance, we don't train on any of that data either if you upload files, all that good stuff. Like again, in a lot of ways, it's a downside for certain customers because there's people who want actually the models to improve for their use cases. We do have a way to opt in if folks are interested in that. But in general, if you don't want people, if you don't want us to have your data, we won't use it. We don't want to use it. We want people to be comfortable building with the platform. And I'm very happy that we made that change in March. I think it's, you know, I had so many conversations with people who, even though it was like a very tiny sliver of the sampling of the data, it's still, you know, there were folks who weren't comfortable with it, and I'm very glad that we have to deal with that challenge today.

Nathan Labenz (32:47) Cool. So let's get into some of the new modalities. I think probably the 1 that's most exciting for me is image understanding. It just seems like first of all, it's really good, you know, from what I've seen in ChatGPT, amazing. And, you know, for thinking about Waymark, you know, 1 of our longest standing challenges has been we collect all these images off the web for the businesses that are gonna make video, and we use those. But how do you know which ones are good and bad and whatever? It's tough. So we've, you know, incrementally improved, but this seems like it should unlock a a notable step improvement. The price is pretty interesting, like, pretty aggressive, I would say. You know, kinda maxes out at about 1¢ per image. And then there's this low resolution option. You can go all the way down to basically, it looks like 12 images per cent if you do the kind of cheapest version. Love to get a little bit better color on, like, what is that low res thing, like, suitable for? And also the question on aesthetics. Almost no image understanding model that we've seen has a sense of, like, how beautiful is this? It's all about the content and very little about kind of the quality. So my yeah. 2 questions there are, what's the low res thing for? How do we think about kind of that that, you know, order of magnitude price range? And then is there an aesthetic component to it?

Logan Kilpatrick (34:04) Yeah. On the on the low res version, the the principal idea is that we take an image, we reformat it to 5 12 x 5 12. So the easy comparison, I think this is actually in the documentation right now as I was writing it at a wedding in Mexico this weekend. The idea is if you as a human could take an image and reformat it to 5 12 x 5 12 and answer the same question, the model should be able to do it as well. If what you're asking about in that 5 12 x 5 12 cropped image is some tiny little detail that's shown in 1 little pixel somewhere, might not

Nathan Labenz (34:40) be able to pick it up.

Logan Kilpatrick (34:41) But if you're like, you know, here's a 5 12 x 5 12 image, it's a picture of a highway with a car on it. And you ask like, is there a car in the picture? The model should be able to do that. Like you and me as humans would be able to do that. The model should be able to do it as well. So it really depends on like, what is the input image size that you're working with and what is the level of detail that you need the model to actually go through. With high res mode, it goes through on like a patch by patch basis. And depending on the image size and dimensions, it depends on the number of patches that are created. And it goes in and it again, looks at those 5 12 x 5 12, but it like zooms in on each individual section, which is why the cost is higher because it's processing additional patches. So again, the reasoning for folks is look at it as you would as a human. If you can answer that question, you should be able to do it with low res. If you can't and you need the high res version, think about it from that patching perspective.

Nathan Labenz (35:39) 5 12 x 5 12 is the low res or the low res is smaller than that?

Logan Kilpatrick (35:42) Low res is 5 12 x 5 12. Wow. Okay. So it's not super small. Like I think 5 12 x 5 12 is like reasonable. You could take a lot of images and like still understand the gist if you reformat them to 5 12 x 5 12, especially given the price perspective. In a lot of cases, it might be worth just seeing if 5 12 x 5 12 gives you the result that you want for certain use cases before you go and pay more money for high res mode.

Nathan Labenz (36:06) 1 angle I've been kind of taking to all these different things is just kind of comparing the prices to other things that are out there. And on the image side, it seems to be lower than basically anything else I've seen that's hosted. For example, if you go use Lava 13 b on replicate, it seems to be you know, and it's they kind of charge by, like, the GPU second, but it seems like you can do, like, 3 images, 3 images percent, I should say. With the low res, it's more like 12%, but you do still have the output tokens too of, like, what's generated that's kind of included. So it seems like it's, like, competitive or maybe, like, even a little bit cheaper than other options. 1 area where that does not seem to be the case is the DALL E 3 integration. If I understand correctly, that still is at 4¢ per image. And that's, like, definitely way higher than, you know, a lot of the sort of open semi open source, you know, kind of hosted open source options. Curious if there's a if there's a driver there. Is that, like, a licensing deal that's gone on in the background? Presumably, it's not compute. Right? Because, like, I would think DALL E 3 Turbo would, would exist if it was a matter of compute. It's a good question.

Logan Kilpatrick (37:17) I don't also, just to go back to your question before about aesthetic stuff within vision, I haven't tried that out. I would imagine it's like, it has the same level of reasoning and capabilities as like base GPT-four. So like you could probably get the model to like make general assessments of whether something is beautiful or not. I don't know if like, again, it's not gonna be like real because it understand in some sense, the saying is beauty is in the eye of the beholder. I don't know what the How it will react to certain things. You could probably ask it those questions. But on the question of pricing for DALL E3, I would imagine we're And this is just like the strategy in general is like we're priced as aggressively as we can be given the technology, given the compute that's required. And you see this in every API release that we do, where we're always lowering prices. We're always trying to make things more competitive for our customers. I don't know what the specific drivers are for DALL E 3. I know that it's like, again, at the end of the day, it's like a really, really powerful model. So I imagine that there's significant costs that are associated with us generating those images. Hopefully, it gets cheaper, but yeah, I I don't have any I I also think there's like a ton of safety system stuff that's happening around DALL E 3 and and some of the other, like, image modality models. So that could be a a driver as well, but not not a 100% sure.

Nathan Labenz (38:40) Gotcha. Okay. So also announced yesterday, Whisper Large v 3, not yet available in the API, but interesting to note that English is not the lowest word error rate. So definitely a big focus on kind of lots of different languages so much so that some other languages, I think English was like, you know, on 2 benchmarks was like in the number 7 spot and maybe the number 3 or 4 spot. So I don't if you have any comment on that, but that was kind of interesting.

Logan Kilpatrick (39:06) Yeah. I'm super excited because, again, 1 of the core challenges of language models and just like the internet in general is like there is a disproportionate focus on English and it just makes these products and services that are built around language, like, much less accessible in places where English isn't the first language. And I think Whisper focusing on non English is a super positive step. I think it's something that we'll have to do more broadly across all of our models is really make them sort of world class in all these languages that exist around the world. So I'm excited for people to get v 3 in the API because I think it's it's just like a lot of work to spin that model yourself and, like, run it and do all the things that are required. And we we've gotten such positive reception from putting v 2 of Whisper in the API. People really liked it and made it so that anybody could use it and not have to worry about, like, all the GPU stuff that's required to go and run it yourself. But I also think it's important to make some of these things open source, so I'm glad that we did that as well.

Nathan Labenz (40:09) Yeah. Cool. I've seen some interesting demos already too of, like, people changing language kind of mid audio file and handling that seamlessly from what people are reporting. So some very cool possibilities there. Obviously, other side of understanding audio is generating audio. So the text to speech API, here, pricing seemed pretty aggressive again comparing the 1.5¢ per thousand characters or 3¢ per thousand characters for the HD audio to 11 labs, it's like a full order of magnitude cheaper. They're, like, in kind of 15 to 30¢ per thousand characters from what I was seeing today. So that's a notable price drop that's definitely gonna probably spur some additional price drops if I had to guess. I I wonder if there's if you have any comments on the kind of the strategy of the voices. This is something where I've I've commented many times that, you know, may while it may have been sort of an accident or kind of, you know, not expecting the thing to take off as much as it did, the name ChatGPT, I think, is really kind of a service to the public in that you can't really anthrop or it's not easy to anthropomorphize ChatGPT. It just sounds like a robot, and I think there's something really good about that that, like, it's a constant reminder that, you know, it's not a human even though it may, you know, pass the Turing test or whatever else. I feel kind of the same way hearing the voices. It feels to me like they are at a really nice balance point. And I've been a, you know, connoisseur shopper of different, you know, text to speech solutions. It feels like it's at a really nice balance point where it's, like, still slightly uncanny valley and doesn't sound human. Like, I think I will know, you know, when I'm hearing it, but at the same time, it's, like, very pleasant and kind of natural. Is that what you were going for with that, to try to kind of, you know, still make clear to people, like, when this is an AI voice?

Logan Kilpatrick (42:03) I'm not a 100% sure, to be honest with you. I think you are probably as a as a condesaur of these voices, probably, like, uniquely in the position where you could make that assessment, kind of like how I feel these days when I see like text and it's kind of sometimes easy to tell whether or not it was generated by or not. I think if you look at like the average person who is like not deeply in the AI space, like you could probably show them those voices and convince them that it was a real human. I think that's 1 of the challenges of TTS is that we, and this is explicitly part of our usage and term limits today is you have to tell people that this is not a real human voice. So regardless of whether or not it is not intended to be like extremely human like to convince to a fault humans that they're talking to a human. We want this to be clear that you're talking to an AI. Just like, again, with the terms of service of all of our products, we don't want people to be misled. And again, I can't speak to whether or not that was specifically part of the strategy for making those voices, but yeah, they are extremely good. I was playing around with a bunch of them and I think we'll have more in the future as well, but the existing ones just do a good job of striking a nice balance, as you said. I think there's gonna be so many cool use cases that people come up with for TTS. It'll be exciting here. I was just talking to my younger brother today and he was working on a blog post and I was like, Use TTS, give an audio version of it too. And it's going to be so easy. I think the thing that 11 Labs to my understanding is, Lisa, and I haven't played around with the product a ton, but the nice part for them is I think they have a whole front end setup where like you really, as somebody who's never programmed before, could go in and generate, like take your text and turn it into audio. And I think people under appreciate like the costs associated with doing things like that. And that could be part of the reason why they have some of the higher costs than us potentially. And I haven't actually looked at the numbers myself, but I trust you. So there's always those non trivial expenses. And like we have some of this, if you look at why ChatGPT is more expensive than API, well, there's more going on than just like the bare metal API powering that experience for people. So I think they'll continue to be competitive. Like at the end of the day, our TTS is only available through an API. So like it requires like a sufficient level of customer experience using or customer knowledge to use a curl request or a Python request or or what have you. Yeah. I think

Nathan Labenz (44:39) there's a couple other things too that that mean that the text to speech companies are not killed by this release. First, the voice cloning is a huge 1. You know, I I cloned my voice with both 11 Labs and PlayHT, and that's just not something OpenAI is offering. And seems like it's, you know, probably enough of a kinda hairy ball wax that you just, like, might not get there for a while if ever. And then there's also kind of the emotional direction, which I could see that being something OpenAI might eventually take on. But if you're trying to make games or trying to make characters and really trying to bring kind of an energy to it that's not your sort of kind of NPR narrator or, like, call center agent energy, you know, then that's something that's just not in scope right now with those other companies are already doing, you know, quite well. So I definitely think they will continue to have, you know, a very significant, you know, amount of usage for those reasons. Stitching these modalities together, is there, like, an Omni API coming? Like, it seems like, you know, right now, I have to, like, call Whisper, and then hopefully, I can, like, stream my Whisper tokens into the language model somehow. And then, you know, it will then stream me back tokens, and then I gotta send that to the text to speech. But it seems like integrating that into 1 API where, you know, a stream audio in and a stream audio out maybe with kind of the text almost as, like, logs is the future. Is that kind of what you would envision the future looking like as well?

Logan Kilpatrick (46:01) Yeah. I would imagine we do something like that. And and and mostly because, like, right now, and this is the case across all of our APIs, like, if you integrate multiple things together, you pay the round trip penalty many, many times. And that's just the least good experience that you could have. So it makes a ton of sense. Again, I don't know when that would come, but

Nathan Labenz (46:22) I would imagine we end up doing something like speech to speech. It seems like that is definitely on the natural path. Reproducible outputs, I thought that was really interesting. Is that essentially equivalent to me caching responses on my side? Or, like, what does it add, if anything, beyond kind of if I were to implement my own cache on generations?

Logan Kilpatrick (46:44) The challenge is you even if you cache the generations previously, the generations might be different for the same prompt. So this is fixing that. So like if you're it's attempting to fix that, it's still in a beta right now, but now you can pre specify the seed and you shouldn't for a given it's a given input is the same with the seed being the same, you should get the same output. And the way that you verify that all of these things should be the same as the system fingerprint. Because again, there's much more going on than just that seed parameter behind the scenes. There's tons of things baked in. The system fingerprint for 2 outputs, if it's the same, it should give you the same output. And this just makes like the whole process more more reproducible. So I don't think caching wouldn't solve this problem today. Again, you could cache the inputs and the output and then give people the same thing, but then that's Yeah. You could have done that, but I think it's in the vein of solving a different problem, which is people want to be able to thoroughly test these systems and you can't really do that if you don't know what the expected output is supposed to be. It's just hard to make unit tests and stuff like that.

Nathan Labenz (47:53) Probably biggest surprise to me was the log probabilities coming soon. I had never expected to see that again, at least for the best models, because my thought was you get those log probes, now you're opening up much easier imitation learning for everybody that wants to match GPT-four. I assume that that's still the case, right? So does this mean that basically there's

Logan Kilpatrick (48:21) just no concern about anybody matching GPT-four? Generally, we are concerned about the leakage of intellectual property. I think it's specifically going to be log probs for output tokens, which is slightly different than like pure log probs for input and outputs. I think that slightly changes it. I also think like at the end of the day, like developers have really asked for this and there's certain use cases that just like are not really possible without log probs. So it's mostly for those people. Like for the people who have asked, for the people who want to build products that require this, I'm happy that we're doing it. There's been a ton of interest for like the last 6 to 12 months. It's just like how to do it in a way that balances all the other things is the hard part.

Nathan Labenz (49:08) My read on that is that it really does kind of reflect that all of these WeMatch GPT-four things basically are not close. Because if it was happening that way, presumably it would be a different decision. But I take this as an update that, yeah, all that stuff is kind of bullshit, but you don't have to say that I'm saying that.

Logan Kilpatrick (49:28) I trust your opinion, but I'm not sure if

Nathan Labenz (49:31) Narrow domains. Yeah, a lot of caveats there, but we got more fish to fry. So GPT is an assistant. So this is kind of obviously the big star of the show. Coming into this, the earlier generation of kind of the expansion of ChatGPT was plugins. Plugins, Sam, A, and others had said hadn't really achieved product market fit. I guess for starters, what is conceptually the evolution that you think takes plugins which weren't quite working to GPTs, which hopefully will really work?

Logan Kilpatrick (50:05) Yeah. I think part of the challenge with plugins was that, a, you had to be a developer to make them. B, there was a bunch of system slash product things that we just never had time to like perfectly optimize. There were certain things that just like weren't visible to the user that probably should have been and other things like that. I think GPT is like this perfect evolutionary story of where plugins are today. Really you get like all of the value of custom instructions of all these different modalities. And like, is, you know, if I go back as somebody who was running the ChatGPT plugin store, people were like asking like, how do we get code interpreter with my plugin? How do I get, you know, X, Y, and Z with my plugin? And really go deep on making this like a truly custom experience, owning the experience end to end. You can now do that with GPTs and you can still have the best of both worlds where you take the API that you were using with ChatGPT plugins and make that accessible as an action inside of it. So it's, I think it's like a slight refactor of the overall framing of the problem, but intuitively, if you're somebody who had a plugin, who had users, you can now go and make a GPT and you should be able to do the similar things with a ton more features built in. Like, again, it was a real trade off for customers to be like, Oh, I want to use X, Y, and Z plugin, but now I'm sort of losing out on a bunch of the modalities that I had in my other chat sessions. And the GPT really sort of puts a nice bow on solving that problem. So there are kind

Nathan Labenz (51:39) of 3 elements of this, and then it gets more nitty gritty on the API side, of course, but instructions, expanded knowledge, and actions. Instructions are pretty straightforward. That's your classic prompt engineering. Expanded knowledge is where the platform expansion of bringing the vector database into the platform is obviously being powered so people can upload their own documents, etcetera, etcetera. And then the actions, as I understand it, there's basically 3 kinds of actions. 1 is using the retrieval to actually go get the knowledge out of the data store. There was a really interesting tweet posted from a breakout session that kind of showed that, like, all the tricks had been kind of thrown at that from your hypothetical document embeddings to your re ranking to whatever, least per the metrics on that slide, looks like it's going to perform really well. So I'm looking forward to getting into that more. Then there's code interpreter, which is where the system can write and execute code on OpenAI's runtime. And then there's the ability to call functions back to either that the GPT kind of defines and makes available or on the API side that the developer defines and makes available. So that's kind of all just foundation laying. I think people largely know that.

Logan Kilpatrick (53:04) How do you kind of

Nathan Labenz (53:05) see these things coming together? 1 thing that I was initially playing with it and I was like, Okay, here's the DALL E3 GPT and here's some other GPT, but can I have them together? With the demos too, it was like, okay, here's Zapier and there's a couple of different things happening there. What if I want to use Zapier and something else and my Google Calendar and my Slack? Do I have to tie all those things through Zapier weirdly? Or can I have like all these different things kind of as because that was 1 thing that the plugins did seem to have a little bit at least more clear as of now?

Logan Kilpatrick (53:38) I hear that. I think that that's true. I think with plugins, you could sort of enable 2 or 3 things at the same time and have those available in the same GPT. You can do this with GPTs today. You would just use multiple actions. I think the current product experience, you have to sort of know what the URL for those APIs are. If you were going to use like third party plugins. So it's really more so designed today with actions of like you using your own API and your plugin manifest or OpenAI file. I think it's possible that that changes over time to allow more for the integrating of Gmail. At the end of the day, people really want that. They want to be able to say, I have this super powerful single GPT that does these 5 different things for me. And I go in and I check the different modalities that I want, or I check the different features and services that I want. We're just not there today. Think eventually we'll get somewhere that has that because that's what consumers are going to want, Whether it's actions or some different iteration of that, I don't know for sure, but it's definitely something that makes a ton of sense to have. You don't want every single developer, every person making a GPT to have to go reprogram like their integration with the Gmail API, for example. Like, you just want something that sort of works across all of them and you can just 1 click check that it works and then you're good to go.

Nathan Labenz (55:02) Yeah. So for now, if I'm an enterprise and I've got G Suite over here and I've got Slack over here, basically, I can go and just set all those different things up as actions, you know, kind of mix and match on my own. But I don't have, like, a higher level, you know, click this, click that, and and just kind of naturally import. But we but you can do it if you, you know, have the kind of wherewithal to go list out all the actions that you need.

Logan Kilpatrick (55:31) You can do it, and you could also technically use some of the third party plugins to do this as well. I think the real challenge, and this goes back to like why hopefully GPTs are a better evolution of this is like, you just have to be, when you don't control the plugin yourself or it's not you like writing the code and stuff like that, you just have less control over like what's happening with this data that I'm using or that I'm sending back and forth. And I think there's like a strong incentive for like Gmail hasn't actually made a plugin. Any email plugin is like a third party provider. So you really have to be comfortable with like, I'm making my data accessible to some third party. And like the chances are you don't know who that third party is really strongly. There should be, I think, a bias for companies, making these things to think about like, you should probably make the Gmail integration yourself. It makes a lot of sense.

Nathan Labenz (56:24) I'm very, very careful about who I allow into my Gmail and G Suite more broadly, certainly. How are you handling auth on this stuff? That was, you know, kind of skipped over in the demo, but, you know, with the Zapier 1, for example, how do I, like, connect this to my Zapier? And then on that end, I assume, like, all the you know, if Zapier is integrating to Slack and Gmail, whatever, like, that's kind of within the Zapier platform. Is that right?

Logan Kilpatrick (56:49) Yeah. For the Zapier case, that's true. I think Zapier has done some interesting things. I don't wanna say getting around this, but to enable it in a more seamless way. You do so each individual endpoint as part of an action, can specify the auth for. And actually think, again, going back to like the differences between plugins and actions, this is 1 of the positive things about, or very, very positive. 1 of the many positive things about actions is that you now have access to like hybrid auth. So you can have certain endpoints that don't require auth. You can have certain endpoints that require auth. You can also set certain actions as consequential or inconsequential. And that controls whether or not the user is prompted to like explicitly allow a certain action to happen so that the model's not like doing things on your behalf without your consent. So I think both of those things are like core features that plugin developers have been asking for of like, how can I make sure that users are knowingly sending data somewhere? And how can I make sure that I don't gatekeep everybody at the very first step and force them to sign in and allow them to sort of, you know, get their feet wet with my product? And then when they want to do something like save it or what have you, it forces them into off. So I think both of those things are not possible with actions.

Nathan Labenz (58:07) That saving mechanism, is there anything like that for the retrieval that's built in? Like, could I update a document or sort of add to the files? Yeah. For for GPTs,

Logan Kilpatrick (58:20) you can go in and edit them after you've created. So you can actually change anything. So it's it's the instructions can be iterated on. The the files can be iterated on. You can go in and change your custom actions if you want to. You can turn off certain capabilities. So it's all fully customizable even after you've initially created first incarnation of that.

Nathan Labenz (58:40) The retrieval, again, looks pretty amazing, and the annotations are cool as well. There's some really nice developer conveniences that have been added too, like the files, several new tabs within the kind of just platform website. I'm looking forward to when files have a little preview of the first how many lines of the file as well as just a, you know, maybe breadcrumb. Not that that one's unobvious, but a question on the billing. So it's 20¢ per gigabyte per assistant per day of content backing. Right? Is 1 gigabyte the minimum or is some smaller gradation of storage that is billable?

Logan Kilpatrick (59:21) It's a good question. I don't know off the top of my head. That's also specifically for file retrieval. So it's specifically for uploaded files that get embedded. I would imagine that that's like there's also a breakdown of, like, if you only upload half a gigabyte, I'd imagine you're being charged half that that price. I'm guessing it's not just like you get charged 20¢ regardless of how much data you put in there. And I imagine it's broken down. I could be wrong about that. So I'll test and follow-up afterwards. But I haven't actually tried it to know off the top of my head what the billing looks like.

Nathan Labenz (59:57) I guess Vision over time would be maybe more kind of control over retrieval as well. Right now, it seems like it's very black box. As far as I could tell, I don't have any way to kind of say what I want my chunk size to be or how many records to pull. That does have some implications if I'm pulling a bunch of records and then using that in context because other tokens. Right? So am I right that it's pretty black boxy right now and I guess more to come?

Logan Kilpatrick (1:00:27) Yeah. You should have some level of control because you could tell the model in the user message or in the system message how you want it to perform in those cases. And at the end of the day, it's still GPT-four or GPT-four Turbo in this case. So it will follow those instructions. So if you say, always only answer questions based on the first 3 images that I upload, like it should be able to follow those instructions and ignore some of the other stuff. So you do have that level of control. From like a chunking perspective, maybe that's something that we'll end up releasing. What tends to happen is like, we end up making the thing that is going to be like most usable for most people. And you see this actually fine tuning is a good example of this, where we actually like automatically set the hyper parameters for you. And in certain cases you can override those and like change them yourself. But we're doing it based on the best practices of us fine tuning many, many, many models. And people just tend to like more often than not modify those parameters. And like it ends up being detrimental to their, like it's a net negative for them instead of being like a net positive. So I would imagine it's possible that like the patching or chunking for the images ends up being something that we give people more control over, but it's not possible today.

Nathan Labenz (1:01:40) The fine tuning UI also within the platform site, definitely a really nice upgrade. I love being able to see the loss curve even just to get a little bit of a sense for how long is it dropping, where does it start to flatten off, you know, why is this 1 run, like, just generally lower than the other. Definitely a lot for me to kind of dig into more there and and better understand what's happening with fine tuning. And we can use fine tuned, 3 5 16 k function calling for assistance. Is that right, or is that not yet? You should be able

Logan Kilpatrick (1:02:13) to do it for assistance. I'll I'll double check that it's that it's compatible today, but you fundamentally should be able to Clearly,

Nathan Labenz (1:02:20) this is where it's going. Cool. So a couple just questions on, like, best practices or conceptual stuff for the assistants. You know, I'm doing a bunch of prototyping stuff with this company Athena and the executive assistant space. We got all these clients. They're all very different. And so I'm kind of wondering here, is this a situation where I create 1 Athena assistant and then I provide runtime context differently with kind of our client profile file and maybe some recent history or whatever, but it's all 1 assistant with a thread per client? Or should I be thinking about it more as 1 assistant per client? And how would I know the difference?

Logan Kilpatrick (1:02:58) Yeah, that's a really good question. I think it fundamentally depends on whether or not each assistant would have different instructions and different capabilities. You could make images or you could make files accessible at a specific thread level. So like you can, if that's the only thing that you wanna change between them, then that would make sense. I do think it probably you probably would want to like tailor the assistant on like an individual basis, or at least is the thing that I would I would consider trying. You just have to be more general and, like, it'd be more useful to, like, take all the context that you have about somebody and, like, put or about the whether it's the actual person who's the assistant or the person who's the the client and, like, put that into the the system instructions for the assistant specifically. So you probably just lose a little bit of customs customization if you try to make, like, a general assistant. But I do think that would be, like, a worthwhile starting place and then, having an option to go in and customize deeper potentially as a next step.

Nathan Labenz (1:04:02) Okay. Cool. Makes sense. Definitely a lot of discovery ahead on that front. Safety. You mentioned safety. There wasn't too much talk about safety yesterday at the keynote. Could you give us any information on, like, the red teaming that went on behind this? I mean, you know, I don't know. OpenAI has obviously been very engaged in policy, has made White House commitments. So per those White House commitments, if nothing else, like, clearly, there was some red teaming. I You don't know if there was, like, a report submitted to the government or if there will be any sort of, like, system card coming, especially, you know, with this kind of assistant paradigm. But it does seem like, you know, probably a lot went into that, and I'd I'd love to hear anything about it that you could share.

Logan Kilpatrick (1:04:43) Yeah, it's a good question. If you look broadly at what assistance is, it's not fundamentally something that's a different capability than has existed previously. It's really chat like completion strung together with a database and then a bunch of other modalities that we already had accessible other places now accessible in assistance specifically. So we've red teamed and done a lot of the safety work for each of those individual modalities. On the I think the thing that I can publicly point to from a safety perspective is around vision. So we did release a vision safety card, GPT-four Vision that folks can go out and read. It's available on the OpenAI website and there's like a full 18 page breakdown that goes through all the questions and challenges around Vision. I think it's actually really interesting if you want to get a sense for like the things that we've thought about, the high level problems of like trying to make a general reasoning system. And it's incredible to see the work. And I think there's also like a page that goes through like all of the people that we've worked with to make vision come to light. And if you think about like, what was the limitation from getting Greg's demo of Vision working back in March to actually making it accessible as a first party and third party products in our API, it's like all of that safety work really is like 1 of the huge things that we've done. And again, it follows the same trend of what happened with GPT-four where GPT-four was ready long before the model was actually released to the public. And there's just so much safety work and thought that goes into how to build systems in a scalable and safe way. I'm happy and, yeah, huge kudos to the teams that have done the work and all the red teaming that's happened.

Nathan Labenz (1:06:28) Cool. Yeah. I'm going to go in and see what I can do on the jailbreak front. A while ago, was a bug bounty program or kind of a teaser of that. Is that actually happening? Do I get paid for jailbreaks now?

Logan Kilpatrick (1:06:40) Jailbreaks is a good question. I don't know if jailbreaks fall into the considerations of the bug bounty program. I believe the bug bounty program is constrained to you find an exploit to access 1 of OpenAI systems or something like that. That program is definitely live and accessible and we definitely find bugs and people definitely get bounties, which is awesome. I don't know if jailbreaks are a part of that or not. We use the platform that everybody uses for that, Crowd something. I forget the name escapes me at the moment, but it's the same platform that every company uses for this. So it has all the eligibility details and it, I believe, will say whether or not jailbreaks are 1 of the things. My understanding is that they're not because it's not It's like more of like a system level, like security vulnerability rather than like a model behavior thing.

Nathan Labenz (1:07:33) At the next roundtable about this, put 1 in for me on moving jailbreaks into the eligibility category for that. I honestly think it would serve everybody pretty well because Lord knows there's just insane surface area here and you need a lot of people to kind of go out and explore weird spaces. So a little incentive, I think, is pretty helpful.

Logan Kilpatrick (1:07:53) I think that's fair.

Nathan Labenz (1:07:54) Yeah. So the last thing is, at the end of the keynote, Sam said, this is all going to look quaint compared to what OpenAI is cooking up for us now. I mean, that's pretty insane. Obviously, I know you aren't going to tell us what we're going to have a year from now. I guess my real question is, to what degree do you guys even know that? Do you have a roadmap that is kind of well established in your own minds for the next year? Or is it going to be the kind of thing where you've raised the platform level? Obviously, now another thousand kind of things, if not 1000000 things, are going to kind of blossom on top of that platform and then take all that into account and kind of figure out what the next platform raises? I guess it's probably both, but how do you think about that balance?

Logan Kilpatrick (1:08:39) Yeah. I think the research roadmap is both more laid in stone, but also less predictable. And I think that's going to drive a lot of the, like, what ends up being released is like, what research are we able to do and what are the new sort of breakthroughs that we're able to achieve that then enables us to bring products to market. I So think that's like the biggest open question. The research team knows where they're making bets as far as like because they actually have to allocate compute to these things. They know in the top of the head, like, where are the 10 places we're making bets and what can those things potentially yield? I think the real question is like, what do those things actually work and result in like a amazing new product modality, whatever it is, and making that accessible? So I think it's I have less of that insight because they're actively doing the research right now and we don't yet know, but yeah, it'll be Every time after we do a big release like this, I'm like, there's nothing left to ship. There'll be nothing. And just talking to you today and seeing the announcements yesterday and listening to a lot of the conversations, like there's so many things that are yet to come that are going to be super, super important. And it's a lot of obvious things too. It's not things that are crazy groundbreaking, but are going to lead to people to be able to do the incredible, crazy groundbreaking products. I'm excited excited about that.

Nathan Labenz (1:10:02) Fast times at OpenAI always and, in the AI space more broadly as well. We'll be digesting this for, you know, weeks, if not months to come. And, for now, I just wanna say again thank you for all the time and all the, clarifying answers. Logan Kilpatrick, thank you for being part of the Cognitive Revolution.

Logan Kilpatrick (1:10:22) Thanks for having me, Nathan.

Nathan Labenz (1:10:23) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

OpenAI DevDay: Beyond the Headlines with Logan Kilpatrick, OpenAI's Dev Relations Lead

Watch Episode Here

Video Description

Full Transcript

Transcript

Nathan Labenz

Read next