Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools
Composio CTO Karan Vaidya explains a smart tool platform that lets AI agents access tens of thousands of tools across many apps, discussing discovery, security, feedback loops, and how robust skills reduce model lock-in for job-like agent use cases.
Watch Episode Here
Listen to Episode Here
Show Notes
Karan Vaidya, CTO of Composio, explains how their “smart tool” platform lets AI agents access over 50,000 tools across 1,000+ apps through a single interface. He details how Composio handles tool discovery, authentication, sandboxes, and logging, and how an AI-powered feedback loop continuously improves tools in real time. The conversation explores avoiding model lock-in through robust skills and instructions, translating capabilities across model providers, and why the best agent use cases look more like full jobs than isolated tasks.
Google: Try Google's latest and greatest model, Gemini 3.1 Pro, in AI Studio (https://aistudio.google.com/) or the Gemini app.
Sponsors:
Tasklet:
Tasklet: Build your own Cognitive Revolution monitoring agent in one click.
Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai
VCX:
VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com
Claude:
Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr
CHAPTERS:
(00:00) About the Episode
(03:38) Special Sponsor
(05:10) Composio overview and harness
(10:20) Users, trust, security
(19:45) Sandboxes and execution (Part 1)
(19:53) Sponsors: Tasklet | VCX
(22:46) Sandboxes and execution (Part 2)
(28:07) Smart MCPs and skills (Part 1)
(34:25) Sponsor: Claude
(36:38) Smart MCPs and skills (Part 2)
(44:10) Context, access, upgrades
(54:05) Skills and model lock-in
(01:03:51) Memory and agent tools
(01:09:21) AI and SaaS disruption
(01:20:20) Agents, costs, labor
(01:31:18) Monetization and interfaces
(01:36:13) Episode Outro
(01:39:56) Outro
PRODUCED BY:
SOCIAL LINKS:
Website: https://www.cognitiverevolution.ai
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathanlabenz/
Youtube: https://youtube.com/@CognitiveRevolutionPodcast
Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk
Transcript
This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.
Introduction
Hello, and welcome back to the Cognitive Revolution!
Today my guest is Karan Vaidya, CTO of Composio, a platform that allows AI agents to access more than 50,000 tools, spanning more than 1,000 apps, all through a single interface, and which is one of the best examples of the "smart tool" pattern that I've been watching out for since the MCP paradigm was introduced.
This is a sponsored episode, but as always, I have played around with the tool for the last couple of weeks, and it's clear to me that Composio does address several real problems.
For starters, core platforms like Gmail, Google Drive, and Slack don't make it easy for do-it-yourselfers to grant access to AI agents. The number of clicks required to get started is a serious barrier for casual users.
Most other tools are simpler to connect, but many aren't popular or well-documented enough for AIs to know how best to use them from the start, and sometimes quite a bit of iteration is required to get things working well.
And looking ahead, as people delegate larger and larger projects to their agents, those agents often need tools that the human never anticipated.
All of this is indeed made much easier by simply giving your agent access to Composio, which allows agents to express high-level intent, identifies the right tools for the job, and provides authentication, execution sandboxes, and logging infrastructure that few developers really want to build on their own.
In this conversation, we get into the details of how Composio works, and how they are delivering on the "smart tool" promise by using an AI-powered continuous improvement process that can detect when a tool isn't working for an agent, generate a new version in real time, and swap the upgrade into the agent's context – and which, over time, automatically identifies and diffuses successful patterns across the entire Composio customer base.
One of the most interesting arguments Karan makes is that excellence in tooling and skills can help developers avoid model lock-in. The idea is that while models have different default behaviors, they are all very good at following instructions these days, such that if you have very thorough instructions, you can probably get similar performance from any frontier model. And for cases when that doesn't work, Karan and team are also working on meta-skills to translate skills from one model provider to another, reducing switching costs even further.
Beyond that, we also hear about Karan's favorite agent use cases – which, notably, look more like full jobs than discrete tasks; his perspective on which technology companies are gaining strength from the AI wave, which are most threatened, and how sticky agent products like Intercom's Fin will prove to be over time; his thoughts on memory platforms, payment frameworks, and other tools built specifically for AI agents; and how Composio works today, which includes individual engineers managing 10s of AI agents, and for the team that manages Composio's own agentic pipeline, a token bill that exceeds human payroll.
For me, this conversation couldn't be more timely. I've put in the work to curate the context that Claude Code needs to serve as a second-brain and capable assistant, and it's become my go-to interface for just about everything I do on a computer. The next level up will be to get agents doing large-scale projects autonomously on my behalf, and as I enter this next phase, Composio will definitely be part of my stack.
I'll report back on how I'm doing, but for now, I hope you enjoy this conversation about building smart tools for AI agents, with Karan Vaidya, CTO of Composio.
Main Episode
Nathan Labenz: Karan Vedia, CTO at Composio. Welcome to the Cognitive Revolution.
Karan Vaidya: Hey, Nathan. Thanks for having me.
Nathan Labenz: I'm excited for this conversation. For folks who don't know what Composio is, I've been playing around with it a little bit over the last couple of weeks, and I've come to think of it essentially as a Swiss Army knife for AI agents. Obviously, we're all developing agents for a wide range of use cases, some of which are, you know, ad hoc, Minute by minute assistance. Others are built in in much more intentional and structured ways into products. But all these agents need tools. And some of the tools we have the time and the luxury of building out in a really intentional bespoke way. And then there's just a ton of other things that a lot of people have common needs for. And this is where I see Composio coming in with a sort of ready-to-go thousand tools that you can plug into your AI agent and give it a much broader reach than it would have if you were building out every tool one by one. That's my takeaway from getting it under the hood a little bit. How do you like the description and what would you add to that for starters?
Karan Vaidya: So you're on point. in understanding the problem that we solve, that we provide thousand plus apps, 50,000 plus tools to your agents, to anybody building agents. But that's not the, like the final solution. Because if at this point where like in the LLM, uh, journey we are, if you provide thousand tools to the agent, it will probably use the wrong blade and suicide via context overload. So that's where, uh, we are essentially building the whole agentic tool harness. We are the, what I kind of call Composio is agentic tool execution layer. So the whole hardness for tool execution that while building agents that you would need to develop or you would want to give your cloud code codecs to, that's what we provide. Uh, that includes like a few meta tools so that you don't have to put in thousands of tools to your agent. Uh, it includes managed authentication, authorization, giving the right scopes to the LLM. It like, like the problem that I just mentioned, you can't give thousand tools to the agent. So we do just in time tool discovery. That's one of the tools and dynamic tool calling around it. So the only the right set of tools that the agent needs for a given use case gets loaded into the context. One of the other problems like people face while building agents is that in a bunch of cases, direct function calling is not the best for solving the use case. Things like if I want my agent to process 10,000 emails, it will probably context overload in let's say a hundred of them. So that's where we provide like sandboxes where the agent can do programmatic tool calling on top of our apps where it can process 10,000 or even a million emails. via writing code. Uh, and on back of it, I think the strongest thing that we do is continual learning. So all our integrations are built by our internal agent pipeline, which goes through the agent doing like first getting the developer app, uh, all the credentials required, uh, like creating the actions, then like finding dependencies and testing them in real world scenarios, like a bunch of edge cases. And that's like the whole process that it goes through. And what it gives us is in like runtime, when the agent is using us, we figure that a particular tool, let's say, is not usable by the agent for whatever reason. There's an error or there's some failure or is not able to understand the tool. In real time, that agentic pipeline is invoked and a new version of that tool gets created and the newer improved tools had added is added into the LLM's context, the agent's context. We also have like continual learning where when we see that like agent is taking a zigzag trace to reach an outcome, we convert that zigzag trace because we have the whole end-to-end agent trace of what it is executing, what was the use case, et cetera. We convert that zigzag trace into a set of skills. So the next time agent does something similar, it will take a straight path, making it more reliable, robust and token efficient and time efficient as well. We also learn from failures, like do's and don't, don'ts for using particular tools, use cases, pitfalls, et cetera. So that, that's the whole harness that we provide. We also have like a notification system, so like, which we call triggers. So like the agent can be notified, let's say when like an email is kind of like received or a Slack message is received or a PRS get created. So the whole system or harness around agent communicating with. apps, the knowledge work apps, that's what Composer provides.
Nathan Labenz: Okay, cool. I want to go one by one through, I think, all of those topics and go a level deeper on each.
Karan Vaidya: For sure.
Nathan Labenz: Before we do that, though, I would love to understand a little bit more who your users are. And maybe, you know, I'm sure this is changing because obviously we have phenomena like OpenClaw popping up and a whole new populations of users coming online. And I don't think that process has reached its endpoint by any means just yet. But as I was using the tool, I was kind of like, OK, I see two ways or two sort of broad scenarios that I might use this in. One is it's become almost reflexive at this point for me to go to Cloud Code first whenever I want to do almost anything on my computer, even if that's something as generic as like search for an email. I'll go to Cloud Code and ask it to search for the email rather than go into Gmail and search directly in Gmail. So there's, I guess I would call that the sort of hobbyist market or like the individual user who has their individual assistant agent. And those folks I could see really needing to, or really benefiting from something that allows them to expand their toolkits really quickly. Just this morning, actually, I was kind of onboarding a teammate who hasn't used Claude code so far, and one of the questions she had for me was, How did you give it access to our Google Drive? And I was like, Well, actually, that was kind of a pain in the ****. Claude actually talked me through the steps, but the steps were pretty gnarly. I had to go into console and create an app and then click over to here, add this permission, whatever, I don't even remember all the steps, and so that wasn't super easy. And I can see a lot of people just stumbling over that, or just for ease of use, getting the, hey, if I can get 1,000 of those, where I don't have to go, Slack is another one, just absolute nightmare of permission adding and all that kind of stuff. So I see that persona, I am that persona. And then I also see you have an SDK, which seems to be really geared more toward production apps. And for those folks, I'm like, hmm, that's interesting because how many people want to sort of dynamically bring tools into their app. It seems like it starts to make the app itself potentially kind of unwieldy. On the other hand, I do see a lot of value in managing auth, for example, for a thousand apps. That doesn't sound like a lot of fun. So if you can make that a simple process for developers, that sounds quite interesting. But I guess I see these profiles kind of going somewhat different directions, or at least getting the bulk of the value from different parts of what you've built. So interested in how you segment the market and what you see the value drivers, primary value drivers being for those different profiles?
Karan Vaidya: Yeah, that's a great question. So as you rightly pointed, we have like a two focal, bifocal product. One is for the prosumer market, which is people using Claude Code, OpenClaw, etc., and like plugging in Composio Connect. That's like what we call that product. as a single MCP server inside all these agentic runtimes that anything like codecs, etc., whatever they are using. And for them, the value prop is you like exactly what you pointed. Like you don't need to, uh, go to this MCP server, plug it in, understand the instructions of Google Drive, then Zoom, then this data dog, et cetera. You just get like one MCP server, which is like connect.composer.dev/mcp. It's as simple as that. You put it inside your cloud code and then like you can. manage your authentication directly via Cloud Code. If you ask it, I want to connect with like a new app, it will give you the link. Or you can like for like, if you like the GUI experience, you can just go to the Composer's dashboard and do it all there with managed permissions, et cetera, managed scopes. So there the value prop is like simplicity and getting the power of like almost anything at the fingertips. On the other hand, on the developer side, everybody's building agents at this point. be it like startups to kind of bigger enterprises. And one of the biggest problem statement while building agents that people face is they, to give them actual real power to be able to do things, they want to give, they want to connect their agents with actual knowledge work apps. And that's where we come in, we solve it, where I think the value that we provide there is, other than like auth and all the integrations, managing the scopes, like, like you can give whatever granularity of scopes you want via us, uh, controlling it in action level, providing the whole harness because like at scale this, like you, everybody who's building agents wants to create this similar harness that we have seen works really well. The pattern works really well for like a bunch of like agentic, like paradigms, specifically general agents, which everybody's building right now. So like the whole harness and its bits and pieces are available like fully in like full modularity. Like people want to use just like our tool discovery with Composer's tools as well as their tools. They can use that. Uh, they just want to use Workbench, which is our sandbox where the execution can happen, where auth and all is controlled by us. They can just use that, like we just code act essentially. So like, like all like bits and pieces, the whole harness is, so in, In the developer side of things, we have on one hand the whole harness where you can just plug in that single thing via MCP, via API, via SDK into your agenting system. We also have bits and pieces of modularity where, okay, if you just want to put this, you can put this, you can put like want to put this, you can put this. And like if you just want to use our tools, you can also do that. Like where you manage the whole harness, we just provide you the auth and actions. The idea there is people want things like governance, observability, auditability, and that all sits inside like composers dashboard. So that's where I think like, and also we have like, like obviously some amazing enterprise customers that kind of gives you trust because you can't, this is very critical data set that kind of you wouldn't want to give to any company. But at this point we have AWS using and building their core agency product on top of us. Zoom doing the same, Glean doing the same, Airtable. So a bunch of like tech first hyperscalers are trusting us. That kind of gives that trust level because they have already evaluated us for all the things that you would want to evaluate us.
Nathan Labenz: What are the big things that they want to evaluate you on that maybe I should be thinking harder about when I go, because I'm a pretty, uh, you know, prolific tester, at least of a lot of products. And I'm increasingly mindful of not, you know, certainly I don't like give full access to my Gmail or whatever to just anything that I happen to sign into. But I do, you know, I do connect a lot of accounts to a lot of things over time. What are the things that are kind of the biggest, the most vulnerable attack services or the biggest risk vectors that the big companies have beat you up on already to make sure you're solid, that the rest of us can take to the bank?
Karan Vaidya: Yeah, I mean, like, first of all, I think providing least privileged access control, I think we have done a pretty good job where we have, you can define what action you want the agent to give at and like the agent will have access to only those actions. So you don't like to start with, if you don't want to give read and write email, like write email or send email access to the agent, you can just give like read. Same thing with Slack, same thing with all the work related apps. So like that access control is pretty kind of important for like you to have that granular access control. Like that's one. Then second level of control, you can kind of, we have like a bunch of ways in which you can control what the agent can take action on via hooks. So like while before calling the like tool, you can check what the tool execution is doing and create guardrails around it, like human in the loop. And we have like patterns on for all of these kind of pre-built, like after calling the tool, like before the agent getting the response, you can have those hooks, like see what the, what the agent is doing, what the agent is gonna get, like what kind of data the agent is gonna have. So if you want to have some guardrails around that, you can do that. So all those type of guardrails are already present in the product. That's second. And then third, obviously, compliance is a big thing. So we have all the kind of compliances that people want for SOC2, etc., which kind of makes them somewhat more comfortable. And the fourth one is, which is pretty valuable for enterprises, is we also do self-hosting. In some cases, you wouldn't want to use our cloud, so we also self-host in the customer's VPC, which gives them a much more kind of like sense of breathing where like AWS, for example, in case of AWS, we have self-hosted composure inside AWS.
Nathan Labenz: Gotcha. Okay, cool. It's a good rundown. Let's talk about these sandboxes a little bit. The paradigm there is like a little bit confusing to me. I'm not exactly sure how to think about like what execution should and does happen in different places. Obviously I run, you know, if I run, which I do, Claude code on my local machine, it is mostly running things on my local machine, right? All the bash commands and stuff are literally happening on my system. There are also some tools, like their search tool, right, that's built in, that happens on their side, in their runtime, on their infrastructure. And those can communicate back and forth in terms of like, you know, the result of that search can get sent down to my system and the result of commands run on my system can get sent obviously up to the cloud to be, you know, part of inference. But I imagine that gets kind of fuzzy or weird when people have a mix of things where they, like, how do they decide or how do you guide people on deciding what should be run in your infrastructure and what they should run on their own infrastructure, like I assume you don't, it wouldn't make sense for them to try to like bring their whole app into your sandboxes, right? So, so yeah, how do you, how should I kind of unclutter my own mind to think about in general who should be running what?
Karan Vaidya: So essentially we have, like in our sandbox, we provide a ton of utilities, which makes it easier for LLM to write code on top of it. That same, like Docker image of sorts, we are making it available like locally also very soon. So the same thing, if you want to use your, like, let's say own tools, internal tools or local tools, um, and kind of like, like want them to be available inside composer sandbox, you can also do that. Like that's coming very, very soon. But the idea is in the sandbox, the agent has to write very minimal code and the auth side of things and a bunch of like, like side things, mostly auth and like some kind of abstractions around what they like tool calling are taken care of like already in the primitives that the agents get. So that's the benefit where, okay, like, like the agent doesn't have to deal with a bunch of things it shouldn't and it just writes very simplified code and all the things around it, like deciding what auth to use, kind of converting things from code to like function calling, et cetera, whatever it needs to be, is done by the tooling that we are providing.
Nathan Labenz: Gotcha. So the biggest value driver you're highlighting there is actually making tool calling, or not exactly tool calling, but making code writing easier for the language model by providing And you said mostly auth. What else is kind of in that? Because you fall generally into the category of hardness, right?
Karan Vaidya: Yes.
Nathan Labenz: So what else besides auth do you see? And I have experienced that certainly where, as we kind of mentioned, it can be hard to get these things set up. So it's intuitive, it looks pretty intuitive to me to say, yeah, if you could provide all solid auth code ready to go so that the LLM doesn't have to recreate that all the time. That sounds like a clear win. What else is like that where there is enough deterministic stuff that you've built out that it makes things a lot simpler for the agent?
Karan Vaidya: So like for example, file sharing is the other thing. Like basically we have like mounted folders where everything in that folders and LLM knows that because it's part of the, uh, the hardness, which is, you in the description of the function and system prompts, et cetera. So we have like, like mounted folders where everything put in those mounted folders are all like, uh, by default uploaded to S3 and like, we have like shareable links available for them. So whenever agents want to share anything outside, it's very simple. It just like moves or copies files to that particular folder. So things like that, like which kind of, and like more will be coming, but like, like giving like better, like whatever it wants you do. And we kind of like come with newer use cases every now and then, right? So file sharing is definitely one of the biggest ones where LLM has created, like going through all 10,000 emails, it has generated a report, going through a, like all the Stripe activities, it has generated a report and it wants to share it with you, the user. And like, then we make it very easy for them to do that. Like LLM can write code which uses LLM. So we've made it very simple to do that. It's kind of like inception, like LLM is writing code which uses LLM, but it's read it for like processing 10,000 emails. Otherwise, how would you do that? So like utilities around that. So all these are like, like some minor things, but overall it increases efficacy of agents and, uh, accuracy of agents to a big extent.
Nathan Labenz: Okay, cool. Um, on the, topic of like discovery or what I think you also call smart MCPs. I had been on the lookout for a while back when, you know, this sort of MCP phenomenon first popped up. I was like, where are the smart MCPs? It seems like at first we just had this massive wave of people wrapping APIs in the MCP layer. And okay, that's fine. I was kind of like, okay, sure, but it seems like where this really gets helpful is if the MCP itself is smart in some way so that it can take in not a very specific command that could equal, you know, it's sort of like this MCP call could have been an API call, but something that is higher order that maybe involves the composition of multiple tool calls or even potentially multiple different APIs working together to do some higher level job. And honestly, I haven't seen a lot of that. I kind of asked and asked and looked around and looked in repos to try to figure out if anybody was doing this, and there wasn't much. Now you guys are doing that, and it seems like it is a pretty big focus of your value. So I'd love to unpack how that is working, and especially maybe getting into a little bit like the sort of progressive disclosure of it, because this, I think, has also come to the fore in developer conversations recently where it's like, First it was MCP is going to take over the world. And now we've heard a little bit of the trough of disillusionment of, well, it's a lot of context bloat and maybe, you know, CLI is better. I kind of always end up thinking that like one of my one of my AI mantras is everything is isomorphic to everything else. Meaning like whether it's an MCP or a CLI, whatever, like you can probably do progressive disclosure and you can avoid, you know, crazy bloat if you think about things the right way. It doesn't seem like those decisions are so sharp as they used to be because there is just so much room to create flexibility in the context of intelligent systems. But take me through the smart MCP paradigm that you're developing, what makes it smart, how you're building that to, again, make things as easy on the agent as possible, to do as much for the agent as possible, et cetera.
Karan Vaidya: Yeah, for sure. I think as I mentioned earlier, if you give thousand MCPs to LLM, it's obvious, like we are hitting like 1 million context windows, maybe in some time it will increase to five or whatever. But attention is definitely not free. So the lesser you give, uh, the better the performance will be. And that's where, uh, things like just in time tool discovery so that your LLM is not suiciding via seeing or overwhelmed by seeing thousand tools and you giving the right set of tools is important. That's where given we have like 50,000 plus tools and most of the users want to use all of them because like it's kind of like dynamic, right? Like their users, specifically in case of like developers building, they want to give their users power of like whatever composer has. So the idea is like the LLM doesn't see all the tools, it sees like a few tools and then like, like dynamically, uh, new tools get added to the context. So that's one part of smartness. Uh, the other that I was mentioning was around learning, like, so what we call is like background learning of sorts. So whenever we see that like a tool is not like comprehensible by the agent, which means the element is not able to understand what the tool does, or it's trying a lot, but it's always erroring out. Then in the background, automatically a new version of the tool, which we feel is kind of very valuable for this particular use case, or in general, there are issues in the tools. So a new version of that tool gets created in real time. and gets added to the context. So that's another level of smartness, like, which is like, okay, we can create multiple versions of tool really quickly and get a new version of the tool which might be more like, like suited for the particular purpose that the agent is kind of taking at this point, uh, to the, like to the agent. Then the, another like thing that happens in the background is, uh, skills have gotten really popular, right? And the reason for popularity is it kind of is an abstraction level above tools. where you can have instructions to particular use cases baked into skills, which inherently use tools, but there are instructions, there are scripts which make it more robust and more repeatable compared to just providing tools to the agent. So what we do is when we have the whole end-to-end agent trajectories, converting them to a set of skills which are reusable, And provide them during the just in times tool discovery. So like it's also just in time skill discovery of sorts. So the agent already has how to do the tool, what tools to call for this to achieve this particular outcome. What code to write in the workbench or sandbox to achieve this particular outcome. And then like. It just has to, maybe the use case is a bit different. So it has to use that skill, but do a bit of fine tuning for that particular use case and reuse that code or skill for its purpose. So that's the smartness, like background learning. And it also includes failures. So if there are some failures that we see happen again and again, we just tell that to the agent beforehand in context. Okay, these are the pitfalls, these are the do's and don'ts of using particular tool, doing this kind of like achieving this type of outcome. So it's like, at the end, in my opinion, hardness is nothing but context. You have to engineer the context. And that's what we are giving some effective context engineering around tools, which makes it smarter.
Nathan Labenz: So double clicking first on that first one, the discovery of tools, what do those requests typically look like? Because I can sort of imagine... And maybe you can answer this with specific examples from specific customers or however you want, but I'm sort of imagining sometimes I might just say like, I want to connect to my Google Drive. And then it's like, okay, great, we know what we're gonna do. We're gonna get the Google Drive tool. Pretty straightforward. Definitely still very nice to be able to have that as a natural language interface. Other times though, I guess I might imagine Maybe I don't even know what I want. Like, is there a tool for this? Or, you know, how would I go about doing something like that? So I'm kind of interested in the breakdown of the kinds of requests that you get and how many of those today, and this, I think there's kind of another version of this question too, which is like, as we actually get into doing the work, how much of the work is happening in ways where the, the user sort of specified it. I want to move my, I want to go call linear and, you know, get some details and then put it over here. And you might think of that as sort of automated copying and pasting, right? Like stuff exists and we're kind of moving it around versus really figuring out higher order stuff. Like I don't really know how to do this, but this is kind of, you know, broadly what I'm trying to accomplish. And then the system figuring out What are the tools? What are the steps to make that happen? So I guess that exists both on discovery and at the level of execution. But the key question there is like, where are we in terms of people defining what they want their agent to do and the agent following those instructions versus people describing intent and the system sort of really figuring out how to serve that intent, even if the person doesn't have it all mapped out?
Karan Vaidya: Sure. I just want to kind of like clarify one thing there. So we have like a pretty intelligent mediator in between. Uh, so the direct user request is not what we get. We have like cloud code sitting in between and that sends the request. So it, that's very intelligent mediator. in between. It's kind of like navigates the user request and sends the right level of intent request to the tools. So we don't get, in most cases, Cloud Code already knows the power of what Composer can do. So it figures out, OK, if the user asks, how do I connect to Google Drive, it will directly call Manage Connections, which is our auth management tool, to do that. for the user. It won't give a search intent to like tool discovery tool. So that's where I think the intelligent mediator, which is the LLM or agentic kind of like runtime comes in between. But to answer your question, I think it's becoming more and more intentful. We have all seen how people are using open clause of the world. I think December was a big shift where people realized that these models have gotten to a limit where much more is possible and I can trust it much more compared to before. And that has happened across all domains, in my opinion. Engineering was obviously, like software was the first one to bite the bullet, but I think it's happening more and more across knowledge work where people are becoming more intentful with these agents and give I think in a lot of cases, just give the outcome they want to drive to the agent and let the agent figure out. And then like we are kind of the harness which provides all the right tools for it to be able to do that. But agent is smart enough to throw the right intent with the right tool.
Nathan Labenz: Can you maybe give some examples of that? Like instances of people with intent, that then gets mapped on to those tools. And I guess those could come-- intuitively, it seems like many of them would come from individual Cloud code or OpenClaw users. But I'd be really interested, too, if there are examples where that is actually happening in an app that an app developer has running in a production environment. I don't know if we're getting there yet. Or even if we want to. A lot of these things, I do feel like, what's the app at that point is kind of an interesting question. Yeah, give me some examples of like intent that you have seen resolved into actual successful execution.
Karan Vaidya: So at this point, I think like people are so confident that they give their whole Gmail access to the agent and ask them to, okay, go through last month of my e-mail and archive all the emails that don't seem useful. Like, so the agent goes and writes code to do that, which uses LM, etc. to figure out which emails are not useful. Then Obviously, like, like the other use cases are like, I use my open claw for hiring. Uh, so my open claw, uh, goes through a lot of like GitHub repositories, find good commits of individuals and creates a pipeline for like, specifically engineering for me to hire like, like good open source agent tech repos, Python repos, TypeScript repos, et cetera. It will go figure out the best contributors. specifically figures out all the enriched data of where they belong to, like SF or outside, like their emails, LinkedIn, like all the socials data. And like I've also kind of like given its own, like its own e-mail, so it also reaches out on my behalf to them. So it's like end-to-end hiring, like a recruiter job. I've literally offloaded and like there have like thousands of like really good folks. It has emailed and I've gotten like in last week or two week, I've gotten, let's say like 30, 40 calls set up. So like that whole thing is like fully done by my agent and Composio. Yeah, like I think and like that's the idea, right? Like a lot of exploratory to actual end-to-end knowledge work is getting offloaded to these agents. if that answers to a certain extent your question.
Nathan Labenz: Yeah, I mean, as it gets up to sort of job scale things, it starts to be both pretty intuitive, I guess, what that ultimately looks like, and also potentially quite transformative for many aspects of life.
Karan Vaidya: A sales agent is doing something like a salesperson is doing something similar for sales. Obviously he's not emailing them because they're like very like directly emailing them via the agent, but like the agent has drafts ready, which they just need to kind of like press the send button. So like that's the level where we are. Like it kind of like does the whole sales, figuring out the people that it needs to reach out, the right people who are building, let's say agents in different companies. and et cetera, like ready with like their data, like e-mail, LinkedIn, et cetera. And then the, like he has to just press the send button.
Nathan Labenz: So how do you make sure that those agents have the right context? Because this is something I'm also thinking about right now with, so one of my favorite things that I've set up, and I kind of can't stop talking about this, so I'll keep it brief because I've probably talked about it on a few episodes already. But I have exported to a local database basically the last five years of my communications. All e-mail exported out of Gmail, all Slack, basically every DM platform that I use. All of my calls, which I've been recording for like the last three years, transcribed. Every like turn in the call becomes sort of, everything's organized into threads. So a Gmail thread is a thread, but also a call, a single call would be a thread. And then each statement back and forth between the people is like a message kind of corresponds to the emails back and forth. Even the podcast, I kind of break down with a transcript and put it in there in the same way. So it's like a lot of the communication that I've had, probably a majority, is now in this database. And it is extremely helpful. for getting the context that is needed to understand, like, who is this person? What is my relationship with them? Do we have projects that are ongoing? Who all was it? If I start with a project, like, who was involved with those projects? How did that evolve over time? Whatever. I am a little reluctant to, like, throw that into some random cloud container, you know, that I'm just testing out, right? So that still only lives on my local machine. So this kind of brings me back a little bit to, like, the containers and the context and what should sit where. 'Cause I assume just like me as an individual, you as a company have a lot of information about what makes a good candidate, like who are the good candidates that we had before, you know, good examples, bad examples, you would compare them to. It's endless, right? How should people think about like sending these agents off to do long running things making sure they have the context that they need, but not like putting themselves too much at risk by my personal database already is a gigabyte, you know, and it's my whole life in there, right? So I do need to be a little mindful of where I send that around or like what ports I open up, you know, to access it. And I don't even know, I don't even know what all is in there, right? Like I'm sure I've emailed myself credit card numbers and done all sorts of stupid, emailed myself passwords and probably, you know, recovery codes I'm sure are in there. So I'm sure I've done all sorts of stupid things which could come back to bite me. How do you think about that balance between making sure the content, the context is there so they can succeed, but managing the associated risks?
Karan Vaidya: So yeah, that's where I think like the, the managed access and least privileged access control comes into the picture. And the idea that we at Composer have is you can, in future you'll have like not just a single agent, but like multitudes of agents. And the idea is you'll have different profiles for and access control for each agent. Some agent will probably have only read only access to your data so that they don't They can't do any malicious of sending or kind of like executing anything, but they'd need all the data. And like, you will want them to be very self-contained because they're like sort of research agents. So they have like a lot of data, but they don't, they don't have the permissions to divulge that data by mistake. So like, like you'll have very like tight access control on what they can do, but they have read only access to everything. So that's a profile that you can create inside Composio and manage it, like what access permissions that you want. There can be another profile where you have a lot of kind of, uh, writing permissions that you have given to that agent, but then that agent has very limited personal information or like company wide information given to that agent. Like just because what if that agent emails like some like secret token by mistake? So like there, the granular access control is, okay, you want to give a lot of rise, right control, but like you either have things like human in the loop, et cetera, to kind of guardrail what the agent is doing and like tight it, very controlled. So like these are the types of different profiles that kind of like will exist in the future. And you would create like a, like different open clause, different cloud codes, different codexes to do. like a mix or like all of it?
Nathan Labenz: Let's talk about the continual learning aspect. I think it's sounds like a huge value driver and sounds like probably a necessary one for Composia's success. I don't know if you would go that far, but what I see is like the barrier to spinning up a new tool is certainly dropping, but the value of having seen a ton of uses of that tool and being able to figure out what the actual effective pattern is, that's something that's not going to be easy for people to recreate on their own. So if you can make a step change difference in the results people can get from, even if they spin their own tool up real quick versus, okay, but we've seen 10,000 uses of this and know what has actually been effective, that strikes me as the moat in the, and we're all... you know, searching for the increasingly elusive moats in the AI space. I'm interested in, you described in some detail already like how it works. I guess the philosophy part of it I'm a little wondering about is how does it work across users? How does it work across apps? There's obviously, you know, a sort of depersonalization or, you know, anonymization aspect to it that I'm sure is like critical. But then also, does everybody If you find an upgrade, does everybody already automatically get that upgrade, or do you have to subscribe to upgrades? Or maybe the upgrades that you're doing are sort of, in some cases, very specific to an individual or a particular app, such that you feel like you can upgrade it for them, but you don't have to change how it is for everybody else. I feel like I both want those upgrades as a user, but then when I do have something that's working well, I'm a little bit afraid of those upgrades, just like I'm afraid of, even if you know, whatever, let's say, obviously the model makers are leapfrogging each other all the time. Similar thing there, right? If a new model comes out, it might be better, but is it gonna be better on the thing that I already dialed in to my satisfaction? Like, maybe, maybe not. So, I don't know, that all just seems like quite fraught. How do you, what's the philosophy that guides you in figuring out who gets what upgrades?
Karan Vaidya: That's a great question. I think we like think about it a lot, by the way, internally. And that's where like we have designed our intra to make it very easy. Like we call, like we have, as I was describing, like multitudes of versions of tools. Uh, and there might be like tens of thousands of a particular tool version that like, like of the same tool, you might have like tens of thousands of versions. So the idea there is There are some personalized upgrades. So kind of like, like we see that you are using some particular tool in a particular way and the tool can be better specifically for your use case. We'll have the upgrade only for you. Don't fully buy that like fear argument just because I think we all know like the models are changing, the model behavior is changing every other day. So the way you can control the behavior of the model, even when model changes or some of the tools change is like having skills. And that's where we're very, very thorough about when we change the skills that we have developed for users. So that doesn't change that often. That's kind of like the fixated level of like repeatable behavior that like the user, like you like something the way it is. So that's kind of like ingrained into your skills, like the personalized skills that are created in the background by us. But like the tools themselves keep on getting better and better. So I'll like, I think like you were talking about more, right? I'd want to like, we have like a gazillion of instances where we have seen, uh, the docs are totally like wrong for a bunch of tools. And because we have gone through so many agentic agents using our tools, like agents use tools in like insane different ways. And compared to previous generation where humans were using these APIs/tools, you hit a newer edge case every now and then. And that just makes our tools better and better. Because we have seen all across-- like last week, we found a bunch of cases in Google Calendar where our tools are much better. And it autonomously happened. We just even didn't get to know in some cases, but our tools are much better than what docs propose. And that's not true for just one app, it's true across apps. So I think that's like a moat which we have developed because we have gone through so many agents poking holes around different apps and making our tools better. And those upgrades are kind of available across, right? Why would we want those upgrades to be available to a particular user? Like if a tool is getting better and better, that should be all across. So that's like how I think about it. Like basically there are some improvements which should happen all across because they are very generic and some skills should be available across because they are kind of like how people, like how agents should use particular tools. And then some are like very personalized to you and your use case.
Nathan Labenz: You said something I thought was quite interesting and I'm not sure everyone would agree with, but it was that Skills basically tame the models, right? I forget exactly how you said it, but you sort of said, models are always changing, of course, but that's where the skills come in. Once you've really defined a skill, it sounds like you're of the opinion that you can kind of swap out models underneath and you'll get pretty consistent behavior even across models. Obviously, I know there are caveats with that in the sense that you can't massively downgrade to a 1B local model and expect to get frontier performance. But if we take a narrower understanding of that statement and restrict ourselves to like frontier models or whatever, whatever reference class you want to use, that's still pretty interesting. I think a lot of people would say they don't feel confident in that. But how confident are you in that statement? How confident are you that like, I guess the implication would be all the frontier models are good enough at following instructions. that if your instructions are really well built out, then they become kind of interchangeable. Is that a good summary of your view?
Karan Vaidya: Yeah, in most cases, I've seen that kind of true because in most cases, the skills are pretty well detailed enough in giving decently granular aspect of what you want to achieve, the path, the trajectory that the LLM should take, the agent should take to achieve a particular outcome, that if the model is good enough, and follows instructions in most cases, like the kind of like the behavior of the model, like, or the, the trajectory of the models remains consistent. And that's, that's a known kind of like pattern that like a lot of research and, uh, industry are also seeing, like where you get Opus to create a skill and then swap to SONET for using that skill, which is like, okay, the first time Opus being more smarter is better at navigating and figuring out, like in reaching the outcome. But once you have done that, you have the skill which Opus created, then you can swap to a cheaper model and achieve a similar outcome.
Nathan Labenz: Do you think that also holds true even swapping across frontier model providers?
Karan Vaidya: It's sometimes not. Like, so I think there are like behavioral patterns across different providers. which make them like different. For example, like specifically I have some examples that I see in day-to-day, uh, like, like entropic models are somewhat more agentic in terms of things like, okay, if there are some tools which require polling, then it will wait, it will write code to wait till like, for example, uh, you have to poll and then like, like continuously poll. I think that polling aspect of like agentic polling is much better ingrained than like, Anthropic model somehow and GPT just stops. It waits for user input after that. So those are some like, like behavioral patterns that are different, which makes some of the skills, like the way GPT will use those skills or would kind of build those skills a bit different from anthropic side of models. But to majority extent, except those like nuances, I think this holds true.
Nathan Labenz: Interesting. Do you in practice like advise people to do, I guess, what would you say is the sort of max efficiency play and do you recommend it? Like if I go develop a bunch of skills with Opus, what's the cheapest model that I can trade down to that would, that you would expect to work, you know, some large majority of the time?
Karan Vaidya: For sure SONET because that I like do regularly because like I in some cases like I write my like for example I do I use skills a lot. So like like in local I'll write like some skills for the first time via Opus but then like trade off for speed to SONET and that works phenomenally well. I don't do it with Haiku because I've seen like that doesn't work really really well. I have tried some with GPTs and like s s is like 90% of the skills just work. There are 10% of the times where like there are some nuances which are present in the skills because of like the model that wrote the skill, how it operates, and they're not exactly plug and play, but in 90 to 95% of the case, it just works out-of-the-box.
Nathan Labenz: Interesting. Would you expect like Gemini Flash to hit that level as well?
Karan Vaidya: I think so. I think Gemini Flash is decently smart. So like I think like if not now, maybe like the next iteration it should try it like myself so I can't like come in directly, but like I've used it in production setting where it feels like very, very soon.
Nathan Labenz: Skills as the re commoditization layer of what I think I would say the prevailing narrative has been recently that the models are starting to diverge in their not exactly capabilities in the sort of macro, you know, benchmark sense, because they're all kind of climbing the same curve, obviously, but that they're diverging more in like qualitative ways that are hard to wrap your head around and sort of thus creating some stickiness and some, you know, some pricing power for companies once they get people using their stuff that I think there's been the increasing sense. It's like harder to switch off, but you're making a provocative, I mean, not deliberately provocative, but it's definitely, it's provoking thoughts in me that it might actually come, the boomerang might come back due to all these skills really just getting so thoroughly defined that you don't necessarily need great judgment in a model to do it. You just need good instruction following. That's very interesting.
Karan Vaidya: That's where we actually position compose you as a one like one shot way to being not locked in essentially to a model provider. So if you use Composer's harness, you can use it with Anthropic, with OpenAI, with open source models. So tomorrow, let's say today you are using open, like Anthropic. OpenAI is gradually improving on a bunch of these things. If you want to switch to OpenAI, you can just have like all your auth, all your skills at just a single place and make that switch and get 99% reliability with the switch day after you decide, okay, open source models are becoming equally great and they are probably 10x cheaper. You want to go there, you can make that switch and like still continue to work. with 99% reliability.
Nathan Labenz: Something I'm still in the middle of doing actually right now is taking the article that, I think it was Thariq, I don't know if the person's personally, but a member of the anthropic technical staff put out a post just in the last 48 hours or so that was very well received that was like, Here's how to use skills, we've learned a lot, you know, here's what we've learned. So I didn't even read that, instead I just copied and pasted the whole thing. put it into Cloud Code and said, Here's some best practices that I just heard were popular and people say are giving good results. Can you go apply these or go into plan mode first? Tell me how you would think about applying these to all the various skills that we've worked on together. And naturally, it has a lot of good ideas for what to do there. I wonder if you guys would be, maybe you already do this, but if you've already invested or might think about investing in skills specifically for translating skills from provider to provider? Because I can imagine that 90, 95% could easily become 99% if you applied another layer of transforming and, and kind of compensating for the known quirks. Is this something you already do?
Karan Vaidya: Yeah. Yeah. We are like, We are developing a bunch of like metrics and benchmarks around this stuff and we already do some sort of this stuff. Nobody specifically and enterprises want a lock in because they want that optionality of moving across providers because like specifically in the current AI race, you actually never know who the winner is. It changes very often. Today it's Anthropic, tomorrow it's OpenAI, Google, Chinese models. XAI. I mean, like, like we want that optionality because if you completely lock in and like these skills specifically, in my opinion, are like very, to a certain extent, very addictive. And on the other extent are to certain, like, like creating those, like you said, like if the skills have those behavioral patterns. ingrained, like how the model works, they can be harder because they're not structured. It can be harder to kind of change them. And that's where I think like, like, I think the 95% is very easy, 90, 95%, because most of the models are at that level now. But reaching that 100% mark with these skills, which are not structured is actually very hard problem. And that's something that we are trying to solve already. And we have solved to like make it like somewhat better, but we want to solve it to like probably 100%.
Nathan Labenz: Yeah. Okay. That's really interesting. And I do feel my worldview changing a little bit in real time here toward expecting, you know, a little more commoditization, less pricing power. That also kind of brings to mind, like, another angle of thinking about lock-in or moats or whatever is memory. Although, again, I think there are ways to get around this. Claude recently did this thing where they said, Go ask ChatGPT this, come paste the answer in over here, and you'll be like, we'll pick up right where you left off in terms of memory. So that, too, is not really a great moat. Or at least it's debatable. But in looking at all the tools that you have and just browsing through them, I was kind of struck by... One, you know, of course there's like many tools that are these sort of just relatively simple API wrappers where these APIs have existed and now they can kind of be used by agents. Okay, that's sweet. A lot of automation powered that way. But then there's kind of another class of tool that is something that is built for agents in the first place. Memory being one, where I saw you have Mem0 and Zep and probably some others, but those are things that are specifically designed to unhobble the AI or enable the agents, if you will. There's also some things that are around like allowing agents to transact, you know, whether in fiat or cryptocurrency or whatever. Tell me, what does that landscape look like? What are the agent enabler tools that are actually important, that are actually working that are maybe even strategically important. Like memory could be one, I can imagine, where decoupling memory from your model provider, if it works well enough, you know, could be another way to sort of insulate yourself from lock-in effects. But I would love to hear your survey of this new generation of tools built for AI specifically.
Karan Vaidya: Yeah, I think like we are very bullish on all of these agent enabler tools, like you mentioned, uh, memory based, mem zero, super memory, zap, payment based, like there's kind of like Skyfire. Uh, we have like traditional, uh, e-commerce ones like Shopify as well, which are almost now like agent enablers in that sense. They are all gearing towards it. So I think like we partner with all of them and all of them like partner with us because we want the people building on top of Compuzo to have access to the best quality agent enablers to kind of like, like improve the ecosystem. Like I think a bunch are like a bunch of good ones are already there. More are obviously coming day-to-day. And I think like our position is just to be like giving access to our customers, all the best quality ones. So I don't think I have like, like any choices there per se, but like we like to give all of them to our users to make the choice.
Nathan Labenz: Are there any, even, I understand the idea that you can't maybe, you know, pick favorites among your partners, but are there classes of these things that you find to be particularly powerful?
Karan Vaidya: I think like all of them are getting decent usage, I would say. Like the memory, memory is obviously a big one. Like everybody wants to use that. Even payment side, there are like bunch like Skyfire, like for different use case payment, bunch of these things like data. are getting like increasingly better adoption given like the open claw movement, uh, specifically where like a lot of background agents are running and want to do commerce stuff. Um, like search obviously is one of the big use cases where people use things like Exa, Fire Crawl, uh, Tabili, et cetera, to do stuff. So I think like, like, like obviously it's kind of like broad, so like depending on the use case and the problem statement that the builder is solving, or if like people are using in their open claw or Claude, I think all of them or a mixture of them are being used heavily.
Nathan Labenz: Are there any kind of missing categories or missing tools that you're like, why has nobody built this yet?
Karan Vaidya: I think, yeah, it's kind of like, I think every, like there are so many that's coming up like, At this point, I might have to think more to come up with an answer here. But at this point, I think I know like people are building every and each sort of like from whatever a human does and need like, like they are like there's like human delegation from agent to human delegation type of thing that's happening. A bunch of things is happening, but like, yeah, I think I don't have like an answer here out-of-the-box, but like still I would say the most users still coming from like the traditional softwares because that's where all the data sets of the users. So things like Slack, Salesforce, I think those are the major users still because they are the kind of like that's the system of records.
Nathan Labenz: Yeah. Perfect transition. I'm interested in your take on those platforms and how like which which ones are kind of maybe even advantaged or at least like safe from major disruption by the AI wave versus which are more likely to be under threat. You know, people are, of course, tired of paying their Salesforce bills and their slack bills. And I think there's been, in my mind, some interesting, but I think kind of confused debates about this where One person will say, I had my coding agent spin up a Slack clone and look what it did. And then another person will say, Well, that's nowhere near what Slack really has to do. I mean, you think about all the complexity that Slack really has to handle. And then I always kind of think in my head like, Sure, but for that one user and maybe their small company, all that Slack complexity was irrelevant anyway. So it's just a bunch of stuff that they're paying for that they're never even using. I'm not really sure where that settles though. And obviously there's a lot of different categories from, you know, collaboration to task management to, you know, customer service platforms of all kinds. What's your take on, you know, with the obvious disclaimer that this is not investment advice, like what, what are you buying? What are you selling based on what you're seeing?
Karan Vaidya: Yeah, so I think like, like there are two full takes on that. Like one is I think, In my opinion, the core infrastructure layer is getting much, much more stronger because like, as you rightly said, making software is getting easy. So a lot of software will get built on the core infrastructure pieces. So, and like your dependence on software is just going to increase more and more because like now you're chatting with your agent more than you're chatting with a human doing some task. So your dependence is increasing. So like the core infrastructure that's driving it is like getting massively more like stronger. Things like obviously like AWS, Cloudflare, all these kind of like base infrastructure, because I think it's very hard for like anybody to rebuild those infrastructure pieces. So that's one. On like a bunch of these, like the whole SaaS war, right, that you are kind of like talking about, I think there, the way I am at least thinking about it is the interface in which this obviously like, like some places where you can build a mini version of some big, bigger like SaaS app for your particular small niche use case. That's fair. But in most cases, what will happen is the interface with which you use a lot of these SaaS apps is going to change. And that's where I think startups will also build those new interfaces and post a competition to the SaaS app. But like in this particular wave, I think all these like older SaaS softwares like Salesforce, Slack, et cetera, are also pretty fast to catch up on and they are also coming up with their like new agent tech interfaces to operate on. And I think that's just like, like, okay, who will kind of like be fast enough and do innovation. And we are here to support both, honestly. Like in our case, like at Composio, we are working with startups. At the same time, we are working with the incumbents building new agent tech interfaces. So it's all about who is fast enough to build a new interface for their users to operate on.
Nathan Labenz: So yeah, I've been back and forth on that myself a little bit. I think one paradigm I had bought into, and I think I still mostly do, is that things like Salesforce will probably not be disrupted by startup CRM competitors because sure, you can go build a AI first CRM or whatever, and it might be sweet, but it's still gonna take a long time to sell the big customers. And by the time you can go win a lot of that business, they'll figure out how to clone your best features and sort of, you know, close ranks and prevent you from taking you know, too much of their customer base away. And so if I think about like how much of the AI value, the value of the AI wave accrues to incumbents versus AI first, you know, challengers to those incumbents, I think I go mostly incumbents. But the flip side of that is what about people bringing things in-house? So another example that I've been thinking about because I actually had a chance to study this a little bit, is Intercom. And they have done, especially the last week, there's been a bunch of praise going around for Intercom saying, this is one of the companies that has adapted the best. So not picking on them by any means. In fact, they provide a past guest on the show, by the way, as well. They provide a really strong example of what it looks like for a company to catch the wave and ride it successfully. And one of the big things they've done, of course, is create this fin agent, which has basically the ability to resolve, you know, some, I think when we, when I spoke to them, it was like approaching like 70% of all customer service tickets, you know, for everything they see across, you know, many, many thousands of customers, 99 cents each flat pricing. Okay, cool. That's sweet. It sounds like it's driven a huge growth boom for them. And You know, easy to see why, because if I'm paying humans to do this and the AI can respond 24/7 instantly, I looked into our data, actually. My company is a Intercom user and we've pride ourselves on customer service, but the speed to response that the fin agent gives is just like, we just can't match that with our staffing size. So, okay, it has all these examples, or advantages, I should say. But then I looked at the Composio tools associated with Intercom. 133 tools just for Intercom, which I'm not sure about this, but it sure looked to me like I could do literally anything I wanted to do or needed to do with the entire Intercom platform through those tools. And then that got me thinking, well, maybe this Finn agent is actually gonna become a skill for me, and rather than pay them 99 cents each, what I really need to do is dial in exactly what I want. And I can probably do that better owning the skill on my side versus like trying to populate it on their side, do it all through the tools, and I probably save, I don't know, 90%, right? I mean, imagine it would be like 10 cents in token cost. So what's your take on that? Do you think that people will start to cannibalize, you know, these agents are awesome, but it seems to me like it wouldn't be necessarily that hard for a lot of companies to say, Finn's doing well, but they're not doing some things quite so well, and I've got all these tools, so give me the Composio 133 tools, let's work on our own skill, and we'll get this thing better than Finn was at 10% the cost. Is that realistic? Why wouldn't that be realistic?
Karan Vaidya: So by the way, Finn is a great product. I kind of like like we used it early on in our like when we were just launching the prosumer side of Composio because like obviously you have a ton load of like support that comes when it's a prosumer product. So we used it early on and it's like like really easy to get started and and that's what they're solving honestly in my opinion. A lot of companies don't want to spend a lot of time. They want to get started and offload this to someone else. And like when is great for them. I think where you made the right point was like, like if you want a lot of customizability, uh, I think more than cost, it's like cost is obviously a big reason, but like, like the customizability and like governing your agent to like building gives you that freedom of what you want to do. You can create multiple skills, you can like, like specifically limit your agent to have particular tools. You can give access to other apps that Composer has while building those support agents. I think that customizability is what people prefer when kind of making that build versus buy decision. And that's where I think FIN is great product for a lot of people, by the way, and that's why it's doing great. But like there are, there will be some people who would want to customize and. make the agent more powerful. And that's where I think we come in and provide those 100 plus tools for intercom and you can do whatever you want.
Nathan Labenz: Yeah, I agree with you that they've done really well. So just to reiterate that point, but I do, I do see the world where just the friction keeps getting so low and in part because you've already got the 133 tools in part because somebody can publish their skill. in part because I can give that skill to Claude Code and say, hey, interview me about my use cases and how I would change this skill to suit me. And then it's already plugged into all the 133 tools. And then I'm just like, wow, the barriers are really kind of crumbling everywhere around, everywhere I look, you know, it seems like the, um, the barriers are really getting pretty easy to overcome. Um, I guess bottom line, like if you're, projecting nobody can project five years into the future even two years is a long time but do you think like a year from now would you expect that a lot of companies are actually trying to realize these gains and say hey we were spending a million dollars a year on a million tickets with Finn let's do our own thing and you know try to save 80 90% with custom skills like do you think that will be a movement I like?
Karan Vaidya: I don't know. It will be like a widespread movement that will happen in bits and pieces. But for some companies, I think spending that much time is probably not worth it. For some companies, it is worth it. So kind of like that will be the idea. But I agree. I think the friction will keep getting, we will make sure that from our side, the friction keeps getting lesser and lesser. I think as you rightly pointed out, there will be skills around it. I think definitely build versus buy is Like as these models are getting better and better, definitely people will inch towards build compared to buy in the future.
Nathan Labenz: Let's talk a little bit about agent to agent. So far we've talked about agent to tools and then agent to smart tools. And then I think smart tools start to look like agents in their own way, right? Again, I think it's kind of, everything is a little bit fuzzy. The boundaries between a tool, a smart tool and an agent, I think are not always super crisp, but maybe when I think about agents, I guess I think about representing someone's interest. You know, if I was going to try to venture a conceptual difference between a tool and an agent, it would be that the tool is exposed for me to use to serve my own interests. And the agent is something that may represent somebody else's interests, but I can still interact with that agent. So you may offer your own definition. Are you, how are you kind of planning for agent to agent as part of the future of Composio.
Karan Vaidya: Yeah, for sure. I mean, like we think about it a lot and I have some models in which how I think about like multi-agent world or like agent to agent. By the way, I just want to call out, we have a bunch of tools at Composio which are agentic. So essentially, for example, there are a bunch of tools where you can do almost anything in a natural language format on a particular app that's inherently internally an agent that does that thing for you. I think there are kind of like, like some places where agent delegation works really well. In some places it doesn't. In A lot of places I feel like, like we all know, like the main agent has the full context about the use case, about what the user wants. It has all set of tools to figure out any more like things that it wants or not. So that's where I think like that the power of giving the right set of tools to the agent if you're not overloading the context is generally better. So like to give you an example, if let's say I have to book an appointment with someone. And let's say I have like a sub agent which can do that for me. But my main agent has all the right set of things like my calendar, etc. to figure out if I have like a collision at that point or not. But my sub agent probably doesn't have it. It's a sub agent of let's say an appointment booker or whatever. right? If I just delegate the task to the sub agent, then it's possible that it books an appointment at a time where I have a collision of with like my other, some important like board meeting, for example, and then I'll be in a kind of like what happened situation. That's where I think like, like sub agent, because it has all the context, my calendar, my this, my that will do a much better job if it is just given a tool. So like the way I look at it is If there's like in general, like 1, 2% context task, this is not a huge context task, right? It doesn't kind of overload your context a lot. Then it's better to provide it as a tool. And if it's like more exploratory task where a lot of context will be used, for example, deep research, we all know like in most companies, there's like a parallel subagents runs. to in different streams and do research on a particular topic and condense the results and give it to the main agent. In those cases, like where it's more exploratory, it's very context, like it takes a lot of context. It's better to use subagents. In a lot of use cases, it's better to just bring it back to the main agent and give it the right set of smart tools. Yeah, so it's like there's no single answer here. It will be a mix and match of like the problem statement on what's the best for that?
Nathan Labenz: Are you seeing anything meaningful today in the agent to agent space? Any examples you would highlight? It has felt largely to me so far, like it's kind of still very much theoretical, right? Like there's been a lot more talk of agent to agent than I've seen actual agent to agent happening.
Karan Vaidya: I think Claude's team of agents is a very interesting paradigm of what they have done, right, where there's a shared task list that like the, all the sub agents have and they can, they kind of like map onto the shared task list. They can kind of like do inter like agent communication via that shared task list where kind of they assign somebody else something, some task that they want like another agent to do, et cetera. That's a pretty good shared agent paradigm that I've seen in like case of Composo. As I mentioned, we have like, like kind of like some. tools which are exposed as agents, which are somewhat like, okay, you can do a to and fro as well, where you give the task to the sub-agent tool, which essentially is like a natural language, go and find this or go and do this in pure natural language on this particular app. And then if it requires something, it will give it in response. And then you can use that like sort of session ID to control the conversation again and again to and fro. So those are some paradigms, but like I, like, I think it's still very early, but like, like it is in production, like the cloud team of agents is available. Uh, they have like agents API, sub agents API that's coming. Like we have like some preview, uh, version of it available, uh, where you can manage sub agents. The agent can manage sub agents. We have like our internally, uh, we like open source this thing called agent orchestrator. A lot of our internally like agent, like engineers use like a single orchestrator agent to manage. 20 to 30 Cloud Code or Codex agents. And that like single orchestrator is kind of like figuring out what like different agents are doing, if any action like is needed from the engineer or like user of it to kind of control these agents. So those are some interesting paradigms that kind of like people are doing.
Nathan Labenz: What is your, um, cost structure for running Composio look like in terms of human cost versus token cost? And how is that shifting or how do you expect it to shift over time?
Karan Vaidya: As I mentioned, right, like all of our integrations are actually built by agents. So the engineering team is actually building agents to build those integrations. So in that particular setting, two to three years back, that was like, not the case. I think everybody had like a big *** team, specifically the, like the tool providers to kind of build these agents. So we have like, like literally like three member team, which is kind of like doing it all, like, like the setting up the whole agentic pipeline. And most of, I think we like over last month probably spent a hundred K on the pipeline that builds those agents. So I think like our kind of like token cost is definitely much higher than our human cost right now, to answer your question.
Nathan Labenz: Wow. So you have a three member engineering team, three humans.
Karan Vaidya: For the agent tech, overall we have like around 15 people, but for the team that kind of builds the whole end-to-end agent tech pipeline that builds, fixes, improves agents, sorry, tools over time, that's just like a three-member team.
Nathan Labenz: And do you see the engineering team growing substantially in the future? Is there anything that will require a lot more headcount or is it just going to be like a lot more tokens?
Karan Vaidya: I think like at Composio, we are definitely hiring. It's just our bar is too high. Like sometimes I feel a bit higher. So that's why we are not able to hire fast enough, but we definitely need humans to control the agents. I think we are still not at the point where like, Agents work fully autonomously without any supervision.
Nathan Labenz: But you expect, so I mean, we're getting a lot of different predictions around what the future of software, you know, the future of the software labor market looks like. Do you think that your, it sounds like you think token costs will grow faster than human costs, but do you think human, do you think that the size of the human team like, levels out at some point or, you know, does it, what's the sort of scaling law of humans at Composio?
Karan Vaidya: Yeah, I mean, like very honestly, we are a startup, right? So I think like the, like in startups, the kind of like human versus AI scaling law operate a bit differently because we are already like very, very AI first in that sense, but we still need humans to kind of like make decisions in certain cases. And that's where in our case we are hiring, but we are all seeing like what's happening across the board in like bigger techs where I think that the, like in our case, LLM usage is already a multiple of like human capital, like LLM capital or token spend is, but even internal development is kind of like multiples of like human capital. And that's where I think like in our case, we need more humans to spend more tokens. Like that's the idea. But that's not true for like incumbents where we know like it's not the case. And there I think like the human capital is much higher than like probably tokens spend the ratio. I think to answer your question, I think as models improve, I think that ratio will definitely go towards more token usage compared to human.
Nathan Labenz: One other question I had for you in terms of business strategy is, and this kind of connects back to the agent concept as well, although it doesn't require the agent paradigm, is just like, why not resell more? Why not try to control the customer relationship more? Obviously, in some cases, you know, I have a Slack account that's my Slack account, and I want to keep it my Slack account. I don't want to use Slack through you. That wouldn't really make any sense. But then there are other things, and I'm thinking, you know, things that I tried with Composio, brand fetch, to just like get logos of companies or whatever that I need, their color schemes, Perplexity or Brave Search API or EXA, which you mentioned, like any of these sort of generic utility style APIs. As far as I saw, it seemed like still you were allowing them to, or not even, not even allowing, but sort of the only way that I could connect those accounts was to give a API key that I already have for those accounts. And that got me wondering, why not just like take the money yourself and have, you know, your own big perplexity bill that your users, you know, can kind of pay their way, but through you. It seems like that would, from my perspective, it seems like that would be to your advantage and also might be a nice like friction reducer for users because I was like, oh, okay, well, I guess I got to go get my Brave Search API key now. But if I already had a, you know, payment method on file with you and it was like, oh, sure, I'll enable I think there's this, you know, $5 per thousand calls or whatever. You could even charge me six, you know, for convenience and it seems like it would work. Is that something you think you will do or is there a reason you're not doing it today?
Karan Vaidya: No, definitely. That's like a place where we are moving. Right now, we do have a bunch of services that are bundled into our paid plans. But I think like, like We are pretty soon launching this thing called like premium toolkits where what we exactly, what we exactly mentioned, like a single wallet, uh, with Composo can give you access to whatever you enable, like all these services, like just via a single place, like a single dashboard where you can enable that. That's kind of like, like something that we'll probably be launching by the time this episode comes or like, like in next. kind of couple of weeks is the plan and you can like, like set up your credits, etc. just at like composer's end and like use all these services so that you don't get overwhelmed by like maintaining so many accounts, different billing, etc. at different places.
Nathan Labenz: Yeah. Okay, cool. I'll look forward to that. I think these are pretty much all the angles that I wanted to cover. What have I not touched on that is on your mind that that, that I should have, uh, thought to ask about already.
Karan Vaidya: No, I think like, like, yeah, I think one of the things that like a lot of people on Twitter are talking about is around MCP versus CLI. I think that's kind of like a pretty heated debate right now, specifically with GitHub CLI, this, that. I think I have my viewpoints there kind of, it's like an interesting because it affects us a lot and we are actually Like I mentioned, right? We are just like the, like reliable to the execution layer. And that's why we are launching a universal CLI like next week, which is like the last week of March. I don't know, depending on when the episode comes out, but like, like with a single CLI, you can access all the different apps. So you don't need a GitHub CLI for just GitHub or cell CLI for this. the CLI for that etc so like a single CLI which can manage all your apps like with like a single kind of like point of like usage and like but like I think more towards like CLI versus MCP I think like I don't know what's your view there but I personally think it will be a multipolar world because you can't compare GitHub CLI which has been used since eternity by me by other ingenious which has baked into the pre-training data obviously the models will use them greatly But MCPs are also now like in the posting data with cloud code access of the world already being like you've been using MCP so much. So I think in terms of accuracy, I think like it's going to be a multipolar world. And I personally think MCP definitely gives you a lot more control.
Nathan Labenz: The MCPs are better for like observability, traceability, and for that reason, better for enterprise use cases. Is that fundamental? I mean, it feels to me like everything sort of can be patched and like, Couldn't you create hooks on the CLI? I mean, I sort of had a question on this in the outline, as I'm sure you originally saw, but then as we were talking, I kind of came to the conclusion that maybe this debate is much ado about nothing in the sense that in the end, They can both work. They may have some relative strengths and weaknesses now, but as they mature, it's kind of two sides of the same coin. That's kind of where I think we're headed. Would you dispute that or what edits would you make to that outlook?
Karan Vaidya: No, I think I agree. I think that's where I was kind of going towards. It's not going to be a unipolar world. I think both of them will coexist. And I think it definitely like kind of like one of the deciding factors will be like where the like more and more tokens are getting spent. So like because that will go to the agentic traces and get RL more and more and like that will improve over time. But I think it will be a bipolar world.
Nathan Labenz: Gotcha. Yeah. Makes sense. Anything else we should touch on? Anything else you want to make sure people know about Composio before we break?
Karan Vaidya: Um, no, I think like, like, yeah, uh, like we are hiring in SF people listening. If anybody is interested in building the future of agentic tool execution, uh, I'd love to talk.
Nathan Labenz: Perfect. Karan Vedia, CTO of Composio. Thank you for being part of the cognitive revolution.
Karan Vaidya: Thanks, Nathan.