The Customer Service Revolution: Building Fin, with Eoghan McCabe & Fergal Reid of Intercom

Watch Episode Here

Listen to Episode Here

Show Notes

Today Eoghan McCabe and Fergal Reid of Intercom join The Cognitive Revolution to discuss building their AI customer service agent Fin, exploring how they achieved a 65% resolution rate through rigorous optimization and custom model training rather than relying on base model improvements, while pioneering outcome-based pricing at $0.99 per resolution.

Shownotes brought to you by Notion AI Meeting Notes - try one month for free at: https://notion.com/lp/nathan

Sponsors:

Linear:

Linear is the system for modern product development. Nearly every AI company you've heard of is using Linear to build products. Get 6 months of Linear Business for free at: https://linear.app/tcr

AGNTCY:

AGNTCY is dropping code, specs, and services.
Visit AGNTCY.org.
Visit Outshift Internet of Agents

Claude:

Claude is the AI collaborator that understands your entire workflow and thinks with you to tackle complex problems like coding and business strategy. Sign up and get 50% off your first 3 months of Claude Pro at https://claude.ai/tcr

Shopify:

Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

PRODUCED BY:

https://aipodcast.ing

CHAPTERS:

(00:00) About the Episode

(03:43) Keeping Up With AI

(09:56) The Evaluation Stack

(14:15) Incumbents and Market Pressure (Part 1)

(19:00) Sponsors: Linear | AGNTCY

(21:34) Incumbents and Market Pressure (Part 2)

(23:42) Real-World Business Impact (Part 1)

(32:11) Sponsors: Claude | Shopify

(36:13) Real-World Business Impact (Part 2)

(36:19) Improving Resolution Rate

(45:00) Intelligence Is Not Bottleneck

(52:03) The Future Knowledge Worker

(59:53) Human vs. AI Performance

(01:04:48) Considering Paradigm Shifts

(01:09:31) Outcome-Based Pricing Model

(01:18:46) Proprietary Product Insights

(01:26:05) Company-Wide AI Adoption

(01:31:54) Boosting Engineering Productivity

(01:36:00) Outro

Transcript

Introduction

Hello, and welcome back to the Cognitive Revolution!

Today I'm excited to share my conversation with Eoghan McCabe and Fergal Reid, CEO and Chief AI Officer at Intercom, makers of Fin, the AI customer service agent that's been a market leader since its launch some 2 and a half years ago.

Regular listeners will know that Intercom has recently been a sponsor of the podcast, so it's worth noting that this episode was NOT part of that sponsorship deal â on the contrary, because I've been an Intercom customer for years at Waymark, and also noticed that leading AI companies â and past guests! â Anthropic, Gamma, and Lovable all have testimonials on the Fin website, I wanted to understand what's really working and what remains a challenge for a company that's been among the most successful at creating practical business value with LLMs.

And I'm glad to say that this conversation really delivers!

With a diverse customer base of more than 400,000 businesses, and Intercom's ability to measure successful resolution rate differences as small as a tenth of a percentage point, Fin is one of the most intensively tested LLM applications in the market today, and as you'll hear, Eoghan and Fergal are both remarkably candid about what they've learned and what they still don't know.

One perhaps-surprising finding that stood out to me, especially considering how much the AI discourse tends to focus on new model releases and frontier capabilities, was Fergal's assessment that intelligence is no longer the limiting factor for customer service automation. On the contrary, he says that GPT-4 was already intelligent enough for the vast majority of customer service work, and that model improvements have only contributed a few percentage points to the greater-than-30-percentage-point increase in resolution rate that the Fin team has delivered since launch. The vast majority of gains have actually come from better context engineering, which they've achieved through many rounds of careful optimization of retrieval, re-ranking, prompting, and workflow design.

Of course, we cover a lot more than that, including:

The fact that most customer support teams are "underwater", which means that, for now at least, Fin is allowing companies to support more customers and beginning to affect their hiring plans, but generally not yet leading to layoffs
How Intercom thinks about the importance of speed, and how they balance the desire to be first to market with the critical need to maintain customers' confidence.
The culture of awareness, engagement, and constant experimentation that has allowed them to deliver 1% improvement, month after month, for 30 months in a row.
The intricate workflows that power Fin, and why Intercom is now training custom models for specific tasks, including a custom re-ranker.
How Intercom dogfoods Fin, and why their resolution rate, while above their customer average, is actually quite a bit lower than top performers.
Fergal's observation that, no matter how sophisticated your offline evals, the messiness of real human interaction means there's no substitute for large-scale A/B tests in production.
How the 99 cents per-resolution pricing model that they pioneered, while initially unprofitable, has created strong alignment between Intercom and their customers and become profitable thanks to improved success rates and lower inference costs.
the 2X productivity goal that Intercom's CTO has set for their technology teams in light of AI coding assistance,
And how their vision is now expanding from "service agents" to what Eoghan calls "customer agents" that can work across the entire customer lifecycle, including sales and onboarding.

Bottom line: if you're building AI products, you'll find in this conversation a bunch of valuable insights from a team that has brought real rigor and sustained discipline to the challenge of making LLMs work reliably for businesses and their customers, at scale.

This is Eoghan McCabe and Fergal Reid of Intercom.

Main Episode

speaker_0: Eogan McCabe and Fergal Reid, CEO and chief AI officer at Intercom, makers of Fin. Welcome to the Cognitive Revolution.

speaker_1: Thank you.

speaker_2: Thank you.

speaker_0: So, I'm excited for this conversation. Um, my company, Waymark, has been a customer of Intercom for years, and, uh, so I've been following what you guys have been doing with AI with interest, both, uh, you know, intellectual and applied over the last, uh, couple years. And you've done some really interesting stuff and, and been, uh, in, in some ways really innovative leaders in the market. So, excited to dig into all of that with you. I thought first question, um, just because AI is moving so fast and obviously you guys are running and, you know, sitting in the leadership position of an 1100-person organization that's distributed across, you know, all the timezones of the world, how are you going about keeping up with AI? What is it that you're doing, you know, mix of hands-on, sources, whatever, to stay current and make sure that you know where we are in the, in the development of the technology?

speaker_2: Yeah, I mean, I'm sure Fergal and I will each have different answers. Uh, f- for me, I, I rely on Fergal. And, uh, Des and others. Des is one of my co-founders and runs all of warranty. Um, and many people who are very close to the action. Y- y- you kind of have to pick your battles a little. And, um, when you have, you know, those trusted longstanding relationships that I have with Des and with Fergal, y- they know what I need to know, and I know I can trust what they say. And so that's a big, big part of it for me. Uh, I actually got really disinterested in tech news many years ago. I found it to be super boring and very, very samey, so I don't follow all the fundraising announcements and who the latest, hottest companies are. Um, uh, and so, yeah, that, that, that's kind of what works for me. I, I myself have, you know, I guess a t- technical t- training, if you like to call it that. I studied computer science, but I never, um, p- practiced. Um, and so I, I can pick up these things pretty damn fast. And funny enough, I mean, I graduated college in 2006. I studied AI in 2004. You know, like machine learning and AI was like s- the thing I specialized in. And, you know, f- tr- truth be told, they had, they did not anticipate this moment, but, um, it primed me a, a little bit. But maybe Fergal can tell you how he learns the things he teaches me.

speaker_1: Yeah, a- a- absolutely. And, you know, just to say like, look, you know, eh, we've been in this game for a while. We've had AI products in production, I, I think shortly after I joined, I remember working with Eogan on the first brief of what later became Resolution Bot. So w- we have those working relationships, and we have a certain organi- organizational or institutional knowledge in the company for a long time of like dealing with and communicating with these things. And so that, that kind of gives us an ability to communicate rapidly inside when something new happens or changes. There's just, there's that shared context and vocabulary and state into leadership org. Um, I, I would say that like in terms of the external environment, it's really difficult. It's really a full-time job to kind of keep track of everything that goes on out there. And there's so much hype, there's so much rubbish, and you almost have to like test and verify everything yourselves. Anytime there's a new benchmark or some new model, it sounds amazing and you're like, "Wow, this is really good, if it bears out." But we now have to go through a period of like testing and analysis ourselves to, to be kind of sure that there's something really interesting here. So it's busy. Of course there's, you know, there's like Twitter or X, there's like papers, there's, you know, you kind of learn who, who the top labs are and the people doing the top work in the space over time. But, uh, but yeah, i- it's busy. It's hard. I don't know anyone in this space who isn't struggling to just keep track of the, the biggest developments.

speaker_2: I, I, I would add one thing that Fergal did well is that he, you know, we have a, a 50 plus person team under Fergal that are, we call them the AI group, and there's, you know, real scientists and researchers amongst that team. And part of what he has done well is create space for experimentation. When- whenever there are new models or new technologies, it's not long before someone on the team has hacked together a new version of Fin that experiments with it. So there's, you know, fundamental research as part of their work, rather than keeping up with what's happening on the side. I, I think that's important.

speaker_1: Yeah, uh, absolutely. I, I, I completely agree.

speaker_2: Yeah.

speaker_1: Like, you know, we, we have a very experimental and scientific mindset, which enables us to kind of quickly integrate information. So if something changes in the external world, we get to validate that. And, you know, i- it's pretty common where it's like, hey, some new model has dropped. And, you know, the next day or five hours later, we have like the results of it on back tests, and then we can kind of integrate that and like think, do we need to change anything here? Does this look promising or not? So yeah, we, we definitely have that pipeline o- and then that kind of, that setup of people to kind of, to do that. We've invested in that and it's paid off numerous times.

speaker_0: Yeah, that's cool. You mentioned it's a full-time job. It's funny you say that, 'cause I've basically found a way to make it my full-time job just to generally keep up with what's going on with AI. Have you actually hired somebody with that specific job description to basically say like, "Your job is to go track external developments and synthesize them, report them to the team"? Or is that kind of a distributed responsibility across the team?

speaker_1: It, it is definitely a distributed responsibility, and I would say it is a core competence of the AI group, and it is a core competence of any AI org in today's world. You just... Otherwise, you're just gonna get outdated. You're gonna fall behind. It's a fast game. You have to move fast. So no, uh, we, we, we, we distribute that, eh, throughout the group. It's not any one person's job, but it is absolutely a core competence of the org.

speaker_0: Yeah, gotcha. What does that eval stack look like? Would you say, you know, you can take a new model, plug it in, backtest it i- within a, a day or so? Um, you know, that ... And you could break that down across a lot of different dimensions. Like, we always welcome shout-outs to vendor, you know, companies that you particularly like, frameworks. Also just conceptual stuff like, to what degree do you trust LLM as judge paradigms, you know, to help you accelerate the evaluation? Um, you know, a lot of companies ... There's been this argument recently that we don't need evals, we can just do it all on vibes. But it sounds like you're not, uh, not on that side of that argument.

speaker_1: I, I mean, evals is a tricky thing. Uh, I, I would say we're a little skeptical about evals. Um, you know, six months ago there was this sort of meme going around that like, "Every product team needs to specialize in evals and be great at evals." And like, people started bigging up evals to be this like amazing thing that will solve all your problems. And you should always be skeptical of that. So, you know, um, I, I'd say a number of things. I'd say ... Thing number one is, yeah, we're pretty good at like backtesting. We've always had a good backtesting framework for Fin, for our core things that we're interested in from an LLM. You know, does it hallucinate much? Eh, how often does it answer questions? And, you know, w- we, we have, we, we have a setup where, you know, given sort of a, a RAG-style framework, and with like known good answers to particular questions, hey, does a, a particular LLM that we evaluate, does it do a good job at finding the right content? So, I, I would say like we, we have pretty mature backtests. But we also have this, this kind of battle-tested wisdom of like always test in production, always test when they ab test at scale in production. And we can move quite fast to that posture with a new piece of technology. And the reason we've done that is because we have seen so many times, hey, there's something that's new and it's very exciting, and it does well on our backtests. But then in production, it actually underperforms. Just the real world of humans and the real messiness of human communication is like so messy that you can't build a perfect eval for it. You can build like something that's good enough to tell you there's signal here, but you have to test in production. And like we are quite sensitive to changes of like a tenth of a percentage point in resolution rate, is something that we care about and we sweat. And so we have to test in production in order to really see those things with like massive at-scale, massively overpowered AB tests. And we really, we consider that to be an edge. We consider that to be something that, that we can do that like maybe smaller competitors or competitors with smaller numbers of customers just can't, 'cause we have this very large deployment of Fin with many very large diverse apps. And so we lean in, lean into that. And anytime we've kinda gotten too down, you know, get too far into the evals till you get too scientific, uh, y- y- you gotta just test in production. Don't fall in love with your offline stuff.

speaker_0: Yeah, I wanna come back to that notion of scale, um, that you mentioned as an edge, 'cause, uh, that ties into some macro thesis that I'm developing, and, uh, maybe wanna test against your world view. Before going there, though, how much pressure do you feel in terms of, you know, from customers, from, I don't know, other stakeholders, to be really quick to market with kind of qualitatively new AI capabilities? I, I think one thesis that I've had is that there's probably enough time for incumbents to implement the best new stuff before s- you know, a brand new startup will spin up and rebuild everything and eat your lunch. Um, and we've got like quite a sp- You know, our people are t- are testing all parts of that spectrum, right, with Apple kind of bringing up the very rear of, you know, being, uh, clearly last to, to market. But th- they get, can hope, I guess, still maybe to be best one day. Where do you guys try to be in terms of like the bleeding edge of capability, somewhere in the middle? You know, not last, but later, but best. Like, wha- what's your sort of philosophy of that and, and what do customers, you know, demand or expect from you?

speaker_2: Uh, I think we feel pressure from, um, hmm, anyone in this space who would wanna do the job that we wanna do. The larger the incumbent, the more they have these incumbency advantages where they have prior customer relationships and, um, some degree of lock-in. Apple has got the ultimate lock-in. And I've been pissed off with Apple for years now, not just on the AI stuff, but every third new interaction on my iPhone has a bug. I mean, at present, if I try to type in a contact's name in iMessage, then it, it just doesn't return a list of names. Um, and I'm not gonna leave anytime soon. Anytime soon. Because I use the entire Apple ecosystem, and if I thought that that insanely annoying bug in iMessage slowed down my day, wait until I try to integrate Android and other Google products with my iMac and other Apple things. Like, they'd have to push it, and I would eventually switch at some point. But it's not about to happen soon. Um ... We don't have that same level of lock-in. And you could compare us with someone like Salesforce. So, the bigger guys obviously have substantially more lock-in. Um, Salesforce, for example, they have large customers. Those customers move slower. Salesforce have multi-year contracts, and they have a broad platform, and so many other things that work with it, and they also have multiple products that they sell. And people become like a Salesforce shop, and if you kinda remove one thing, it makes the whole commercial contract less, you know, attractive, et cetera. So, they've a lot going for them. So, they do have more time. But they-What, what happens is that there's a certain momentum in categories, and once a company kind of breaks out, particularly with the early adopters, it's very hard for the incumbents to kinda show up and new, and kind of take away that sheen. Once a new disruptor has broken out and they've established their brand, they've got momentum with their technology, it's untypical for, um, the incumbent to just completely unseat them. It's worth, I, it's worth looking at the Dropbox story, and, and, and, and there's probably nuance in it too. If you recall, many years ago, um, this is coming from the guy who just told you he doesn't follow tech news, but this is back when I guess I did. Um, famously, famously Steve Jobs offered to acquire, uh, Dropbox and Drew Houston said no. And my understanding at the time was that Steve Jobs said, "You know we're gonna build this," and Drew still said no. And it took them many years, but they did build it, and now iCloud exists and it actually works. It works just fine. But Dropbox still exists. Now I'm not saying that Dropbox are totally killing it. I don't know if it's their fault, though. It could be the whole category. But eh, the ways in which we reduce all of these narratives or concepts down to, like, winning or losing, catching up, falling behind, is too blunt. And so if I was to kinda summarize it all, I would say that, like, the bigger the co- the bigger the incumbent, the more established they are, the more time they have. Salesforce and others will probably catch up. Um, but I think that the newcomers, once they actually catch some momentum, they will have established themselves very early in the market. And then some of them totally run away with it and the incumbents can never catch up. And I would say that it's like kind of a meme to say this, and I've been saying it a lot, but I do think that this time is different in that it's not just, "Hey, let's build ABC instead of XYZ." It's more like building AI requires a fundamentally different culture, set of talents. You just heard Fergal talk about the fact that keeping up with the changes and using a highly qualitative and scientific approach was of fundamental importance. The older companies need s- uh, way more of a fundamental reset to catch up than just deciding to build something else. So I would be relatively bearish on any incumbent catching up where there's now some significant momentum in any of the AI categories.

speaker_0: Yeah. Interesting. So how does that lead you to make product decisions when there's sort of a cool but immature perhaps, y- you know, new feature or extension of the product that you could offer versus holding off until maybe the next generation of model that gets a little bit better? Uh, how do you think about, like, how much risk to take in terms of things not necessarily working as well as you might dream, but being there versus, you know, the risk of somebody else beating you to be the first to deliver that?

speaker_2: I think in previous years we've, we've had a very strong pro- product inclination. In previous years, we would just try and build all the sexy things first. But now we know that while we're super product oriented and we're very careful to make sure that in our category that we've been leading, which is, you know, service agents or kind of AI customer agents, um, we wanted to make sure that we, we'd always be the company that people could trust on to get the new innovation. We've realized that if a smaller company no one has heard of, or even has heard only little about builds a sexy shiny feature before us, it's gonna be okay. So for example, in our space, we have these agents that can talk on the phone, and by text, and chat, and email, and they're incredibly effective at that. I mean, it's surprising and shocking. An obvious thing to do that would be really sexy and cool and that a big part of me would like to build before anyone else is, you know, 3D avatars and little talking heads in your app. And not just a chat bubble, but an actual little digital AI person. It'd be fucking cool. It's just so exciting to me as a builder. But does it really move the needle today? Not really. People are still absorbing the other stuff. Might someone else build it first? They probably will. Will that hurt our business? Not at all. So a part, there's a lot of art and judgment to this in knowing what the customer wants and needs, and where you're at.

speaker_0: Yeah. Um, on just business results, the, I look back, it's been 18 months since Klarna made a bunch of headlines by saying, you know, that they're going in all, all in on AI and, you know, look at these amazing results. Um, again, this is February 2024, right? They said that they had cut 700 full-time jobs. They were a little cagey about that. I think they said that nobody lost their job because they were actually working in an outsource firm that was able to reallocate those people. But they, you know, were employing 700 fewer ... They say, you know, millions of conversations, improvements in all the metrics, faster resolution, higher resolution rate, et cetera, et cetera. Um, then that became a big debate, like, is that real? You know, are they really good at this when other people aren't? Uh, you know, are they kinda, uh, is there some, you know, um, snake oil there that w- was being peddled ahead of an IPO or whatever? How would you look back on the last year and a half or so and tell the story of what the real impact has been? Like, how consistent is that Klarna story, even if it was leading, um, relative to, like, the broad customer base that you guys serve? How does, how does the impact that your customers are seeing line up against that story?

speaker_2: Yeah. I, I think Fergal will have some fun things to say about this, but I'll first start by saying the things that Fergal won't say. Uh, like I, I never, you know, p- p- p- shit on other technology companies. That's really hard. And I don't know Klarna or the Klarna people at all, so I don't know anything about their story. But as a man who likes marketing moments and opportunities to tell brave stories, eh, you know, that looked like a good one. And, um, you know, they say it takes one to, to know one and, mm, that looked like a bit more of a show than a reality. Like I, for one, and maybe I'm completely wrong, and I'm, I'm totally open to being called out. I, for one, don't believe, didn't believe that story because I didn't see anyone else do it, to your, to your, um, uh, question. And I think I heard that they backtracked on that. I could be wrong, but I think they announced that they backtracked on that and they hired people or they didn't fire people. I don't know. But less about Klarna. The fascinating thing about this moment in time is that... And this, this has happened in all disruptive technologies, is that at least in our category, but I expect this way in every category, that the new economics and the accessibility and ease of deployment means that before it replaces a bunch of humans, it actually increases the supply for the things the humans were doing that one could never afford to deliver in the past. And so, for example, in our category, service, service agents, people are deploying their service agents to their free customers that they never gave support to before. They're responding to their customers more quickly, which means that the customers will ask more questions and they'll get more service. Um, they'll put it on email addresses or maybe they'll deploy a chat, uh, feature that they never had before. And so, they're just net doing more service, which means that their customers are more satisfied, um, more effective at doing the things that they want their customers to do, um, et cetera. I will say that for us as a, you know, a, a prime customer of Fin, we deploy it in all the ways, we certainly have dramatically slowed the rate at which we hire service agents and haven't substantially grown our team since we launched Fin over two years ago. And so I think that that's where the first human disruption is gonna be, where it's going to, uh, eat the, the supply, the future supply that was gonna come from new headcount. Um, but, you know, I'll, I'll, I'll, I'll finish by saying this, which is that, you know, we have this metric we look at, resolution rate. It's the percentage of customer queries that Fin can resolve according to the customer. Um, and that resolution rate has been increasing on average by 1% every single month. We did a couple 2% months in the last few months. Um, and it's currently, I think in the low to mid 60s. I'm looking for Fergal's nod. Um, which means that it's some years before it gets to even like the high 80s or 90s. Um, and that assumes that people deploy it to all of the places where they get service, uh, requests, and they don't typically do that. And then the other dynamic is that even as the resolution rate increases and we do a higher percentage of work, each additional point is slightly harder work. All the easy work was done a long time ago. And so even if it takes 10 months to add 10 resolution rate points, um, that's probably not 10 points of work. And so even if in a number of years, um, when we're in the high 90s of resolution rate, it still might be 20 or 30 or 40% of the work left to humans. And so I'm trying to lay on the idea that, uh, even when it starts to disrupt the work and doesn't just serve augment demand, there'll still be substantial human work to re- required. And I think that that's something that people in AI and in this space, you know, have been commenting on it, have got wrong, where they've f- flipped to the future really fast, imagine this brave new, if not scary world, and have just not realized that all technology adoption takes time, and that disruptive technology serves augment demand, and that it takes a phenomenally long time to kill the disrupted categories. Like, if text disrupted email, if email disrupted letters, if letters disrupted fax, guess what? Fax still exists. So that's a very, very, very long way to say that, um, I don't see these big layoff things happening and I don't see them about to happen.

speaker_1: Yeah. If, if I may be coming in on that as well, like I definitely agree w- what Eoghan said there. Um, like absolutely Fin is resolving like a really meaningful sizable chunk of the overall volume of people's businesses. Most support teams are like underwater. Like I haven't met a support team that isn't underwater by like 30% versus the capacity they wished they had. And so when you come along on day one, you resolve 30, 50% of their, their queries, maybe 30% in a workload, you just take them from being like underwater to being like roughly at parity. And then, you know, this was always the good news story. This is the way we used to hope it would play out before it happened, and we're like, "Oh, we kind of hope it, it doesn't really, you know, have massive job losses and we hope people move up the value chain." But that's what we see. People move up the value chain. Now, there's one major exception to that, which I would say is BPOs. If there are customers that have outsourced their like tier one or their frontline customer support to BPOs-They do very frequently deploy Fin and instantly get rid of the BPO. But most of the time, the internal team, like, pivots and goes up the stack and just isn't replaced, you know? And we saw that internally in Intercom. I think we, we, we had, like, our support team needed to grow by 25% the first year of Fin. Uh, that was the, the projected headcount, and it never happened. It just stayed static the first year we deployed Fin, and then Fin has continued to grow on and improve its resolution rate. So yeah, it's complicated. I, I do also wonder if there's, like, an economic downturn or something like that. Like, I think the support teams of the world are starting to realize that this is valuable and it's real. But I think the CFOs haven't kinda cracked the whip yet. And maybe if there's, like, a downturn or something like that, you'll see, then you'll see impacts. Then you'll see a much harder conversation about operational efficiency post-AI. But right now, it's a good news story. And we don't know the future.

speaker_0: Yeah, I've, um, have done two ad reads for Intercom over the last however many months-

speaker_2: Mm-hmm

speaker_0: ... I don't know if you guys have been aware that you've been sponsors of the show, but thank you whether you were or weren't. In that time, I think it's gone from 56% advertised resolution rate to 65.

speaker_2: Right.

speaker_0: And I guess I'd like to dig in a little bit on, like, first of all, how does that published resolution rate compare to your own resolution rate as Intercom? Second, what has driven that? You said, you know, it's 1% a month, and obviously there's a lot of work and testing and whatever. But, like, in terms of the tailwinds that have made that possible, what are they? Um, and then what do you think is kind of most likely to come next, you know, that will take you from 65 to 75 over, you know, whatever the next however many months?

speaker_2: I'll just answer the very first of that, part of that question, Fergal, and then would love you to jump, jump in. I think we're in the high 60s for, for Intercom. And that's pretty damn good because Intercom was this sprawling product, like so many features. Frankly, too many. Um, so the fact that it's actually able to provide that level of coverage is, is incredible. Um, and we have deployed it in all the places, so a really large chunk of our customer service is now done by Fin. And we certainly are one of the, you know, biggest deployments. Um, but we actually have customers in the high 80s and in 90s, so if you've got a kinda narrower set of questions that people might ask, it can get very high. Over to you, Fergal.

speaker_1: Yeah, absolutely. And then in, in terms of the actual, you know, the process by which we get that resolution rate up, like, it, it's a lot of work. And, you know, it's really weird how there's sort of a scaling law, or a Moore's Law-like phenomenon, where it has really consistently improved at about one percentage point month on month. And we have this thing internally where so often we're like, "Okay, well, we have three more things to try over the next cycle, over the next six weeks. Uh, I'm not too confident." And about, like, two of them work and one of them doesn't. Or one of them works, and two of them don't. And we get this net increase of about a percentage point month on month. We've this, like, ever-growing machine in the AI group always trying and testing more things. And that's roughly what it nets out at. And, like, you know, that is this constant process of, like, optimization. Trying to, like, let's refine the retrieval model. Let's go and work on the re-ranker. Let's go and change the prompts. Yet there's so much stuff to do there that kinda slowly gets it up. Only a very small amount of it, you mentioned tailwinds, only a very small amount of it has been, like, core LLM performance. That's been, like, couple of percentage points over the last two years as the overall performance has gone from about 35 percentage points at launch to about 65% now. And so, like, so much of it is just this testing and optimization process we do. And, you know, we have the data on that. We have the AB tests, and then we also have the sort of, the cohorted view, and we can see that, like, recent customer cohorts get a similarly high resolution rate to customers that have been with us for a long time. There's always variance within those cohorts. Some customers have a very mature setup where they've done a lot of optimization work with all these product features to help you optimize your help center content. But overall, you know, most of it is just improvements in the core engine of Fin that we have invested very, very heavily into. So the only kind, kind of tailwinds would be the space as a whole where, you know, there's more and more AI and it's available. And sort of the future is really, you know, for us, we've done a lot of work recently on investing in our custom AI models. We're sort of training our own models in-house for the first time. We've trained a custom retrieval model, a custom re-ranker model. We're very pleased that our custom re-ranker model, it beat out, you know, one of Cohere's top models, which we previously had used. And, um, yeah, we're really happy with that. So we really think that's the future of our investment in Fin is kinda taking all the data we have from all these different customers and using them to, like, help Fin learn and make Fin better overall. And that's been working for us. It's been delivering resolution, and we're, we're really excited about continuing to invest in that.

speaker_0: Do you think that will continue going forward? 'Cause that's, I would say, a quite different story from the most common one that I hear. And frankly, that I tell, which is that, you know, for, to take my own company, Waymark, for example, we help small businesses, increasingly bigger and bigger businesses, 'cause the quality of what we're able to deliver is improving, you know, quite fast as well. But it started with, you know, local brick and mortar, very small, you know, niche online retail. We helped them create video content. And you look back at what we were doing three years ago when we first brought an AI video generation thing to market, versus today, it's dramatically improved.... I would say a lot, you know, a lot of credit goes to the team. Certainly, like, we've tried a lot of things, figured a lot of things out. But I would say probably two-thirds if not more goes to the fact that the models themselves have just got so dramatically better during that time. Everything has gotten easier. You know, just the qua- even when it comes to something like generating a voiceover, you know, it went from, like, robotic to still mostly robotic to now, like, very expressive, right? And we had to figure out how to prompt that and how to adopt it and how to get things to sync up time-wise. But there were fundamental advances that were just, like, wow, you know, we're just... The, the, the giants on whose shoulders we're standing are getting taller, you know, at a, at a rapid rate. Um, so yeah, I don't know. How do you see... You're telling a, a quite different story where that seems to be less of the lift for you. Uh, how do you think about that, and do you think that will, will continue into the future?

speaker_1: Yeah. Uh, it- it's definitely a different story for us, and there's probably a few different reasons for that. Um, one reason for that is, you know, we built the first version of Fin on GPT-4, and we were, like, very early to that. We had, like, advanced access to GPT-4 for a couple of months, and it really took us, like, across the threshold that we wanted for, like, accuracy and quality. But it involved using this, like, big beast of a model, and we used to have to run it on dedicated hardware. And, you know, we would... We kinda, we had to really architect around it quite carefully. So I- I would say we have gotten some improvements due to improvements at the model layer. In terms of our architecture, our architecture is a little bit simpler, and it can do, like, more powerful things. But yeah, in terms of the actual resolution rate, um, yeah, it- it's only a few percentage points that have been just to... due to, like, pure model improvements, and it's almost all to do with the, like, testing and iteration around it. But, like, you know, that said, Fin is a different architecture as well. The architecture has changed as we've gone. So yeah, so i- i- it is, it's a complicated story, but, uh, w- we'll be, we'll be highly confident in that. Almost all of the improvement is in, like, you know, what you might call the rag layer or the AI layer outside the core models. Um, core models have definitely gotten better, and that's been great for us. It's been great for everybody. But, uh, you know, that, that has, that has improved like... It's reduced the cost a bit, and it's improved reliability a little bit. But it hasn't driven core resolution rate that much, but it has given us a platform to be able to go and, like, you know, provide more flexibility to our customers. We, we had a feature called Guidance. Guidance is where customers can go and they can, like, you know, make Fin more personalized to their brand so that as they talk in this tone of voice, and, like, features like that would have been harder to achieve with the models in the architecture, uh, a year or two ago. So yeah, the, the, the reality is, is multifaceted and it's complicated, but I would say, like, if you were to look at the... You're focusing on resolution rate, which is the core metric that we care about, that we build for, that our customers care about, I would say the story is a bit simpler, and the story is, is less about the, the underlying models. I think video will be very different than that. Um, GPT-4 is a really powerful model, you know? GPT-4, which we had two years ago, is a very, very powerful model. It was a very big model. It was a beast of a model to run. Um, and so yeah, we, we could, we could talk through the whole history. GPT-4 Turbo, which a lot of people have forgotten about, was, like, much smaller and much more efficient, um, but, like, very similar in terms of power to GPT-4. Um, yeah, and then, you know, moving to Sonnet, uh, we got, like, a couple of percentage point improvement, which we care about a lot. But against the backdrop of the sort of, like, 35% to 65% that we've had over the two years, uh, th- the actual models is, is a relatively small part of that.

speaker_0: Yeah. Yeah, that's fascinating. So is it, would it be accurate to say that intelligence is not the bottleneck? And would that also suggest that you are able to use smaller, faster, cheaper models? Like, this might suggest that you are a Haiku user today or a 2.5 Flash. As opposed to needing to maximize intelligence, it sounds like you're maybe more emphasizing...

speaker_1: So if, if-

speaker_0: The cost.

speaker_1: Yeah, uh, if, if you look at, like, core Fin... So again, i- it's complicated, but if you look at, like, core Fin, uh, there is a sense in which customer service is a task that is less cognitively demanding than, like, you know, the Maths Olympiad or something like that. And so a given level of maths or sorry, given level of model intelligence will saturate that task earlier. And so yes, I would say that, like, a large part of customer service probably was saturated by models of GPT-4 level intelligence. In which case, you know, yeah, y- you know, you spend a certain amount of time optimizing that and, you know, we have, we have went and we have trained our own small models to kind of, to do parts of Fin. So for example, one part of Fin was summarization. We summarized the end user query before we go and do rag out. And that was worth doing because sometimes end users, they put a lot of random stuff in their query, and you don't want to pass all of that through, like, your search pipeline and your embeddings, so you do a canonicalization or a summarization piece first. That's always been very valuable for us. We used to use initially GPT-3.5 Turbo to do that. Then over time, we used different models. We used, uh, you know, GPT-4.1. I think that we used Haiku for a while, didn't have great experience with Haiku. Recently, we switched that. That model is now a combination of a proprietary encoder-decoder model that we have trained ourselves, and then a fine-tuned version of Quent-3 that we have fine-tuned to be really excellent at that summarization task. That has made it cheaper and lower latency and more predictable and more reliable and higher quality than we were able to get from third party LLMs. So yeah, so absolutely, we are taking some tasks that... for which, you know, model intelligence is sufficient to saturate the task, and we're placing them with small models. They happen to be small models we've trained ourselves.Mm, smaller third-party models have never really given us the exact trade-offs that we want. You know, they become less steerable and, you know, there's all these different complicated set of trade-offs. But yeah, that's absolutely what we've been doing. We're very excited about that. And, um, that's a big investment we made recently in Fin. And then there are other parts of Fin that are real frontier challenges. Uh, so, you know, a big part of that is, like, tasks, right, or procedures where Fin is, like, interacting with external systems. We have a very high reliability bar to hit there. And yeah, for that we need frontier models. Uh, we use Anthropics, like, excellent sonnet models for that sort of task at the moment. And, um, yeah, it's great. Um, or like, you know, the- the hardest part of our answering prompts, we're still using, um, third-party models as well. So it's- it's nuance. Like, our- our product is pretty big at this stage. Fin was always 10, 15 prompts, and then we had a different architecture for email. Now we have Fin voice, which is a whole different story. So we really, we have this big cloud of AI services at this point. And so you end up in a nuanced discussion of each one. But, uh, but yeah, speaking about core Fin and that narrative I had is- is- is correct. Yeah.

speaker_0: Cool. That's, um, excellent. And, uh, just the kind of thing people tune into this podcast for. Uh-

speaker_1: Great. Right

speaker_0: ... I- I always hear that people, you know, they want these little nuggets of, um, understanding. It's a pretty AI-obsessed audience. So... And I think this is, like, a very timely and sort of th- thematically relevant conversation, too, because there's this broad question around, like, well, what's missing from AI that's gonna, you know, take it to the next step? Or if we imagine a drop-in knowledge worker of the future that it does seem like all the frontier companies are racing to create, what will that have that the current things don't have? And if we can take just raw intelligence mostly off the table, you know, if you're at high 60s today and you got 30 percentage points-ish to go to, you know, fully automate your own customer service, and it's not intelligence itself that's gonna close those 30-point gaps. Like, what are the other gaps? Uh, how- how would you taxonomize, you know, that which is missing? And it may not all be on the AI side, too. I mean, I guess with you guys, I would assume it would be more with your customers. I- I could imagine you might say, "Well, they gotta give us more information. They gotta actually use the features." But I assume you guys are using all the features you have, you know, pretty well to the fullest. So, what is missing that you think will, um, will need to come online to get you, you know, climbing the rest of that 30%?

speaker_1: Look, I- I would say that, like, there- there are definitely intelligence bottlenecks still. So, like, the core task of, given this article, answer the question from this article as well as a human customer support rep, we're kind of there. You know, the models are, like, really, really good at that stuff. They're intelligent, they're human competitive at constrained tasks like that. However, the more complex tasks of, like, given the whole bunch of external systems and a very vague, fuzzily defined policy about when to do a refund and when not to do a refund, and needing to do that reliably with common sense, not 99 times out of 100, but 100 times out of 100. That's still a frontier task. That's still a task where the whole ecosystem is leveling up. And so, you know, a bit like how with informational queries, we have to go and use RAG and we have to tune them quite well to get to the performance we needed. It's gonna be... The- the right way to do that, the right way to attack that is gonna be to use the models as a building block and then to go and tune the envelope in which the model works to be able to, like, give the right performance. So we need to do that. That's thing number one. And then thing number two is this huge task of actually deploying these things. All the human factors, convincing the security team, making sure they're secure enough, penetrating the organization. And this cuts back to what Eoghan says earlier, people in the discourse around AI, you know, it's very easy to- to fall in love with some metric on a back test and to not see all those messy human factors of deployment and adoption and penetration. And, you know, the more valuable something is, the faster it will penetrate. That's absolutely true. Amazing breakthrough technology penetrates fast, but it's still a process. And so that process needs to roll out, so that's gonna take- gonna take some time yet, you know?

speaker_2: I- I... Yeah, I would also add that the more systems, um, a- a product needs to touch, the slower it will penetrate the organization. And so, like, in some senses, like the informational queries for service were nicely isolated, and so it's easy to kind of pick that up, switch it on. But this future imagined knowledge worker that has to collaborate with many different teams and individuals, and use different systems and pay attention to permissions and talk to external stakeholders too, that just sounds like it's a lot, a way harder adoption. So the- the first thing that comes to mind when you ask that question is kind of Fergal's answer at a higher level, which is that, like, the- the- the recent trajectory of the actual foundational models tells me intuitively that the- it's not gonna be the base models themselves that just show up someday and they're good to go. But rather, there will be companies that deploy them to specific use cases and build, as Fergal calls it, the envelope around that to do this work highly effectively. I mean, if someone built the- the sophistication of system that Fergal and the AI group at Fin have built for a range of different use cases, it would probably be ready today. I don't think that the technology itself...... is actually holding people back. It's the raw hard work needed to, um, point these things in the right direction and help them be effective. And all of the work around the R&D to help companies adopt them, that is the difference between where we're at today and companies having, you know, real AI knowledge workers.

speaker_1: I, I, I completely agree with that. And I, I would also say that, like, you know, I do think there's something maybe missing from some parts of the discourse. You know, when people talk about, like, the models are gonna get so good, they're a, you know, country of geniuses in a data center, that sort of thing. You know, like, certainly there needs to be changes to the model capabilities to really achieve that. And so, like, you know, the models are missing a whole bunch of key capabilities today. They're missing memory, right? They're, they're typically run in a stateless fashion. And yeah, you can go and you can put memory outside, and that kinda works, but it kinda doesn't. They're also, they're missing, like, what someone might call a system tree. They're, they're missing the ability to, like, grind on a problem and to, like, learn about that problem, right? So if, if you have, like, an intern, nevermind, you know, some PhD-level, you know, genius or whatever. If you just have, like, an intern outta school, and you go and give them a task, they'll do pretty bad at the task the first time in a way that, like, probably Sonnet won't. Sonnet will give you consistent performance. But, like, the intern can learn, and they can... You know, they can really learn in a task-specific way over time. And, like, the models, they just can't do that. And there are research prototypes where, you know, they're trained with reinforcement learning to go and, like, update their weights when they hit a certain point in a maths problem after doing a lot of chain of thought or something. That's all, like... None of that's deployed yet. And so, you know, I, I think Crozier just did something interesting with, like, a live reinforcement learning system. I think people are gonna be looking at that. But even that's, you know, that stuff is, it, you know, it's risky. And so, so we'll see. There are direct fundamental capabilities that are still missing from the overall intelligence layer. We'll need that. In the meantime, companies like us will have to, like, build around that. And even after we have those, as Owen says, you know, there's gonna be a ton of work in building the envelope to make the system actually work for a business, you know? If you had some, like, crazy genius, you wouldn't want to give them, bring them to your company and let them just do everything, you know? You still wanna train them, you know? Sometimes if they're, if they're a real genius, they need a lot of training, you know? There, there is a value. If you wanna have someone come and do your customer support, do you want someone who has, like, 10 years of experience doing great customer service, or do you want a crazy genius? Like, it's not obvious that you want a crazy genius to do your customer service, you know? So yes, there's a lot to do here.

speaker_0: Yeah, I think that's really interesting. I, I, on the, um, refund point that you mentioned specifically, you said, you know, not 99 times out of 100, but 100 out of 100. I wonder if that is... Do you re- do you endorse that way of thinking for customers? 'Cause one thing I've noticed when it comes to, like, self-driving, for example, is Waymo is way safer than a human-driven taxi. And it's, like, almost 90%, maybe not quite, but getting to sort of 90%-ish reduction in accidents and in injuries, and yet we're just starting to deploy. So I wonder how often you see that essentially happening with your customers where, you know, it, uh, to quote Biden, who used to say, um, you know, "Don't compare me to the Almighty. Compare me to the alternative," how often do customers just fail to realize how accurate or inaccurate their humans are, how consistent or inconsistent they are amongst themselves? You know, again, at Waymark, to take a personal example, we had tried to evaluate aesthetically the image assets that our small business users would bring to the platform so we could make intelligent recommendations. We found that it was really hard to even establish agreement among people. Like, directionally, yes, there'd be correlation, but, you know, to say that everybody would rank them the same, like, far from it, right? You could... You... I always say you could tell the top end of the curve and the bottom end of the curve, but in the middle, you wouldn't necessarily even know which way was up or down. So, all that to say, is there a way in which measurably, if people measured, the AIs are as good or better than what is happening today, but people, for one cognitive bias or motivated reasoning- reason or another, don't wanna recognize that or are, are prepared to, um, you know, take the risk on the AI that they're already running with their humans?

speaker_1: I mean, I, I, I can definitely give probably a couple of conflicting answers to that because I think we all internally- ... spend a lot of time thinking about this. Uh, look, in, in one way it's like, yeah, some customers have a nuanced understanding that their humans are imperfect, and some of them even will have, like, an error rate that they know their CS reps will do incorrect refunds for, and then that's the bar. So that's one answer. But another answer is that, like, if I was a PM at Waymo, I wouldn't be just trying to equal the human accident rate. I'd be trying to exceed that by two orders of magnitude, because that's what's required to really build a product that will penetrate very fast through the market. And you can spend all your time obsessed that people are holding you to too high of a bar, but for better or worse, people do hold new products to a higher bar. And I think, you know, we've always kind of engineered for it. We've always wanted to give not just good enough or not just human competitive error rates. We've wanted to give the best error rates we possibly can, the lowest error rates we possibly can. And, you know, the, the, the understanding changes over time, and then it's different for some customers. We have some customers for whom they're like, "I'm a regulated industry. If, you know, a human makes a mistake, I can talk to the regulator about that. The regulator will understand it. But I'm worried that if the system makes a mistake, the regulator won't understand it." We have customers who are in that boat, and, you know...... regulate. This is all so new to regulators. And then we have customers who are in the boat where it's like, "Nope, eh, I can make the judgment call. I know it's superhuman. I believe you have done the trial. Let's go for it." And so, you know, like with the adoption of any new technology, it gets complex. Uh, and Owen, I'm sure you have, y- y- we talked about this a little bit as well.

speaker_2: Yeah, totally. I think, uh, you know, initially as Fergal said, the expectation will just be remarkably high. There's just a great degree of kind of fear and skepticism. And so the lens, the microscope is really on every single interaction. Um, it's almost expected that these new disruptive technologies just won't be as good, and so they need to oversell. But I think as soon as people start to realize the ways in which it's vastly superior to humans, maybe not all the times, but quite often, they'll give it a lot more permission for error. And Waymo has been in the market and with consumers a lot longer than the technologies that we are building. You know, people are not actively talking about the incredible customer service experiences that they've had. Often it doesn't even register that it's AI. And you'll see even with Waymo that people endearingly will forgive its mistakes. "Oh, it like stopped behind this car that was pulled over and I got a little angry at it, and then it kind of like turned, you know, pulled past it." And it's, it's endearing, it's funny. Right? So, uh, I do think that just at large we as a society will warm to the eccentricities of AI and be quite forgiving because we just know how effective it can be. And in customer service for example, it turns out that people really hate having to reach out to human customer service. They know they're typically gonna have to wait, you know, days, maybe hours, but probably days. They're gonna reach someone who doesn't really wanna do their job. They're probably gonna get a crappy half-answer that doesn't really answer the question. Um, the whole thing is just really unpleasant. But when you have a snappy, happy, expert concierge agent apparently very keen to hear from you, and willing and ready to answer in seconds, they're gonna ask a lot more questions. And so a lot of it is just us as a society building more meaningful relationships with these things.

speaker_0: Yeah, speed is really killer. I, I... In preparing for this, I took a look, and it hasn't been my, uh, department for a while, but I took a look into our intercom data. And, you know, for context, we have always really invested in customer service, and you know, have got great people doing it. And they actually do, you know, show up with a smile and, um, interact with customers in a really authentic thick way. And cus- our customers have always really appreciated us for that, and specifically, you know, called out individuals on our team an awful lot. And yet, you know, one thing that it is impossible for the humans to do at the at the level that the AIs can do, is the speed of response. So, we typically respond in two minutes, which for a relatively small company with a small team is pretty good. And yet it's just enough time for the person to tab away and do something else. And then when they get our response, you know, they tab back five minutes later. And the next thing you know, we're like 30 plus minutes to get to resolution, even with everybody being like attentive and doing a good job. And the AI's ability to just be there immediately is like you know, that's a, that's a pretty dominant, uh, advantage that we're not gonna catch up to anytime soon.

speaker_2: Y- it's true. I'll tell you f- a couple of funny stories about that though. If it's too fast, people don't think it really is gonna give it the right answer. That it's probably like a pre-baked, crappy, automated macro-answer. So if it's too fast, people don't quite trust it. Um, and if it's too slow but still way faster than a human, let's just say 20 seconds, people will be pissed off because it's like, "Hey, this is AI." And the reality of where the technology is today is that if you actually gave it a couple of minutes as opposed to require that it responds in seconds, which I have pressured my team a lot to make sure that it did, it can actually do a better job if you give it a little bit more time. We've seen that with, um, you know, the latest models that are really just the same fe- foundational technology applied, eh, eh, eh, you know, at run time, to just have the permission to think a little longer. Um, so the, the response time thing is actually super nuanced. And it also relates to people's understanding and expectations of AI. The, the one thing I will say though, is that despite in some senses my pessimism about how quickly the base models are gonna improve, I do think that they will get just far faster and cheaper. And so as we build all these technologies together, uh, we will be able to probably do the run-time stuff, just quicker. We'll be able to provide that level of response just in the amount of time that people expect.

speaker_0: How do you think about in the... If you kind of zoom out from where the product is today, and the, you know, particular architectural decisions that you've made, and all of the sort of history. How do you know when to... This is like in some ways the most human thing still relative to what AI is, is good at. How do you know when to be willing to consider changing the paradigm? You mentioned like voice, which is obviously a, you know, quite different paradigm than a, a chat. And I think another big trend that I'm sure you're thinking about a lot, is what I call the choose your own adventure style of agent as opposed to the highly structured, scaffolded, optimized, you know, controlled input/output, more like a workflow agent. Seems pretty clear that you started with like a very, you know, methodically task decomposition based approach to getting every little step in the chain working well.... and it seems like, you know, Anthropic and others are now kind of trying to push, like, well, you don't need to do that anymore. Hopefully we're gonna make this thing so good that you just kinda give it some tools, and it'll choose the right tool, it'll kinda figure it out. Um, how would you n-... You know, are you c-... Are you doing, like, radical A/B testing where you have these kind of two different paradigms in competition with, with each other? And how do you think about, you know, if and when it would be time to make a, such a big paradigm shift?

speaker_1: Yeah.

speaker_2: I'll say one quick thing, which is that we're gonna share at our Pioneer event in New York, which is our big customer event, which what date is it, Fergal? October?

speaker_1: The 8th of October.

speaker_2: October 8th. We're gonna share basically, you know, what we've been working on and what kinda comes next. And part of that is gonna be a paradigm change because I think that there is a big opportunity for stepping back a little bit and thinking more broadly about the problems that we're trying to solve. And I think that the, the, the kinda answer bot, Q&A bot thing is fine, but it's, it's still quite limited. So I don't really wanna say more about it than that, but, uh, these things will need to become more agentic, and there's big opportunities for that. But, you know, th- f- Fergal can probably talk to some of the A/B testing that we do do, but it... I don't think it tends to be very radical. There's probably a spectrum from radical to micro-optimizations, and I know that we do stuff in the middle at the very least. Fergal?

speaker_1: Yeah, I'm really excited for, for, for that announcement, eh, at Pioneer. Look, t- to your question, uh, specifically at kind of a technical level, yeah, uh, it's very interesting time for architecture. I can tell you with high confidence that, you know... So firstly, let me say that your, your, your synopsis of Fin is correct. Fin originally very deliberately was this sort of, like, or is, still is, this building block style architecture where you use the models in, as building blocks, and then you, you carefully isolate them, and you carefully test them and optimize them. That's worked really well for us. And we've invested in training our own custom models for some of those building blocks recently that have been, like, more performant, and, you know, way better than, uh, some of the third party models we were using. So, that's true. And some of those building blocks are definitely durable. So in a RAG system, you need a retrieval engine, you need a re-ranker, you need a few other pieces like that that are definitely going to be durable. But I do think there is an interesting architecture switch or push on the cards in future, and Anthropic and others have been pushing it. I can tell you that we're, we're pretty confident that switching to that today would reduce the quality of Fin at kind of the question-answering task, uh, versus where it currently is. And so, like, we, we, we're pretty happy that, you know, for the highest quality, um, that we are where we need to be at the moment. But we have built prototypes there. And yeah, if we get to the point where we have, um, another generation architecture that is like that form you're talking about, yes, we would A/B test that in the end. That is ultimately how we would get a, sort of a full spectrum analysis of, you know, exactly its trade-offs and its strengths and weaknesses compared to our current architecture. And that's how we've done architecture migrations in the past. Internally, we're on probably by generation four of core Fin at the moment, and each architecture change, um, well, we've done something like that. Uh, and some w- w- we've A/B tested in production and scale. And some architecture shifts have been comparative in size to sort of what you're talking about there. So yeah, we'll see. Um, but, uh, a- as Owen says, we will have things to share at Pioneer that are quite relevant to this, uh, from, from a product perspective. So, yeah.

speaker_0: Cool. Well, we'll stay tuned for, uh, the next big update there. Um, when it comes to the impact that Fin has had on your business, as far as I know, I think it was the first pay-per-outcome pricing scheme that I remember seeing with the, uh, in- at least in my, you know, corner of the world-famous 99 cents per resolution pricing model. When I- when we were first scheduled this, I was like, "I wonder if that... I wonder if they still have that?" And sure enough, you do. How is that going? Like, if you could go back and do that all again, um, would you, you know, do that the same way? Would you price it at the, at the same point? Um, and 'cause obviously, like, a lot has changed under the hood, right? Like, we've got per token prices are way down, but amount of tokens put in for, you know, context is, at least in many use cases, way up. Plus you've got thinking, you know, tokens, which means outputs are way up. So I, I don't really know how to guess the cost basis has evolved other than just to say there's clearly, like, strong forces pushing in either direction. Uh, but having anchored at that price point, you know, and probably being pretty reluctant to move it, you know, you've kinda gotta manage against it, I would assume. So, uh, what can you tell me about what's going on under the hood, and then, you know, w- what, uh, what would you say you've learned, and, and maybe, you know, any advice you would give to people that are thinking about a, a per outcome pricing model?

speaker_1: So, yeah, e- exactly as you said, um, you know, core LLM costs have definitely been falling. Cost per token has definitely been falling. I, I guess the converse thing for us is that, like, Fin is doing a lot more work now than it used to two years ago when we launched it, right? So it's like resolving a much higher percentage of your inbound volume, but also it's resolving harder questions. And so, yeah, we have to spend more token budget on that for putting more and more things into our RAG system. And then there's all these others kind of supporting AI systems around Fin. So we have things like our insights product. We have things like, you know, ways we kinda double-check and check, like, the content that goes into Fin. Our chunking strategy is way more complex. And, you know, we've just, you know, we- we've added other pieces to kind of, you know, really make Fin better over time. And so, yeah, so I, I do think, yeah, absolutely, the, the, the core, um...You know, cost per token has come down a lot. But FIN is doing a lot more work than it used to, as we're trying to, like, push up the resolution rate more and more. And, like, we're, we're constantly struggling. Like, our big North Star, everything we really care about is, like, you know, what percentage of your inbound volume are we actually resolving? Under Eoghan's direction, we're investing a lot in, like, making the product better and better and better. And that's the thing we're pushing towards, rather than optimizing purity for cost. So yeah, um, that, that's kinda how I would say it from the, the applied level.

speaker_2: Uh, I'll, um, I'll tell you a little bit about our, our, our story and thinking. But first, I wanna set us apart from some of this, the narrative in the market where people are talking about negative gross margins. When we launched FIN, I was told it was gonna cost $1.21 per resolution. And we decided that, for reasons I'll, I'll, I'll speak to in a moment, that we must be more aggressive. And we really just, we just really liked the attractiveness of $0.99. And so we actually took a hit on each resolution at the start. But pretty quickly then, uh, turned that dynamic positive, and then achieved software-level gross margins. And so, any narrative that these AI products, and particularly the agent products, need to be, you know, uh, negative or even c- crappy gross margins, is, is clearly not always true, 'cause we've shown that to be case. Even while there has been pressure, because we're doing more work. Um, philosophically, we wanted simple pricing, and we wanted pricing that would map to value. We learned a lot of hard lessons from having complex pricing in the past, because we used to do a lot of things for different types of customers. And so we had many aspects to our pricing models that tracked different metrics. And it just became a nightmare, a nightmare for people to track and follow. So we really have been simple to a fault, and probably far simpler than we need to be. There will come a point where certain classes of resolutions that just save their reps an hour, or multiple hours, of work, um, uh, where we will charge more for those. It'll still be super cheap. Uh, I have no idea what it will be. Is it $1.99? Is it $2.99? Is it $3.99? Is it $4.99? I don't know. But we'll always make sure that it is, um, easy to translate, um, from a value perspective, so that it's always a no-brainer for customers. Our own work showed that we spend $26 per resolution, all in. So that's salaries, and everything like offices and benefits and all that stuff. And we've come across companies that say that they are as low as $5. We've come across some companies that say that they're lower. We don't think that, that they're, they're doing the analysis correctly. And so we're highly confident just $0.99 is just a, a great deal. And what we really, really like about it, uh, not, not only that it's, uh, simple or a great deal, is that it also, uh, aligns our incentives. And so, if we charged some sort of synthetic token or for every conversation, it would actually not be in our, in our interest to make FIN more effective at solving customer problems. Look, if we charged for every query we got, w- we're good, we're golden, we don't need to get better. Whereas if we charge per resolution, every time Fergal's team increases that by one percentage point every month, we make our customers way happier and we make our CFO way happier. Um, so, um, it's a just beautiful alignment. The biggest cost to this outcome-based pricing is that it's novel. And so, there's a bit of education. Um, we were certainly the first in the market. My favorite part of that story is that two other companies also announced that they were first in the market. There was Zendesk and I think Sierra did the same. Although, I need to double-click on Sierra. But they made a big deal about outcome-based pricing. Um, but we were f- were, were certainly the crazy f- first people to do that. And, uh, there, there, we had, we did experience an education cost. But when we went back and surveyed customers, and, uh, and tried to find out if they would prefer per conversation costs, which some of our competitors do, um, they said no. Um, and so we think that outcome-based pricing makes sense for all those reasons. And I can't see us, uh, about to change it. Um, and certainly historically, when you read some of the kinda academic, um, approaches to pricing, there's a great old book I read y- read years ago called Pricing on Purpose, that analysis showed that value-based pricing always yielded a higher profit than cost-based pricing. Even though cost-based pricing has a baked-in profit, the problem is that it doesn't properly price discriminate. And, um, people who get more value from your product than others, um, will pay the exact same. But when it's value-based and people say, "Okay, um, yes, I pay more for the, for it doing more work," um, they're happy for th- they're happy for it to do so, because it, it, it perfectly maps the things that are important to them. Yeah, that makes sense to me. And we're trying to move toward, uh, value-based pricing as much as, uh, and outcome-based pricing as much as possible as well. The, the, the one thing I would say is that it's not just an academic problem. So you can come up with all these beautiful theories about how this is simpler and maps to value, but if the education and the friction that comes from that is too much, it might totally fail in the market.... and so you need a degree of kind of artistic license with your pricing, where it's not just science. There's real art. And you need to be willing to kind of leave money on the table in a bunch of cases. For example, we have this really deep insights product, the best in the market, helps you understand exactly what's happening in your business using this modern AI, um, uh, these modern AI technologies. Uh, but to, to this date, we've decided to not charge for it, because we know it makes people better at using our product and it helps them get a higher resolution right out of it. And so that's an example of where we're... or that, that the science has been ignored at Intercom for the sake of simplicity, and the art in pricing, which is making something s- easy to comprehend.

speaker_0: Yeah. Well, it sounds like you feel like you nailed it, and it certainly has a beautiful simplicity to it. So I, um, I can see why it's working. I, I... We don't have too much time left. I did, um, in my, uh, prep for this, I had two things which kind of converged on the same, um, outcome for Waymark as a business. One was I went to the, uh, dev platform docs, and if I could make a customer request, it would be to create an LLMs.txt, if you haven't, uh, in the last like six weeks since I did this. But what I did at the time was sent, um... I guess it was ChatGPT Operator, I don't know if they'd introduced Agent at exactly that moment. Sent it against the docs and said, "Visit all the pages on this docs website, put them all into a Google Doc for me." I think I got to, you know, however many 100 pages as it kind of went back and forth copying and pasting. Then I took that, put it into, uh, Gemini and said, "Create a consolidated, you know, just what I need to know, cut all the crap, you know, that's repeated from this." And it did that and gave me basically an LLMs.txt. And then I went to Claude and said, "Code up this data export for me so I can do an analysis, uh, outside the product." And Claude, by the way, one-shotted that, and so I got my data exported into a CSV, and that was pretty cool. And then I went into the product and I was like, "Oh, I may have kind of actually already created this with these background insights, uh, product-type experiences." So I guess the main quest- two questions there would be, are you seeing people doing a lot more casual hacking against the APIs? And also then, your own proprietary product work, how have you... You told a little bit about it, uh, in terms of just making people better. But I was kind of surprised to see that there was so much happening in the background, that I wasn't even really aware of until I went looking for it. So I'm kind of interested in the, in the product philosophy there, to the degree that there's more that you haven't, um, already said.

speaker_1: I, I, I, I don't know if our API usage has increased.

speaker_0: Okay.

speaker_1: I, I would say like, you know, y- y- you're describing a pretty sophisticated story there. Like, you're not an outlier, but like you're a sophisticated user of AI in our customer base when you tell that story. Um, and so it might just be a little early yet. Uh, absolutely internally within Intercom, we're really pushing ourselves to be early users of all these products. I remember being really blown away by Claude Code recently, when I kind of saw it first. And, you know, uh, we definitely tried to up-level our own team to be good at those things. W- we've really built a big insights product, and there's an awful lot of AI in the insights product t- in order to deliver it. And so I, I guess, I guess I'll, I'll say two things on that. The first thing I'll say is, you know, we built that with sort of, you know, custom or semi-custom AI, rather than doing something like just expose everything to, you know, a deep research-style environment. We did that because we needed to do that to hit the quality bar, the performance, the efficiency. We have some customers that have very high conversation volumes. And, you know, all the models, they have context lens, and the context lens get high, but the fidelity decreases when you try and use those very high context lens. And so, uh, in our opinion, uh, the way we built that was still the right way to build it, which is very much like we use LLMs to process the conversations into kind of like topics and sub-topics. And then we use sort of a, a BERT-style, you know, medium-sized language model to deliver it at scale. And we think that's still the right way to build an insights product today. If you are a medium or large customer, you can't just throw everything into a big context window. Like, we've experimented with that, but the quality of output degrades and you end up with a product that looks nice superficially, but isn't really reliable. So we think that today, to get something really, really industrial strength insights product, you still got to go the way we went. Um, now you're kind of like, "Oh, there's a big insights product there I hadn't really heard about." And yeah, candidly, um, it is difficult to market the depth of technology. We have this very broad product. It's very deep. And, you know, we, like everybody else in this space, are struggling to communicate against all... You know, you mentioned earlier keeping track of AI. It's just so much noise in this space, and, you know, we're still struggling to find the right way to tell customers, "Look, we've built a deep product here." And clearly you, you can see that and you can recognize the depth there. And, um, and a- a lot of folks, uh, ... count, and so we're still working, we're still up-leveling our product marketing game all the time in this new world of AI, to get great at like telling people what it is. Um, I'm just really relieved that we've managed to improve our marketing of like complex queries and procedures recently. We, we had deep technology there for like a long time, that w- was just, you know, people weren't ready for it or we weren't doing a great job at explaining it. But I think, you know, I think insights is, uh, is a deep product as well. Yeah. Eoghan, I'm sure you have, you have thoughts on that.

speaker_2: No, o- only to say that I agree with you that, um...The depth of usage is, is just a problem amongst all software vendors. You know, it's, it's customers discovering, um, the full things they've poured blood, sweat, and tears into. It's just a problem. And it's not, you know, uh, anyone who imagines like some better onboarding pop-ups is, is a solution are completely wrong. I mean, yeah, just the amount of just attention, uh, to, to go around is just at a bare minimum. There's just so much contention for, for that attention. Um, it's just a giant problem. So this is where some of these kind of high-touch, you know, more enterprise-oriented businesses are being affected now because the, the, the vendor is stuck and stitched into the organization. The organization has committed very large amounts of money to use a product, and they get the benefit of their attention and an opportunity to train them on the whole thing. But if you're kind of more in the kinda like upper end of mid-market, bottom end of enterprise where we are at, um, having everyone discover your features is hard. That said, you also want some degree of progressive discovery. Uh, uh, you can't overwhelm people, and most people do find this, uh, functionality as you've, as you finally did. And doing so brings a, a level of connection and c- and kind of commitment to that product, um, that, um, is invaluable. And it'll be very hard for you now to go and move to one of our competitors 'cause they don't have this damn Insights product. So it's a, it's a not- not a solved problem. And trust me, it's something that causes me a lot of heartache because you got great people like Fergal k- killing himself to build stuff. And then it's like, "Well, don't look at the data, but only 26% of people actually use that feature."

speaker_0: Yeah, that was, um, I mean, OpenAI just said that, I think, it was only 7% of people were using reasoning models before GPT-5. So you're in good company there. Let's, um, maybe zoom out from all the AI product work, and I really appreciate you getting so deep into the weeds with me on that. Um, and just think about the rest of the company. You mentioned like you've kind of slowed hiring on, um, human agents and Fin is doing a lot more of the work. Is there a similar impact when it comes to hiring junior developers, and how are you handling the adoption across functions? You know, is there a sort of everyone must use AI kind of mandate? Is there, you know, a, a budget that people are kinda go, you know, go- go forth and spend it on whatever tools you think are interesting and cool? Uh, or have you picked, you know, something that, you know, is, is sort of the, the official stack? Um, just really interested in how you're thinking about all that. And also, I guess, if I could tack on 1B, does this all mean that you will be, uh, potentially even more ambitious in terms of expanding into sort of adjacent niches? Like one thing I do wonder about is whether companies with a certain platform will see... You know, maybe in the past, I, I had to be focused, but now I can kinda layer on, you know, or, or append these, um, adjacencies and actually feel like I have a, a good chance of making it work because I get all this productivity benefit from AI. So there's a lot there. You can take it in any direction you like. But, um, just mess-

speaker_2: I have a number of things... Okay, I have a number of things to say to cover that I think Fergal can probably speak to maybe developer efficiency and th- there's certain things we're doing there. First thing I'll say is that I'm actually between two minds on the topic of AI usage in organizations. On one part, I think that companies that don't adopt AI broadly will become the dinosaurs of this industry. And I believe that young companies that are just AI native from the get-go have great advantages and a lot of efficiency and can move more dynamically and maybe even be more creative. That said, there's just so much AI out there, over funded, overhyped. There's like 10 products in each category. And I don't trust, you know, a company of 1200 people, um, with a big budget, um, to properly discern what's n- n- net valuable or not. If they're told, "Use more AI," you can be guaranteed they'll buy a bunch of crap. And I've already seen us adopt some big AI platforms with big fancy names, and I'm like, "Wh- wh- what, do we need this?" And probably we did, and maybe I'm like the old guy, and I'm like too grumpy and, and, and jaded and cynical. But I think we're gonna... I'm s- quite certain that in a couple years, we'll, we'll have a more nuanced view of what AI was outstanding for and what was kind of, uh, a bit of a pipe dream. So that, that's the broad piece. Uh, I'll speak to the, the, the idea of going into more areas and spaces. So this is part of what we're gonna start to talk about soon, but like it's just patently obvious to us that people are not gonna have multiple agents talking to their customers. It makes no sense. You're gonna need, um, a coordinated approach to solving customer problems. Um, if you have a sales agent with one set of goals and a service agent with another set of goals and an onboarding agent with another set of goals, they're gonna be competing against each other. And if they come from different vendors, they'll have different styles, different approaches. They may even have different interfaces. How can you track goals across all those, uh, all those agents and effectiveness and really coordinate and orchestrate them? It's not gonna happen. It's not gonna happen. So it's quite obvious that for Fin, which is the leading service agent by customer account revenue performance, we win all of our head-to-heads with our direct competitors. And in our benchmarks, it's quite obvious that we need to also be, um, the leading customer agent. And so we'll work through the entire customer life cycle, uh, and make sure that, um, not only do people not have to...... split between different agents. But we can finally realize our dream of this beautiful, high-touch concierge experience for every single customer, where they have the attention and personal treatment that we've all dreamt of giving our customers. It's not actually possible. So that's the way in which very holistically and in a very qualitative way, um, these agents are gonna be so much better than humans. You're talking, uh, and we spoke earlier today about how they're faster. But that's actually quite quantitative. But you're gonna see pretty soon when we... R- really properly push the fact that Fin is a customer agent. And many people are doing things with it today, and we have it deployed in some other use cases, which we'll t- talk about at our event. You're gonna see people then really start to get excited about the experiences that they're having, you know? Uh, incredible levels of attention and personalization. So that's the thing that we're really focused on. When it comes to engineering efficiency, what I'll ask Fergal to speak to, if he has any take on this, is the fact that our, uh, CTO, Dara Curran, did decide some number of quarters ago that we were gonna actually, uh, uh, 2X our, our engineering output, uh, and our efficiency. And so I thought that that was quite cool. Uh, I actually didn't have a good read on how f- how doable or not that is. Some people on the outside of the company said, "Oh, that's easy." But I actually think that's probably a pretty lofty goal. Have you any take on that, Fergal, on how far we are from, from getting there?

speaker_1: Yeah, I mean, uh, i- it's pretty funny. Earlier you talked about, uh, just mandating overall improvements and like, people are gonna go and buy crazy tools. And look, you know, uh, the, the 2X thing I definitely had mixed feelings on. Because, you know, eh, we, I think, we, we certainly put it in the perf system in some form, and yeah, that can always be a dangerous thing to do, right? Y- y- you know, people will start optimizing for it in a, in, in the wrong way. But on the other hand, it's like, there's a change in the world and like, people are resistant to change. People will not properly adopt a new technology because they're busy and change is hard. And so, you know, you do need to kind of say, "Hey, this is a priority for us as a company. We are carving out time, we're gonna make you focus on this." And so I- I really appreciated the push of it. Within my group, the AI group, you know, in one way it's like, "Well, we're all AI technologists, so we're gonna be great at adopting this new thing." But on the other hand, it's like, "We're all AI technologists. We better not take it for granted, we better not assume we're great at adopting this new thing." And I remember, um, using Claude code. I came across it online like, really soon after it was first publicly released, and I played with it. And I really... I- it took me out for that week. I spent all that time that week like, hacking on it, like late into the night, building prototypes and just being like, "Wow, there is a change here." And then I went and I, you know, demoed it to the AI group at my all-hands on Friday. And like, some people were pretty skeptical. Some people were like, "We already use Cursor." And I'm like, "Yeah, Cursor's great, but like, this is different." And you know, so I think people do need a push. And you, you have to push yourself constantly because it can be fatiguing. And you know, to experiment, and a lot of those experiments will be wrong. And like, look, you know, how close are we to 2X? I don't know. I don't even know if it's achievable. And it's one of these, it's one of these soft things. I know for a fact there are things that we do now that we just wouldn't have done before, due to the reduction in friction. I can like, hack together a prototype. A designer can hack together a prototype. And so th- there's definitely qualitative changes there. You could fight endlessly about the quantitative impact, you know? Whereas the bottleneck and those law like, you know, re- if you improve part of the system, how much does the system overall improve? But like, there's no doubt that there are incredibly powerful tools here and it is a mistake to ignore them. You've got to engage with them positively and optimistically. You've got to try it out. And then yeah, y- you can't be dogmatic about it. You've got to be pragmatic and stern in the end and be ruthless and like, "Okay, is this a toy? Or is it really valuable?" And if it's valuable, double down on it. And I think that's it. That's the, to sort of the, the explore exploit process. We want all our people running. We want them to like, not be stagnant. We want them to be curious and optimistic and like, trying out the new technology. And then be rigorous about thinking, "Is it... Did this really make me faster or did it make me slower?" And when it works, double down, and double down on similar things in the future. And I think Dara's push, I think the company is moving to a posture like that. Um, there's skepticism, but there's optimism too. And I- I think that's probably the ideal take, optimism and skepticism together.

speaker_2: Hm.

speaker_1: Yeah.

speaker_2: I think that's a great note to end on. This has been a fantastic conversation, guys. I really appreciate, um, the level of depth and detail and, uh, simultaneously the level of strategic vision that you've been willing to share. So, definitely we'll keep our eyes open for your upcoming announcements at the Pioneer event. You wanna tell us when and where that is again?

speaker_1: It is October 9th in New York City, and it's gonna be a very exciting event.

speaker_2: And it'll be live-streamed also, for anyone that wants to watch it online.

speaker_1: New York City and on the internet.

speaker_2: Cool. October 9th. Um, thank you again, Eoghan McCabe and Fergal Reid from Intercom. Thank you both for being part of the Cognitive Revolution.

Escaping AI Slop: How Atlassian Gives AI Teammates Taste, Knowledge, & Workflows, w- Sherif Mansour

AI in the Cancer Journey: How I'm Using AI to Help My Son

My Positive Vision for the AI Future, from the Existential Hope Podcast

The Customer Service Revolution: Building Fin, with Eoghan McCabe & Fergal Reid of Intercom

Watch Episode Here

Listen to Episode Here

Show Notes

Transcript

Introduction

Main Episode

Read next

Escaping AI Slop: How Atlassian Gives AI Teammates Taste, Knowledge, & Workflows, w- Sherif Mansour

AI in the Cancer Journey: How I'm Using AI to Help My Son

My Positive Vision for the AI Future, from the Existential Hope Podcast

The Customer Service Revolution: Building Fin, with Eoghan McCabe & Fergal Reid of Intercom

Watch Episode Here

Listen to Episode Here

Show Notes

Transcript

Introduction

Main Episode

Read next

Escaping AI Slop: How Atlassian Gives AI Teammates Taste, Knowledge, & Workflows, w- Sherif Mansour

AI in the Cancer Journey: How I'm Using AI to Help My Son

My Positive Vision for the AI Future, from the Existential Hope Podcast

What is Catholic AI? Technology Meets Theology, with Matthew Harvey Sanders, CEO of Longbeard