In this episode of The Cognitive Revolution, Nathan interviews Michael Boyce, Director of DHS's AI Corps, about bringing modern AI capabilities to federal government.

Watch Episode Here

Read Episode Description

In this episode of The Cognitive Revolution, Nathan interviews Michael Boyce, Director of DHS's AI Corps, about bringing modern AI capabilities to federal government. We explore how the largest civilian AI team in government is transforming DHS's 22 agencies, from developing shared AI infrastructure to innovative applications like AI-powered asylum interview training. Join us for an insightful conversation about the intersection of artificial intelligence and public service, and discover why AI professionals should consider a career in government.

Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse

SPONSORS:
Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive

SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognit...

Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive

80,000 Hours: 80,000 Hours is dedicated to helping you find a fulfilling career that makes a difference. With nearly a decade of research, they offer in-depth material on AI risks, AI policy, and AI safety research. Explore their articles, career reviews, and a podcast featuring experts like Anthropic CEO Dario Amadei. Everything is free, including their Career Guide. Visit https://80000hours.org/cogniti... to start making a meaningful impact today.

RECOMMENDED PODCAST:
Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more.
Apple: https://podcasts.apple.com/us/...
Spotify: https://open.spotify.com/show/...

CHAPTERS:
(00:00:00) Teaser
(00:01:00) About the Episode
(00:03:38) Introducing Michael Boyce
(00:05:49) What is Homeland Security?
(00:09:52) History of AI at DHS
(00:13:15) Generative AI at DHS
(00:16:03) Structure of the AI Core (Part 1)
(00:18:17) Sponsors: Shopify | SelectQuote
(00:20:51) Structure of the AI Core (Part 2)
(00:22:04) Opportunities for AI at DHS
(00:25:34) Bureaucracy Hacker
(00:30:34) The Manager's Role
(00:35:24) Sponsors: Oracle Cloud Infrastructure (OCI) | 80,000 Hours
(00:38:04) Internal Chatbot Project
(00:43:28) AI Role Playing for Training
(00:49:55) A Request for Startups
(00:57:46) Generative AI for Quality Check
(01:03:20) AI Training at DHS
(01:06:07) Metrics and the Future of AI
(01:13:26) Non-Generative AI at DHS
(01:19:08) AI and Automation at DHS
(01:23:03) Join the AI Core
(01:28:39) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://www.linkedin.com/in/na...
Youtube: https://www.youtube.com/@Cogni...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...

Full Transcript

Transcript

Nathan Labenz: (0:00) I'm not here from Silicon Valley to save anybody, but what I am here to do is to bring a bunch of really smart folks, both from inside and outside the federal government, into focus on AI to partner with the people who want to be partnered with. The most important thing is to listen. The most important thing is to learn. These are complicated bureaucratic structures, and relationship building, all those soft skills are just as important as the hard technical skills when you're trying to make progress here. If you don't think federal government agencies are interested in leveraging AI, I have a bridge to sell you in Brooklyn. They are very interested in doing it, but they're also looking for companies with really strong expertise and who can really get to niche services with niche perspectives. You kind of can't actually beat the level of difficulty and challenge of the problems because the problems aren't only technical, although there are many technical problems. They're societal, personal, politics. You're dealing with, by many definitions, the largest organization in the world and figuring out how to navigate that is an incredibly complicated and challenging piece.

Michael Boyce: (1:00) Hello, and welcome back to the Cognitive Revolution. Today, I'm excited to share my conversation with Michael Boyce, director of the Department of Homeland Security's AI Corps, which is bringing modern AI capabilities to one of the largest departments in the United States federal government. While I've spent the last year exploring AI R&D applications and impacts throughout the private sector, how exactly the US civil service is approaching these technologies has been admittedly a bit of a blind spot for me. But as I learned in this conversation, there is already quite a lot going on. The DHS AI Corps, which Michael leads, is meant to become the largest civilian AI team in the government, and they're responsible for building both shared AI infrastructure and a range of special purpose applications to help the department's 22 different agencies begin to capture the benefits of modern AI. As a 10-year member of the department himself, including four years as a frontline refugee officer, two as a bureaucracy hacker at the US Digital Service, and four more as chief of innovation and design at the US Citizenship and Immigration Service, Michael combines a long-standing passion for technology with a deep understanding of both the incredible potential and unique constraints of bringing AI into government work. In this conversation, we discuss how the department was formed, what its IT infrastructure and AI adoption has looked like historically, how the department approaches information security and privacy now, and what all of this implies for its ability to use the best AI technology available commercially today. Overall, I was impressed by how reasonable and balanced the approach seems to be. We also get into the details of specific AI applications that Michael and his team are developing, including a training application in which officers conduct practice asylum interviews with a large language model playing the role of the asylum seeker. Finally, Michael makes the case that more AI researchers and builders should consider a career in government service, which he describes as a rewarding opportunity to work with unique datasets, tackle complex consequential problems, and earn what I found to be a surprisingly competitive salary, all while serving the public interest. As always, if you're finding value in the show, we'd appreciate it if you take a moment to share it with friends, write a review on Apple Podcasts or Spotify, or just leave a comment on YouTube. Your feedback and suggestions, including for more AI leaders inside the US government, are welcome too. You can contact us via our website, cognitiverevolution.ai, or feel free to DM me on your favorite social network. Now I hope you enjoyed this window into how one of The United States' largest federal departments is adopting AI with Department of Homeland Security AI Corps Director Michael Boyce. Michael Boyce, of the Department of Homeland Security AI Corps. Welcome to the Cognitive Revolution.

Nathan Labenz: (3:46) Thank you. Thanks so much, Nathan. It's really great to be here.

Michael Boyce: (3:50) I'm excited for this conversation. I'm on a very wide-ranging scouting mission to understand how people are developing and also applying AI in all facets of life and society. And the US federal government is honestly a bit of a blind spot for me. So I'm looking forward to learning a lot over the next hour and change and coming away with a much better understanding of what's going on in the US government right now. Just for starters, I know your position is a career civil service position and obviously we're in the midst of a presidential transition. So I wanted to just get the ground rules on the table from your perspective. What can you talk about? What can't you talk about? Is there anything that is a no-go zone? Just to make sure that I'm taking us in a direction that works well for everybody.

Nathan Labenz: (4:40) Yeah, Nathan, again, thanks so much for having me here. Excited. The department's really excited to engage with this podcast, with the community of your listeners. Absolutely. I think existing federal government agencies, I'm a huge podcast addict myself, and I don't often hear existing federal government employees talking, especially in the AI podcast sphere, about the awesome work that's happening here. So, again, really excited to be here. And you're exactly right on your question. I'm a career civil servant. So in addition to laws like the Hatch Act that define what I can and can't say, I know the recent election is high on people's minds, but really my role is not a political role. I'll be here through the transition. And so the focus of my conversation here will be on that core functioning of the government, and I'm not really going to get into politics as much on this call. I'm sure there's plenty of podcasts for folks who want to dive more into the politics side. That's going to be a little less my role here.

Michael Boyce: (5:40) Yeah. We've dabbled only a little bit in that in recent times and don't plan to make it too much of a focus, but it is certainly on folks' minds as you recognize. So let me start with a real naive question next. What does the Department of Homeland Security do? I know in my research that it was created after 9/11 and combined a bunch of different agencies into one, and that can be your springboard into telling us what it's like today.

Nathan Labenz: (6:08) It is a great question. And I'll say, for me personally, since the age of 25, I've never received a salary from any organization other than the Department of Homeland Security. So to me, it feels very natural. But, of course, when I'm trying to explain it to folks, usually, the ideas that come to people's minds are Homeland or something they've seen in the media or the news. The Department of Homeland Security, I think, at a really high level, and we can quibble over the Veterans Affairs hospitals, but at a high level, it is the largest civilian agency in the federal government. There are 260,000 people that work in it, and I think actually some additional contract, non-permanent career staff as well. And then within it, we have 22 agencies and offices that make up the department. So if you ever want to play the most boring cocktail party game in the history of the world, you can name different federal agencies and ask someone, is that in the Department of Homeland Security or not? So I'll list off a few that I think probably many listeners have heard of. The Federal Emergency Management Agency, that helps us prepare and respond to natural disasters, is inside of it. The Coast Guard, that mans our maritime border, is part of it. Secret Service that protects the president, key individuals in our political system or visitors, is also part of it, as well as a whole bunch of other missions. Our immigration agencies are, of course, a critical part of the Department of Homeland Security, as well as the agency that interacts the most with the American public on a daily basis, the Transportation Security Administration. Relevant to the techies who listen to this, we also have our Cybersecurity and Infrastructure Security Agency that deals with not only protecting our critical infrastructure, but protecting our cybersecurity both for the federal government and for the public. And I probably missed a half dozen other really incredibly important departments. What I would say on that though is, if you want to think on a high level, at least the way that I think about it, the Department of Homeland Security fundamentally is about preventing and then responding to when bad things happen. I mean, there are some exceptions, but we are about borders. We are about security. We are about preventing. And then we also have a large service and delivery aspect. So we determine, among other things, who can become a citizen in the United States. We determine what goods and what customs, what packages, what shipments can come in and out of the country. We also determine when people, going back to FEMA, when people survive a disaster, the benefits that they're entitled to. So I think while we are also the largest federal law enforcement agency, we also have a huge wing of it that provides public-facing services and benefits in a civilian capacity, which you wouldn't think necessarily if you don't know very much about an organization named the Department of Homeland Security. But that gives a broad overview. It's a pretty interesting, fascinating organization.

Michael Boyce: (8:59) Just quickly on the 260,000 employees, I assume the ratio of career folks that will be there providing continuity through the transition is like 99.9% plus, and it's just a handful of people that are at the highest level that are going to be replaced with new appointments. Is that accurate?

Nathan Labenz: (9:22) That is exactly right. So I sit under our Chief Information Officer's office. And so of his roughly 700 federal employees, and I think it's around maybe 3,000 employees if you include our contract partners, there are three politically appointed individuals in that whole organization—himself and then two of his senior advisors. So that gives you a rough sense of the breakdown, and that's been my experience when you move across the different components of DHS. That's right.

Michael Boyce: (9:52) Gotcha. Okay. So the department is roughly 20 years old, and that makes it older than the generative AI moment for sure. And you've been there for a number of those years. How about a little history just for context? I think people have a sense, when my dad tries to call the IRS, which he occasionally does, the sense that people come away with from the federal government is not being on the cutting edge of IT, let's say. So what is a brief history of information technology and earlier eras of AI application at DHS?

Nathan Labenz: (10:35) Sure, absolutely. And then here, of course, my policy government brain immediately kicks in. Because when you say AI, I want to define AI and make sure we're talking about it precisely. The department probably traces, and we use most of our standard definitions of AI, we generally see ourselves as having used AI for roughly the last 10 years. And actually due to a previous executive order, actually not the most recent one that came out, what is it now, last year, but actually one under the previous Trump administration. Since then, most agencies and certainly all the big federal government agencies have provided very transparent inventories of all of their AI use cases. So if you go online to dhs.gov/data/ai_inventory,

Nathan Labenz: (11:18) you will see, I believe it's over 40 right now, publicly available reports of here's how we're using AI in the department. So really, AI is not actually that new to the department. Now certainly when you're talking more to my tech colleagues, I think there are some ideas around how we're using technology in the government. Certainly it can be, you get a range of different experiences. The government is a massive organization and so you have a massive different set of experiences. I really think DHS has come quite a far way. And in fact, I spend a lot of my time trying to recruit people to join the department. What I often don't tell people is, you might be sure if you have your two-person startup and you kick up your Mac and you can do whatever you want, you'll have some freedom there. But you might actually be pretty pleasantly surprised at some of the capacities that you get when you are in a massive organization like Department of Homeland Security that has access to some of the most rich and interesting data in the world, has some of the most interesting missions in the world, some of the most complicated problems. So I would also say that DHS has always been quite forward-leaning in terms of its technology. Probably for most people's day-to-day lives, they see it when they go to the airports. Right? We interact more with the American public than any other department in the federal government, and that's because of the airports. So you probably notice when you come back from an overseas country and you go to the CBP kiosks where you have to scan your face, or now more and more, and it's optional, but same thing with TSA. When you're holding up that, when you put your ID in the scanner and then put your face in and it's checking your ID against your face to make sure that there's a match there. We've been using AI for a long time and we're constantly innovating around it in unique spaces like this to both make sure we're protecting the homeland and also to ease a lot of the services and the critical things that we're providing. I can probably go on and on, but that's a pretty good summary of how I think broadly about AI in the department, its history there.

Michael Boyce: (13:15) Yeah. It makes sense. So I guess the big phase change that has brought so much more attention and investment and people into the AI space, probably defined, I would say, has been the shift from purpose-built narrow systems that do one thing and you try to get it to do that thing as well as possible, but you have a pretty good sense that it's not going to do anything else besides that thing. Right? Your facial recognition systems are only going to do facial recognition. And now we're moving into this era of generative AI where they're general-purpose and open-ended. Also, I mean, not that the old systems weren't prone to mistakes, but the new ones are prone to different kinds of mistakes, can be quite unwieldy. That I imagine must present a ton of different challenges. Let's maybe start with, and I actually don't know the sequence of events. Did the AI Corps get created first or did the executive order come first and the AI Corps got created downstream of that? But what's the recent history of executive decisions and implementation?

Nathan Labenz: (14:17) You know, it's so funny that you bring this up because before I took this job, I was on loan from DHS to an office called the Office of Management and Budget, which is part of the White House and focuses a lot on the internal administration of the government. And I remember getting there in what must have been late 2022. I joined the internal AI policy team. And back then, you talked to a lot of people who would say, oh, that's interesting that you're doing this fun little fad. And then right after that ChatGPT came out, and so I watched as inside of the government, just as outside of the government, people—the tenor and the interest in this built and built and built. I think purely from a sequencing perspective, we announced the AI Corps, the department announced AI Corps back in February. And so that's after the recent executive order, which called for a national AI talent surge. And I think the department was really proud to help support a national AI talent surge with, frankly, what will be the largest civilian AI team in the federal government. We really want to put our money where our mouth is. This was not about posting one or two more AI positions. We really believed in bringing in a significant amount of AI talent. And while certainly there are many different needs from policy and governance, I think the focus of my team was to bring in experts who could help with the delivery and the mission implementation piece. So we can definitely get into this more, but I think the department, while there are absolutely risks and things we have to do smartly with these new technologies, the department wants to leverage generative AI and these new paradigms of AI to help make sure that our mission is moving forward while, of course, protecting civil rights, protecting privacy, and all of these key pieces. But we can definitely get into this more.

Michael Boyce: (16:02) Yes. How are you guys structured? My sense is that it's a kind of internal consultancy or a center of excellence sort of model.

Nathan Labenz: (16:15) That's a great question. So myself, I've been a career civil servant but had the chance to work at another team called the United States Digital Service. And actually, our Chief Information Officer, Eric Hysen, was also a member of the United States Digital Service. So Eric, in this whole effort, is partially his brainchild, really wanted to bring that same model of, I think an internal consultancy is a nice way of putting it, of outside experts, also even some career civil servants we now elevated to focus more on AI, but a focused team that will bring an injection of talent focused on AI into the department with a service implementation and delivery perspective. We do have what I call our roaming consultancy. So as we mentioned, the department is 22 offices and agencies. So probably about half of our team, we are targeting right now and still being built out, will be teams of two to five going in, partnering hand in hand with our component partners to deliver on some critical AI priority. Another fifth of the team roughly will work on scaling AI adoption, which really today kind of means policy and governance and oversight. I think over time, it will also mean shared infrastructure and other critical things that are truly cross-cutting across the department. And then the last piece, what we're talking about is what we're calling our incubator. So in addition to the other two pieces, we also want to make sure that we're centrally owning some of the products and some of the AI delivery. It's good for a number of reasons. Both we have the ability to take on risk of new types of AI ideas that may be, a more operational office, they don't have the time to play around and see what might be the art of the possible. I think I'm also a deep believer in, if we can't walk the walk and talk the talk, what are we doing going out to other people and telling them how they can better deploy AI? So we have these three structures that we're molding around right now, and that's been a good model so far. And we've been able to make initial progress, and the team has only been around for six months. A lot of initial progress to advance on all three of those fronts.

Michael Boyce: (18:18) Hey. We'll continue our interview in a moment after a word from our sponsors. So I've never done anything like this at the governmental level, although I did, still waiting to hear back from the city of Detroit where I got connected to a friend of a friend. And I was like, hey, if there's anything I could do, let me know. It's been crickets there. But even in my experience working with companies in the private sector, a challenge that I often encounter is, is it being demanded or is it being pushed on people? I feel like there are three, you can maybe even identify more, but you've got the presidential level executive order saying, we got to do some stuff, or at least we got to make a plan to do some stuff. You've got the people that are trying to do their jobs that might have an itch or an instinct that, hey, maybe AI could help. I've been hearing a lot about this. Then obviously, you have your team that's presumably evangelizing and trying to show off cool stuff.

Michael Boyce: (19:21) What is dominating, if anything is dominating? How would you characterize, maybe a better way to ask it, where the opportunities are coming from? Are they being pushed on people? Are they being pulled from leads, or are you guys having to go out and inspire?

Nathan Labenz: (19:35) You know, it is such a great question. I think the unfortunate answer to that question is yes, it's all of the above. And it's because the department is such a diverse and complicated organization that you really have real pockets of grassroots innovation. You have more senior leaders who have a vision and a path that they want to go on. And so, I mean, one of the reasons that I think I was brought into this role is I have been in the department for a while. I think they really wanted to bring in somebody who has a track record of being able to deliver in the federal government context, but also knows the department, knows what the challenges folks are facing. I'm not here from Silicon Valley to save anybody, but what I am here to do is to bring a bunch of really smart folks, both from inside and outside the federal government, into focus on AI to partner with the people who want to be partnered with. And I will say, look, from a broad perspective, I mean, just as there is outside of the government, inside of the government, there is tremendous demand to use these tools. What is it that they say, that ChatGPT was the fastest adopted online website in history? We certainly, I rarely go past the day, especially when I'm talking to folks who I'm meeting for the first time, who don't mention, hey, Michael, I have an idea or I've been playing with ChatGPT or I've been playing with Claude or I've been playing with some other product, and I really think we could do this, this, and this using AI. But, with these broader structures, I mean, this is I think anyone who's thinking about working in the government, working in public service, this is why I've always found it so compelling, is the challenge there too. It's not just, okay, it's great to have a great idea, but there's this whole complicated set of the broader people problems, organizational problems. Well, okay, you have a great idea, but how do you make good business cases to get the budget for that good idea? How do you structure it in the organization such that it's sustainable? Maybe you make a pilot and then it dies and never goes anywhere. And all of those pieces of how do you go from this moment of I have a good idea to now you actually have a thing that is working, by the way, that also protects people's privacy and protects critical information and protects people's rights and is sustainable and aligned with the direction of politics and also is a good steward of taxpayer dollars, how do you make all those pieces come together whenever you have a good idea? So that's also been part of the challenge. And so if we went through various different scenarios, there'd be some where we source the grassroots and we bring it up. There'd be other times where we start at the top really, and then use that as a chance to go down. The last thing though I will say about this is DHS actually, in addition to my team at AI Corps, recently started an entire directorate focused on customer experience. And so that focus that DHS has on customer experience, focused on human-centered design, also impacts and affects everything that we do. So when we're working, this is not, if you've ever worked in a large organization with legacy systems, you know the kind of, there's a committee writing waterfall requirements. I mean, my team will be cross-functional teams, product managers, designers, engineers, talking to users, understanding their needs, of course, working with stakeholders, getting business input, but we are very much committed to making sure the way that we deliver is focused ultimately on solving the end user's need, whether that be an internal end user or a public-facing end user as well.

Michael Boyce: (23:01) So how do you square all those circles at the same time? I noticed in your LinkedIn that you, at one point, had a title of bureaucracy hacker, which is one of the better titles I've come across. What does it mean to be a bureaucracy hacker? And I guess maybe two, there's also a question around your own background in AI. It seems like you're relatively, if I'm maybe I'm missing something, but it seems like you have not been a long-term AI guy. I actually think that's fairly normal in that there's not a lot of experts running around, especially in these new AIs. I personally always say, Malcolm Gladwell says you need 10,000 hours to be an expert. Nobody has it. So we're all just still on our way to any sort of real expert status. My guess is that it probably would be more important to be able to wrangle the bureaucracy or hack the bureaucracy than to know the latest prompting techniques. But yeah, tell me how you've ramped up on AI and how you're wrangling all these different challenges. Nathan Labenz: (24:06) Yeah, I think the background question's a great one, and let me go to that. I am certainly not an AI guy. By the standard definition of AI guy, I probably do not fall into that one. I will say to do a deep cut, in high school, I went to a very weird, big public high school in New York City that had a really strong computer science program. Actually a friend of mine, this must have been back in 2007, wrote a neural net that we used to find Waldo and we were, I think, semifinalists in the New York City Science Fair. So I always had a strong interest. Frankly, I went to college and then found it far more interesting to go to a seminar than to be late in the computer science lab doing problem sets, and so sort of moved away from that. But then when I came back to the government, I actually really found that technology was consistently where I thought I could have the most impact across the board. And so my story, to continue to bore the listeners by telling it but at least provide some context, is actually my first job was I was an immigration officer who would actually go to different countries all over the world and determine which refugees were qualified to be resettled inside The United States. So I would sort of interview them, I would hear their stories, I'd make sure they're legally qualified. I also work for the Department of Homeland Security, so I would also have to be involved in doing security checks and making sure that we don't have information that would lead us to not want that person to come to The United States. It was a very fascinating job, but while I was there, most of the problems I found myself interested in were problems of process and specific problems of technology. So that's where I did dust off my reasonably serious—I probably had about two to three years of college level computer science and just started writing code, frankly. And one thing led to the next. Ultimately, I did work for the United States Digital Service, which is sort of a kind of White House, they call it sort of a SWAT team for technology that would go to different agencies and try to improve the work. And then after that, my next step was actually going back to one part of DHS called U.S. Citizenship and Immigration Services, where I really oversaw technology for both sort of our overseas immigration, humanitarian processing, as well as the domestic asylum system. And there we did do some pretty major machine learning projects with an awesome team. We did some, I thought, very, very innovative work in machine learning. That was probably in the 2017, 2018 period. And then, a couple more steps, but finally ended up here. So again, big ramble, but to go back to your core question, where do I come from? I am technical enough to be dangerous. I do have on my computer, I've spun up my side projects with, I don't know, Streamlit or I'm trying to get my Next.js to work with this other Node API that I spun up. But absolutely, I am not the person making the next frontier large language model. I have a team of people who certainly could do that, and I think I want to have enough technical credibility that when I'm talking to them, we can be speaking the same language. But ultimately, I do see my job as a guide. I'm here to, as folks come here, help them figure out how they can devote their energy to be maximally impactful and maximally helpful. And also, by the way, reminding them how to be maximally helpful too. Because I think when folks, especially folks when they're new, they want to come in and immediately show value and immediately add and immediately improve. And usually, I'm the one saying, the most important thing is to listen. The most important thing is to learn. These are complicated bureaucratic structures. And relationship building, all those soft skills are just as important as the hard technical skills when you're trying to make progress here. So you're right. My job really is about being that guide, but I am proud that I've been able to dabble enough in serious ways into the technology that I don't feel totally out of my lane or out of my ability to be helpful.

Michael Boyce: (28:00) Would it be fair to describe your role as kind of the manager's role? You might have what is called in Silicon Valley the manager's schedule, and then the people that work on your team are more the makers and you're kind of helping focus their energy. And I know one of your priorities is recruiting for the team over time. So to kind of paint a picture of the sort of protection and facilitation that you provide to people that are doing the hands-on technical work.

Nathan Labenz: (28:32) Absolutely. So I sit within our chief information officer. I actually report also to our chief technology officer and deputy chief technology officer. That CTO role within the broader organization right now is sort of the key AI area at the Department of Homeland Security. A lot of our governance, a lot of our policy, a lot of our oversight comes here, and that's also why they want to put the AI Corps inside of it, because they also wanted that delivery arm, that implementation arm to be tied in with those other critical parts of our AI work. And so where I sit here is sort of between the broader strategic direction of the AI and then my staff that, as I mentioned before, are a very cross-functional team. ML engineers, data scientists. We have some actually more security-focused engineering roles as well as even some policy roles because oftentimes it's a policy issue that's preventing us from delivering as well or from making progress or building relationships with colleagues. So there's that role. And then on our bigger projects, we'll usually have a team lead, a coordinator, someone running those larger projects. And then I have two deputies. Actually, right now I only have one permanent, but I essentially have a bunch of people filling in as I finish staffing up my team. One, we call the deputy for delivery, who's really focused on the projects. What are we working on? What are we doing? And then the other focused on operations. This endless amount of work in terms of how do we build the team? How do we build the culture? How do we make sure people are talking to each other? How do we get people hired? How do we make sure we're getting the right budget? And all those sort of pieces. So the other interesting piece that's been fun, but also frankly wild about this, and I've had an amazing team that have supported me the entire way, is we've grown extremely fast. So I started in April. I had my first three employees in two weeks after I started in May, and we are now, I believe, at 33 people. And so any manager who's led a team that's basically added five people every month, if not faster, would know that that's a pretty decent scale for a team. And then in the government, people expect you to be a solid team. They expect you to have been there for 10 years, to know all the rules, to know all the relationships. And so we're both trying to grow really quickly, scale our own practices as a team that is getting large while also being responsible and organized sort of adults in this world of a broader organization where we're expected to be informed, be productive, know the rules, and that's been a really interesting blend. So what is my schedule? My schedule is I do try to keep quarterly check-ins with all my team. We'll see if that can last, but I have a fully remote team and I think it's really important to sort of check in on how people are doing at all times. I have your standard standing meetings with my bosses, including right now a weekly meeting with our Chief Information Officer. My other bosses attend where we review the project priorities as well as key staffing issues. I also maintain a lot of meetings and touchpoints, usually on a more monthly cadence with key component colleagues. We talk to the CISA Chief AI Officer every month. We talk to colleagues at CBP or colleagues at Customs and Border Protection, or we talk to colleagues at the Transportation Security Administration at TSA. And so by keeping that, I'm also keeping a pulse. And then of course there's any number of ad hoc things. But I will say one thing I've learned in the government is when it's very meeting-focused, when it's nonstop back to backs, is the importance of writing. And so what I have made my team do is when they have an idea for a really big project, they write me—we have a templated one- to two-page document that they have to basically write up for me. Then I tell them as much as they want to set up a meeting just to discuss it, I say, nope. You're going to send me that one- to two-page document. I'm going to review it. Usually, there's one or two people who check it first, and then we'll actually send it over to the stakeholders. And then I want the stakeholders to say, yep, that document looks good. And only when both sides agree or both our stakeholders and myself agree on that document, at that point, we can set up a meeting with our leadership to really pitch it and to try to move forward, devote the resources, that sort of thing. So we've also tried to set up some practices that are not purely just endless back-to-back meetings and constant conversation, but decision points and processes where people can be a little bit more thoughtful and really make sure they put in the time to figure out the appropriate next steps.

Michael Boyce: (32:56) Hey. We'll continue our interview in a moment after a word from our sponsors. So that's maybe a perfect transition to getting into just some of the nitty-gritty details of projects. I mean, so many different agencies and teams and various missions. What are some highlights for things that you are doing that you have either deployed or are already working on but not yet deployed?

Nathan Labenz: (33:22) Totally. Totally. So, look, the big thing that we are working on right now is an internal large language model chatbot, which certainly to the listeners, no one is going to think that that is some incredible technological advancement. I'm sure most folks here, probably half the listeners here could pull down 10 lines of Python and get that spun up in about 15 minutes. But what we've really been working on here is it's really important for us—we really believe that a key part of upskilling and a key part of getting our staff to understand the possibilities is just getting this in people's hands. But at the same time, going back to protecting privacy, protecting civil rights, making sure we're protecting cybersecurity. Most of the work that my team has been doing as we've been rolling this out has really focused on not that single API integration to the front end and then you're pulling it back. Maybe you do some tricks to give the chat interface. Obviously, you can just use the existing APIs or if you want to do anything fancy with summarization or whatever else under the hood. Sure, you can do all of those pieces, but what we've been working on is a lot of that key plumbing of logging. How are we connecting this up to our broader cybersecurity infrastructure to check for incidents? What is the infrastructure we're deploying it on? Because the main one that we're focused on is running on one of the hyperscaler infrastructures. And so there, it's not I'm just going to pull down exactly what I want. I want that cloud infrastructure that also has gone through the federal controls, making sure that your information is protected with the ultimate goal being that staff at DHS, literally thousands of staff will be able to use this and put in, not classified, but information that we're not allowed to make public and also things like information with people's personal information on it. Because I think anyone who's done this knows where the magic will be at that layer where it really gets into those core work processes where we're analyzing the real data and we're moving forward. So that's been a really exciting project. I imagine probably around the time of publishing this, we'll have some really exciting news to share there. I mean, we have a number of different projects going on, but the one that I'd like to highlight actually isn't one that my team specifically has done, though in the very kind of end stages, we helped sort of shepherd it to the final point. But back in my old agency, USCIS, Citizenship and Immigration Services, they made—and I love this use case—for our asylum officers. So an asylum officer is somebody who will do a long interview with someone seeking protection here in The United States. And usually it'll be a three-, four-hour interview with that person, their family. Why'd you leave your country? This is something similar to what I did when I first started working at the department. And when that officer is doing that interview, they're taking notes in a special application that we have that sort of captures all the questions and all the answers. It's not exactly verbatim, but it's pretty close to verbatim. So today, when we're training new asylum officers, what we literally do is we pull other asylum officers off of their existing interview rotation, and we have them call into the training or even go in person to sit there and pretend to be asylum seekers so people can practice. And so one of the great applications that we've been working on that we released, it's been getting incredible feedback, is instead of having a human there, we will instead ask the question and then actually in training have it pretend to be an asylum seeker. This is my story and this is the reason. And it's been a fascinating process because the department really wanted it to kind of recreate the experience. You're usually working through a translator. Things are being miscommunicated. You have to read sometimes legalese, and the person might not even have an elementary school education that you're talking to. And so a lot of the work that the team did was crafting a set of prompts. We didn't do anything too fancy. It was mostly just writing really good prompts for the most part, but we did a lot of work to really make it so that if the person said—normally, if in the real world, I said, why'd you leave your country? The person would respond, well, I was afraid. And then you'd have to sort of keep going. You'd ask them questions. Well, of course, when we first started to create this application, it said, well, according to the Immigration and Nationality Act, section 104 subpart b. And so we had to really prompt it to make sure that it was actually recreating that experience. And I think that gets into also how we tested, how we evaluated. A lot of that is a qualitative thing because there isn't some objective measure for what recreates a real situation. And so I thought that was just a very interesting process, but people have loved it. I mean, it allows you basically to change the scenarios on the fly. It actually provides a lot of consistency in many scenarios because you've given it a certain story or a certain template to tell. So it's not having a human actor who might go off script and all these sort of things. So it's been a really interesting application. I have 400,000 other examples to give, but I'll leave it at that. I'm sure we'll get into it more.

Michael Boyce: (38:25) I mean, I think honestly, more examples is better. I'll just maybe ask a couple quick follow-ups and then invite you to go back for more examples. On the general purpose chatbot, could you give more—you said hyperscaler. You can neither confirm nor deny if that's the right way to go, but if I had to guess, I would guess that that's Azure and that you are trusting the contractual guarantees that they are making around not retaining information and not obviously training on information. I think that's actually pretty important for even folks in the commercial audience because it is still very common that people are, oh, our boss says we can't use whatever external API because of information security. I think you might even encourage folks in the private sector if they know that it's passed the federal government's checks.

Nathan Labenz: (39:22) Totally. Totally. Well, look, I am sure that we will—so the department, first of all, I mean, we talked about our four generative AI pilots, so I can definitely talk about—I'm being, I guess, a teeny, teeny little bit cagey, but we definitely have AWS, Azure, other of the sort of the big cloud applications at the department. And I think we specifically, to that point, we've been talking really publicly about our four generative AI pilots. And in those, we actually tried to use both frontier commercial LLM APIs. And even in one situation that my team was involved in, we also spun our own using an LLM open source model running on, I think it was a cloud GPU. And I don't want to say never, maybe our science and technology—I don't think we're going to probably create our own GPT-2. Our science and technology directorate, but other than that, I don't think we'll create our own GPU racks, but certainly we've spun our own. So we've had a range of things. I mean, on your point of the APIs, you're right. So this is something that's very interesting because I think in the AI context, folks say, wait, but you must be reaching out to get that information. The federal government has been doing cloud and external APIs for a long time now. You can go back. There's all sorts of different federal strategies around this, cloud first and cloud smart. DHS has taken a really active role in that. So one of the big processes that I've had the chance—I'm actually still involved in it as a member of what they call our technical advisory group—but there's an organization, the General Services Administration, called the Federal Risk Authorization Management Program, FedRAMP, and that is for cloud services, kind of the official certification that most agencies use as sort of the gatekeeper to cloud APIs in the federal government. The Department of Commerce's NIST, National Institute for Standards and Technology, has this defined risk management framework with these special publications that defines key sets of controls that different cloud services have to get a third party to certify that they've done those controls, submit their results, submit the output of logs and other verifications, and then this is all reviewed by the FedRAMP program essentially to make sure that we, to your point, are not just finding an API and sending some of the most frankly sensitive information in the world to any API we find. I think most people probably would not want to know that the package that they're having shipped or maybe their personal immigration information or something like that is just being sent to, oh, I found a fun API. I'm just going to ping it with your personal information. But what is happening is that these companies are spending a lot of money and a lot of resources to both implement the controls and then also prove that they have implemented those controls before the federal government can use it. So anybody can go on to the FedRAMP marketplace and actually see all of the certified compliant services that the federal government can use. And certainly knowing the federal program quite well, I would imagine that they'd love if some of those big companies would use—and there's other standards, your SOC 2s, your other things like that. But there are other ways of showing certification. We have a very rigorous process. And I will say I won't name who, but I was talking to a frontier lab that's looking at doing this process and actually one of the individuals there said to me, I have to say it's a pain in the butt, but FedRAMP is one of the few processes I see that I think really actually adds to security. And so that type of feedback makes me feel good. It's really hard to find the right balance of sort of trust and verification and letting things move quickly, but also frankly, this is important stuff. We have to protect people's information. But what I don't want people to think is that all of my staff are sitting there on their local machine or with a data center somewhere in the back office. I mean, we've been using cloud at the department. Frankly, I've probably done 30 to 50 software projects in the department. I've never done one that was not in some sort of fairly standard cloud infrastructure in my time here, and the same has been true for all my AI projects since I've started.

Michael Boyce: (43:28) Yeah. That's good to know. I think, take note, folks at Fortune 5,000 companies, it's a—I think there's a lot of mistakes being made at the moment around way too much investment in setting up your own infrastructure. Sometimes that's in the name of cost savings, but that almost never pans out either. And it's just not really a great way to go for companies. And if the federal government can trust some of these enterprise providers, I think definitely a lot of our listeners at mid and large sized companies should be able to find their way to doing that too.

Nathan Labenz: (44:03) And, Nathan, I will offer to your listeners too. I'm sure they can find me on LinkedIn or wherever. If there is a smaller midsize company that really wants to sell to the federal government, honestly, when you're trying to figure out how to go through the process, reach out. I mean, I'd love to connect you to folks. Sometimes it's just a matter of doing it. It seems daunting, but once you start to ask the questions, you can get enough information that you can evaluate whether it makes sense for your company to go through that process. I constantly talk to partners and companies that are working with the department who say, this has been the best job I've had since I've been in the company because it's not just the profit motive that I feel like I'm working for. I'm also working for that broader mission, protecting the homeland, providing these critical services, keeping people safe. And so folks really get a kick out of that. Anyway, so just an offer to folks who are listening.

Michael Boyce: (44:51) Yeah, that's cool. I love it. Is there anybody that you—is there a request for startups? Are there needs that you would particularly hope people would come forward with the ability to solve?

Nathan Labenz: (45:03) It's a great question. I think the other thing too with startups is one thing that is an easier barrier to entry is also the government doesn't just buy products, it also buys services. And so one thing where you do not need to go through these lengthy cybersecurity, not only compliance, but also implementation regimes, is also if you're competing for contracts and the government offers. Actually to small businesses, there's in the DC area and sort of key markets, actually a very vibrant culture of small business service providers who are providing offerings to the federal government. There's actually a lot of flexibilities that make it far easier. For example, if I have some money, I want to bring on some technology expertise. It's actually, in many cases, much easier for me to bring on a small business than to bring on a large business. And so, I think also—and if you don't think federal government agencies are interested in leveraging AI, I have a bridge to sell you in Brooklyn. They are very interested in doing it, but they're also looking for companies with really strong expertise and who can really get to niche services with niche perspectives that we don't oftentimes get historically depending on the situation. So I think that's another great entry point too if folks are willing to partner up.

Michael Boyce: (46:21) Yeah. I mean, I'll give you one more follow-up and then just open it up again. You mentioned the project of essentially AI role-playing as an asylum seeker. And within that, the fact that it's not a super objectively evaluatable task. Right? There's not exactly a ground truth to evaluate things on. It's kind of—as we say in the AI space these days, it's a vibes question. How does that decision get made? And I suspect this may also kind of connect to what a startup experience would be if they wanted to sell into the department. Who makes that decision? Is it an executive decision? Is it a committee decision? Do you have to put things out for comment? I imagine it probably varies somewhat, but you've got this prototype or you've got this small business that's pitched you a solution. How does the decision go from unclear to resolved? Nathan Labenz: (47:20) Yeah, that's a great question. Look, I don't know in specifics, but also we're dealing with such a complicated organization, so there are going to be variations of all of these things that you will find happening. In the particular case of the agent that's role-playing an asylum seeker, that came because DHS created what we called our AI Task Force, which is a governing body to figure out what our strategy would be towards both AI policy and also some initial AI implementation, helping to implement the recent executive order as well as some additional policy that's come out since then. We wanted to, of course, make sure that we were also focusing on leveraging the technology. So we actually put out a call to all the departments with some money that we have at headquarters to basically select three or four potential pilots that we would actually give some additional money to help accelerate what they were already doing. Now I think that team that worked on it had already been thinking about this idea. Again, I mean, you have technologists across the federal government who are paying attention to this, who in some cases are dying to find the right opportunity to use generative AI or not even dying, but it just seems inevitable given the problem that they're trying to solve. And so they just needed a bit more motivation, a bit more prioritization to say, okay, we have our backlog. Let's push this to the top now that we both have the money for it. We have that political support. This is really going to happen and we have the attention and momentum. I mean, and that's really, I would imagine, true for any big organization. A good product manager or a good product team has a hundred different ideas for a product, and what is that process to bring it to the top? In this particular case, we actually made a call to try these generative AI pilots, and then this was one of the ones that was selected. I mean, one, frankly, was in our Homeland Security Investigation team, and that was a Llama project, an open source model project, and that was much more of a traditional RAG scenario. So the Homeland Security Investigations investigates things like child sexual abuse material, human smuggling, all sorts of different challenges like that. And they, of course, have significant unstructured holdings of potential leads and potential investigations. And so there, we use generative AI using a fairly standard RAG pattern, though the member of my team who did it did some great work both on the evaluation side and how he structured his RAG pipelines to really make it work for this use case so that the homeland security investigators, the desk officers who are doing the research—if you watch Zero Dark Thirty, there's all the scenes where they're at their computer reading documents. It's the same kind of idea, but using this technology to make that much more streamlined. I mean, what he did there was awesome. It's a laundry list of different things. He first, if I remember correctly, was looking at your standard k-nearest neighbor sort of semantic vector search. And then he actually realized that for things that would really show relationships like phone numbers or names, in many cases, bringing in that lexical search as well and combining it would be great. And then he also put on top of that other techniques like using—I think that was back Llama 3, it wasn't 3.1 or 3.2, but at the 70B—to basically judge the semantic relevance of what the RAG model was pulling back. So it was a cool RAG implementation. He was really getting into the weeds, experimenting with the dataset to see how we could set up the pipeline in the best way possible, and it was another really exciting project. To go back to your question, how that got prioritized, that was another one of those generative AI winners. I think now we're sort of past that phase. Different components are playing with different things. They're putting out contracts, experimenting with different things. We're doing our own work or in some cases partnering with some of the components of DHS to move their generative AI projects forward. But I mean, the whole how these things get prioritized is the full gamut, and it'll sometimes be that executive who has it in their brain. It'll also have to be that smart product team that makes a really compelling business case up the chain for some broader priority, and then they said, okay, that makes sense. You can prioritize that one.

Michael Boyce: (51:25) The importance of hybrid search, I think, is definitely—yeah. Watch for that to continue to rise. I've been banging that drum every so often myself. So I appreciate the importance of that little insight.

Nathan Labenz: (51:41) Well, and I mean, to just be a little bit more technical here because I know I'm speaking somewhat in abstractions, it's also really about—I'm not sure that there's one RAG algorithm to rule them all. I think once you know the data structure that you're in, it is really about—again, I guess I tend to see a lot of success when it's that soft qualitative, let me try a few things. And then when you get some really good results coming back, you just sort of know that it worked well. Obviously, you can set up automated pipelines. Certainly my employee who worked on that did do a lot of automated testing, but I think there is a little bit of try different things, have hunches, and then see what comes back. And so in his case, because we're looking for certain types of information to tie different holdings together, the lexical search really helped improve just your standard, whatever, taking a cosine of the vectors or whatever it is.

Michael Boyce: (52:37) I'm interested in, honestly, just more examples, maybe just to prompt you a little bit on other possible deployments. I'm wondering to what degree people are getting any training across the board. Are they encouraged to experiment on their own in their own jobs or does there have to be some sort of prior approval before somebody can use a product that's out there?

Nathan Labenz: (53:02) Totally, all great questions. Maybe I'll make a broader statement about what we're seeing on the generative AI front. I do think one interesting use of generative AI that folks don't focus as much on, but it happens so much, I'm sure, in any large organization and generative AI is so helpful for, is not just on the automation side or even on the knowledge management side, but actually on this quality check and quality improvement side. There is any number of offices—and I mean, we have a project right now with one team where I'm not going to name the exact team, but essentially, they get reports of particular types of things that another team needs to review. And to do it today, they essentially have to hand—it's not structured quite enough in the way the report comes in that they can just automatically pass it into that other team's system to track it. And so, of course, you have people sitting there hand-jamming the reports from one side into the other one to transform it. Now it, of course, happens when you're hand-jamming it all day long, you inevitably make mistakes. And so where generative AI comes in that is so great as opposed to what happens having a bureaucracy where you end up having to have a second layer review or third layer review or an ongoing quality check process is I really see a lot of opportunities where we'll be able to put an LLM there as that second pair of eyes to basically double-check and see if the hand-jammer was—it was 5 PM, they were rushing to get out and put the information in poorly. Of course, also in some of those use cases, the LLM will be more accurate and accurate enough that it will take that unstructured data and convert to structured data. I think those sort of situations depend on the criticality of the information, how much we think LLM could hit it in all situations and how much you need those human set of eyes. But I see that popping up all over the place. I think with all the focus on automation, when I talk to more of the executives who are thinking about automation, but then are a little worried because most of their careers have been, how do I put humans to do really high-level quality work? One of the ways I think we've been also trying to introduce this technology to them is I'd say, hold on, we'll still have humans make the decisions. We'll still have humans do all the critical tasks, but you might be able to remove some of or enhance—next time you do a quality review, as opposed to being 94% accurate, we'll get it to 99% accurate across what you're doing. And so that's another use case we've been seeing. And then, I mean, honestly, in a big organization like this, I have to imagine other organizations seeing the same thing, we see RAG being extremely helpful. I have another colleague, the CIO of the Air Force Research Lab, who talks about actually how RAG creates this very personal relationship between people's data, which I think is a nice way of saying it, where before you had to either go through your own documents and your own things and remember where you put it and keep it organized or have a big team that can somehow spend a lot of time structuring it. And what's really neat about RAG is people are really, really interested in why this one set here and this one set here. And so you're seeing these big organizations, just a tremendous amount of interest in RAG from that perspective. I think sometimes maybe the AI community is getting ahead of itself. I think many people can say, also, did you just fix the search in the first place? But I do think that RAG interface is a very nice and a very accessible user interface that really gets people to it. Let me jump to training and what people can use. So I'm really proud of the role the department is taking in terms of empowering its employees. So actually, the recent executive order discourages agencies from blocking or banning generative AI. Some agencies have blocked, turned off VPN access to ChatGPT and things like that. DHS has not taken that approach at all. So what DHS has said is if you take the training, you get approval from your supervisor, you take a couple of other things, your cybersecurity or privacy training, and you sign a rules of behavior, you can sign up for a certain set of approved commercially available generative AI solutions. And so most employees in the department are able to log on to ChatGPT, log on to Claude, go into Bing, and use those generative AI tools and technologies. The rule, though, is that they can't put people's personal information in. So I cannot take that immigration application and drop it in. And so that's why my team is working on the much more secure chat clone that folks can use. We're taking inspiration from other departments that have been really forward-leaning in this—Department of State, National Institutes of Health. I think VA has done a lot of great work. I know DoD has been doing a lot of great work in empowering their employees to use all of this, but everyone takes a training, everyone signs these rules of behaviors. And then more broadly, literally moments before this recording, we've been doing brown bag sessions across the department to numerous agencies, and I think we've had thousands of people across the department who've taken brown bag sessions to look at how you can use generative AI, how you can apply it to your work. And especially once we have our internal for official use only—is what we call it—our internal more sensitive information chatbot. I really think opening that to the department will continue to democratize it and increase usage and also understanding of people across the department.

Michael Boyce: (58:17) What does that training look like? Specifically, the AI aspect of it. Can you measure that in hours or competencies that people are supposed to come away with?

Nathan Labenz: (58:29) Yeah, I mean, that's a great question. I won't pretend to be the biggest expert in the training. I did take it when I started in the department. It's not a five-week-long AI training. It is an hour-long session for busy people that have important jobs that need to deliver for the public, and it covers, here are some great ways you can use it. Here are some things you might not know about the technology that can go wrong, and here are some things that you absolutely can't do with the technology in the department. For example, you can't use it to make a final decision on if you're determining public benefits. So if I take my immigration application and I'm not even allowed to probably drop it into ChatGPT because it would have personal information, let's imagine I could. I could not say, hey, tell me whether I should approve this or not. That is a huge no-no here in the department. I mean, I think this is also coupled with, people are really interested in this. Civil servants are really interested in this technology. So I know that colleagues in another federal agency, the General Services Administration, they lead a government-wide AI training. I think it had 10, 15,000 people across the government sign up to take this last year with multiple different tracks. There was a general informational one as well as one more tailored to leadership. And so folks are demanding trainings across the department. I know VA, I know DoD, and a lot of other agencies are really trying to train their staff. Most, at least most of the big departments I see, do see these technologies as being critical day-to-day internal technologies that we will use to get the work of managing the government done. And so I think they really want to get people used to it. The last point I think I'll say is what I'm interested in my team, which we aren't really going to be the training deliverers. That's not, I think, where we're structured. That's one of my other colleagues and peers at the department who are more focused on that. But I see our role in terms of bringing awareness is also just getting different types of tools to my colleagues and departments. For many colleagues, when we showed them that role-playing tool, the feedback was, how'd you get it so good? And many of them had heard of ChatGPT but never made that logical next-step jump to, oh, you can get it in this very specific use case that's related to how I think about my work and get it to be so good. And I think this is also partially due to a lot of the more procedural chatbots that you see popping up, and they sort of imagine it to be the same thing. And so the more also we can get the tooling out using people's day-to-day work, the more I think you'll even hear further ideas about, oh, I realize that now that is a possible thing for us to do, and you'll see in turn more interest in it happening. So it's a nice virtuous cycle, and I think that's where I see us coming in from a training perspective.

Michael Boyce: (1:01:18) What are the metrics that you ultimately hope to help move? I guess you probably don't define them. Right? I suppose all the agencies have their own metrics, but it would be wait time at TSA and wait time in the immigration system—a lot of wait times. What else is on the list?

Nathan Labenz: (1:01:35) Well, look, my biggest metric, and I'm sure we'll get to it in a second, my biggest metric is how many hires and what roles were they hired into. That's the first metric that I'm most focused on. Because I'll be honest, I was given these 50 positions, which will be the largest civilian AI team in the federal government, at least AI-focused. There are bigger technology teams in the federal government, of course, but purely focused on AI. And actually, a lot of people told me there's no way that will happen, Michael. You'll never find the talent, you'll never find the interest. We can talk about it more, but over 14,000 people have applied for these roles. So at least on the demand side, that has not been the issue. I think you're right, and my team teases me that they said that I don't like metrics, which is not true. I do like metrics. But what I want is I want to make sure that the metrics are meaningful and actually driving decision-making as opposed to being metrics that we can make up after the fact. So look, the department, first of all, cares about a lot of metrics when it comes to testing and evaluation. Whether it's—a couple members of my team right now are working really closely with key folks in the department. I think the last draft I saw was a 30-page—if you have this scenario, here are some really key T&E things you can do. And what about this scenario? Here are some different approaches you can take to really give really strong tactical guidance on testing and evaluation and making sure that we can set up automated pipelines as we produce AI products to do that. And then I think that's the other point is that there's a lot of mission-oriented metrics, which in some cases are qualitative. So again, going back to that role-play example, a lot of the metrics we got there was, do you think that this was better or equal to a human-level experience? Do you think that this made you feel more comfortable before doing your very first asylum interview? Because what we're also trying to get at is that qualitative side of do you think these products helped you? I'm sure once we release our chatbot, the types of metrics that we'll be looking at are standard engagement metrics, top-line users, number of interactions. But we'll also want the qualitative feedback of, how important at this point is this work for you to get your job done? Is this just a helpful thing on the side that you use a little bit, or if we were to shut it off and stop funding it, would some critical process break? And so we're going to need to combine really hard metrics that are very objective with softer metrics that give us signals to whether we're on the right track and whether we're moving in the right direction.

Michael Boyce: (1:04:05) I feel that for sure. I wouldn't call myself anti-metric either, but there's no substitute for just using the thing and just reflecting on whether or not it was good. And that honestly can take you pretty far in a lot of contexts.

Nathan Labenz: (1:04:21) Yeah. Sometimes I'll hear people say things like, we became 70% more efficient. What is 70% efficient? How? Does that mean less time? Does that mean—and so sometimes you'll see these metrics thrown out or oftentimes in the time reduction or the money saved too. Even there, I often just want to dig in because people will say, oh, this saves people on average five minutes, which—and they usually do times the number of employees that I have times expected workdays. And they say, oh, so it saved us $5 million. And I think that's one of the other metrics that I love to dig into because no one's ever going to give those $5 million back. So what happened with those $5 million? And so to me, I am as much interested in that emotional cognitive load. Is this moving in the right direction? As I am into trying to make things sound quantifiable that may or may not be. And that's the top line though is that we are looking at this across both the very quantifiable, very meaningful places, as well as these softer metrics that indicate that we're making the right decisions.

Michael Boyce: (1:05:29) I've got a few kind of sequenced in my mind in a row here. One is, what's the vision for the future of the department that's AI-enabled? Second is, are the models already good enough to do most of the stuff you need them to do, or are there things that they can't do? And then third, if we don't get there, if in five years we're mostly still doing things the same way that we are today, what is the most likely cause of that failure to transform?

Nathan Labenz: (1:06:02) Yeah. I mean, I think the top line is that—and I think why I'm here, why we're bringing all these people—is because the department does think this technology will be transformational. And I think that that will be a constant—I think everyone listening to this podcast knows that that is likely to be the case. And I think, hopefully, one thing I'm communicating here is that civil servants know that will be the case. People are not blind to that. People are taking notice in their own personal lives. Americans, just like many people listening to this, who are working in public service, are going home and using advanced voice mode and trying to play with artifacts on Claude or whatever it is. And so it's hard for me to imagine a situation where this does not help with the transformation. I don't know, Nathan, in terms of what you want the models to do, I find them very powerful. I mean, I know that whether it's AGI or whatever else—I don't know, we're not yet—it's not the Matrix or Minority Report technology here. But I think for a lot of the business use cases that we have, the problem is I don't often see the model as much being the issue as just strong application development practices. It's just that time that it takes the system to integrate these new technologies into its flows and to adapt. I just think there's an endless number of possibilities. I think there's a huge amount of interest. I think all the technologists, the main technologists working in government are excited about this technology. And I think where we would fail, therefore, is A, if we can't build trust with the public that when we use these technologies, it will lead to better service delivery. So Congress doesn't fund us to do it. Political appointees don't think it makes sense to invest in. And, of course, the other place that we could fail is if we use them for ill. If we don't put controls in to make sure that we're using them, we don't protect privacy and civil rights and civil liberties and all these critical pieces. Or if we spend the money, good money after bad, if we're not doing good design practices, good product practices, good engineering practices, and so the taxpayer dollars that we think will lead to a more efficient federal government end up not being used in ideal ways. I think my team and the teams in the government that I see working on this are committed to making sure that doesn't happen, but that would be, I assume, a failure state. And then, yeah, I'll leave it at that.

Michael Boyce: (1:08:24) Well, I was going to kind of close with an opportunity to just give the pitch for the Corps and get into a little bit of what it would look like to go through an application process and ultimately work there. Are there any other highlights, anecdotes, specific tasks that you're excited about getting people out of the business of having to do that you want to touch on before we go there?

Nathan Labenz: (1:08:49) Yeah. I think there's—I mean, the one piece of the AI portion of DHS that's pretty exciting that we didn't talk about is the non-generative AI pieces. And a lot of our focus has been on generative AI. It makes sense. Hopefully, I gave a couple of strong examples, but we do a lot of other work. So I'll just touch on a couple examples because one is we do a lot of vision work. So on the border, for example, there's a number of systems that actually track trucks that are moving in and outside of the border to look for anomalous patterns. In one case, I think it spotted—it spots it in a second and a half. It spotted a truck. The truck was pulled into secondary inspection, and they found 165 pounds of narcotics inside the truck once they were able to go in there. So that's a very interesting use case in my opinion. Another case that's also in that vision space that I think we see popping up all the time in the department is FEMA in emergencies. So we get these aerial photos after a disaster and we have to actually assess where the disaster took place to figure out where we can do the response. We've been using vision models, I think, that took it from having humans have to analyze over a million different images to spot where the disasters happened to using machine learning techniques—I think it brought it down to a number of magnitude like 70,000 or something like that. Much more manageable because when you think about the cost of staffing all those people and full-time salaries. And then the last piece, because again, I know we don't want just generative AI, but there's also a lot of great work just to happen on your classifiers, your traditional kind of ML use cases even with text. So I think about my old agency, U.S. Citizenship and Immigration Services. We have extremely robust—some of the projects I've worked on, we take in applications online, but we also do make it available to folks to submit applications by paper. And I think we receive millions of paper applications every year still. And so we have huge facilities that are essentially working on bringing these paper applications, processing them, scanning them, putting them into our system, entering in basic information. What of course happens though when it's not sent in through the internet is nothing's labeled, right? They're all just piles of images. And so one of the key use cases that we've been doing for quite some time now is actually one of our main case management systems has an image classifier, goes in, looks at the text as well, pulls it out, OCRs it, and then classifies. Okay, this is a birth certificate. This is a form for this benefit, or this is this slip or receipt that the person got when they applied for this other thing. And so that type of work is also this day-to-day type of work that ultimately means that the public's saving money, the work's happening more efficiently, we're making better decisions and it's clearer. So I did want to give just a few more broader anecdotes around AI too because we've been so generally AI-focused. But other than—

Michael Boyce: (1:11:45) Do you think those narrow systems kind of get displaced by a more general system? I see that happening in a lot of corners. The—and it's for multiple reasons, often the accuracy is competitive or better, and the implementation is so much easier. Nathan Labenz: (1:12:02) Yeah. I mean, look. Do I see it being possible? Yes. I think one of the challenges is still going to be cost. The per-token cost of running millions of immigration applications through an LLM, never mind once I have Cloud provisioned and all these other types of security privacy things I'm talking about, the per-token cost is still not there compared to a classification model that can run pretty cheaply inside the system. But, yeah, I mean, we are definitely looking at those types of things. I think one of the pieces too is that a lot of those core fundamental ML models will also be so much easier to get up and running. Hopefully, the same evaluation, the same set of test circumstances that we're using for the more traditional straightforward ML cases, we can just move over into large language models. But I've seen certain situations where it's a little bit hard to quite one-for-one it. And so that also can be another blocker, but probably a smaller one in the grand scheme of things once you really dig into the data. I mean, again, that's also why I think one of the things my team is trying to do, and that's also when you ask, what is the future of my team, is I think that's why we're trying to get this in the hands of the agencies. There is a world where you don't necessarily need my team anymore because it's been such a day-to-day way. You know, we don't have the DHS cell phone corps or the DHS internet corps because everyone is just using that in their day-to-day work, and it's just a standard technology. So that is one of the different worlds that I can imagine us operating in. I will say, I mentioned the incubator. I do imagine that most, though maybe not all, but most of the work in my incubator will be generative AI focused because those classic machine learning techniques are pretty widespread across the department. And so if we're taking on the riskiest, the most challenging use cases, it probably will be in the generative AI space where we're going to want to focus our limited resources. Though I can imagine there being exceptions, especially if it's a tricky testing and evaluation situation. We want to try something novel or something like that.

Michael Boyce: (1:14:05) Do you think that there are, this one could be a delicate one, but do you think there are roles that ultimately just get done better with AI? I mean, I'm thinking of, you mentioned spotting things, trucks at the border, and we've talked a little bit about the airport experience. When I catch a glimpse of the screen that the person, as I go through security, is looking at when they're looking at my luggage, I'm always like, man, I would get real distracted real quick in that job. God bless the people that are doing it, but it seems like the kind of thing that could be automated. I guess the question would be, I mean, first, there's a technology question, but I'll posit that if we're not there already, we're probably not too far away from being able to create a technology solution that is as accurate at identifying knives and carry-on bags or whatever. Is there a world where I walk through a staffless security checkpoint at an airport in the not-too-distant future, or does that feel impossible for some reason?

Nathan Labenz: (1:15:07) You know, it's so funny. I just had the opportunity to go down to TSA. They have an innovation center outside of the Las Vegas Airport. So if you ever fly into Las Vegas, try to linger, and maybe you'll be randomly selected to go through one of their kind of future test facilities. I mean, TSA is definitely thinking about how to use these technologies. Nathan, you know, if you want to work on that, we still have vacant roles on the AI Corps. But look, are people thinking about ways that critical technologies can be automated and streamlined, especially in places where, and I think it's that quality piece that we're really focused on, is, to your point, we want to, for now, make sure that we don't miss anything. And so I think to me, I see that as the next step. So if I were to say we'll replace somebody or something, that's just getting into, A, our policy right now is that we are not replacing people for decision making, for special benefits and critical decisions like that. But B, it's just sort of thinking too far in the future. It also depends on our public engagement, right? We would really want to hear from the public, engage with the public, do testing evaluation. If we were ever to think about doing a critical piece like that, we'd want to do it very carefully. And so I guess I would almost just start with the first step, which is how can we get better AI algorithms to make sure that we have great TSA staff on our checkpoints who work super hard. I had a chance to meet them and they were walking me through how they do it. And I mean, they were like, I love that machine. This machine's good, but I really like the interface on that one. And so they're really thinking through that and they take it really seriously. But I do think that there's some ways that we could use AI to also further enhance the quality of even the already pretty significant, and there's some sophisticated technology on those machines. I mean, the other funny thing too, I think that's interesting for the personless check-ins that I thought was also interesting is it is a numbers game on a lot of these things. So sometimes the great thing about having humans is just being able to, it's the physical presence of someone saying, please put your shoes on the rack. And just the way that when people don't have somebody there, they're going to dilly-dally and whatever else. So, I mean, there's also this human element too that I think we need to think about because if you add 10 seconds to every single person on a TSA line, you know, the line will run outside the airport. So it's a fascinating area. I know the TSA colleagues are thinking about that. We have been engaging with TSA and looking for some interesting opportunities, so you're definitely right. But I will again repeat, I've already prepped myself too much here that I do think the quality use cases are the first step with these technologies, because not only is it safer and builds trust, but it's actually just a great way to test that it's all working. It's a great way to say, to really know, okay, well, why choose? And so why not just improve how we're doing it? And I think it's something that we're thinking a lot about.

Michael Boyce: (1:18:00) Cool. Well, let's land the plane here with just kind of an overview again of what you're looking for. You said you're halfway to 50 positions. What are you looking for? You mentioned that you do have a remote structure, which is intriguing. I suppose folks will probably also be interested in what salaries look like, are they market rates, or how does that get decided? So, yeah, give us kind of some of the logistics, and then we can zoom out for the big picture pitch for the AI Corps.

Nathan Labenz: (1:18:29) Totally. So I'll also just give a pitch for folks to either go to ai.gov and look at the different available AI positions, or you can go to USAJobs, where almost all civilian jobs in federal government are posted. You can set up an alert for positions that are tagged AI or have keywords like AI or machine learning in it, and it'll send you a weekly or a monthly distillation of the opportunities. I mean, there's just today, I saw an amazing opportunity to be AI leader at Health and Human Services. For my team, you can go to dhs.gov/ai/join to join it. We are at 33. We're still going through the hiring process for the last few. We have 33 on board, which means that I have a number of additional folks in the selection, but it has been extremely competitive. So as I mentioned before, we've received over 14,000 applications, so my acceptance rate is far less than 1% for all of the positions. And we've really been able to attract a range of folks. We are looking for application engineers with a backend focus because as I was saying, a lot of this is just, it's not necessarily the ML layer. It's that robust APIs dealing with latency, dealing with fault tolerance. We have ML engineer positions, data scientist positions, security engineer positions, specifically focused on securing these systems. We have some roles for policy analysts as well as product managers and designers. I mean, there's also this catchall that we call technologists. And so we're looking really for all of these different roles to create a strong cross-functional team. And then on the salary piece, the salary piece is interesting. So I'm forgetting what our base salary is, but, because frankly, I think maybe it's around $150K. Though my staff in general, it usually tops out at around $230K a year, and that's in addition to all the benefits. You also get a pension if you stay three years in the federal government, which is a nice thing.

Michael Boyce: (1:20:33) Pension kicks in at three years? I didn't know that.

Nathan Labenz: (1:20:35) Well, I think it's three years. And so once you retire, you basically get free money every year after. And so there's some other nice benefits, 401(k) match equivalent, that sort of thing. But what I tell people about the salary side is, look, if you're making $2 million a year, right, you're not going to want to take the cut down to $200K. But that being said, we generally put our staff in the top 3% of American wage earners. These are, I mean, no one here is poor. And I'll be honest, I've had pretty few situations where folks have turned us down for salary. I think the only situations where folks said, sorry, Michael, I'm not going to do it, was because of a vesting schedule that they were on. It just, they were stood to, I mean, that would be the same with any company where if you left, you would have lost your vesting. So we haven't. I personally haven't seen pay be as big an issue as some of the other things of just finding the job, understanding what it means, understanding the work that you're going to do. That's one of the reasons it's important that we speak publicly about what the work looks like. And it's really funny. A lot of my staff, some of my staff are career civil servants. Some of them come from your sort of FAANG-type big companies. Some come from more of the startup world. And then I have also this category of folks, and maybe this is what I'll kind of leave you on, who were private sector or whatever, Google, Meta, whatever it was, went into government maybe a few years ago, stayed for a couple of years, left, and then missed it so much and the work that they were doing that they've now applied again, and we've rehired them back into the federal government. Because what I always tell people is that what's cool about this is there are really interesting technical problems, amazing datasets. You have this really nice culture of technologists in the federal government, but the last piece I would say about this too is you kind of can't actually beat the level of difficulty and challenge of the problems. Because the problems aren't only technical, although there are many technical problems. They're societal, personal, politics. You're dealing with, by many definitions, the largest organization in the world and figuring out how to navigate that is an incredibly complicated and challenging piece. So people want to come back because they feel less bored when they work in the federal government because there's such constant interesting challenges that keep them moving forward. So maybe that's, so I appreciate you giving me the chance to make that pitch because I do think a lot of technologists who like gnarly problems really like working in the federal government and don't realize, or really any public sector role and don't realize, how interesting and how compelling it can be.

Michael Boyce: (1:23:15) Cool. I love it. Well, that is a probably surprising pitch for many of our listeners, and it's been very interesting to get a peek behind the curtain at what you're doing at the Department of Homeland Security. Michael Boyce, director of the AI Corps at the Department of Homeland Security, thank you for being part of the Cognitive Revolution.

Nathan Labenz: (1:23:34) Thanks so much, Nathan. Take care there.

Michael Boyce: (1:23:36) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Building Government's Largest Civilian AI Team with DHS AI Corps' Dir. Michael Boyce

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Building Government's Largest Civilian AI Team with DHS AI Corps' Dir. Michael Boyce

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving