Koray Kavukcuoglu is the chief technology officer of DeepMind and Google’s chief AI architect.
He has been leading the work to develop Gemini 3, the Big Tech company’s latest AI large language model (LLM), which was released in November. One of the new features of the model is that it can create interactive apps and widgets based on user search queries.
The capabilities of the new LLM have impressed rivals, leading OpenAI chief executive Sam Altman to declare a “code red” over the need to improve ChatGPT, his company’s popular chatbot, and catch up with Google.
Working in Google’s advantage is that the company owns the full AI stack, meaning it has the hardware, data centres, chips and all other elements that enable frontier AI research. It can then release any new products straight to its huge customer base.
In conversation with the Financial Times’ AI correspondent Melissa Heikkilä, Kavukcuoglu explains what makes Gemini 3 stand out and how it can help Google in the race for AI supremacy.
Melissa Heikkilä: You’ve been chief technology officer of DeepMind since early 2024, and last summer, you took on the new role of Google’s chief AI architect. What does a chief AI architect do?
Koray Kavukcuoglu: We are building a really fundamental technology. And the first and foremost focus that I have is making sure that our AI development and our products are connected well together.
We want to enable all the products across Google, all the product areas across Google, to have access to the best AI technology that we are building. So, at Google DeepMind we are building this frontier technology [where] our goal is to build AGI [artificial general intelligence — machines that surpass human capabilities and intelligence]. It is really important that we do this in connection with the users, and that only happens through our products. And for that to happen, our products need to have access to our frontier technology.
This is a whole new technology that requires a whole new infrastructure to be able to do this at scale. And these are the focus areas that I have, enabling that transformation, enabling that infrastructure, working with the products so that they have access to the best technology, and we can connect with the users the best way.
MH: Help me put Gemini 3 in context. For the layperson, we’ve seen lots of new AI models, and they all seem like incremental developments. Many non-AI people might have heard that OpenAI’s GPT-5 was a bit of a disappointment. So, why is Gemini 3 a big deal? And why is it a big deal for Google? And how does it position you in the AI race?
KK: From our point of view, it is important because we feel like we took another big step in multimodal understanding, which is really important for users. Our content is not just in text. Our content comes in various forms. This is why NotebookLM [Google’s AI research and note-taking assistant] is very popular, [because] people like . . . to [upload] all sorts of documents and then . . . ask questions about [them].
So, as we increase that capability . . . people’s videos, images, PDFs, all that, being able to have a really good understanding over that is a big step. And I think hopefully our users will also notice that big step in the kind of answers that they get and the kind of information that they get.
The second one is coding. But coding is not only for software engineers. More and more, coding is also about learning.
[With Gemini’s generative user interfaces], when people ask questions, they get a lot more intuitive answers, answers that actually teach them on the spot, together with simulations, together with little widgets that they can actually learn from and experiment from.
I think being able to convert this kind of conceptual and abstract progress into really tangible and impactful interfaces and interactions for the users, is what is going to make a difference. Being able to do that together with the products is the unique differentiator that we have. We are not just releasing the models. We are releasing, together with the products, these really well-thought [out] user interfaces and interactions for the users and building on the whole full stack that we have.
On the engineering side, with Antigravity [Google’s AI-powered integrated development environment] we are releasing a new way of building code. The agent-first [where software can act autonomously and independent of human input] code development environment is a big step. And that is because the models have that capability that they can actually execute at that high level, abstract level, and execute as agents.
MH: Can you walk me through the research and the technical breakthroughs that enabled this model?
KK: There are different areas of technical investment that go into the model development. It starts with pre-training. Pre-training [where the model is trained on a dataset] is mostly about architectural improvements, so that you have a better architecture, you have a more efficient architecture, and you can understand the data that you get, that you train for, much better. We have pushed our performance quite significantly. We are really happy with our capability to do that.
Pre-training gives you the potential, because you have a model that understands the data, that [not only] captures the information in the data, but its potential. The way it is reflected in the products is through the post-training, where the model learns how to interact with the users for that product.
On the post-training side, we had several advances that led to high-level agentic behaviour and being able to code and understand. The model knows that for a question that you are asking, maybe it is going to show you a table with images that it found from the web on its search.
But for a different query that you have, it will decide that it’s going to write a little program to show you a simulation, a widget. So, the model decides on those. And this is all because of its . . . coding and agentic capabilities.
All these coming together on the pre-training side, on the post-training side, across the company is what has enabled all this.
MH: The computing power needed for this must be eye-watering. How are you making money off this?
KK: The most important thing is one, our full stack approach. I think we have a unique approach there. Two, we are working on this together with our products. When we do all the research that we do, the frontier technology development, as well as when we release these models to our users, it is all grounded on the fact that we do this by our products, billions of people use them, and we see where the need is, how people want to use these.
And I think that is the big important thing here, where every frontier technological development that we do is guided by the signal that we are getting from the users. That groundedness is what is important and different for us.
MH: Google has said Gemini 3 is a first step towards a true generalist agent and the vision for what agents look like. Is this what we could expect artificial general intelligence to look and feel like?
KK: I wouldn’t say that to be honest with you. Everything that we are doing is towards that. Obviously, we are trying to build AGI. That’s our mission. That’s our goal. But I think one thing that is really, really important for me is we do not have . . . the recipe of how to build AGI [because it is still research]. That’s why doing the right products, picking the right products, and understanding user signals is [what guides our] technological development.
Because AGI is going to be something useful for the users. It has to be. That’s what we are trying to build. And the only way to do that is to get that signal from the users in a responsible way. That’s why, when we say we are trying to design our models from the ground up with security and safety in mind, we do that, but then we do that with our products as well.
And Google has a huge and long and successful history of reaching billions of users. And we are relying on that to also show us where the users’ needs are, where the technology really needs to solve the problems for the users. And that’s the path towards AGI that we are trying to build.
MH: You’ve also said that Gemini 3 avoids clichés and flattery, which is a common characteristic of generative AI models. How? What did you do?
KK: The persona of the models is important. There’s a lot of conversation around how people would like the models to feel. I think one advantage we have is because we work with both external partners, companies, as well as internal products. Every product also has a little bit of its own internal persona. What we have done is a lot of research on how we can quantify the persona of a model. Sycophancy is one of these dimensions that we look at. I don’t think that anyone can claim that we have a golden solution here.
But we feel we have taken steps towards understanding how to create a model that is steerable, and that is useful across a wide range of domains. One of the important things is for models to be giving the information that the users want without too much cascade around that or too much flattery around that.
In some cases, it should be used. In many cases, we know that it is not necessary.
MH: How would you describe the persona of Gemini 3?
KK: We did not necessarily encode the persona for Gemini. Of course, post-training is all about user experiences. But for us, it is more about the capabilities and truthfulness and with that plain language.
MH: And more broadly about AI research and the field, what excites you in AI right now as a scientist?
KK: Right now, everything is going fast. It is going fast because we see the impact of the models that they have in real world use cases. People are using these models for their work, for their learning, for their education, and they are being impactful.
To me, the most exciting things are happening, as we learn how to make better agents from these models. Because when we say agents, many people think about just coding agents, but it’s just one aspect. It’s about how these are used, and in what parts of your life you are relying on them.
Learning is the part that I’m really, really excited about. Because what we see is, all of a sudden, you can get a much richer interaction with the content that is there. So, we can connect that content to the users in a much richer way. And as we get better on agents, I think we will see this more.
MH: What can we expect next?
KK: We put six months of effort into developing the [Gemini 3] model, building on top of Gemini 2.5, integrating all the signals and experience that we get from the users, and then we built this one. We are going to get feedback from all sorts of different communities, from consumers to . . . developers, and enterprises. Our focus really will be on understanding that.
Inevitably, there will be gaps and then closing those gaps. And through that, we will also understand what are the important problems that people are trying to solve. Because once you achieve some level of quality or accuracy in your models, then people really push it in harder ways, very creatively. So, learning from that creativity is what is next.
This transcript has been edited for brevity and clarity
Read the full article here