In 1946, two young scientists at the University of Pennsylvania introduced ENIAC, the world’s first digital computer. It was the size of a room, consumed the power of a small town, and had one purpose: free humans from the tyranny of calculation. ENIAC didn’t want your attention. It didn’t need to notify you. And it didn’t try to optimize your calendar or your inbox. It just… computed.

Fast forward to today. Computers are everywhere, extending our cognition into previously unknown territories. But while they’ve spurred unprecedented achievements, the sheer volume of repetitive, mundane interactions required to work with them holds us back from keeping up with the pace of innovation. We spend our working hours not thinking, but interfacing: toggling between tabs, digging through message threads, playing tetris with our calendars, re-entering the same information in slightly different formats across slightly different systems, and becoming experts in arbitrary software—all before we can do our actual work.
Digital drudgery has become the price of admission to modern knowledge work. Yet it’s only part of the problem. Yes, we have more information at our fingertips than any generation before us. But good luck getting your fingertips to reliably find their way to the bits that are actually relevant and accurate, let alone systematically integrate them. Rather than augmenting our cognition and accelerating our ability to collectively refine our “world models”—to create new knowledge and put it to work for us—computer work today actively counters our ability to focus and build meaningful bridges across roles and disciplines.
Human-computer interactions are an obstacle course of distractions
As a knowledge worker, opening my laptop each morning feels less like booting up and more like bracing for impact. My nervous system anticipates the cacophony of pings that start competing for my attention before my coffee cools. I prepare for the occupational hazard of slipping into rabbit holes of misinformation and disinformation. And inevitably, I enter the hypnotic paralysis of my working memory being wiped clean in the milliseconds it takes to swipe to another window. Far from liberating us, computers keep us in information silos and demand an increasingly large tax on our cognition that we never meant to pay… but alas, they have our information on file.
It’s not just inefficient. It’s exhausting. It’s also a profound misuse of intelligence—ours and the machine’s.
It irks me to no end when I hear fellow technologists argue that we have all the tools to make these systems seamlessly work for us. We just need to manually customize all our settings and combine some subset of the literally hundreds of productivity apps that exist at any given moment to create personalized workflows. Not to worry, they assure us, when our workflow causes problems for other workflows, or when one of the apps in our workflow is no longer supported—we can build a new app to fix that. And when that new app causes new problems, they reason, we can create a workaround... ad infinitum.
The reality is that many technologists are intrinsically motivated to tinker, independently of whether they’re actually increasing productivity. Tinkering is necessary for discovering new ways of doing things. Evolution is essentially iterative tinkering. But we’ve accidentally evolved a labyrinth of systems that assume humans should accommodate the machine, when it should be the other way around.
The bigger problem, then, is that our most powerful productivity tool—the computer—now undermines not only our productivity, but also our thinking and agency.
We’re at a critical cross-roads in the obstacle course
One path forward, popular among AI optimists and venture decks alike, is to make computers smart enough to do all the cognitive labor for us. It’s just the natural order of things, they would have us believe, for computers to replace human thought. Moreover, it’ll be good for us all.
But will it? By now it should be clear that this vision leads to a familiar outcome: diminished human agency. Why should we expect that more powerful recommendation engines will help us avoid echo chambers and endless scrolling? That more automated suggestions will help us think outside the box? Or that computers more intelligent than us (whatever that means) will be motivated to help solve our problems? We shouldn’t, of course.
That’s why at our lab, we’re proposing an altogether new path. We’re building AI agents not to outthink humans, but to work with us. Instead of making AI smarter and giving it more agency, we’re building AI that makes us smarter and gives us more agency. This requires building agents that are aligned with human cognition while keeping us in the driver’s seat.
I’m part of a team of scientists at the AGI Lab who are building agents that are actually useful. This means they need to complement our own intelligence, even in these early stages of development. My research background is in cognitive science, with a specialization in the link between language and thought. It’s my job at the lab to lead a team tasked with generating human-computer interaction data that trains our agents to be better collaborators.
In the near term, we believe the atomic unit of all computer interactions will be an agent call. In the longer term, our vision is for agents to infer our higher level goals—just like humans do—so we can effectively cooperate. To achieve this vision, we’ll need to get AI to model the social world—to model our own minds—in addition to the physical and digital worlds. (I’ll post a deep-dive into this topic soon; in the meantime, check out a preview in this recent talk!)
We’re building a foundation for human-agent co-evolution
Clearly, the models aren’t there yet. So we’re meeting them where they are.
Where is that? Current models can’t yet break down most of our more complex, high-level goals and execute them with high reliability. For the majority of multi-step tasks, we still have to babysit them. What they can be trained to do reliably is click on icons and type into search fields. This is a useful foundation: In the same way that infants need to reliably identify individual words before they can form sentences, models need to reliably execute the smallest units of human-computer interactions before they can perform more complex tasks.
Our first iteration of this approach is Amazon Nova Act, our agentic AI model paired with an SDK. It does two things. One, it allows developers to break down repetitive tasks into smaller units that we’ve trained our model to be really good at. Two, it scaffolds probabilistic model calls with deterministic python integrations.
Again, there’s a parallel with language. The structure of syntax allows word meanings to freely vary as a function of the social context. Different languages have different grammars, but they all have some set of rules that, given a context to ground meaning, enable flexibility in meaning. For instance, in the sentence, “They are cooking apples,” the word “they” refers either to people in the act of cooking or apples of a particular variety; the context of being in a kitchen versus an orchard grounds the language to disambiguate its meaning.
Similarly, the structure of Nova Act’s deterministic scaffolding allows probabilistic atomic interactions to freely vary as a function of the digital context. “Click on the button for choosing the highest rated option” is a flexible model call because it can adapt to the specific features of a given website. And because there’s a python integration for parallelization, you can execute this instruction across many different websites at the same time.
A developer building a workflow for the first time might have to play around with different wordings of the model call, just like two people in a conversation might have to try a few different ways of saying something before they converge on a shared understanding. But in both cases, once they’ve negotiated the meaning of the variable units, there’s alignment on the interpretation of the shared environment.
The power of this approach is that it allows human-agent interactions to evolve in complexity over time while retaining both flexibility and reliability. Just like linguistic structure enables us to string together words to generate infinite possible sentences, the structure of Nova act enables developers to string together model calls to create increasingly complex workflows. Anchoring on the workflow as a unit of work allows us discover and train on the most useful tasks across diverse contexts right now so that, over time, our agent learns how to interpret and execute increasingly complex, abstract goals. Put another way, our method scaffolds our agent’s ability to learn how to think like us so that it can think with us.
Toward this end, we’re partnering with a variety of internal and external customers to identify the applications of Nova Act with the most business impact today. For example, Nova Act can save companies a huge amount of time by doing something as simple as reliably filling out forms. These partnerships help us improve our agent in a way that targets the most useful capabilities. We’re also creating thousands of simulated environments that capture the diversity of these digital use cases, which will enable our agent to learn via reinforcement learning.
This approach requires that we develop and systematically update a robust suite of agent evals. We need to track our agent’s evolution as it learns the most relevant tasks. But that’s not all we’ll need to measure. As our agent becomes more reliable, the thing we really care about is whether our agent’s reliability translates to helping us do our best work. We’ll need to measure not just our agent’s capabilities, but also our own—things like creativity, collaborative ability, strategic thinking, abstract reasoning, learning efficiency, states of flow, and even how we feel as we engage in our work (we want to enjoy it!).
Agents can augment our intelligence by becoming our collective subconscious
As we continue to iterate on this approach, we believe agents will take on a growing portion of the repetitive work we do every day on computers. This will reduce the cognitive tax of conforming to our machines, unlocking our ability to engage in higher level thinking. In other words, as agents take on the digital drudgery, we can shift our conscious attention to collaborating with each other and translating knowledge across domains to solve increasingly large-scale and interdisciplinary problems.
Offloading repetitive work to agents, however, is just the beginning. Automation can also be an engine for augmentation—that is, if we’re intentional in designing products that align with our own cognition. Think about how our own thinking is enhanced. When we learn something new, whether it’s a sport, an instrument, or a piece of software, we first have to consciously attend to the details. But the hallmark of expertise is that, as we practice repetitive actions, our brains move these processes over to our subconscious—it automates them. By automating routine digital tasks, agents can free up our cognitive bandwidth to improvise, innovate, and detect increasingly abstract patterns across domains. After all, calculators didn't stop people from doing mathematics; they opened pathways to more complex mathematical work.
We envision a future where everyone can teach agents their skills, and agents can then redistribute these skills to help create a new common ground for human-human and human-agent collaborations. Imagine being able to quickly get up-to-speed on any of your teammate’s work, no matter their role or expertise. Thoughtfulness in human-agent interaction design will be critical. But if we get it right, then in the same way that the GUI up-leveled the command line, agents will up-level the GUI. A team of agents embedded within our teams will generate new interfaces on the fly that help us focus on the right things, communicate at the right level of abstraction, and have exactly the right context to do our best work and continually learn new skills.
Agents can augment our intelligence by becoming our “collective subconscious,” quietly parallelizing and executing repetitive work in the background so we can leverage our conscious attention to learn the next thing. By meeting the models where they are now and focusing on reliability in these early days, we can set the stage to co-evolve with agents in a virtuous cycle of reciprocally elevating our respective intelligences. Stay tuned to learn more about what this means in the upcoming blog about modeling minds…
It’s an exciting time to be on a team that’s building tangible products with far-reaching impact. If this type of work inspires you, we’re hiring!