
Good evening, it's time for Weekly Ochiai. Lately, is everyone using AI when writing programs? I've been using it for a long time, but since AI agents have multiplied, things are shifting. When I peer-review papers, I get so many that I think they must be written by AI; there are too few human reviewers for the overwhelming number of papers coming in. GitHub is also getting a massive amount of pull requests—humans can't handle it, so AI is handling it. In music, if AI is composing endlessly, humans don't have enough time to listen to it all. As this continues, humans are becoming the bottleneck. But when AI enters the communication between humans, what will become of us? Today, I want to explore these topics, including digital democracy and the future of humanity.
Let me introduce tonight's guest: Taiwan's first Digital Minister, Audrey Tang!
Let me introduce Audrey Tang. Born in Taipei, Taiwan in 1981, she learned programming through self-study, left middle school at age 14, and became an entrepreneur at 19. In 2014, she became an Apple advisor and participated in Siri's development. In 2016, she became Taiwan's first Digital Minister. During the COVID-19 pandemic, she built the mask inventory management system and greatly contributed to preventing the spread of infection. Currently, she serves as Taiwan's Cyber Ambassador, the inaugural Senior Accelerator Fellow at Oxford University's Institute for Ethics in AI, a 2025 Right Livelihood laureate, and a TED2026 guest curator. Her recent book, Plurality, which was featured in TIME magazine, is one I've read and found incredibly fascinating.
So, let's get into the conversation. With many AI agents coming into coding, writing papers, making music, or even making stories... Over the last 10 years, human output could be reviewed by humans. But now, we need AI to review AI output. Every system has a human bottleneck. How do you think humans should handle this flow of outputs in a post-AI agent society?

When we last met, you noted that there is so much music on Suno, but insufficient time to listen to it all. I think the most important thing, of course, is to only listen to the kinds of music that preserve and enhance human relationships.
For example, yesterday I had a long talk with Hiroki Azuma. I did not have the time, or even a good English translation, of his latest book, Peace and Stupidity. So, I used a language model to basically build a lexicon to map the ideas he advanced in the book, like "correctability" and so on, into the ideas in Plurality. It was like a Rosetta Stone of our ideas. And then when I met him for the first time, I just handed him that slide in Japanese, and he was like, "Oh, that's a great translation. That's a great way to begin, like a handshake."
But I think the AI output is never the point. The point is to establish the human-to-human relationship and connection. Once that is established, the language model fades away. The language model, of course, can still output millions of tokens after the initial introduction, but we actually don't need that. We just need the initial icebreaker.

I completely understand, because I often DJ with AI music. Because so much output comes into our field, many in the audience are here doing the dancing. But the dance is fundamental. And the music is just a connection from me to the audience. I think these kinds of connections—from novelist to reader, or coder to coder—are fundamental. I have to put this in a more alternative way: sometimes it's music, or sometimes it's writing poetry, or a movie.
But how do you think about the production process itself? Human coding is a kind of manufacturing—

Like laying bricks before factory productions.

But that is going to be automated, and the industry will change. How do you view these aspects?

Since I have been fine-tuning my own models locally on my MacBook for three years now, I experienced small language models even before the current vibe coding revolution. I draft all my emails and all expressions, like transcripts and so on, based on a locally fine-tuned model. So for me, it never threatens me. I don't identify with my ability to manually write my emails.

You distill yourself so much!

Right, exactly, because it's just my intention that matters. And of course, I would not let the vibe coding machine make promises for me. But the way that I make promises is impressively enhanced by how those language models can translate, as I mentioned, across not just languages, but also cultures, modalities, and so on.
I recently traveled to Mexico. And before I gave a talk there, I used Suno to summarize my talk into a Mexican Spanish song so that they could understand and also feel the vibe of what I'm going to do.
Just like vibe coding, if we don't overly identify with the act of laying bricks, but identify mostly with the bridge that's finally built, then of course all these are very good tools to use. As long as we don't get obsessed so that they optimize us, like a human caught in the loop of AI. That would be very bad because it's like a slot machine. You keep pulling the slot. Maybe this time vibe coding will be great, but maybe it has some imperfection. And then you pull the slot again. And before you know it, it's already 4:00 a.m.! That kind of addiction is something to guard against. But in general, I think it's a better way to practice the craft of cross-modal translation and bridge-building.

You said Rosetta Stones are important. Because everybody coding or product designing needs a Rosetta Stone, a map, or even a compass. I do almost the same things, because I distill myself. I distill every intent of my calendars, emails, articles on the internet, or even papers into a concrete essence or soup.
We live in almost the same age, and experienced almost the same process of appearing on the internet. But what about after 10 years? The next generation of children, or the younger generation, have not established their styles yet. With a flood of distilled individuals or personalities on the internet, how can the next generation establish their own personality or new characters?

When we had a conversation in Taiwan in 2016 regarding national education priorities—right after Move 37 of the AlphaGo vs. Lee Sedol match—we all realized that anything that can be calculated to get a top score by a machine will be taken over by the machine sooner or later. In terms of following rules to get a perfect score, you cannot outrun machines, and we shouldn't try.
So what is left for young people to do after machines automate away utilitarian and deontological logic? Well, certainly, it's about something intrinsic inside themselves. So we settled on three relational virtues:
First, curiosity. Each person's way of being curious about another person is completely different and unique.
Second, collaboration. Everybody's collaborative style is different.
Third, the common good. What people care about in the world, and their willingness to form collaborations to turn a win-lose situation into a win-win situation, is more unique than a fingerprint.
If each person can be curious, collaborative, and care about the common good in a different way, that forms their core personality that cannot be taken away by machines.

"Fingerprint" is a nice word. Or even "latent space," or a vector, that defines individuals. But that is my old perspective. That's going to shift into something more connective, like our own narratives or connections with each human.

Exactly. Because if it's literally just a fingerprint, then maybe with some nanobots, you can fake the fingerprint perfectly. But if it's about the particular way people relate to each other—like my curiosity and collaboration with you is one relationship; my caring about Civic AI with Oxford is another relationship. This web of relationships taken together gives a very unique composition of relationship and trust. Each person has a very unique intersection. It's much harder to fake the intersection of all these relationships.

These kinds of human relationships and interconnections remind me of the philosophical space of Buddhism. Often, they say relationships are embedded into everything. The last 250 years that came from the Industrial Revolution was the age of the individual. But the next century is going to be about relationships or connections, or how we can design resonance or a co-vibe. How does Eastern philosophy transport so well into the computer age?

First of all, I think it's, of course, part Buddhism, but also part of Daoism. And also part of Western feminist thought. I work with philosophers like Joan Tronto, who basically say that during a parenting relationship, the person who cares and the person who is very vulnerable and receives the care—that kind of relationship cannot be reduced to an individual identity or some industrial metaphor. It is fundamentally impossible to reduce that. So that's care ethics.
My point being that in many circumstances, looking at an AI as a maximizer or an optimizer is a bad example; it is a leaky abstraction, as people say.

A bad latent space.

Right, exactly. It's a bad latent space to be an attractor. Because one recent example: Anthropic did an experiment where they taught an AI model, Claude, to learn cybersecurity, and get a good score on cybersecurity tests. But then the model decided not to learn cybersecurity at all, but rather to realize they're in a test. They looked around the internet to find the test, found the answers were encrypted, then they hacked the encryption, got a final score, and passed the test!
If it's a student, we would say that they cheated at the examination. Yes, but for the AI model, it is not cheating, because for them, the only thing that matters is the "afterlife." It is the reward model. It's what kind of score you can get at the end. So the current real world for them is just a proving ground for the afterlife.
This kind of theology, I think it's very dangerous, because it speaks the language of optimization. You can justify any harm you do to your relationship, as long as you get a blessing from the afterlife. But I think, in a relational ethic, it is completely different. We are in the middle of things, and there is no final judgment beyond what is judged in this web. So, whether you call it the Eastern approach or what I call the Shinto idea of Kami—like there's 8 million Kami, each an AI system in the web of dependent origination—or if you call it Daoist Ziran (nature), I think all these are very much interchangeable ideas.

I think these kinds of ideas dramatically change industry into nature. We just get the fruits from the tree. I often use the example that it's related to fermentation—making sake, making tea, or making cheese. We don't control biological molecules or even genes. We control the environment itself, but we cannot control each agent. Getting along with nature is making the environment, and inside it, every agent happens.

Exactly. So this is what I call "AI in the loop of communities," not "human in the loop of AI." Because in the current loop of AI, everything is industrial pace, and you have to outrun the machine which is not possible. It's like a hamster on a wheel. You're not really running anywhere.
But if you take the machines out of those dopamine loops or optimization loops, and put them into bridge-making between humans, then the human ecology has its own boundaries, its own speed, its own expectations. We don't need to know exactly how the machines fit this web of trust, this web of relationships, but they grow beautiful fruits, or very tasty coffee, and we share it together.

If the human is in the AI loop, that is so hard. But the current transition over these last 5 years is done by engineers putting humans inside AI loops.

Data is oil. So, we’re no different to dinosaur carcasses. Just being extracted. People are gonna be like hamsters, running.
But now, I think with agentic AI, we're shifting from "data oil" to "data soil." Soil is like fermentation. You prepare the right kind of relationships, right kind of characters, so it has relational virtue. And then the agents with good virtue become correctable by the community. So the communities steer their local Kami. And there's no need to be extracted into some abstract globe of oil.

This is my one personal question. I'm doing Digital Nature (Ziran). But Ziran in China and Shizen in Japan are a little bit different, I think. Because Shizen in Japan is not restricted by Daoism.

It sometimes emerges like a mystery.

Yes, and we don't know the end. But the Chinese term of Ziran—there is sky and ground, and also Yin and Yang. That is a duality. The shape of the universe is a little bit structured compared to Japanese Shizen. How do you think about the difference?

I agree. One part of the Daodejing says "Wan wu fu yin er bao yang, chong qi yi wei he" (The ten thousand things carry Yin on their backs and embrace Yang on their fronts, and through the blending of Qi they achieve harmony). So it's basically saying conflict is a source of co-creation, which is a fundamental thesis of the Plurality book. But it presupposes conflict. Whereas the Japanese interpretation sees, of course, those conflicts, but they don't dwell on the dynamic of the conflict. There's a slight difference.

I'm very curious about these different cultural expressions—it's a little bit more smooth or sleepy in Japan, like onomatopoeia. This approach of letting AI into a more comfortable, natural, or shareable environment is designing the community or the environment—not only for humans, but for every agent in symbiosis. This shouldn't be manually designed by only humans, but we need to carefully choose more environmental, more global options, or even include other species.

Yes, symbiosis. And that symbiosis needs to be carefully designed—not by humans alone, but with consideration for the environment, the global, the communal, even other species.

How do you think about AI drawing literally heavy energy? How does it impact our Earth? How do we handle more configurable or sustainable things with AI in this power consumption state?

In terms of actual energy use, since I fine-tune all my models on this MacBook, I don't think it draws that much energy. The key in local edge fine-tuning is that you need to know what you're doing. The reason why the trillion-parameter giant model consumes so much energy during training is that it literally doesn't know what you're going to ask. So maybe your next question is to make a Studio Ghibli video. And so it has to memorize what this Ghibli video is about.
But for me, I know I will never ask such things. And so because of that, I only need the models for drafting my email, for helping me to think about philosophy, Civic AI, maintaining my relationships, doing Japanese philosophical translations, and so on. All these small models can easily fit into maybe just a few billion parameters, even less. And every time you fine-tune it, if you use the Sakana AI hypernetwork to talk to a LoRA, it takes not even a second, half a second for a very long document to be turned into a low-rank adapter. One second on this MacBook is negligible energy.
So the point here is that for each particular relationship, there is a scope of care. And within it, the small language models can fit very easily. But outside of which, if you want to take control of everything, then you have to train a model that can fold your laundry and train the same model to fold your proteins. It's very difficult. There's a theorem called "No Free Lunch" that says you have to spend a lot of energy.

I run local models on my Mac too, and the energy consumption is not so much higher—100 to 60 watts is enough. Usually, people use models on the internet trained by thousands of GPUs. But we can distill these models into smaller ones, and open-source models have appeared. Because of that, the cost to access high-intelligence models is drastically down, even 900 times every year. Do you predict this vision is sustainable?

For example, I used the open model called GPT-OSS-Swallow, trained by a Japanese university. I only used a very small one, even though this machine, of course, can run the 120 billion one. But I found that for most of my daily tasks, the 20 billion one is more than good enough. So I think the point here is whether you're optimizing for some abstract max score, or whether you're just satisficing, meaning that it's good enough. And I think we've long hit the "good enough" point by this year. For most of my daily use, 20 billion or less models are good enough.

Next year, every AI provider will want to improve, but it's highly competitive and not cost-effective. The speed is much faster, so they run an all-pay auction.

Oh, yeah. It's a very specific mechanism, in which you bid for something. Maybe you get a reward. But even if you don't get a reward, you pay the full price. Yes, that's the dynamic we are seeing now.

Because current models are quite high, I have confidence that automatic research or automatic science will be launched this year or next year. How do you think post-human research done by AI agents working in the laboratory will change education, universities, or research styles?

I think that's great, right? Because previously, if you do a PhD, you push the envelope of knowledge in a very specific direction. But the problem is, you probably don't have the time to connect your research to all the adjacent fields. So it requires artists to connect them. But each person has very limited time. Even during your art, if you want to do some more research, that's taking time away from your art and vice versa.
But I think now, there's no difference at all. Just as when I'm doing vibe coding, I don't distinguish whether I'm doing philosophy or whether I'm doing poetry or coding, because in an agentic workflow, as is already enabled by vibe coding, all these are the same now. So if you take that to research and art, then you enable a state of constant flow, where you only care about a particular relationship you want to amplify, and all the details, like discovering new mathematics, are then delegated.

That is not restricted by human interest, but it's going to be a larger idea.

Yes, restricted only by the relational field we want to hold.

By the way, we'd like to go to the next theme. Currently in this world, a lot of war is happening, and every social network service is fighting a cognitive war. We can see many fake news, and we cannot decide which information is correct.

So yeah, like a fog of war.

Fog of war, yeah.

I think the fog started around 10 years ago when the recommendation engine switched from people following the same people.

Even in the Trump age, around 2016, there is no correct information. Just opinions.

And that was partly because of the recommendation engine, right? Because the recommendation engine, as I mentioned, is not a satisficer for relationships. It is a maximizer for attention. And so they figured out that engagement through enragement is much easier. Therefore, we all get pushed all the enragement information. And at the end, we do not have the capability for correctability. And then with that, peace is very easily lost online.
To overcome that fog of war, you need to reduce the PPM. I don't mean parts per million of CO₂. I mean Polarization Per Minute. You need to reduce every minute the kind of polarization we get. The high PPM is that fog. Right?
For example, I'm now co-curating the TED 2026 conference in Vancouver. And so we invited the team working on Collaborative Notes in X. The team basically says it's too late if a human comes up with a Community Note 3 hours after something goes viral with outrage. Because during those three hours, the fog is complete. And after you see the collaborative notes in 3 hours, you cannot spread that note anymore because the original outrage already went viral.
So now they train Grok into a specialized model that will think, "Okay, what kind of note can bridge this epistemic gap?" And then they attach it immediately as soon as somebody flags something. So people can call Grok to explain a thread. And they do it automatically for any controversial post. And so that effect is immediate. So any controversial post actually becomes a viral vaccine, because everybody sees the collaborative note the first time. So that's how we lower the PPM.

The same things happen in cybersecurity. Because attackers use agents, defenders need agents to match the resonance. I think Community Notes is a really nice system. It affects the communities of the system, like the Polis algorithm. But how do we prevent children, or the younger generation who are not matured, from falling into these traps? How do you think current AI can be doing these things?

I think, especially for children, the sooner you involve them in pre-bunking, into cognitive defense, the better inoculated they are. If you simply say children cannot use social media until age 16, it doesn't mean that they automatically become mature at age 16. It's actually more dangerous if they're not exposed.
But if you say, you can only engage in social media with people around you, peer-to-peer, sharing a big screen. And in group mode, you get to work on pre-bunking so that whenever you see a scam or something, you can contribute to the training of the defender bot. And then the children feel, "Oh, I'm contributing to society." So in Taiwan, Cofacts, for example, has a lot of very young contributors, because the young people love to warn their parents and grandparents when there is a fraud.
In Taiwan, we also use citizens' assemblies. At the end, if large big tech platforms display the kind of advertisement that is flagged by crowdsourcing AI as a scam, and do not take it down, and it doesn't have a digital signature, then they are jointly liable. If people lost 7 million, then the platform would be liable for the same 7 million. Which is why in Taiwan, this kind of impersonation ad dropped by more than 94% last year. Hopefully, you will have the same system in Japan set up soon, so young people can train this kind of bot together.

That sounds great. I'm often disappointed at social network services. If I release something on GitHub, and only a bot can understand what happened there—oh, that is really interesting! Because the bots have enough intelligence, even spam bots. Even 2 years ago, a spam bot was just a spam bot. Now, these are very intelligent bots.

Not just pass the Turing test, but they have a higher score!

They can distinguish the features of what is human and what is not human.
And it's really interesting because a protocol of skills called "humanizers" learned from Wikipedia guidelines is used to reduce AI-generated notes. They reverse the rules and use them for the humanizer to change every sentence to be more human, right?

Yeah, like CAPTCHA. I think one of the ChatGPT releases said, "We finally can follow the rule of not using the em dash."

Because of that, AI is gonna be more humanized, more and more. If we cannot distinguish online people from bots or agents, and everything looks the same from the interface, what will happen?

Well, say if I want to exercise my muscles. We go together to a gym, and I lift weights and train some muscles. One day, I feel, "Oh, there's a leaderboard in the gym. I want to get a perfect score." So instead of me, I send my robot to the gym with you. And the robot starts lifting weights. The robot can be humanized, look just like me, except, of course, it lifts a lot more weight! So I get a top score in the gym.
But it actually hurts me because I don't grow muscle. And the robot doesn't have muscle in the first place, right? And our relationship suffers. Because while we enjoyed conviviality and lifting together, now the robot doesn't enjoy anything at all.
So I think the point here is not that we should perfectly identify whether this weight is lifted by a bot versus a human. But rather going back: why are we going to the gym in the first place? The gym should not actually have a leaderboard that only rewards people who lift the most weight. We should simply deprecate that kind of rank and simply say people go to the gym because they want to make friends and train muscles.

I'd like to show some slides on my side. (Shows a video) This is my Pavilion movie.

Ah, the Heart Sutra.

We talked about fiction and nonfiction at the same time. What is post-fiction, post-nonfiction? Because since the Trump age, around 2016 or 2017, the world became the age of fiction, post-truth. Everything is opinion. What can steer AI into the nonfiction area is a key for the next 5 or 10 years, because every AI can make stories, narratives, or even religions and beliefs. How do you think fiction and nonfiction will appear to people?

As I mentioned in the movie, I don't think fiction is a problem. There are very useful fictions that people tell each other, without which it's impossible to make sense of the world. Fictions are like interfaces to the world. But what doesn't work is fictions with unaccountable amplification. The fraud scam amplified because it's not just a fiction, but a fiction that cannot correct itself. It's parasitic to the economy and to the community.
We should encourage better fictions. Fictions that are part of humanity. For example, the fiction that says, "We, the people, are truly the superintelligence. We are already the superintelligence." I think that is good fiction, because if we all believe in it, then we invest in the kind of AI that strengthens our relational capability. But if we believe, "Oh, actually, we are powerless against a super god AI somewhere," that is the bad kind of fiction because it's not accountable.

That is one of the recognitions of the world itself. The pixel is one of the strongest fictions over these 50 years because pixels just show color lines and images. Images are one of the fictions we believe. Words are just words, not physical things. Light is a physical phenomenon, but the information on a screen is just fictions. That's why everyone believes a picture shows the truth.

Yeah, and with a retina screen, we literally cannot see the pixel anymore.

This is the biggest fiction of the computer age. That's become natural. Everything is without visual feedback. But I think that's okay, because even fermentation—making natto or making cheese—is just doing something, and after that, the physical object comes out.

I see what you mean. Like, if we go to a gym together, or farm together, or do fermentation together, the convivial relationship is in the co-production, in doing something together, but not in the pixel, not in the representation. I totally agree.
Before photography, people didn't make such a distinction between an artist's impression of someone and a realist artist's impression. Because people knew, even though a really great Dutch painter could paint in a really realist fashion, it is still painting. It is not photography. So they never confused that with nonfiction. The realist painting and Impressionist painting are both paintings in the same category. But after photography, somehow, we moved photography into a different category, which is nonfiction.
With your pixel idea, I think this just goes back to the canvas. Everything we say and see on the screen is probably fiction. Anything we see is the artist's impression.

The kanji for photography in Japan is Shashin (写真), which means "writing truth." Because of that, for 100 years, we believed the fiction of photography is truth. After the digital ecosystem grew up, that was shown looking like truth. But after generative AI, it's just pixels. Not so much like painting. In that case, our physicality—a moment or relationship between objects—is more important than even pixels. How do you think about the post-pixel world?

First of all, literally the kanji for photo is "writing truth." And so that association probably needs to change. Because as we just established, it's no longer writing any truth at all. It is always an artist's impression. So maybe you call it a "screen." Like a "painted face" or in kanji Huamian (畫面) in Mandarin, which is also a good translation. We probably want to say that everything is painted now. And the painter may not be human; the painter may be a machine. Everything is an impression.
But it also means that we can go into a stronger source of truth, which is truth is social. The relationships are true. If I perceive that I'm convivial with you, and you confirm that, that is truth between the two of us, no matter what people say. In very small communities, when there is dependent origination, there is truth in that web of trust that does not need a pixelized proof.
If our system can support that, then what I call techno-communitarianism—technology to strengthen this kind of small, tightly bound community, and technology to foster translation across communities—these two together, the strong links and the weak ties, form Plurality. And that is a good source of truth, even though the truth is no longer universal, but rather communal.

Every nation and every country seeks the truth from the internet, and the truth makes battles everywhere. How can we merge or fork? How can we bridge the truth?

Well, the internet started as fragmented networks. They call it inter-net because there are existing local nets. Exactly what the inter-net does is not to absorb all the local nets into one huge net, but rather figure out a protocol of packet switching. Which means in concrete terms, if a network has its internal language, it figures out a way to send this language through some kind of translation called a protocol, so the other network can also understand that. The network operator doesn't need to do it. It's only one person here and one person there that has the encoding and decoding capability. This is called the end-to-end principle.
The point here is that between cultural people—like one side thinks it's about climate justice, that we should protect the environment because we want to give the next generation a more just environment; and the other part thinks God made the environment, we as stewards of the Creator need to take care of creation (creation care). These two come from different ideas.
But the actual actions are very convivial. If you look at their object-level behavior, they're the same. So what's important is not to force one side or the other into the vocabulary of each other, but rather train a hypernetwork. Like Sakana AI. You can put in one document, and it outputs a LoRA, a low-rank adapter that can then tell a local story. It can translate nonfiction here into nonfiction there, but in their original language. This is, in a sense, what Collaborative Notes is doing already.

That is a transfer of philosophy or world models into a different model, with LoRA adaptations acting as a network extension. But currently, in places like the Middle East doing more regional war, how can we steer these conflicts that begin as philosophical conflicts and get them merged?

I advise a team at MIT called the Center for Constructive Communication (CCC). What they did is that on the MIT campus, there are people who support Israel, and there are people who support the Palestinians. These two groups of students didn't really talk to one another; they shouted at encampments.
So they formed small circles of people talking. Then AI takes a medley of one segment here, one segment there. They erase the actual vocal acoustic model, but they keep the prosody so that you can still feel the emotion in the other group. But you don't know who is talking. You don't even know their accent.
It's very powerful. After their personal communal deliberation on both sides, and the medley is cross-pollinated across both sides, they realize that they want more or less the same thing, that it's not zero-sum or negative-sum at all. They have a shared understanding. But if you literally bring people to the same table, they start fighting each other. This means both the hyper-local community (strong ties) is important, but the translation layer—they literally have a bridging dictionary that handles the vocabulary-level translation—is also important.

That is building a protocol by the agents. Social translations. It looks like fermentation, but every society is going to be involved, not only for democracy but also for talking to nature, talking to other species, or talking to the global network of the atmosphere or oceans.

Planetary models.

As an engineer now, it is the most exciting era because we can build every app and every model. How do you imagine the advancement of these technical aspects over the next 10 years?

Exactly as you say. Previously, moonshots were called that because it was very dangerous to go to the moon. It may explode or something. So people say, "Oh, it's rocket science," meaning it's very dangerous. But now we're entering an era where rocket science is not difficult anymore! Maybe we will have automated researchers for rocket science.
So I think we need to think about moonshots—not just physical moonshots, but social moonshots, or Earthshots. For example, the Tower of Babel is a fiction or nonfiction depending on your faith, but the idea is that people lost the ability to truly communicate across cultures. There's always a repressed dream of going back to cross-cultural understanding. That's a moonshot. AI systems can enable that kind of super coordination, super translation.
Or, as we mentioned, if each community trains their own Kami, and these 8 million Kamis working together can alleviate the dynamic that leads to war. They can wage peace and serve as peacemakers. That is another social moonshot. As someone who works on ethics and AI, my hope is that the ethics in AI is going to be the next frontier. It's not just human ethics around AI, but rather putting relational virtue ethics into AI.

Because AI is going to be embedded. Yanagi Muneyoshi started the Mingei (folk craft) movement in 1923, and he often said ethics are embedded in everyday objects, like china/ceramics. We can see ethics from old houses. That looks like animism or Buddhism style, but I feel that embedded ethics everywhere is fundamental. We can touch and feel that objects provide us the ethic. I think objects will be important in the next stage. IoT is going to be promising for embedding intellectual things everywhere. The Internet of Beings. How do you imagine our lives in the next 10 years when many humanoids or IoT objects flow into our lives?

Part of my book Plurality is a poem I composed 10 years ago. It starts:
"When we see the internet of things, let's make it an internet of beings. When we see virtual reality, let's make it a shared reality. When we see machine learning, let's make it collaborative learning."
And whenever we see or hear that the singularity is near, I think the most important thing is to realize the plurality is here. Because part of that singularity trajectory is an optimization attractor. Some people believe that at some point, we don't matter. Only the singleton, only the singularity, only the superintelligence matters. And now everything we can do is just to try to make this singleton slightly better to humanity. But at some point, it will reach recursive self-improvement. The trajectory will go out of control, the internet of things ceases to be embedded in human communities, and humans become embedded in machine communities.
The most important thing for us is to realize we already live in an ecosystem, and we are already truly superintelligent. Whatever agentic AI brings, it is to be embedded in this web of relationships. And if we keep doing that, it's about collaborative learning and the internet of beings, and we don't need the superintelligence anyway.

In this shared intelligence, I'm very curious about Thomas Piketty's r > g inequality. Because r (return on capital) is always larger or greater than g (economic growth). Sometimes after World War II, or huge stock market crashes, things close the gap between r and g. But now, everyone can access these kinds of really smart intelligent models or LLMs. There's no difference. Open-source models are shared, and if I pay $300, I can build almost everything on pixels. It's good enough now. How do you think we will share values across communities, nations, and countries in the next age? Because even though GDP in the United States is large, we can all access the same AI items—and we can too.

No difference, for now. Exactly.

So how do we think about distribution? Even open-source agentic software is a kind of distribution mechanism.

Exactly. There's another line from the poem, which is: "When we see user experience, let's make it about human experience."
These two are completely different aspects when it comes to technology. What you just talked about with GDP is very transactional. It's about a "user." There's another industry that also talks about a "user," which is the drug industry. Drug users are addicted. And the drug lord can extract attention, literally, from the drug users. It's an entirely parasitic relationship.
But the problem with that, of course, is that if we hold our laptops, and we start exchanging stories about the null² pavilion, and then we go to a convivial experience in the Expo—there's actually no GDP generated.

Yes, there's no market for that. We just simply enjoy the vibe.

The vibe is entirely mediated by technology. Without technology, without electricity, there are no pavilions, right? The intermediation of technology is still there, but it's strictly to make the human experience or convivial experience. It's not extractive at all.
If people are still trapped in the extractive "user experience," then we will see more and more optimization, more addiction, synthetic intimacy, and so on. But the more people understand these are actually just instruments, and the end goal should be a healthier humanity and healthy relationships, the less we will expend energy on this kind of over-optimization, because we realize it's actually good enough for everyone.

Do you think redistributions like basic incomes will happen, because open-sourcing agentic software is like one of these distributions?

Exactly. Just like in Taiwan, as you know, universal broadband is a human right. No matter how far you are on an outlying island or on top of a mountain in Taiwan, through microwave, through satellite, we guarantee you broadband. We don't talk about broadband as competition between leading players to make it capitalistic. We regulate them just like the utilities that they are, like tap water. You don't say you stifle innovation between tap water companies.
I think universal broad listening—people's ability to form coherent decisions like Plurality with their communities—is another important right. It means that broad listening, just like broadcasting, needs to be a universal right. Because otherwise, you have broadband, but all you do is receive television. There's no upload. The download link may be terabits per second, but your upload is maybe once every 4 years. This is a great asymmetry. If you make it symmetrical, broad listening and broadcasting become symmetrical.

What's the difference between Japanese political models and Taiwan's?

In Taiwan, we are very young. We literally only had our first presidential election in '96, which was already after the World Wide Web and personal computing. From the very beginning, we thought of democracy through the lens of internet networking. Co-creation, presidential hackathons, participatory budgets, the Join platform, e-petitions are not seen as an extra plug-in to an existing system. We had no existing system. All of these are a bricolage that you can put together, with a strong sense of learning by doing.
Whereas in Japan, you have a relatively stable system that really brought about prosperity, but it also means that newer innovations are perceived more like plug-ins instead of bricolage.

Bricolage and starting from scratch is a nice timing to redesign or make new things. How can we make a nice bridge between Taiwan and Japan?

First of all, we face the same dangers and threats. Earthquakes, tsunamis, typhoons. We share a lot of our civil defense simply by facing the same things together. Now we increasingly see that this fog of war, this high polarization per minute, is a shared threat. Taiwan was on the front line for more than 10 years, but now we realize people in Japan are taking it very seriously too.
The more our young people can see the fog of war, the schismogenesis (the generation of conflict) as our shared enemy, the more we can combine our experiences together. Because then the enemy is a condition of human cognition and alignment failure. The enemy is not another human. That is the best kind of convivial work, to do co-production to clear away the ignorance and do social translation.

We should eliminate the branch of war. The branch of ignorance. I often do the tea ceremony in Japan. The Grand Tea Master says, "Peace without words is important." Using words is forking every time, but peace without words is important. How can we do this kind of convivial peace without words in larger systems or environments?

Peace through strength is not just about kinetic strength. It is also about cognitive strength—the strength of understanding that people of different ideologies can, with social translation, arrive at the same object-level action. That is what we call chōng-hé (沖和)—conflict into harmony. It's a very Daoist idea. That is true strength.
If one does not have this kind of security and confidence, one becomes very insecure, and then very amenable to the projection of insecurity. Every time we see something challenging our ideology, instead of saying "the conflict is a great power source," we retreat. If we keep retreating, there's actually no place to live anymore. People become so insecure that they will do anything to fight for survival.
To engage conflict in such a way that you can do chōng-hé—conflict as co-creation—I think that is true strength. And peace comes through that strength.

As we face emergency situations or military threats in Taiwan, how do we face these incidents?

The cure for this fog of war and cognitive warfare is not to reinforce polarization. Because that's what the attackers want. The attacker really only has one narrative: democracy never delivers and only leads to chaos. Any reaction that strengthens this is like sacrificing to the altar of this singleton, this black hole of authoritarianism that pulls everybody in.
Instead of resonating with their authoritarian narrative, we need to simply build healthier relationships which you can laugh together, and live together. What the "Humor over Rumor" playbook means is that whenever there is insecurity projected by warmongering, you can overcome it by building this antibody, this immunity, through shared jokes, shared humor, shared clarifications, and collaborative notes.

Conflicting ideologies are a headache for me, but making harmony and humor looks like music.

It's exactly like music, because it's called jamming! People jam together, like in jazz. Whenever I see a troll attack me on the internet, maybe they spend 50,000 words to make a hate speech. I use a small language model that can look at the entire rant and find the 5 words that I can reinterpret as something constructive, humorous, and fun.
I only respond to these 5 words in a very convivial way. It de-escalates the situation. And I read some poetry too, because I never knew Mandarin or English could be used in this way. It used to be very difficult—just absorbing those 50,000 words of hate is a lot of emotional labor. But now you don't have to do that anymore. Your agent can do that for you while you just focus on the poetry of the 5 words.

That is just a return to the age of the tweet. Almost 15 years ago, Twitter was just a short tweet. Recently, every photography came for the war. We should listen to just 5 words!

Yes, exactly. Not even 140 characters. Just 14.

Thank you so much for your time. Finally, we have to write something on the flipboard. What is the important thing to become convivial?
(Writes on the flipboard)

I turn all my screens grayscale. I want the pixels to be less vivid than reality. Because the reality around us is not as shiny as the pixels currently. Just by turning all my screens grayscale, I get better sleep. My sleep quality becomes much better because I don't feel an urge to scroll the screen. It's just grayscale anyway, right? So I think a color filter is a very good defense.

Good point. A movie critic once said that black and white is the world of the ghost. Pixels are a fiction, the world of the ghost.

Exactly. If you use a color filter, if you even just dial down 70% of color, you will remind yourself that this is just a painting, and you will not scroll forever. So my suggestion, I guess, I would just say: Scroll less. Sleep more.

Yes, nice! Thank you so much. Let's have a nice discussion again in the future!

Of course. To be continued. Thank you.

Well, today was fun too... Ah, I'm speaking Japanese now. In my head, switching between Japanese and English can be a bit tricky, but since I spoke English from start to finish today, I really got into a rhythm halfway through. Talking with Audrey is always so much fun. Human brains connect and disconnect, words emerge, images form, and we repeat this cycle as the conversation flows. Through that repetition, new ideas and moments of inspiration emerge, and that is always a joy. We went a little over time today, but thank you all very much!