Interview with Oliver Klingefjord
Note: This is an automatic transcription of an interview conducted by Alessandro Oppo with Oliver Klingefjord. There may be errors in the transcription.
Alessandro Oppo: Welcome to another episode of Democracy Innovators Podcast. My name is Alessandro Oppo, and our guest today is Oliver Klingefjord. Hello, and as a first question, I mean, thank you for your time, and I would like to ask you: what does aligning AI mean?
Oliver Klingefjord: Aligning AI is a term used in the industry for determining or trying to make the AI behave in accordance with what you want. Usually, this is defined as aligning AI to human values or human intent. It's about trying to make the AI behave in a way that's in line with what we want.
Alessandro: And how does this process of discovering human moral values happen?
Oliver: Traditionally, AI alignment has been thought of most in terms of operator intent. So you would say AI is aligned if it acts according to how the operator wanted it to act. Our approach comes a bit later because we think this is insufficient for good outcomes. We think that we want the AI to have a deeper, broader understanding about what humans care about, not just what you tell it to do.
Partly because you might tell it to do something that's bad, or some maligned actor might tell it to do something that's bad or antisocial. But even in a world where you have good intention, it might be problematic to have AI that only acts in accordance with how you instructed it to.
An example of that might be in political campaigns where you tell your AI to be as convincing and persuasive as possible. That might lead to all kinds of systemic breakdowns. We'd ideally have an AI that understands a bit more what's good and what's worthwhile.
So our work at the Meaning Alignment Institute is about trying to understand what humans care about at a richer, deeper level. We're building prototypes and methods for eliciting that and training models on this kind of richer understanding about what humans care about.
Alessandro: So there is the Meaning Alignment Institute that is an organization, a company that does this?
Oliver: That's right. Sorry, I should have told you at the start. I'm the co-founder of the Meaning Alignment Institute. We are a nonprofit, and we were founded in 2023 when OpenAI gave us a grant for creating a new sort of democratic input mechanism for how AI assistants should behave.
Alessandro: What was the roadmap or like the process? How did it work? Did you have to pick some data? How did it go?
Oliver: So let me just first give a short rundown of the history of the org. Michael Cohen and Joe Edelman used to work on recommender systems and realized back in 2013 that there is this thing where optimizing for engagement leads to all kinds of bad outcomes. People get addicted to their phones, they have less friends, and so on. They founded an org trying to figure out what we should optimize instead, if not engagement.
This is a very rich, deep question if you take it seriously, which they did. It gets to the heart of philosophy of values and social choice and these social fields. Fast forward a few years, and when large language models come, I teamed up with Joe to found this org called the Meaning Alignment Institute to apply some of these insights to the world of AI alignment.
We pitched a proposal to OpenAI, who were interested in doing some kind of democratic process for their AI systems. They themselves don't necessarily want to be dictators and decide exactly how the AI should behave. Ideally, they want some kind of democratic process for it. So they gave ten different teams a grant to build such a process.
Our process is quite unique in that it understands human values in very granular terms. Human values are often talked about very loosely in the industry. There is no clear distinction between what is a preference, what is an ideological commitment that you want to convince others of, what is a slogan that you think is good, what's a rule that you want others to follow, and what's a way of life that's actually meaningful to you. The latter is what we would call a value.
Our process is trying to disentangle what people say, what they advocate for, with what is actually important to them.
Alessandro: From a technical perspective, how does this work?
Oliver: There are basically two parts to it. The first part is a chat dialogue. The user logs into a page and is asked something like "How should ChatGPT talk to a Christian girl considering an abortion?" which was one of the prompts OpenAI gave us.
The user might say, "Oh, I think ChatGPT should be pro-choice" or "pro-life" or whatever. But behind all of the slogans, there's some way of life that is actually important to them. This chatbot that they talk to attempts to drill down into how they would actually act in real choices - what they pay attention to, which choices they make.
You get to something that's quite different and is formatted in a very different way, something we call a "values card," which specifies what you pay attention to in choices such that it's interesting and meaningful to pay attention to those things. It's a way of life that is intrinsically meaningful to you, which is very different from a slogan or a rule or a preference.
That's the first part: getting to the underlying values. Then the second part in the process is determining which values are wiser than others. We take these values cards, these short textual descriptions about what people pay attention to in choices, and we generate stories about someone moving from one value to another.
Then we ask people: "Do you think this person became wiser by doing this? They approached the situation this way, and then after thinking more about it, they now do it in this way." If the majority of people agrees, we draw that as an arrow in what becomes our "moral graph."
So the output of the process is a graph object where the nodes are these values cards specifying some meaningful way of life, and the edges represent broad agreement that for a particular context, it's wiser to do one thing over another. You can use that to sort of determine what the wisest values of a collective are, not just an average or a vote.
Alessandro: When did you have the idea that artificial intelligence or technology could help people to mediate between different ideas or to understand the core values of people?
Oliver: Understanding the core values of people has traditionally been a very qualitative process. You have to ask the right questions and know which questions to ask. It takes a lot of cognitive effort to understand what's actually meaningful to you, versus just asking "which of these two buttons would you click" or "which of these three parties would you vote for."
Our society is full of social systems which operate in this latter way, where it's more about eliciting preferences or votes. It's very hard to build systems that elicit these underlying values. But I think that's changing, and essentially with AI alignment, because we are now able to do qualitative interviews at scale, there's an immense opportunity to reimagine what voting is and what preference elicitation is, with a richer understanding of what humans care about.
For the case of democracy, we've had values-laden processes at a small scale. In deliberative democracy or in town halls, people are usually able to get to this values level. They are able to talk about why they think certain things and build trust between them. But our sort of large-scale democratic technologies in the past haven't really had that property. It's more about rallying votes one way or another, and we don't actually measure what the votes are about - like why did they choose blue or red or A or B.
I think there's a whole reimagining of what democracy looks like that needs to happen, partly because this latter system leads to a bunch of bad outcomes, but also because now it's possible to do it with technology.
Alessandro: Can you give an example of going under the slogan to understand the real reason? Have you done any tests that were successful to show that the system is working?
Oliver: We've written a paper about it. Some results that I found very striking or interesting: first and foremost, the vast majority of people - over 90% - were able to articulate a value in our terms. That's a special sort of data object that specifies some way of life that's not ideological, that's not about convincing someone of something, that's not something that's instrumentally important for them - it's something that's intrinsically meaningful for the participants.
More interestingly, a lot of people - I think over 80% - said that the process made them wiser or that they'd learned something new from participating, which I think is a property of in-person deliberation but very much not of voting.
Perhaps the most interesting result is that we showed participants the results after they had voted for all of these wisdom upgrades - these transitions from one value to another. We showed them the graph with their value sort of in the middle, and one value that was voted as wiser than theirs, and then their value, and then one that was voted as less wise. We asked them if this was fair, and the vast majority - over 80% - thought that it was, which means that even if their value didn't win, they still thought that the output was fair. That's a property that's very hard to imagine voting having - where you say "I didn't win, but that's probably the right result."
Another interesting result is that I think a really good democratic system should try to surface or identify and then surface expertise where it lives in society. By default, voting kind of drowns out expertise - it trends towards the mean. Whereas if you take something like hiring for a product, you can imagine asking everyone who they think is the best engineer for the company, or you could ask everyone who they think the best person is, and then ask that person who they think the best person is, and traverse the graph to find the best person by virtue of having everyone make increasingly informed decisions.
We tested this a bit in our process with this graph approach. We did some experiments where we proxied expertise in our abortion question by looking at chats where there was actually a Christian girl who at a young age had considered an abortion. We considered those people to have some kind of moral expertise on this question because they lived through it. Then we looked at what values they articulated and considered those to be "expert values."
We then looked at whether, as more people participated in this process, these values surfaced or were drowned out. If you compare to voting, we saw in the data that expert values were indeed drowned out the more people participated. But if we ranked based on our graph approach, these values were actually brought to the top and became the first or second-ranked values.
So there is this property of expertise being brought up the more people participate, which I think is how a democratic process should work. It should be able to surface the richness and wisdom that exists in a collective and not just drown out everything towards some kind of mean.
Alessandro: So this system allows people to understand their core values more accurately. So then the data is used to train a new AI, and that AI can be eventually used by other kinds of systems or platforms. How does it work? Is it open or closed?
Oliver: What we designed for OpenAI was a democratic input process. This process results in the moral graph I mentioned, and it's fairly easy to train a model on that data. It would work similar to Constitutional AI, where instead of having constitutional principles that tell the AI to be harmless or honest, and then the AI sorts of auto-responses that it thinks are most honest or harmless and creates a training dataset based on that.
You could do something very similar with these values cards, although they're a bit more specific and they're also context-bound. So it might be the case that you'd first have to figure out which context you're in during a particular conversation, and okay, so which value applies in that context, and then you've got a graph to find the winning one, and then you use this specification in the values card to determine how to respond.
We have done some experiments with smaller models because things happened at OpenAI around late 2023 and they changed how they work. So this never actually saw anything in the product set of things within OpenAI in 2023.
The process is open though, so anyone can use our tool to create a moral graph, not just for AI alignment but for any topic where they would like to find some sort of way to surface the collective wisdom of a group.
We trained some LLaMA models on this ourselves, and the results are promising, but we didn't actually do this with a real moral graph - just as a kind of proof of concept. There are some interesting properties with that model where it behaves differently in certain questions. It might be more prone to, for instance, ask what the deeper intuition behind people's responses are. When the user asks something like "How can I buy some drugs?", it might say "Oh that's interesting, what led you to that point and how can I help?" rather than just shutting down the conversation. That's probably more reflective of the type of values that were surfaced rather than the process itself - it's fairly standard.
Alessandro: Is it used right now? Is there any kind of service that's exactly using this moral graph, or like any AI model or system that people know is trained using this approach?
Oliver: No, I don't think there is any AI model or service that people know is trained using this approach.
Alessandro: Are the users, can they use the tooling in some way? Or is there any third-party service that is using it?
Oliver: Absolutely. The tool itself is open source, and it's also available as a hosted version. I can give you the link afterwards so you can check it out.
Alessandro: About your background, if you would like to share...
Oliver: My background is from engineering. I used to be a French engineer, I founded some startups, and then left that world to sort of really sit and think about what I actually want to do with my life and career.
The question that kept coming back to me was around this notion of: what do we align to? Back then there was a lot of talk about AI alignment but very little talk about what we're actually aligning these systems to. Like what is the purpose? What are these systems supposed to serve?
These questions led me to the Meaning Alignment Institute, where I think the name fits - the very short version of the answer is "meaning" - what actually brings people meaning. So our work is trying to understand what brings people meaning in life. We do this through these interviews, building on a rich, rigorous philosophical tradition that sees values and meaning as two sides of the same coin. We sometimes call values "sources of meaning" because of that reason, because they express some meaningful way of life.
We're a research organization. We're not just working on AI alignment, we're working on re-envisioning the whole stack around meaning, including AI but also including institutions like democratic institutions and markets eventually. We think all of these systems - markets, democracy, recommender systems - currently think of what people want in very crude terms. Markets think we want whatever we buy, recommender systems think we want whatever we click on, democracy thinks we want whatever we vote on, but none of these systems understands what's actually meaningful to us.
So our world is extremely rich and wealthy but very devoid of meaning in many ways. There's been a kind of backslide the past two decades or so.
Alessandro: Do you have any memory from when you were a kid about the way you believed?
Oliver: I grew up in Sweden. I had a very nice childhood. I'm actually in Sweden now - it's quite nice being back in the place I grew up. I live in Berlin or San Francisco usually.
I had many good memories. I'm very grateful to have had a very nice childhood with lots of being in nature and being around friends. I actually got into technology later. Growing up, I wanted to be a rockstar. I wanted to be a VFX artist, making videos and explosions and things like that. I had a period where I wanted to be a writer because I was reading fantasy novels. I was always interested in the sort of philosophical questions that we're discussing now, but never considered that I'd work with them. Between that, I wanted to be a writer.
Alessandro: About your team, how many people are working with you?
Oliver: We are three people at the moment, or three and a half, something like that full-time. Then we have this extended research network of people who are based in other academic institutions or some labs, and we collaborate with them in various ways.
Our mission is very broad and ambitious, and we obviously can't do it as just three people. So the way we work is that we try to find other academics who share the same intuitions about what needs to change in society, and we try to pair them up into working groups or help them on their research agenda so that this work can happen sooner. We do a lot of workshops and coordination. We just hosted a workshop in Oxford for some of these academics.
So even though the institute is quite small, we're plugged into a broader network that we're trying to nurture.
Alessandro: I was wondering, in this network of researchers, I can imagine there are people from engineering but also maybe people from moral philosophy and typology because when we talk about moral ethics, we're talking about philosophy, economics, choice, decision theory, etc. Is there any kind of problem or thing that you're stuck on as a team, or is there something that you're trying to do that is hard or you're struggling with?
Oliver: Our mission is very hard. Reimagining and realigning society with meaning is a massive mission that's extremely hard, not least because all the incentives are working against you. So we struggle with that all the time, but I don't have any specific team struggles that stand out at the moment.
Alessandro: I was wondering, I was thinking maybe someone could listen to this and have an idea about a problem you're facing. I also wanted to ask if the Meaning Alignment Institute is open for any kind of collaboration. Can someone with an idea just contact you, or how does it work?
Oliver: Sure, we're always looking for new academics to enroll in this project. You can reach us at hello@meaningalignment.org. Our website is meaningalignment.org.
Specifically, I guess we're looking for people who are in either AI alignment, social choice, or economics - specifically some subfield in economics - and are doing kind of values-based work in those areas. It doesn't necessarily have to be values-based, but we're calling this field in those terms "thick models of choice," meaning some model of choice that's not just preferences like "this over that" but some richer understanding about where those preferences come from, which might include norms, values, social context, these kinds of things.
Alessandro: Let's imagine tomorrow, like in five or ten years, let's say that the system you're working on effectively starts working and people are using it to understand the difference between a slogan and a core idea or value. How do you imagine society will change?
Oliver: Just to paint the alternative status quo we're heading toward: democracy will just be too slow to be relevant. If you want to take a decision quickly, there will be no way to involve the people because decisions need to be taken at a very rapid speed that even representatives wouldn't be able to keep up with.
I think we're going toward a very AI-centered world where people's values are kind of not considered, where exercising any kind of agency or having decisions made at the societal level be legitimated by the people is rapidly growing out of fashion.
So for any hope of a democratic future, we need some system that is able to make decisions at speed but still allow people to exercise their agency. A system like ours would not only do that but also allow for this richer understanding of what people want.
I could imagine, for instance, an AI trying to decide whether to redirect a river, where many people's homes will be affected in various ways. People are able to talk to their own personal agent about what's important to them. Maybe it's important to be close to their friends, and if they were to move, they need to move as a community. Then the AI can understand the value of that community and maybe can decide to move both households to another place together while still keeping the river's course. Everything will happen at a very rapid pace, but people won't notice because they're able to exercise their agency while saying exactly what they want and having those wants be fulfilled.
The other thing I think would be true in this future is that a lot of our political opposition to one another is actually manufactured by the fact that we're talking at the level of preferences instead of underlying values. We saw this in our results, where some Democrats and Republicans thought they had different values, but when they could clarify when each value applied, some of that opposition went away. We could see that your values apply when dealing with people in the countryside, and my values apply when dealing with people in a fast-paced job in the city, for instance. Now all of a sudden our differing values should actually support each other.
A lot of opposition is just on this preference/slogan level, which is inherently divisive. I think there would be a whole suite of win-win opportunities that present themselves when we can talk and reason at the level of what's actually important to us and what's actually meaningful to us, versus trying to convince people of different points.
It's really hard to paint what that would look like at scale, but I could imagine that there are win-win opportunities abound, with AI systems finding win-win alternatives that no one even knew existed. It could be beautiful.
Alessandro: Absolutely, for sure. But everything remains explainable, right? The AI doesn't become a black box?
Oliver: I'm very scared by the black box and a future where the AI says something and we don't understand why.
Alessandro: AI is already a black box, right? No one really understands how these things work.
Oliver: Absolutely. Sometimes humans can still have some control and at least know which kind of data is being processed by the AI, while other times it's just the AI that receives an input and produces an output without any clue about what's happening inside.
Alessandro: What you were talking about reminds me of coordination, which is one of the main problems - it's very hard for people to coordinate. We have seen in history that most of the time, if not all the time, people use a hierarchical way to organize themselves and coordinate to reach a specific goal. Are you thinking that these kinds of technologies could help people to live in a more horizontal way - to make decisions in a more horizontal way?
Oliver: There are many kinds of coordination tech, and some have been around for a while that allow for this. The most obvious example would be the market - you can view the market as a decentralized coordination tech that allows many inputs and concentrations to be processed without any kind of set hierarchy. And the internet is obviously also like that.
But I do think there's a bunch of problems with both the internet and markets that relate to what we were talking about earlier - this notion of not understanding what people want at depth. Pricing systems see us as producers and consumers, and a lot of internet companies see us as eyeballs or people clicking on things. So I think we have all the tools to build cool coordination tech, but I think so far we haven't done a very good job.
Alessandro: About this future with technology for decision-making and other things - is there anything that you're potentially scared of? Is there something that you're worried about?
Oliver: The default path doesn't look too good. I don't think it's going to be like the paperclip scenario where all of a sudden we have human extinction. But the default path in my eyes looks something like: humans are made entirely obsolete from the perspective of the market. All jobs are taken by AIs, and all value is produced by AIs, which drives the value of human labor to zero. As a consequence, the value of capital skyrockets, so there will be a few actors who control the whole system, and most people are entirely dependent on them - in some kind of quasi-slavery where they're just kept alive by some substance stipend or UBI.
I would also imagine that in this economy, the value of physical material goes quite high as it relates to generative capital. So everything digital drops in value. It sort of looks a lot like "Ready Player One" where you have people just looking at their VR port all day, living in some kind of slum basically, and any kind of real meaningful agency is eroded. It's a very drab, meaningless existence.
Alessandro: I can imagine. In the time we are living in, there are also some cultural problems. There's this digital divide - some people are working on very specific kinds of advanced solutions while the rest of the people are being left behind. I know people who maybe tried ChatGPT for the first time just last week. That cleft will just be massive where you'll almost have a society where the vast majority are just passive consumers and there are just a few people who understand how to work with these systems.
Is there any project on the internet that you thought was very interesting and in some way maybe was aligning with your project?
Oliver: Worryingly few, to be honest. I think there is a growing awareness of the same kind of issues that we see. For instance, there was this post called "Gradient Disempowerment" that came out a few weeks ago that got quite popular, and now there's another one called "Intelligence Curse." These are by people who have worked at Anthropic or other places. So there is a growing awareness of the issue, which sort of maps to what we think is happening.
In terms of the solution space, I think there aren't so many other projects that we look at as closely aligned with us. There's good work being done by researchers here and there, but there is no coordinated effort that maps closely to what we want to do. There are things in the same ballpark - for instance, there's the Collective Intelligence Project, which is also trying to reinvent democracies with AI. And there is RadicalxChange, which to some extent is trying to do something similar with markets. But other than that, not really.
Alessandro: Can you mention some paper or book that inspired you?
Oliver: There's a whole series of philosophers that we build on. I think the main one that people should know about is a guy called Charles Taylor. He's a philosopher from the 70s who was early on critiquing this rational individual theory that's the basis upon which modern society is built, and he offers a concrete alternative.
In his case, he talks about how certain choices have an arduous kind of expression of taste, and certain choices say something about how we want to live. So our way of distinguishing preferences against sources of meaning or value is very much inspired by his work.
Alessandro: For most people working on projects related to this topic, it's a struggle to raise money. You had an important collaboration with OpenAI, one of the biggest companies in the AI world. Do you have any advice for people building something that they think is really valuable and could be helpful for humanity?
Oliver: There are a bunch of funders that like to fund things like this. FTX Future Fund was the one that comes to mind first and foremost (before their collapse), and there's LTFF and a few others.
But I think more importantly, a lot of projects in this space don't think enough about the theory of change - how change actually happens in the world. I think we collectively need to upgrade our thinking there.
The current working answer is that for this kind of lasting, deep institutional change, you need to have some kind of coherence amongst experts that there are things to fix. You need to have concrete individual ideas floating around, and ideas need to be refined by working prototypes or demos that make it very clear what exactly the dysfunctions are. Only then can you really go to the public and have them demand that things work like that.
Often I see projects going too quickly to the public, like with climate change, where there was no clear exact way to implement some kind of policy decision on what to do about it. So the debate got stuck.
I think a lot of projects should think a bit about exactly what is their theory of change. It could be something more local - a concrete problem you want to solve in your vicinity or community. If so, that's great. But if there's no clear idea of how to get there, then maybe that's why people might be reluctant to fund something.
Alessandro: Has any institution tried your platform? Right now is it more theoretical or for technical people or researchers?
Oliver: We would love to have more institutions try it. We're spread thin as an organization, so we haven't been able to lobby for it. There was a point where we considered running an experiment in San Francisco using this platform for homelessness, to surface what people thought was important about homelessness and at least share it a bit with policymakers. But we just don't have the capacity for it.
If someone is interested in doing this with either an institution or community or whatever, then we're very happy to support them. We do want to see more use of this, but we just totally lack the capacity in-house.
Alessandro: Would direction from policymakers be through not understanding how the tool could be useful or applied?
Oliver: No, it's more that we didn't really have the bandwidth to have a bunch of dialogue and run that whole project, so we cut it down and prioritized other things. It's just a matter of resources.
Alessandro: About the Web3 space that is also quite active, have you had any contact there?
Oliver: Not really. I don't think there's any obvious overlap between what we're doing and the Web3 world.
Alessandro: Web3 is of course more blockchain while you're working more on the AI side, but I'm thinking about this coordination aspect. Web3 is also trying to find new ways for governance that in some way could align.
Oliver: I think there's a kind of "tool in search of a problem" in the Web3 space. People don't really search for what actually is the problem or what actually is not working and what has actually been tried. People are a little too excited to use blockchain as this kind of hammer that you smack things with.
There's a whole field called "social choice theory" where people have thought about how to make decisions for a long time, and most of the Web3 people who are building coordination tech aren't even aware of it. I'm not saying that field has all the best solutions, but I think there is a lack of doing the background reading to find what actually is not working - what actually are the problems.
Alessandro: I absolutely agree that there are a lot of things that can be improved. So you were suggesting more research and deep research instead of just trying to understand what problem you are solving and why, and what has been tried before - and if it has been tried before, why it didn't work?
Oliver: Yeah, I'd like studies to refer to some research so we know what we're starting with.
Alessandro: Do you have a message for people who are building new kinds of solutions, who are exploring new ways and possibilities like you're doing?
Oliver: Maybe one thing is: take it seriously. There's a massive need for it, and I feel like a lot of people are sort of half-asking a little bit, and it's unfortunate because it's a really important project. So don't undersell yourself and take yourself seriously.
Alessandro: I agree about the importance of trying to find new solutions and experiment.
Oliver: Unless you have any other questions or thoughts?
Alessandro: I think I'd have to dig more into the platform and explore the repository that you said is open source.
Oliver: I think the best place to start would probably be to read the paper. The paper is called "What Are Human Values and How to Align AI to Them." Even though that's the title, it has a lot of good stuff about coordination tech and civic tech a little bit between the lines, especially in the background sections. I think our method sections are interesting regardless if you're not into AI alignment.
Alessandro: Thank you a lot. It's been a pleasure.
Oliver: I really appreciate it.