AI Alignment
Follow
Following
Directions and desiderata for AI alignment
I lay out three research directions in AI alignment, and three desiderata that I think should guide research in these areas.
Paul Christiano
Feb 6
Benign model-free RL
Benign model-free RL
Reward learning, robustness, and amplification may be sufficient to train benign model-free RL agents.
Paul Christiano
Mar 19
Prosaic AI alignment
I argue that AI alignment should focus on the possibility that we build AGI without learning anything fundamentally new about intelligence.
Paul Christiano
Nov 18, 2016
Latest
Benign AI
Something is benign if it isn’t optimized to be bad. “Benign” is weaker than “aligned,” but I find it helpful for thinking about AI…
Paul Christiano
Nov 29, 2016
Hard-core subproblems
I think that discussions of AI control should aim to identify subproblems that we aren’t making progress on but are necessary.
Paul Christiano
Nov 26, 2016
AI “safety” vs “control” vs “alignment”
Defining what I mean by “AI safety,” “AI control,” and “value alignment.”
Paul Christiano
Nov 18, 2016
Handling destructive technology
Solving AI control is just “delaying the inevitable” with respect to the need for global coordination, but it seems high-impact anyway.
Paul Christiano
Nov 14, 2016
Thoughts on reward engineering
Addressing a bunch of details that come up when we try to convert our preferences into a reward function for RL.
Paul Christiano
Nov 8, 2016
Security amplification
Can we use agents with many vulnerabilities to implement an agent with fewer vulnerabilities?
Paul Christiano
Oct 26, 2016
Meta-execution
Meta-execution
Building agents out of agents.
Paul Christiano
Oct 25, 2016
Of humans and universality thresholds
I’ve suggested that HCH might be a universal deliberative process if run with humans but not if run with apes. Is that suspicious?
Paul Christiano
Oct 23, 2016
Some thoughts on training highly reliable models
A grab bag of relevant considerations, mostly pointing out that the problem is even harder than it might at first appear.
Paul Christiano
Oct 22, 2016
Aligned search
Powerful searches are likely to pose a distinctive challenge for AI control.
Paul Christiano
Oct 21, 2016
Reliability amplification
Can redundancy increase the reliability of complex policies in the same way it can increase the reliability of computation?
Paul Christiano
Oct 20, 2016
ALBA on GitHub
A preliminary ALBA implementation is now on GitHub: https://github.com/paulfchristiano/alba
Paul Christiano
Oct 19, 2016
Not just learning
I’ve been focusing on aligned learning, but AI is more than just learning.
Paul Christiano
Oct 16, 2016
Imitation+RL
Imitation+RL might be a more natural model for powerful AI than either imitation or RL.
Paul Christiano
Oct 15, 2016
Security and AI alignment
AI alignment and AI security are probably more closely connected than I used to think.
Paul Christiano
Oct 14, 2016
Ignoring computational limits with reflective oracles
Reflective oracles provide a natural computational model where there is no such thing as “not enough time to find the answer.”
Paul Christiano
Oct 4, 2016
Extracting information
Can we incentivize experts to optimally gather relevant information? A clean open question relevant to AI control.
Paul Christiano
Oct 3, 2016
Capability amplification
Can we use a weak policy with a fast implementation to construct a stronger policy with a slow implementation?
Paul Christiano
Oct 2, 2016
The reward engineering problem
How can we define rewards which incentivize weak RL agents to behave in a desirable way?
Paul Christiano
May 30, 2016
Red teams
Training AI systems to avoid catastrophic errors — without causing catastrophes.
Paul Christiano
May 28, 2016
Learning with catastrophes
A catastrophe is an event so bad that we are not willing to let it happen even a single time
Paul Christiano
May 28, 2016
Semi-supervised reinforcement learning
A problem at the intersection of AI control and traditional RL research.
Paul Christiano
May 6, 2016
Strong HCH
A more elaborate arrangement of humans consulting humans consulting HCH
Paul Christiano
Mar 24, 2016
Efficient and safely scalable
A precise but overambitious goal for AI control research.
Paul Christiano
Mar 23, 2016
About AI Alignment
Latest Stories
Archive
About Medium
Terms
Privacy