Approval-directed bootstrapping

Approval-directed behavior works best when the overseer is very smart. Where can we find a smart overseer?

One approach is bootstrapping. By thinking for a long time, a weak agent can oversee an agent (slightly) smarter than itself. Now we have a slightly smarter agent, who can oversee an agent which is (slightly) smarter still. This process can go on, until the intelligence of the resulting agent is limited by technology rather than by the capability of the overseer. At this point we have reached the limits of our technology.

This may sound exotic, but we can implement it in a surprisingly straightforward way.

Suppose that we evaluate Hugh’s approval by predicting what Hugh would say if we asked him; the rating of action a is what Hugh would say if, instead of taking action a, we asked Hugh, “How do you rate action a?”

Now we get bootstrapping almost for free. In the process of evaluating a proposed action, Hugh can consult Arthur. This new instance of Arthur will, in turn, be overseen by Hugh—and in this new role Hugh can, in turn, be assisted by Arthur. In principle we have defined the entire infinite regress before Arthur takes his first action.

We can even learn this function by examples — no elaborate definitions necessary. Each time Arthur proposes an action, we actually ask Hugh to evaluate the action with some probability, and we use our observations to train a model for Hugh’s judgments.

In practice, Arthur might not be such a useful assistant until he has acquired some training data. As Arthur acquires training data, the Hugh+Arthur system becomes more intelligent, and so Arthur acquires training data from a more intelligent overseer. The bootstrapping unfolds over time as Arthur adjusts to increasingly powerful overseers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store