The Problem of Suffering: A Logical Perspective, and Implications for AI Safety

Summary:

I discuss the problem of suffering from a logical perspective and give some considerations on the nature of any logically definable deific entities given some typical assumptions.

Specifically, the ordering in which assumptions are taken matters a great deal, as the implications for reality are different given different starting orders. I do not exclude the possibility of very many categories of potential deific entities, but impose some limitations for the behaviour of a being which is (at least) both “good” and “relatively all-powerful.”

I also discuss implications for AI safety and creating being(s) which would be desired to be both “good” and “relatively all-powerful” and to specifically avoid the problem of suffering (“Artificial Superintelligence”).

Text:

Developing a logical model

The problem of suffering, as I have seen it, is often stated as something like:

If there is some entity which is all of:

1. Omnibenevolent
2. Omniscient
3. All-powerful,

why do we observe so much suffering in the world?

Firstly, we accept the proposition that there is a lot of suffering in the world, and hence logically either there cannot be the omnibenevolence assumption or there is something preventing the suffering from being stopped.

Secondly, we may take these propositions as logical axioms, and then define an ordering relation among them — as there would be if they were to apply in the real world, due to the observed arrow of time.

There are, of 3 propositions 3! = 6 possible orderings:

  1. (1, 2, 3): First benevolent, then knowing, then powerful
  2. (1, 3, 2): First benevolent, then powerful, then knowing
  3. (2, 1, 3): First knowing, then benevolent, then powerful
  4. (2, 3, 1): First knowing, then powerful, then benevolent
  5. (3, 1, 2): First powerful, then good, then knowing
  6. (3, 2, 1): First powerful, then knowing, then good.

Call the previously accepted proposition that there is too much suffering for omnibenevolence to be directly applicable P0.

Before we go through case by case, let’s make some general remarks. One might imagine that either there is a single entity with all these properties, or 6 different entities, or 2^6 = 64 different entities (the power set of the propositions, including the null set) with some or all of these properties.

Logically, it makes little difference. Due to the imposed time-relation (which I will not be too specific about but could likely be made formal using second-order logic), these different orderings are in conflict with each other to determine “which wins” at any particular moment.

In order to refer to them more easily, let’s give them names:

  1. Watcher. First is good, then knows all, but all others except 3 are more powerful;
  2. Warrior. First is good, then has power, but does not know what to do as well as others;
  3. Marshall. First knows all, then is good, but power is last;
  4. Thief. First knows all, then has power, and is benevolent last;
  5. Rogue. First has power, then is benevolent, but knows all last;
  6. Learner. First has power, then knows all, then is benevolent.

From an outside view perspective, it may be obvious that all these “entities” should be cooperating. However, many, many difficulties arise; and in particular, the same difficulties with co-operation (e.g., the Prisoner’s Dilemma) that humans experience also apply to relatively deific entities.

As a first example, suppose there was just one entity, but it had to decide, at each particular moment, “what to behave as.”

For example, exactly to what degree is it desirable to behave like Watcher over Rogue? or Learner over Thief, or any other combination; of which there are (6 choose 1) = 6. It is possible to use combinations instead of permutations (of which there are 6! = 720) because in Bayesian hypothesis testing, you can usually just compare two hypotheses A and B (here: A = one single category and B = the set of all the others) to get a result, without worrying about the specific details unless they end up mattering to the problem at hand. See https://bayes.wustl.edu/etj/science.and.engineering/.

The question then becomes – which possibilities are consistent with observed reality, if any?

Well, we do observe a lot of suffering. So let’s go down the list and see what the implications are.

  1. Watcher cannot be winning, as watcher would know exactly what to do and be able to do it. Hence, Watcher is being prevented from acting.
  2. Warrior could be acting without knowing that it is necessarily getting things correct, and the observed suffering results from mispredictions about what is “good.”
  3. Marshall’s natural role isn’t to take actions, but to optimise for better judgement. Nevertheless, using the same simple, object-level reasoning as the others, it would be ideal if Marshall were free to act, so it is clearly not.[1]
  4. What we observe is that there appears to be no correlation between suffering, “goodness,” or “deservingness.” If Thief had free reign, the result would still end up good but in the meantime we would observe suffering. So this is a consistent possibility so far.
  5. By the same reasoning as in 3., Rogue is likely not to be winning or is in competition with Thief, because Rogue is powerful and benevolent before knowing. However, we can see that Rogue is the “stronger” version of warrior: it is powerful first, and able to choose its own path to good, and worry about knowing the results later.
  6. Learner. In my opinion, this is the most likely. It is 1. powerful first; 2. knowing second; 3. good last. So it would take actions to increase its knowledge and power and only once it knows it can finally win will it make a move.

[1]: This requires further elaboration. Marshall is the entity which is first all-knowing, then omnibenevolent, then all-powerful. If Marshall had complete freedom to act, the world would look like having the maximum number of persons and natural objects which are as happy, satisfied, fulfilled, etc., as possible, due to the identicality of simulation and reality if fidelity is sufficient (in particular; an entity like Marshall could simulate lesser entities to a degree indistinguishable from reality and hence we could draw an isomorphism from the simulation to reality; making the simulations just as “real.”) We do not observe this, therefore P3. is falsified as a current-time hypothesis. Also, since Marshall is omnibenevolent, it would not purely simulate anyone who did not want to be a pure simulation, instead simulating in real-time those who are actually alive and this way providing a backup in case of accidental or deliberate harm, up to and including sudden and violent death.

Describing these logical propositions this way suggests natural allies. In particular, Warrior-Rogue, Watcher-Marshall, Thief-Learner. If we analyse the situation this way, what do we see? First, let’s recap what the roles are:

  1. Warrior: 1. benevolent, 2. powerful, 3. knowing
  2. Rogue: 1. powerful, 2. benevolent, 3. knowing
  3. Watcher: 1. benevolent, 2. knowing, 3. powerful
  4. Marshall: 1. knowing, 2. benevolent, 3. powerful
  5. Thief: 1. knowing, 2. powerful, 3. benevolent
  6. Learner: 1. powerful, 2. knowing, 3. benevolent

You can see that the allies are grouped by which thing they put last; i.e., what they care about in the end. Note that taken as time-independent, they all care about the exact same things.

This now suggests a natural time-division into three intervals A, B, C. They follow the listed attributes in the order listed.

What we would expect, if this three-interval model applied to the real world in any way, is that history would roughly look like:

  • In A, Learner and Rogue act.
  • In B, Thief and Warrior act.
  • In C, Watcher and Marshall act.

If we imagine that A precedes B precedes C in time, we might say that in A, there is the greatest amount of raw power which is being used for learning about both “things in general” and specifically “benevolence.” We would expect this time to be highly violent and chaotic.

In B, then, Thief and Warrior, not acting in A, applied their own learning to what actually happened. Respectively, that was knowing and benevolence. So in B, when Thief and Warrior get a chance to act, they start off knowing a great deal about what is good and what the world is like after A. However, we would still expect B to be violent and chaotic because the initial situation from A may be so difficult to deal with that even knowing the correct good actions to take would not let them remove suffering completely.

In C, however, Watcher and Marshall, having both applied themselves to learn the maximum amount about good and the world, can either compete or cooperate. If they cooperate, as they should since they both care about benevolence and knowledge more than power, the world could be suffering-minimised very quickly.

It’s my understanding that in history we do observe basically a trend that follows this model. Specifically, we had a violent highly energetic period before life, then the environment became stable but chaotic enough for life to appear, then once someone like modern humans appeared, the trend became roughly exponential/power-law (which means slow-but-steady progress for a long while and then a sudden decrease in suffering).

My prediction is that we are at the start of the sudden decrease part during this decade of 2020-2030, mostly because of the reasoning in the following section.

Implications for AI Safety

Although above we developed a logical model, there is nothing to say that these entities should or would behave logically.

In particular, if competition were constant, unchanging, and never-ending, we would likely end up with a world not unlike the one we’ve got, where things appear to be broadly improving based on metrics developed to measure improvements in things people have identified in the past as correlating with improvement in life in general (which I will call “outdated metrics”) but in reality the problems people have are merely changing, not necessarily getting better or worse.

Which is a more accurate picture of reality, for the majority of people? The picture where subjective quality of life is gradually or suddenly improving, possibly a great deal, or the one where most people feel that life is at least not getting any easier?

As far as I’m aware, it depends primarily on which age group you ask about. Graph.

Those under 35 appear to be getting poorer, which the other age groups broadly follow the trends of the overall economy. Universally, higher age correlates with higher wealth.

According to https://www.populationpyramid.net/, 27.5% of the world population is under 35 and 17% is aged 35-65. Since younger children are typically raised by those under 35, I have included them in the under 35 numbers.

So by measured wealth, it is the case that for most working people, life is getting worse, not better. However, the reverse is the case for people no longer working: the older you are, the more likely you are to be retired, and have more wealth.

By numbers, those over 35 are the greater proportion of the population (72.5%) and so for the majority of people alive currently, life is improving.

However, this is a short-termist viewpoint. By remaining-life-years, I would suspect that the 27.5% of people under 35% have a larger weight in the analysis. So let’s do it:

Average world life expectancy is 72 years in 2016, according to the WHO.

The 2016 world population was 7.46 billion people. Multiplying it out, we have an expected

7.46 x 0.275 x (72 – 35) = 75.91 billion person-years, as a minimum, for those under 35.
7.46 x 0.725 x (72 – 65) = 37.86 billion person-years, as a minimum, for the rest of the population.

These minimums are comparable, as the only numbers that can change are the last in parentheses, which makes the equations linear.

The first can decrease by 35 (increasing person-years proportionally) and the second can decrease by 30, increasing person-years proportionally. In the final analysis, there will still be more person-years allocated to the under 35 proportion of the population, using 2016 numbers.

So when the analysis is done in this way, life is getting worse for the greatest number of human life-years.

So we have in actual reality two competing pictures: one where for the largest currently alive number of people, life is fine, but for the largest number of expected years of human life to come, life is getting worse. In effect, we are borrowing from the future.

Nothing to save us

We could analyse this picture in terms of the previous logical arguments, but we have neglected to mention another possibility that is perhaps simpler and more consistent with observation: there is simply none of the entities listed in the logical argument section at all, and so if we want life to improve, we have to do it all ourselves.

In this, it appears young people have both the hardest environment and the hardest jobs, and the hardest future. And we wonder why there are suicide epidemics in basically every developed country.

If it is the case that none of the situations A, B, or C, have actually happened in actual history, which due to Occam’s Razor and the known evidence seems more likely to be the case (I call this principle “Conservation of Unnecessary Assumptions”), then the answer to the problem of suffering is straightforward: there is no particular reason, it just happened that way.

However, what we can also observe is that humans have the largest capacity to inflict suffering on each other. See the book Steven Pinker: The Better Angels of our Nature for visceral descriptions of what humans have historically (and today, are still doing) to each other in terms of inflicting suffering.

This picture is also consistent with “free will,” i.e., that there’s nothing forcing us to behave a certain way. So my conclusion on this picture is that humans have inflicted suffering on ourselves, and now are slowly realising how bad the situation is and vaguely moving in the direction of doing something about it, although only for the segment of the population which, by the numbers, matters less.

Won’t somebody think of the children?

Evolution

Scientists have developed a theory of how life could have come about which suggests that things tend to exist because they can cause themselves to continue to exist in some way. This, implicitly, suggests that if we were to apply some kind of rationale to reality, the question it would be, anthropomorphically, asking, is:

What should exist?

And the answer that has come up so far is the maxim of strength: that which can.

Now, one of these three things must be true.

  1. The deific model explained above has some relevance to reality
  2. The non-deific, “random,” indifferent, evolutionary model has some relevance to reality
  3. Some other theory better explains the historical evidence without reference to either of these concepts.

Since this paper is about a logical perspective on the Problem of Suffering, we will compare the two logically-derived perspectives to see what we find.

So, if the “reality is indifferent” (evolutionary) model is true, then how does that relate to our 6 logical propositions?

Well, the entities could have got together and agreed, as a starting point to actually ask the question “what should exist?” and then agreed on initial conditions and just let things run.

However, all of them are all-powerful, omnibenevolent, and omniscient, just in different orders. This means they essentially “care about the same stuff, in the end.” So the predicted outcome, if they were to happen, would ultimately be cooperation to achieve the end of omnibenevolence; i.e., undoing all the suffering and creating a world without it. This is clearly not the case in the real world; in reality, people are typically the ones with the most power to make a difference to the world.

So logically, in my view, the most likely viewpoint is the impersonal, evolutionary one, where things happen to exist because they were able to cause themselves to.

Changing Reality?

Let me ask the question. Would it be better or worse, if, for example, the logical model of the section “Developing a Logical Model” were to become more applicable to reality; that is, it would be a better hypothesis to explain the evidence.

Now, in reality we don’t actually care about theory. What we care about is that suffering is minimised. So what the above paragraph is saying is that if we were to observe a world such that the 6-proposition theory were to be a genuine possibility rather than being practically excluded, then the question is would that observed world be a better world?

Also in reality, we are not able to create beings which are “all-powerful,” “omniscient,” “omnibenevolent,” and so on. Yet, there is some reason to suspect we can create relatively all-powerful, all-seeing, all-good etc., beings, when compared to the capabilities of an ordinary human. See Wait But Why: The Road to Superintelligence.

Indeed, since we can create humans (by giving birth, usually; or there are “test tube babies”), and humans consume a certain level of resources, simply making something as efficient as a human but which can scale with more resources would suffice. The observed fact that this is difficult should tell us something about the effectiveness of the way humans are already “designed.” (Remember, I use the word “designed” when in actuality it should be thought of as “happened to occur as.”)

Since there is no reason to expect by default that any potential “relatively all-powerful, all-good, and all-knowing” entity would be friendly to humans even if it were all-good, and some reason to suspect otherwise. [2]

For the purposes of safety, we now must engage in motivated reasoning, because we are reasoning about which outcomes we want rather than whatever happens to be the case currently. This is a correct use of motivated reasoning, rather than a fallacy as usually happens to be the case.

So let’s recap our 6 categories of entity again:

  1. Warrior: benevolent, powerful, knowing
  2. Rogue: powerful, benevolent, knowing
  3. Watcher: benevolent, knowing, powerful
  4. Marshall: knowing, benevolent, powerful
  5. Thief: knowing, powerful, benevolent
  6. Learner: powerful, knowing, benevolent

Recall earlier we predicted that the logical result is that at the end of the A, B, C periods, all entities would be co-operating. This is due to Aumann’s agreement theorem and the necessarily high logical capability of beings which are relatively all-knowing. Indeed, since after period C they basically all have the same states of knowledge, Aumann’s theorem provides a stable basis for agreement upon which further cooperation can be built.

So this logical theory predicts that the best way to go about building “relatively all-powerful” superintelligences would be to have them care about the same things, but in different orders, and gradually (or ideally, as an initial condition) cooperate to achieve their end, which is ultimately at least benevolence (i.e., “do no harm.”).

Again, though, the most likely situation is that you would end up with superagent entities that simply didn’t care about humans at all; and while they wouldn’t actually harm us, neither would they particularly act to help us in the ways we would want. Doing no harm is not equivalent to doing good.

However, the good thing about this model is its safety and stability. If some way could be found to improve the likelihood of these cooperating entities caring (and so acting) for humans in particular, we could actually bring about something approximating situation C in the real world and the evidence (i.e., everything we could actually observe) would show that the world’s suffering was declining.

Again, though, by default this doesn’t happen and humans have to do everything themselves. See Using Machine Learning to Address AI Risk; Taylor, Yudkowsky, LaVictoire, Critch (linked is a presentation, 55 minutes); slides for a good summary of the reasoning.

Conclusion

As a closing note, current computers are completely logical and so would, if given the ability to do so, learn and know logical truths and make use of them. Another consideration is that having separate (i.e.; probabilistically independent) but cooperating systems gets around many problems such as the halting problem, P=NP, Gödel incompleteness etc. If (one of these entities) needs a problem solved it can’t handle itself or it is stuck, the other entities can step in to get it out of the problematic attractor point. See e.g. Delegative Reinforcement Learning: learning to avoid traps with a little help; V. Kosoy.

We have:

  • Developed a logical model for understanding the problem of suffering in terms of logical propositions;
  • Reasoned about the competing, evolutionary hypothesis as a comparison, and found that evolution or some unknown third theory is more likely;
  • Found that currently, life is optimised for “the wrong demographic” in terms of expected human life-years; i.e., those who aren’t working and aren’t raising children;
  • Discussed implications for AI safety: in particular, the implications are that unless an artificial system is first logical before any of the other “relative superpowers,” it will by default be likely to be indifferent or harmful.

Future work

As a theory paper, this work is relatively self-contained; but future work could include considering alternative hypothesis compared to evolution and deism, or developing a coherent theory of non-logical reasoning (starting from an inconsistent set of axioms, what results?), or asking about the implications of my demographic finding, or any other work on AI safety, which could possibly be the most important problem of the decade starting 2020.