Existential risk (Topic archive) - 80,000 Hours https://80000hours.org/topic/foundations/existential-risk/ Thu, 07 Dec 2023 16:20:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.2 Tantum Collins on what he’s learned as an AI policy insider at the White House, DeepMind and elsewhere https://80000hours.org/podcast/episodes/tantum-collins-ai-policy-insider/ Thu, 12 Oct 2023 21:09:16 +0000 https://80000hours.org/?post_type=podcast&p=84175 The post Tantum Collins on what he’s learned as an AI policy insider at the White House, DeepMind and elsewhere appeared first on 80,000 Hours.

]]>
The post Tantum Collins on what he’s learned as an AI policy insider at the White House, DeepMind and elsewhere appeared first on 80,000 Hours.

]]>
Anders Sandberg on war in space, whether civilisations age, and the best things possible in our universe https://80000hours.org/podcast/episodes/anders-sandberg-best-things-possible-in-our-universe/ Fri, 06 Oct 2023 20:20:25 +0000 https://80000hours.org/?post_type=podcast&p=84134 The post Anders Sandberg on war in space, whether civilisations age, and the best things possible in our universe appeared first on 80,000 Hours.

]]>
The post Anders Sandberg on war in space, whether civilisations age, and the best things possible in our universe appeared first on 80,000 Hours.

]]>
Kevin Esvelt on cults that want to kill everyone, stealth vs wildfire pandemics, and how he felt inventing gene drives https://80000hours.org/podcast/episodes/kevin-esvelt-stealth-wildfire-pandemics/ Mon, 02 Oct 2023 18:13:21 +0000 https://80000hours.org/?post_type=podcast&p=84097 The post Kevin Esvelt on cults that want to kill everyone, stealth vs wildfire pandemics, and how he felt inventing gene drives appeared first on 80,000 Hours.

]]>
The post Kevin Esvelt on cults that want to kill everyone, stealth vs wildfire pandemics, and how he felt inventing gene drives appeared first on 80,000 Hours.

]]>
Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his four-part playbook for AI risk https://80000hours.org/podcast/episodes/holden-karnofsky-how-ai-could-take-over-the-world/ Mon, 31 Jul 2023 23:27:31 +0000 https://80000hours.org/?post_type=podcast&p=82914 The post Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his four-part playbook for AI risk appeared first on 80,000 Hours.

]]>
The post Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his four-part playbook for AI risk appeared first on 80,000 Hours.

]]>
Hannah Boettcher on the mental health challenges that come with trying to have a big impact https://80000hours.org/after-hours-podcast/episodes/hannah-boettcher-mental-health-challenges/ Wed, 19 Jul 2023 17:34:20 +0000 https://80000hours.org/?post_type=podcast_after_hours&p=82753 The post Hannah Boettcher on the mental health challenges that come with trying to have a big impact appeared first on 80,000 Hours.

]]>
The post Hannah Boettcher on the mental health challenges that come with trying to have a big impact appeared first on 80,000 Hours.

]]>
Great power conflict https://80000hours.org/problem-profiles/great-power-conflict/ Tue, 20 Jun 2023 11:45:03 +0000 https://80000hours.org/?post_type=problem_profile&p=77510 The post Great power conflict appeared first on 80,000 Hours.

]]>
Why might preventing great power conflict be an especially pressing problem?

A modern great power war — an all-out conflict between the world’s most powerful countries — could be the worst thing to ever happen to humanity.

Historically, such wars have been exceptionally destructive. Sixty-six million people died in World War II, likely the deadliest catastrophe humanity has experienced so far.

Since World War II, the global population and world economy have continued to grow, nuclear weapons have proliferated, and military technology has continued to advance. This means the next world war could be even worse, just as World War II was much deadlier than World War I.

It’s not guaranteed that such a war will break out. And if it does, it may not escalate to such a terrible extent. But the chance can’t be ignored. In fact, there are reasons to think that the odds of World War III breaking out this century are worryingly high.

A modern great power war would be devastating for people alive today. But its effects could also persist long into the future. That’s because there is a substantial chance that this century proves to be particularly important. Technologies with the potential to cause a global catastrophe or radically reshape society are likely to be invented. How we choose to develop and deploy them could impact huge numbers of our descendants. And these choices would be affected by the outcomes of a major war.

To be more specific, there are three main ways great power conflict could affect the long-term future:

  1. High international tension could increase other risks. Great power tensions could make the world more dangerous even if they don’t lead to war. During the Cold War, for example, the United States and the USSR never came into direct conflict but invested in bioweapons research and built up nuclear arsenals. This dynamic could return, with tension between great powers fueling races to develop and build new weapons, raising the risk of a disaster even before shots are fired.
  2. War could cause an existential catastrophe. If war does break out, it could escalate dramatically, with modern weapons (nuclear weapons, bioweapons, autonomous weapons, or other future technologies) deployed at unprecedented scale. The resulting destruction could irreparably damage humanity’s prospects.
  3. War could reshape international institutions and power balances. While such a catastrophic war is possible, it seems extremely unlikely. But even a less deadly war, such as another conflict on the scale of World War II, could have very long-lasting effects. For example, it could reshape international institutions and the global balance of power. In a pivotal century, different institutional arrangements and geopolitical balances could cause humanity to follow different long-term trajectories.

The rest of this profile explores exactly how pressing a problem great power conflict is. In summary:

  • Great power relations have become more tense. (More.)
  • Partly as a result, a war is more likely than you might think. It’s reasonable to put the probability of such a conflict in the coming decades somewhere between 10% and 50%. (More.)
  • If war breaks out, it would probably be hard to control escalation. The chance that it would become large enough to be an existential risk cannot be dismissed. (More.)
  • This makes great power conflict one of the biggest threats our species currently faces. (More.)
  • It seems hard to make progress on solving such a difficult problem (more) — but there are many things you can try if you want to help (more).

International tension has risen and makes other problems worse

Imagine we had a thermometer-like device which, instead of measuring temperature, measured the level of international tension.1 This ‘tension metre’ would max out during periods of all-out global war, like World War II. And it would be relatively low when the great powers2 were peaceful and cooperative. For much of the post-Napoleonic 1800s, for example, the powerful European nations instituted the Concert of Europe and mostly upheld a continental peace. The years following the fall of the USSR also seem like a time of relative calm, when the tension metre would have been quite low.3

How much more worried would you be about the coming decades if you knew the tension metre would be very high than if you knew it would be low? Probably quite a lot. In the worst case, of course, the great powers could come into direct conflict. But even if it doesn’t lead to war, a high level of tension between great powers could accelerate the development of new strategic technologies, make it harder to solve global problems like climate change, and undermine international institutions.

During the Cold War, for instance, the United States and USSR avoided coming into direct conflict. But the tension metre would still have been pretty high. This led to some dangerous events:

  • A nuclear arms race. The number of nuclear warheads in the world grew from just 300 in 1950 to over 64,000 in 1986.
  • The development of new bioweapons. Despite signing the Biological Weapons Convention in 1972, the search for military advantages motivated Soviet decision makers to continue investing in bioweapon development for decades. Although never used in combat, biological agents were accidentally released from research facilities, resulting in dozens of deaths and threatening to cause a pandemic.4
  • Nuclear close calls. Military accidents and false alarms happened regularly, and top decision makers were more likely to interpret these events hostilely when tensions were high. On several occasions it seems the decision about whether or not to start a nuclear war came down to individuals acting under stress and with limited time.

This makes international tension an existential risk factor. It’s connected to a number of other problems, which means reducing the level of international tension would lower the total amount of existential risk we face.

The level of tension today

Recently, international tension seems to have once again been rising. To highlight some of the most salient examples:

  • China-United States relations have deteriorated, leading to harsh diplomatic rhetoric and protectionist trade policies that aim to reduce the countries’ economic interdependence.
  • Russia’s invasion of Ukraine has killed about a hundred thousand people so far, raised the risk of nuclear war, and sent United States-Russia relations to their lowest point since the Cold War.
  • Chinese and Indian soldiers fought deadly skirmishes along their countries’ disputed border in 2020–21.

These dynamics raise an important question: how much more dangerous is the world given this higher tension than it would be in a world of low tension?

I think the answer is quite a bit more dangerous — for several reasons. First, international tension seems likely to make technological progress more dangerous. There’s a good chance that, in the coming decades, humanity will make some major technological breakthroughs. We’ve discussed, for example, why one might worry about the effects of advanced artificial intelligence systems or biotechnology. The level of tension could strongly affect how these technologies are developed and governed. Tense relations could, for example, cause countries to neglect safety concerns in order to develop technology faster.5

Second, great power relations will strongly influence how nations do, or do not, cooperate to solve other global collective action problems. For example, in 2022, China withdrew from bilateral negotiations with the United States over climate action in protest of what it perceived as American diplomatic aggression in Taiwan. That same year, efforts to strengthen the Biological Weapons Convention were reportedly hampered by the Russian delegation after their country’s invasion of Ukraine raised tensions with the United States and other western countries.

And third, if relations deteriorate severely, the great powers could fight a war.

How likely is a war?

Wars are destructive and risky for all countries involved. Modern weapons, especially nuclear warheads, make starting a great power war today seem like a suicidal undertaking.

But factors like the prevalence of war throughout history, the chance that leaders make mistakes, conflicting ideologies, and commitment problems, make me think that conflict could break out anyway.

On balance, I think such an event is somewhat unlikely but hardly unthinkable. To quantify this: I put the chance we experience some kind of war between great powers before 2050 at about one-in-three.6

War has occurred regularly in the past

One reason to think a war is quite likely is that such conflicts have been so common in the past. Over the past 500 years, about two great power wars have occurred per century.7

Naively, this would mean that every year there’s a 2% chance such a war occurs, implying the chance of experiencing at least one great power war over the next 80 years — roughly until the end of the century — is about 80%.8

This is a very simple model. In reality, the risk is not constant over time and independent across years. But it shows that if past trends simply continue, the outcome is likely to be very bad.

Has great power conflict become less likely?

One of the most important criticisms of this model is that it assumes the risk is constant over time. Some researchers have argued instead that, especially since the end of World War II, major conflicts have become much less likely due to:

  • Nuclear deterrence: Nuclear weapons are so powerful and destructive that it’s just too costly for nuclear-armed countries to start wars against each other.9
  • Democratisation: Democracies have almost never gone to war against each other, perhaps because democracies are more interconnected and their leaders are under more public pressure to peacefully resolve disputes with each other.10 The proportion of countries that are democratic has increased from under 10% in 1945 to about 50% today.
  • Strong economic growth and global trade: Global economic growth accelerated following World War II and the value of global exports grew by a factor of almost 30 between 1950 and 2014. Since war disrupts economies and international trade, strong growth raises the costs of fighting.11
  • **The spread of international institutions: Multilateral bodies like the United Nations General Assembly and Security Council promote diplomatic dialogue and facilitate coordination to punish transgressors.12

It is true that we are living through an unusually long period of great power peace. It’s been about 80 years since World War II. We just saw that a simple model using the historical frequency of great power wars suggests there was only a 20% chance of going that long without at least one more war breaking out. This is some evidence in favour of the idea that wars have become significantly less common.

At the same time, we shouldn’t feel too optimistic.

The numerous close calls during the Cold War suggest we were somewhat lucky to avoid a major war in that time. And a 20% chance of observing 80 years of peace is not that low.13 Structural changes might have dramatically reduced the likelihood of war. Or perhaps we’ve just been lucky. It could even be that technological advances have made war less likely to break out, but more deadly when it occurs, leaving the overall effect on the level of risk ambiguous. It just hasn’t been long enough to support a decisive view.14

So while the recent historical trend is somewhat encouraging, we don’t have nearly enough data to be confident that great power conflict is a thing of the past. To better predict the likelihood of future conflict, we should also consider distinctive features of our modern world.15

One might think that a modern great power war would simply be so destructive that no state leader would ever choose to start one. And some researchers do think that the destruction such a war would wreak globally makes it less likely to occur. But it would be hard to find anyone who claims this dynamic has driven the risk to zero.

First, a war could be started by accident.

Second, sometimes even prudent leaders may struggle to avoid a slide towards war.

We could blunder into war

An accidental war can occur if one side mistakes some event as an aggressive action by an adversary.

This happened several times during the Cold War. The earlier example of the wayward American reconnaissance plane shows how routine military exercises carry some escalation risk. Similarly, throughout history, nervous pilots and captains have caused serious incidents by attacking civilian planes and ships.16 Nuclear weapons allow for massive retaliatory strikes to be launched quickly — potentially too quickly to allow for such situations to be explained and de-escalated.

It is perhaps more likely, though, that an accidental war could be triggered by a technological malfunction. Faulty computers and satellites have previously triggered nuclear close calls. As monitoring systems have become more reliable, the rate at which such accidents have occurred has been going down. But it would be overconfident to think that technological malfunctions have become impossible.

Future technological changes will likely raise new challenges for nuclear weapon control. There may be pressure to integrate artificial intelligence systems into nuclear command and control to allow for faster data processing and decision making. And AI systems are known to behave unexpectedly when deployed in new environments.17

New technologies will also create new accident risks of their own, even if they’re not connected to nuclear weapon systems. Although these risks are hard to predict, they seem significant. I’ll say more about how such technologies — including AI, nuclear, biological, and autonomous weapons — are likely to increase war risks later.

Leaders could choose war

All that said, most wars have not started by accident. If another great power war does break out in the coming decades, it is more likely to be an intentional decision made by a national leader.

Explaining why someone might make such a costly, destructive, unpredictable, and risky decision has been called “the central puzzle about war.” It has motivated researchers to search for “rationalist” explanations for war. In his 2022 book Why We Fight, for example, economist Chris Blattman proposes five basic explanations: unchecked interests, intangible incentives, uncertainty, commitment problems, and misperceptions.18

Unchecked interests: Sometimes leaders who can decide to go to war stand to personally gain. Meanwhile, the costs are borne by citizens and soldiers who may not be able to hold their leaders to account.

Intangible incentives: War can sometimes provide some abstract value, like revenge, honour, glory, or status. This can help offset its costs.

Uncertainty: States will sometimes try to hide their strength or bluff to win concessions. Under this uncertainty, it can sometimes be in their rivals’ interests to call the bluff and fight.

Commitment problems: Bargaining is based on relative strength. If one state is growing in power more quickly than its rival, it may be hard to find a compromise solution that will continue to be acceptable in the future.

Misperceptions: Leaders may just misjudge the strength, beliefs, or resolve of their rivals and push for untenable bargains. Faced with what seem to be unfair terms, the rival state may decide to go to war.

This section discusses how great power tensions may escalate to war in the next few decades. It focuses on three potential conflicts in particular: war between the US and China, between the US and Russia, and between China and India. These are discussed because each of these countries are among the world’s largest economies and military spenders, and seem particularly likely to fight. At the end, I briefly touch on other potential large conflicts.

Projected real GDP of the US, China, India and Russia according to a 2022 Goldman Sachs analysis Source: Author’s figure using data from: Kevin Daly and Tadas Gedminas, “Global Economics Paper The Path to 2075 — Slower Global Growth, But Convergence Remains Intact,” Global Economics Paper (Goldman Sachs, December 6, 2022), https://www.goldmansachs.com/intelligence/pages/gs-research/the-path-to-2075-slower-global-growth-but-convergence-remains-intact/report.pdf.

United States-China

The most worrying possibility is war between the United States and China. They are easily the world’s largest economies. They spend by far the most on their militaries. Their diplomatic relations are tense and have recently worsened. And their relationship has several of the characteristics that Blattman identifies as causes of war.

At the core of the United States-China relationship is a commitment problem.

China’s economy is growing faster than the United States’. By some metrics, it is already larger.19 If its differential growth continues, the gap will continue to widen between it and the United States. While economic power is not the sole determinant of military power, it is a key factor.20

The United States and China may be able to strike a fair deal today. But as China continues to grow faster, that deal may come to seem unbalanced. Historically, such commitment problems seem to have made these kinds of transition periods particularly dangerous.21

In practice, the United States and China may find it hard to agree on rules to guide their interactions, such as how to run international institutions or govern areas of the world where their interests overlap.

The most obvious issue which could tip the United States-China relationship from tension into war is a conflict over Taiwan. Taiwan’s location and technology industries are valuable for both great powers.

This issue is further complicated by intangible incentives.

For the United States, it is also a conflict over democratic ideals and the United States’ reputation for defending its allies.

For China, it is also a conflict about territorial integrity and addressing what are seen as past injustices.

Still, forecasts suggest that while a conflict is certainly possible, it is far from inevitable. As of 8 June 2023, one aggregated forecast22 gives a 17% chance of a United States-China war breaking out before 2035.23

A related aggregated forecast of the chance that at least 100 deaths occur in conflict between China and Taiwan by 2050 gives it, as of 8 June 2023, a much higher 68% chance of occurring.24

United States-Russia

Russia is the United States’ other major geopolitical rival.

Unlike China, Russia is not a rival in economic terms: even after adjusting for purchasing power, its economy is only about one-fifth the size of the United States’.

However, Russia devotes a substantial fraction of its economy to its military. Crucially, it has the world’s largest nuclear arsenal. And Russian leadership has shown a willingness to project power beyond their country’s borders.

Country Military spending in 2021 (2020 USD, PPP adjusted)
United States 801 billion
China 293 billion
India 76.6 billion
United Kingdom 68.4 billion
Russia 65.9 billion


Top five countries by estimated military spending, 2021. Source: SIPRI

Russia’s 2022 invasion of Ukraine demonstrated the dangers of renewed rivalry between Russia and the United States-led West. The war has already been hugely destructive: the largest war in Europe since World War II, with hundreds of thousands of casualties already and no end to the conflict in sight. And it could get much worse. Most notably, Russian officials have repeatedly refused to rule out the use of nuclear weapons.

Unchecked interests and intangible incentives are again at play here. Vladimir Putin leads a highly-centralised government. He has spoken about how his desire to rebuild Russia’s reputation played in his decision to invade Ukraine.

Given their ideological differences and history of rivalry, it is reasonable to expect that the United States and Russia will continue to experience dangerous disagreements in the future. As of 8 June 2023, an aggregated forecast gives a 20% chance that the United States and Russia will fight a war involving at least 1,000 battle deaths before 2050.

China-India

India is already the world’s third-largest economy. If national growth rates remain roughly constant, the size of the Indian economy will surpass that of the United States’ sometime this century. India also has nuclear weapons and is already the world’s third-largest military spender (albeit at a much lower level than China or the United States).

One reason to worry that China and India could fight a war is that they already dispute territory along their border. Countries that share a border, especially when it is disputed, are more likely to go to war than countries that do not. By one count, 88% of the wars that occurred between 1816 and 1980 began as wars between neighbours.25

In fact, China and India already fought a brief but violent border war in 1962. Deadly skirmishes have continued since, resulting in deaths as recently as 2020.

Forecasters agree that a China-India conflict seems relatively (though not absolutely) likely. An aggregated forecast gives a 19% chance of war before 2035.

Other dangerous conflicts

These three conflicts — United States-China, United States-Russia, and China-India — are not the only possible great power conflicts that could occur. Other potential conflicts could also pose existential risk, either because they drive dangerous arms races or see widespread deployment of dangerous weapons.

We should keep in mind India-Pakistan as a particularly likely conflict between nuclear-armed states and China-Russia as a potential, though unlikely, conflict between great powers with a disputed border and history of war. Plus, new great powers may emerge or current great powers may fade in the years to come.

While I think we should prioritise the three potential conflicts I’ve highlighted above, the future is highly uncertain. We should monitor geopolitical changes and be open to changing our priorities in the future.

Overall predictions

Below is a table listing relevant predictions from the forecasting platform Metaculus, including the number of predictions made, as of 10 March 2023. Note the different timescales and resolution criteria for each question; they may not be intuitively comparable.

Prediction Resolution criteria Number of predictions Metaculus prediction
World war by 2151 Either:

A war killing >0.5% of global population, involving >50% of countries totalling >50% of global population from at least 4 continents.

Or:

A war killing at least >1% of global population, involving >10% of countries totalling >25% of global population

561 52%
World War III before 2050 Involving countries >30% of world GDP OR >50% of world population

AND

>10M deaths

1640 20%
Global thermonuclear war by 2070 EITHER:

3 countries each detonate at least 10 nuclear warheads of at least 10 kt yield outside of their territory

OR

2 countries each detonate at least 50 nuclear warheads of at least 10 kt outside of their territory

337 11%
When will be the next great power war? Any two of the top 10 nations by military spending are at war

“At war” definition:

EITHER

Formal declaration
OR

Territory occupied AND at least 250 casualties

OR

Media sources describe them as “at war”

25th percentile: 2031

Median: 2048

75th percentile: 2088

Never (not before 2200): 8%

No non-test nuclear detonations before 2035 No nuclear detonation other than controlled test

[Note the negation in the question. It resolves negatively if a warhead is detonated]

321 69%
At least 1 nuclear detonation in war by 2050 Resolves according to credible media reports 476 31%

I have previously independently estimated the likelihood of seeing a World War III-like conflict this century. My calculation first adjusts historical base rates to allow for the possibility that major wars have become somewhat less likely, and uses the adjusted base rate to calculate the probability of seeing a war between now and 2100.

This method gives a 45% chance of seeing a major great power war in the next 77 years. If the probability is constant over time then the cumulative probability between now and 2050 would be 22%. This is aligned with the Metaculus predictions above.

We can also ask experts what they think. Unfortunately, there are surprisingly few expert predictions about the likelihood of major conflict. One survey was conducted by the Project for the Study of the 21st Century. The numbers were relatively aligned with the Metaculus forecasts, though slightly more pessimistic. However, it seems a mistake to put too much stock in this survey (see footnote).26

We now have at least a rough sense of a great power war’s probability. But how bad could it get if it occurred?

A new great power war could be devastating

At the time, the mechanised slaughter of World War I was a shocking step-change in the potential severity of warfare. But its severity was surpassed just 20 years later by the outbreak of World War II, which killed more than twice as many people.

A modern great power war could be even worse.

How bad have wars been in the past?

The graph below shows how common wars of various sizes are, according to the Correlates of War’s Interstate War dataset.27

The x-axis here represents war size in terms of the logarithm of the number of battle deaths. The y-axis represents the logarithm of the proportion of wars in the dataset that are at least that large.

Using logarithms means that each step to the right in the graph represents a war not one unit larger, but 10 times larger. And each step up represents a war that is not one unit more likely, but 10 times more likely.

Cumulative frequency distribution of severity of interstate wars, 1816-2007 Source: Author’s figure. See the data here. Data source: Correlates of War Interwar dataset, v4.028

What the graph shows is that wars have a heavy tail. Most wars remain relatively small. But a few escalate greatly and become much worse than average.

Of the 95 wars in the latest version of the database, the median battle death count is 8,000. But the heavy tail means the average is 334,000 battle deaths. And the worst war, World War II, had almost 17 million battle deaths.29

The number of battle deaths is only one way to measure the badness of wars. We could also consider the proportion of the population of the countries involved who were killed in battle. By this measure, the worst war since 1816 was not World War II. Instead, it’s the Paraguayan War of 1864–70. In that war, 30 soldiers died for every 1,000 citizens of the countries involved. It’s even worse if we also consider civilian deaths; while estimates are very uncertain, it’s plausible that about half of the men in Paraguay, or around a quarter of the entire population, was killed.30

What if instead we compared wars by the proportion of the global population killed? World War II is again the worst conflict since 1816 on this measure, having killed about 3% of the global population. Going further back in time, though, we can find worse wars. Ghengis Khan’s conquests likely killed about 9.5% of people in the world at the time.

The heavy tail means that some wars will be shockingly large.31 The scale of World War I and World War II took people by surprise, including the leaders who initiated it.

It’s also hard to know exactly how big wars could get. We haven’t seen many really large wars. So while we know there’s a heavy tail of potential outcomes, we don’t know what that tail looks like.

That said, there are a few reasons to think that wars much worse than World War II are possible:

  • We’re statistically unlikely to have brushed up against the end of the tail, even if the tail has an upper bound.
  • Other wars have been deadlier on a per-capita basis. So unless wars involving countries with larger populations are systematically less intense, we should expect to see more intense wars involving as many people as World War II.
  • Economic growth and technological progress are continually increasing humanity’s war-making capacity. This means that, once a war has started, we’re at greater risk of extremely bad outcomes than we were in the past.

So how bad could it get?

How bad could a modern great power war be?

Over time, two related factors have greatly increased humanity’s capacity to make war. 32

First, scientific progress has led to the invention of more powerful weapons and improved military efficiency.

Second, economic growth has allowed states to build larger armies and arsenals.

Since World War II, the world economy has grown by a factor of more than 10 in real terms; the number of nuclear weapons in the world has grown from basically none to more than 9,000, and we’ve invented drones, missiles, satellites, and advanced planes, ships, and submarines.

Ghengis Khan’s conquests killed about 10% of the world, but this took place over the course of two decades. Today that proportion may be killed in a matter of hours.

First, nuclear weapons could be used.

Today there are around 10,000 nuclear warheads globally.33 At the peak of nuclear competition between the United States and the USSR, though, there were 64,000. If arms control agreements break down and competition resurges among two or even three great powers, nuclear arsenals could expand. In fact, China’s arsenal is very likely to grow — though by how much remains uncertain.

Many of the nuclear weapons in the arsenals of the great powers today are at least 10 times more powerful than the atomic bombs used in World War II.34 Should these weapons be used, the consequences would be catastrophic.

Graph showing that early nuclear weapons are 1,000s of times more explosive than previous conventional explosives Source: AI Impacts, Effect of nuclear weapons on historic trends in explosives

By any measure, such a war would be by far the most destructive, dangerous event in human history, with the potential to cause billions of deaths.

The probability that it would, on its own, lead to humanity’s extinction or unrecoverable collapse, is contested. But there seems to be some possibility — whether through a famine caused by nuclear winter, or by reducing humanity’s resilience enough that something else, like a catastrophic pandemic, would be far more likely to reach extinction-levels (read more in our problem profile on nuclear war.

Nuclear weapons are complemented and amplified by a variety of other modern military technologies, including improved missiles, planes, submarines, and satellites. They are also not the only military technology with the potential to cause a global catastrophe — bioweapons, too, have the potential to cause massive harm through accidents or unexpected effects.

What’s more, humanity’s war-making capacity seems poised to further increase in the coming years due to technological advances and economic growth. Technological progress could make it cheaper and easier for more states to develop weapons of mass destruction.

In some cases, political and economic barriers will remain significant. Nuclear weapons are very expensive to develop and there exists a strong international taboo against their proliferation.

In other cases, though, the hurdles to developing extremely powerful weapons may prove lower.

Improvements in biotechnology will probably make it cheaper to develop bioweapons. Such weapons may provide the deterrent effect of nuclear weapons at a much lower price. They also seem harder to monitor from abroad, making it more difficult to limit their proliferation. And they could spark a global biological catastrophe, like a major — possibly existentially catastrophic — pandemic.

Artificial intelligence systems are also likely to become cheaper as well as more powerful. It is not hard to imagine important military implications of this technology. For example, AI systems could control large groups of lethal autonomous weapons (though the timeline on which such applications will be developed is unclear). They may increase the pace at which war is waged, enabling rapid escalation outside human control. And AI systems could speed up the development of other dangerous new technologies.

Finally, we may have to deal with the invention of other weapons which we can’t currently predict. The feasibility and danger of nuclear weapons was unclear to many military strategists and scientists until they were first tested. We could similarly experience the invention of destabilising new weapons in our lifetime.

What these technologies have in common is the potential to quickly kill huge numbers of people:

  • A nuclear war could kill tens of millions within hours, and many more in the following days and months.
  • A runaway bioweapon could prove very difficult to stop.
  • Future autonomous systems could act with lightning speed, even taking humans out of the decision-making loop entirely.

Faster wars leave less time for humans to intervene, negotiate, and find a resolution that limits the damage.

How likely is war to damage the long-run future?

When a war begins, leaders often promise a quick, limited conflict. But escalation proves hard to predict ahead of time (perhaps because people are scope-insensitive, or because escalation depends on idiosyncratic decisions).

This raises the possibility of enormous wars that threaten all of humanity.

The risk of extinction

It is extremely difficult to estimate the chance that a war escalates to the point of causing human extinction.

One possible starting point is to extrapolate from past wars. Political scientist Bear Braumoeller fit a statistical model to the Correlates of War data I discussed above.35 His model suggests that any given war has at least36 a one in 3,300 chance of causing human extinction.

If we experience 15 wars in the next 30 years,37 then the implied chance of an extinction war is about 0.5%. Assuming 50 wars over the next 100 years, that rises to a disturbing 1.5% chance of extinction.

But this estimate must be interpreted cautiously. First, it infers probabilities of different outcomes today using data from the past. Yet the chance of different war outcomes have very likely changed over time. The Correlates of War data goes back to 1816; it seems reasonable to think that 19th-century wars, fought with cannons and horses, tell us little about modern wars. This means it probably underestimates the chance of huge wars in the 21st century wars.

The Correlates of War data also only includes battle deaths. But large wars also kill lots of civilians. So considering only battle deaths will underestimate the chance of an extinction-level war by a considerable margin (for example, if one civilian is killed for every soldier, then a smaller, more probable war of just over four billion battle deaths would cause human extinction).

On the other hand, to infer probabilities about extinction-level events, Braumoeller extrapolates far beyond the data we’ve observed so far. An extinction-level war would be more than 100 times larger than World War II. It is hard to imagine a conventional war,38 at least, escalating to this extent. The logistics would be enormously complex. And barring omnicidal maniacs, world leaders would be hugely incentivised to bring the fighting to an end before killing literally everyone. This makes the model look too pessimistic.

On the whole, a 1.5% chance of an extinction-level war this century seems too high to me.

But while Braumoeller’s model seems too pessimistic on net, his work makes it hard to rule out a war that causes human extinction. We’re just left pretty uncertain about how likely it might be.

Another approach is to estimate the specific risks posed by different weapons of mass destruction.

We’ve estimated that the direct risk of an existential catastrophe caused by nuclear weapons in the next 100 years is around 0.01%. Maybe half of that risk (0.005%) comes from escalation through a major conflict.

I’d guess that the risks posed by bioweapons are similar (and possibly higher). We should also consider the interaction between great power conflict and risks from AI, as well as other future weapons of mass destruction whose development we can’t predict.

We could assume that these risks, plus the risk of conventional wars, are approximately mutually exclusive, and that each contributes about 0.005%. That would give a total risk of around 0.025% — or around one in 4,000 this century.

The risk of collapse

A more likely scenario is a war which doesn’t cause extinction, but is much larger than World War II.39 Such an event would still be easily the most destructive and deadly in human history. Beyond the enormous suffering it would cause, it would inflict major damage on the world’s infrastructure, trade links, social networks, and perhaps international institutions. The effects could be very long lasting.

One possibility is that civilisation could be damaged to the point of collapse. While some people would survive, they would lack the physical and social infrastructure to maintain all the processes we need to sustain modern life.

Rebuilding in these conditions would be a formidable challenge. Adjusted for inflation, under the Marshall Plan the United States spent $150 billion helping nations in western Europe recover from World War II. Accounting for investments from other Allies and the affected countries, as well as damage in eastern Europe, Asia, and Africa, rebuilding after World War II cost trillions of dollars.

So rebuilding after a war much larger than World War II could cost tens of trillions of dollars. And such a war would leave fewer nations with intact economies to fund the recovery. Survivors could also face additional challenges like widespread nuclear fallout and uncontrolled spread of weaponised pathogens.

Given enough time, I’d guess that humanity would eventually recover and rebuild industrial civilisation.40 However, we don’t know this for sure. And the recovery could take a very long time. Meanwhile, society would be vulnerable to a range of natural and anthropogenic hazards which could drive them to extinction.

Even if it doesn’t cause extinction or civilisational collapse, a major war could affect our long-term trajectory

Finally, a large war could alter our future even if it doesn’t cause human extinction or a global societal collapse.

Consider how different the world looked before and after World War II. Before the war, most of the world was autocratic. Fascists controlled several of the world’s most powerful countries.

This changed after the war. The Allied victory preceded a global wave of democratisation. Though fascist regimes continued in some countries, far-right ideology clearly posed less of a threat after the war. Instead, as a direct result of the war, the international institutions that emerged in the years after were shaped by liberal values like human rights and international cooperation.41

World War II is a particularly dramatic example, but it’s not the only time that war has caused major geopolitical realignments and affected which values are influential globally. Major conflicts reshape the global balance of power. In their aftermath, leaders of the victorious nations often use their new influence to change various institutions in their favour.

They redraw borders and cause civilisations to rise and fall. They invest in military research and influence how technological change happens. And their diplomatic strategies shape the norms and institutions that structure the international system.

Such changes can have very long-lasting effects.42 In extreme cases, they can even change our civilisational trajectory: the average value of the world over time. This might sound abstract, but just think about how much more pessimistic you’d feel about the future if more of the world’s most powerful countries were still ruled by fascist dictators.

A new great power war has the potential to cause similarly important changes to global institutions. For example, if an authoritarian state or alliance emerged from the war victorious, they may be able to use their influence and modern digital surveillance tools to entrench their power at a global scale.43

This century may be especially important because we are at risk of value lock-in due to transformative AI. We’re also probably in a high-risk period due to technological progress generally. So, it could be hugely important which countries, values, and institutions become more globally influential following a war.

That said, World War II also shows that the effect of war on our civilisational trajectory is not always unambiguously negative. That’s why I have focused on the effects of war that do seem unambiguously bad: dangerous technological development, near-term death and destruction, and heightened risk of a major global catastrophe.

Overall view

Overall, the short answer to the question of how likely a war is to affect our long-term future is that we really don’t know. Not much research has addressed this question, and each of the estimates I’ve ventured above has some serious weaknesses.

But considering the track record of past wars and the chance that weapons of mass destruction are used, I’d put the chance of an extinction-level war at between 0.025% and 1%.

We’re much more likely to experience a somewhat smaller war (a war killing 800 million people is probably around three times more likely than a war killing eight billion). But its long-term effects are far more ambiguous than extinction. So perhaps the risk to the long-term future from trajectory changes is roughly equal — though it’s really hard to say.

On the whole, my best guess is that the chance a war seriously damages the long-term future this century is between 0.05% and 2%. But I expect, and hope, this estimate will change in the coming years as more researchers work on these questions.

What are the major arguments against this problem being especially pressing?

So far, I’ve talked a lot about reasons why you might want to work on this problem: the chance that a new great power war breaks out is low but far from zero, such a war could escalate to unprecedented size, and the effects could reach far into the future.

In other words, the importance of the problem is clear. But we need to consider other factors as well. In particular, we need to ask if there’s anything we can realistically do to help solve it. And there are several reasons to think that improving great power relations is among the more difficult major problems to make progress on:

  1. There are many people working on avoiding war, with strong incentives to do so. (More.)
  2. Even if you can influence policy, it’s often not clear what the best thing is to do. (More.)
  3. Maybe it’s better to focus instead on specific risks. (More.)

It’s less neglected than some other top problems

Most people want to avoid war

The most obvious reason you might not choose to work on this problem is that it’s less neglected than some of our other top problems.

War hurts almost everyone. Some (though not all) wars start with public support. But they are costly in human lives and economic disruption. Negotiated solutions are almost always preferable. In reality, most wars that could happen don’t because people work to avoid fighting them.44

That said, it’s important not to take this argument too far. It is not the case that everyone is harmed by war or high international tension. The most obvious example is defence companies who benefit when governments buy more and more expensive weapons. In certain circumstances, unchecked leaders can also gain status and enhance their reputation through war without personally incurring many costs. And some foreign policy professionals benefit from increasing demand for their work.

I mention these factors not to criticise these actors in particular. Rather, it’s to point out that we can’t assume war will be avoided because it’s so costly for large swathes of society.

We know conflict has been historically common. And we know that negative outcomes can occur when their costs are distributed across society but their benefits are concentrated among influential actors.

More people work on this problem than some other top problems

Still, the obvious costs of war mean that there are already thousands of people working in relevant diplomacy, research, and policy roles. For example, there are about 13,000 Foreign Service members in the US State Department alone.

Thousands more people work on these issues in think tanks and universities. The Council on Foreign Relations, a prestigious membership organisation which publishes Foreign Affairs and hosts events about foreign policy, has over 5,000 members. The International Studies Association, which focuses more on academics, has over 7,000 members. Many thousands more people work on this problem in the intelligence and defence communities.

Of course, these organisations cover a huge range of issues, with only a fraction of their employees focused on great power conflict in particular. And of this fraction, only a small number are probably focused on preventing or mitigating worst-case outcomes like extinction.

To get a rough estimate of the number of people working on this problem, let’s try assuming that the US government employs about 250,000 people who work on issues broadly related to great power conflict. Perhaps 5% of this effort focuses on the specific issues we’ve discussed throughout this profile. That would leave about 12,500 people working on the most important US foreign policies in the government today.

Assume further that another 10,000 people work on international relations in think tanks and universities and, again, 5% focus on the issues in this profile. That would bring our total to about 13,000 people.

Of course, this is a very rough estimate. Accounting for the civil servants, diplomats, analysts, researchers, professors, and advocates in other parts of the world could double or triple it.

(In comparison, we’ve previously estimated that only about 400 people work on existential risks from advanced artificial intelligence.)

Because so many people are already working in this field, you will probably find it harder to identify important issues on which a lot of progress can be made that other people haven’t already found.

There aren’t many possible actions which are clearly positive

Suppose, though, that you managed to work your way up to a role which allows you to influence foreign policy in some way. What advice would you give?

This is a hard question to answer for a few different reasons.

First, IR researchers disagree on even the field’s most basic questions, like when deterrence policies are effective and whether diplomatic, cultural, and economic engagement has pacifying effects. So there are a few — though not zero! — ‘consensus’ actions to pursue.

Second, predicting the effects of important foreign policy decisions is difficult. We just don’t know much about how accurate long-run forecasts are, even when they’re made by superforecasters with strong track records.

Third, our advice could not just be ineffectual; it could also be harmful. Not only are the long-run effects of foreign policy decisions hard to predict, they often involve difficult tradeoffs.

For example, some researchers argue that building up the world’s nuclear arsenals has made major great power wars less likely (because of mutually-assured destruction) but smaller conflicts more likely (because they are less likely to escalate and thus ‘safer’ to fight).

Under this model, the total effect of nuclear deterrence doctrines on existential risk is ambiguous. It raises the upper bound of how bad a conflict could get. But it makes such conflicts somewhat less likely. And it’s hard to say which effect dominates.45

For these reasons, the impact one can have by working in this area is probably best thought of as improving the quality of decision making on a case-by-case basis rather than advocating generally for specific policies. You’ll probably still have some doubt about which direction to push.

Of course, everyone faces the same issues. You could still have a big impact by giving better advice or making better decisions, given all these constraints, than whoever you’re replacing would have. But acting under so much uncertainty could be a strong limitation on the expected impact you can have.

Maybe it’s better to focus instead on more specific risks

These concerns may lead you to think that you can have a bigger impact by working on a more direct existential risks like nuclear security, biosecurity, or risks from AI.

To think through this decision, let us return for a moment to our tension metre metaphor. The goal of someone working in great power relations could be seen as lowering the reading on the metre. I’ve discussed how that might make a diplomatic breakdown or the outbreak of a war less likely, lowering total existential risk.

But it may seem too hard to affect the tension metre. Or the connection between the tension metre and any specific risk (like a deadly pandemic) may be too tenuous. In that case, you’d probably have a bigger impact by taking the current level of international tension as given, and working directly on one of our other top problems in whatever geopolitical context we may find ourselves in.

For example, one way in which great power conflict could lead to catastrophe is by causing the release of an extremely contagious and deadly biological agent. Perhaps high tensions and fear of war increase investment in biological weapons, increasing the risk of an accidental release. Or perhaps one of the great powers, faced with the prospect of a catastrophic lost war, chooses to release such a weapon in a desperate bid for victory, and it goes horribly wrong.46

Flowchart showing how reduced international cooperation and great power war are factors in the development and deployment of dangerous new technologies, which could cause existential risks. Source: “Modelling Great Power conflict as an existential risk factor”, Effective Altruism Forum

You could choose to reduce the likelihood of this outcome by reducing the chance we end up in a high-tension or outright-conflict scenario in the first place. Or you could reduce the likelihood of this outcome by focusing specifically on how biological agents are governed and controlled. Although the latter approach doesn’t reduce the other risks conflict poses, there are more concrete proposals you could work on implementing.

Whether it’s better to focus on overall tension or specific risks depends on the relative tractability of proposals in both areas and how many other risks are affected by changes in international tension. You’re more likely to think that trying to reduce conflict is more impactful if:

  • You think conventional wars pose a lot of risk on their own, either because they can escalate massively or cause trajectory changes.
  • You think that great power conflict drives a large fraction of the risks posed by nuclear weapons, biological weapons, military AI, and other emerging technologies. This would make reducing tensions between great powers a powerful leverage point for lowering total overall risk.
  • You think that there are good approaches to reducing great power conflict risk — perhaps ones that aren’t mentioned in this article.

If, however, you think most of the overall existential risk we face comes from a specific risk (such as AI or climate change) or great power conflict is just not that solvable, then you might want to focus on a different area.

Earlier, we identified five specific pathways through which great power conflict could cause an existential catastrophe (conventional war, nuclear war, bioweapons, AI, and future technologies). So by working to reduce great power tensions, you can reduce five risks at once.

But my current best guess is that it’s at least 10 times harder to reduce the chance of conflict by a given amount as it is to reduce a specific risk like a biotech catastrophe. So unless you feel that, for personal fit reasons, you would be at least two or three times more effective working on great power conflict broadly, it still likely makes sense to focus on one of the most pressing specific risks.

(That said, this is a very rough calculation — I could easily be wrong here!)

What can you do to help?

After reading the previous section, you might feel pessimistic about your chances of making progress on this problem.

It’s true that this problem seems generally less neglected than some of the world’s other top problems, and I’m not really sure what’s most helpful for it. But great power conflict encompasses many different issues. I think that some of these specific sub-problems are more neglected and tractable than great power diplomacy generally. You could have a big impact by focusing on them.

Here I highlight a few issues that experts I’ve spoken to have highlighted as particularly promising for people to work on if they want to have an impact in this space.

One promising path to impact looks like gaining a deep understanding of the foreign policy landscape, building a strong network, and practising good judgement. Later in your career, you could use your skills and expertise to support policies that seem good and resist policies that seem harmful. But exactly which policies those are currently seem hard to predict, as they’ll likely rely on highly contextual factors like who’s leading the countries involved.

Another thing to keep in mind is that to reduce great power conflict, you’ll probably need to combine foreign policy expertise with expertise in another important area.47

For example, US foreign policy experts who also know a lot about China or speak Mandarin are really valuable. Similarly, people who understand international relations and biosecurity, risks from advanced artificial intelligence, or nuclear security are sorely needed.

If you want to go into this field, you’ll probably need to be flexible and open to taking unexpected opportunities when they arise.

Finally, you’ll want to think carefully about personal fit. There are a lot of different jobs you could do in this area. Some are very research-focused, like working in a think tank. Others would be much more people-oriented, like working for a policymaker or going into politics yourself. Although you might work on the same issues, your daily routine would look totally different.

The rest of this section gives some preliminary ideas about where you might want to work in this area. It’s separated into two questions: where can you work and what issues should you try to focus on?

Where can you work?

Government

I’ll start with roles specific to the US government because it’s especially large and influential in many of our top problem areas.

The main US federal policy institutions are Congress48 and the executive branch (including both federal agencies and the White House).49

After my conversations with experts, I’ve divided the potential government roles in this space into four broad categories.

First, there are research-like roles in intelligence and analysis. Researchers can affect policy by ensuring it is addressing the right problems and focusing on the best solutions. For example, at the beginning of the Cold War, analysts suggested that the USSR’s nuclear arsenal was larger and more effective than the United States’, and that the gap was growing. This idea was wrong, and it helped drive the early nuclear arms race. Better analysis may have been able to avoid this.

Second, there are decision-making roles in which research is turned into policy. These include political appointees selected by the executive, and career civil servants who work their way up the bureaucracy. Decision makers influence which strategies to pursue and which policies to implement.

Third, there are programme management roles. Programme managers prioritise how government budgets are spent. Since these budgets can be quite large, even small improvements in how they’re spent could have a big impact.

Programme managers have been distinguished from decision makers because they work ‘deeper’ in the bureaucracy with less public visibility. The State Department’s Office of Cooperative Threat Reduction, for example, currently spends about $90 million a year on its Global Threat Reduction programme, which focuses on preventing the development and proliferation of weapons of mass destruction and “advanced conventional weapons.”50

Fourth, there are diplomatic roles that involve working with people from other countries to implement policies.

To enter a career in US foreign or security policy, the best paths include completing a relevant graduate degree (ideally based in Washington, DC), particularly a policy master’s or law degree, and participating in a policy fellowship — providing benefits like job placements, funding, training, mentoring, networking opportunities, application support, and more.

Working for the US government, especially in national security, can be impossible for non-citizens.

However, if you’re in a position to work on foreign policy issues in other influential countries like India or a major NATO member, you could still have a big impact.

Unfortunately, I’m much more uncertain about how to reduce risks and improve policy in Russia or China.

Think tanks

Especially in the United States, think tanks are also an important part of the foreign policy ecosystem.

Framing your career as a choice between working at a think tank or working in the government is actually a bit misleading. In reality, many people move back and forth between the think tank world and the government over the course of their career.

We’ve previously written about think tanks in this article. Working at a think tank allows you to spend more time investigating issues deeply, developing new policy ideas, and building your network and professional reputation. It can be a particularly good way to break into this field early in your career.

For example, you could work at prestigious foreign policy think tanks with broad focus areas like the Council on Foreign Relations, the Carnegie Endowment for International Peace, or the Center for Strategic and International Studies (CSIS). Alternatively, you could work at think tanks focusing on specific relevant issues like international AI policy or biosecurity policy.

Organisations that work on AI governance and military risks include the Center for Security and Emerging Technology (CSET), Brookings, the Center for a New American Security (CNAS) and the Federation of American Scientists (FAS). (CSIS and Carnegie also have relevant programmes).

For biosecurity, the most relevant organisations include the Johns Hopkins University Center for Health Security (CHS), the Nuclear Threat Initiative (NTI), Bipartisan Commission on Biodefense (BCB), and the Council on Strategic Risks (CSR).

Universities

You can also do research in universities.

My sense is that policy implementation, not research, is more of a bottleneck in the foreign policy and great power conflict space.51 This limits the value of studying and working in universities.

However, the foreign policy space is pretty crowded and competitive. This means that earning a master’s or PhD can also be very useful, or even necessary, to advance in your career.

If you’re pretty sure you want to work in policy, you can do one of the US policy-focused master’s degrees discussed here. If you want to do academic research, or move up to a high-level position in a prestigious think tank, it’s worth giving a PhD programme strong consideration. And if you’re going to do a PhD for career reasons, you could think about how to focus your research on important, policy-relevant issues.

Academics can focus on questions for extended periods of time. They can also think deeply about issues which don’t yet seem to have direct policy relevance. This could help them answer particularly complex questions or help reduce risks that are not yet salient but could be in the next few years or decades. I discuss some potential research topics in the next section.

What issues should you focus on?

The riskiest bilateral relationships

Wars can begin when leaders of one state misperceive the strength or intentions of a rival.52

This makes it very important to have experts who can help policymakers accurately interpret the actions of rival states. Combining an understanding of foreign policymaking processes, say in the United States, with an understanding of the historical, social, economic, and cultural context in another great power like China or Russia could be a highly valuable set of skills.

Similarly, you could work to become an expert in an emerging or future great power like India.

One concrete example of this kind of work is facilitating Track II diplomacy programmes. This can include hosting summits and meetings between non-official (non-governmental) representatives from different countries to share information and build trust. People with expertise in two nations, such as both China and the United States, can play an important role in facilitating such dialogues.

Track II diplomacy can be useful, for example, when official diplomatic channels have been closed down due to high tension. There are some historical cases where they have even contributed to concrete policy change, such as the United States and the USSR signing the Anti-Ballistic Missile Treaty in 1972.

Language skills can also be very useful in this area. See, for example, the work of the Center for Strategic Translation, which works to translate, annotate, and explain influential Chinese texts for English speakers.

If you decide to go down this path, you should probably try to focus on the riskiest relationships, which I discussed here.

Crisis management

Some wars are sparked when small disputes escalate. And escalation is unpredictable and difficult to control.53 One way to lower the total risk of war is to prevent escalatory spirals before they begin.

It may seem difficult to imagine how one could do this. But there are a number of important crisis management systems one could work to improve or support.

You could research, advocate for, and work to implement information-provision systems like hotlines to reduce uncertainty during crises. Or you could research how new weapons and communications technologies might affect escalation dynamics and propose policies to pre-empt unexpected effects.

Thomas Schelling, for example, did influential research on crisis management and communication hotlines and helped motivate the establishment of the Moscow-Washington hotline following the Cuban Missile Crisis.

Analysing the effects of important foreign policy decisions

Another approach one could take is to become an expert in a particularly important foreign policy issue.

For example, great powers will use sanctions to punish aggressive actions from rivals. They may also try to slow their progress in important sectors (like by putting export controls on semiconductors). You could study such policies closely to better predict their effects. By working for the government you could improve their effectiveness and minimise major downside risks (like increasing the chance of conflict). Or you could work outside the government, like in a think tank or as a journalist who can scrutinise policy choices and provide public accountability.

Other areas of foreign policy in which you might consider developing expertise include:

International governance of weapons of mass destruction and emerging technologies

You could also help reduce total war risk by working to make extremely severe outcomes less likely. The most obvious way to do this is to study proposals for international governance agreements on the development, proliferation, and use of weapons of mass destruction. This would include both existing weapons, like nuclear weapons and bioweapons, and emerging weapons technologies like advanced military artificial intelligence systems.

Improving how WMDs and emerging technologies are controlled at the national level

Individual states can also reduce war risks by unilaterally improving their management policies for weapons of mass destruction. Some of these policies in the profiles on nuclear and biological risks.

On several occasions, malfunctioning systems have created false alarms that could plausibly have led to retaliation and escalation to war. If one thinks there is a low but constant risk of something like this going wrong, then we will inevitably head to disaster on long enough time horizons.

Research on how current policies could fail, or how new technologies (like advanced AI or improved satellite imaging) may raise or lower the chance of accidents, could be useful.

Other domestic interventions

There are several other potential interventions one could work on domestically. For example, one could try to affect the politics of war by influencing public discourse to reduce tension and working to get less war-like politicians elected. Or, one could try to strengthen democratic institutions to ensure that leaders remain ‘checked’ and accountable to the people who would bear the costs of war.

I’m more uncertain about how important and how feasible these interventions are, though. Given my current views, I’d instead encourage people to focus on the first five issues I listed in this profile.

Find vacancies on our job board

Our job board features opportunities to work in government and policy on our top problems:

    View all opportunities

    Learn more

    Top recommendations

    Books:

    Research:

    Popular articles:

    Other pieces:

    The post Great power conflict appeared first on 80,000 Hours.

    ]]>
    The public is more concerned about AI causing extinction than we thought https://80000hours.org/2023/05/the-public-is-more-concerned-about-ai-causing-extinction-than-we-thought/ Fri, 26 May 2023 15:45:10 +0000 https://80000hours.org/?p=82020 The post The public is more concerned about AI causing extinction than we thought appeared first on 80,000 Hours.

    ]]>
    What does the public think about risks of human extinction?

    We care a lot about reducing extinction risks and think doing so is one of the best ways you can have a positive impact with your career. But even before considering career impact, it can be natural to worry about these risks — and as it turns out, many people do!

    In April 2023, the US firm YouGov polled 1,000 American adults on how worried they were about nine different potential extinction threats. It found the following percentages of respondents were either “concerned” or “very concerned” about extinction from each threat:

    This blog post was first released to our newsletter subscribers.

    Join over 350,000 newsletter subscribers who get content like this in their inboxes every two weeks — and we’ll also mail you a free book!

    We’re particularly interested in this poll now because we have recently updated our page on the world’s most pressing problems, which includes several of these extinction risks at the top.

    Knowing how the public feels about these kinds of threats can impact how we communicate about them.

    For example, if we take the results at face value, 46% of the poll’s respondents are concerned about human extinction caused by artificial intelligence. Maybe this surprisingly high figure means we don’t need to worry as much as we have over the last 10 years about sounding like ‘sci fi’ when we talk about existential risks from AI, since it’s quickly becoming a common concern!

    How does our view of the world’s most pressing problems compare? Well, our top two problems, existential risks from AI and catastrophic pandemics, rank lower in the poll than some of the other threats, but many people do seem to have serious concerns about them. (Though we wonder if people are aware that engineered pandemics seem especially dangerous compared to natural ones.)

    We shouldn’t be that surprised that our list doesn’ quite match the poll’s rankings. In our view, one of the factors that makes problems particularly pressing is when they’re relatively neglected — not getting the level of attention they warrant.

    Some concerns among the respondents are, in our judgement, off base. While we do list climate change among our top problems, and is causing tremendous harm, we don’t think it’s very likely to cause human extinction — a conclusion we argue for in-depth in our problem profile. Though it may be a risk factor for other extinction threats.

    Human extinction from asteroid impacts is even more unlikely. Objects in space smash into Earth all the time, but collisions with something big enough to cause a mass extinction are exceedingly rare. Toby Ord, an advisor to 80,000 Hours, has estimated that the chance that an asteroid causes human extinction is about one in a million.

    Let’s put aside an “act of God” — it’s hard to know if respondents meant that figuratively or literally, and if it’s literally, we’re pretty stumped on how to assess its likelihood. Extinction from infertility or alien invasion also seems quite unlikely to us.

    Overall, the American public seems to have a mix of more plausible and less plausible concerns, at least by our lights. But we’d guess most people will agree that avoiding human extinction from any source should be a top priority, which is why we want to help many of our readers use their careers to address these threats.

    Learn more:

    The post The public is more concerned about AI causing extinction than we thought appeared first on 80,000 Hours.

    ]]>
    Preventing an AI-related catastrophe https://80000hours.org/problem-profiles/artificial-intelligence/ Thu, 25 Aug 2022 19:43:58 +0000 https://80000hours.org/?post_type=problem_profile&p=77853 The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.

    ]]>

    Note from the author: At its core, this problem profile tries to predict the future of technology. This is a notoriously difficult thing to do. In addition, there has been much less rigorous research into the risks from AI than into the other risks 80,000 Hours writes about (like pandemics or climate change).1 That said, there is a growing field of research into the topic, which I’ve tried to reflect. For this article I’ve leaned especially on this draft report by Joseph Carlsmith at Open Philanthropy (also available as a narration), as it’s the most rigorous overview of the risk that I could find. I’ve also had the article reviewed by over 30 people with different expertise and opinions on the topic. (Almost all are concerned about advanced AI’s potential impact.)

    If you have any feedback on this article — whether there’s something technical we’ve got wrong, some wording we could improve, or just that you did or didn’t like reading it — we’d really appreciate it if you could tell us what you think using this form.

    Why do we think that reducing risks from AI is one of the most pressing issues of our time? In short, our reasons are:

    1. Even before getting into the actual arguments, we can see some cause for concern — as many AI experts think there’s a small but non-negligible chance that AI will lead to outcomes as bad as human extinction.
    2. We’re making advances in AI extremely quickly — which suggests that AI systems could have a significant influence on society, soon.
    3. There are strong arguments that “power-seeking” AI could pose an existential threat to humanity2 — which we’ll go through below.
    4. Even if we find a way to avoid power-seeking, there are still other risks.
    5. We think we can tackle these risks.
    6. This work is neglected.

    We’re going to cover each of these in turn, then consider some of the best counterarguments, explain concrete things you can do to help, and finally outline some of the best resources for learning more about this area.

    1. Many AI experts think there’s a non-negligible chance AI will lead to outcomes as bad as extinction

    In May 2023, hundreds of AI prominent scientists — and other notable figures — signed a statement saying that mitigating the risk of extinction from AI should be a global priority.

    So it’s pretty clear that at least some experts are concerned.

    But how concerned are they? And is this just a fringe view?

    We looked at three surveys of AI researchers who published at NeurIPS and ICML (two of the most prestigious machine learning conferences) — one in 2016, one in 2019, and one in 2022.3

    It’s important to note that there could be considerable selection bias on surveys like this. For example, you might think researchers who go to the top AI conferences are more likely to be optimistic about AI, because they have been selected to think that AI research is doing good. Alternatively, you might think that researchers who are already concerned about AI are more likely to respond to a survey asking about these concerns.4

    All that said, here’s what we found:

    In all three surveys, the median researcher thought that the chances that AI would be “extremely good” was reasonably high: 20% in the 2016 survey, 20% in 2019, and 10% in 2022.5

    Indeed, AI systems are already having substantial positive effects — for example, in medical care or academic research.

    But in all three surveys, the median researcher also estimated small — and certainly not negligible — chances that AI would be “extremely bad (e.g. human extinction)”: a 5% chance of extremely bad outcomes in the 2016 survey, 2% in 2019, and 5% in 2022.6 7

    In the 2022 survey, participants were specifically asked about the chances of existential catastrophe caused by future AI advances — and again, over half of researchers thought the chances of an existential catastrophe was greater than 5%.8

    So experts disagree on the degree to which AI poses an existential risk — a kind of threat we’ve argued deserves serious moral weight.

    This fits with our understanding of the state of the research field. Three of the leading labs developing AI — DeepMind, Anthropic and OpenAI — also have teams dedicated to figuring out how to solve technical safety issues that we believe could, for reasons we discuss at length below, lead to an existential threat to humanity.9

    There are also several academic research groups (including at MIT, Oxford, Cambridge, Carnegie Mellon University, and UC Berkeley) focusing on these same technical AI safety problems.10

    It’s hard to know exactly what to take from all this, but we’re confident that it’s not a fringe position in the field to think that there is a material risk of outcomes as bad as an existential catastrophe. Some experts in the field maintain, though, that the risks are overblown.

    Still, why do we side with those who are more concerned? In short, it’s because there are arguments we’ve found persuasive that AI could pose such an existential threat — arguments we will go through step by step below.

    It’s important to recognise that the fact that many experts recognise there’s a problem doesn’t mean that everything’s OK, the experts have got it covered. Overall, we think this problem remains highly neglected, with only around 400 people working directly on the issue worldwide (more on this below).

    Meanwhile, there are billions of dollars a year going into making AI more advanced.11

    2. We’re making advances in AI extremely quickly

    Two cats dressed as computer programmers generated by different AI software.
    A cat dressed as a computer programmer” as generated by Craiyon (formerly DALL-E mini) (left) and OpenAI’s DALL-E 2. (right). DALL-E mini uses a model 27 times smaller than OpenAI’s DALL-E 1 model, released in January 2021. DALL-E 2 was released in April 2022.12

    Before we try to figure out what the future of AI might look like, it’s helpful to take a look at what AI can already do.

    Modern AI techniques involve machine learning (ML): models that improve automatically through data input. The most common form of this technique used today is known as deep learning.

    ML systems today can only perform a very small portion of tasks that humans can do, and (with a few exceptions) only within narrow specialties (like playing one particular game or generating one particular kind of image).

    That said, since the increasingly widespread use of deep learning in the mid-2010s, there has been huge progress in what can be achieved with ML. Here’s a brief timeline of only some of the advances we saw from 2019 to 2022:

    • AlphaStar, which can beat top professional players at StarCraft II (January 2019)
    • MuZero, a single system that learned to win games of chess, shogi, and Go — without ever being told the rules (November 2019)
    • GPT-3, a natural language model capable of producing high-quality text (May 2020)
    • GPT-f, which can solve some Maths Olympiad problems (September 2020)
    • AlphaFold 2, a huge step forward in solving the long-perplexing protein-folding problem (July 2021)
    • Codex, which can produce code for programs from natural language instructions (August 2021)
    • PaLM, a language model which has shown impressive capabilities to reason about things like cause and effect or explaining jokes (April 2022)
    • DALL-E 2 (April 2022) and Imagen (May 2022), which are both capable of generating high-quality images from written descriptions
    • SayCan, which takes natural language instructions and uses them to operate a robot (April 2022)
    • Gato, a single ML model capable of doing a huge number of different things (including playing Atari, captioning images, chatting, and stacking blocks with a real robot arm), deciding based on its context what it should output (May 2022)
    • Minerva can solve complex maths problems — fairly well at college level, and even better at high school maths competition level. (Minerva is far more successful than forecasters predicted in 2021.)

    If you’re anything like us, you found the complexity and breadth of the tasks these systems can carry out surprising.

    And if the technology keeps advancing at this pace, it seems clear there will be major effects on society. At the very least, automating tasks makes carrying out those tasks cheaper. As a result, we may see rapid increases in economic growth (perhaps even to the level we saw during the Industrial Revolution).

    If we’re able to partially or fully automate scientific advancement we may see more transformative changes to society and technology.13

    That could be just the beginning. We may be able to get computers to eventually automate anything humans can do. This seems like it has to be possible — at least in principle. This is because it seems that, with enough power and complexity, a computer should be able to simulate the human brain. This would itself be a way of automating anything humans can do (if not the most efficient method of doing so).

    And as we’ll see in the next section, there are some indications that extensive automation may well be possible through scaling up existing techniques.

    Current trends show rapid progress in the capabilities of ML systems

    There are three things that are crucial to building AI through machine learning:

    1. Good algorithms (e.g. more efficient algorithms are better)
    2. Data to train an algorithm
    3. Enough computational power (known as compute) to do this training

    We spoke to Danny Hernandez, who (at the time) was a research scientist on the Foresight team at OpenAI. Hernandez and his team looked at how two of these inputs (compute and algorithm efficiency) are changing over time.

    They found that, since 2012, the amount of compute used for training the largest AI models has been rising exponentially — doubling every 3.4 months.

    That is to say, since 2012, the amount of computational power used to train our largest machine learning models has grown by over 1 billion times.

    Hernandez and his team also looked at how much compute has been needed to train a neural network to have the same performance as AlexNet (an early image classification algorithm).

    They found that the amount of compute required for the same performance has been falling exponentially — halving every 16 months.

    So since 2012, the amount of compute required for the same level of performance has fallen by over 100 times. Combined with the increased compute used, that’s a lot of growth.16

    It’s hard to say whether these trends will continue, but they speak to incredible gains over the past decade in what it’s possible to do with machine learning.

    Indeed, it looks like increasing the size of models (and the amount of compute used to train them) introduces ever more sophisticated behaviour. This is how things like GPT-3 are able to perform tasks they weren’t specifically trained for.

    These observations have led to the scaling hypothesis: that we can simply build bigger and bigger neural networks, and as a result we will end up with more and more powerful artificial intelligence, and that this trend of increasing capabilities may increase to human-level AI and beyond.

    If this is true, we can attempt to predict how the capabilities of AI technology will increase over time simply by looking at how quickly we are increasing the amount of compute available to train models.

    But as we’ll see, it’s not just the scaling hypothesis that suggests we could end up with extremely powerful AI relatively soon — other methods of predicting AI progress come to similar conclusions.

    When can we expect transformative AI?

    It’s difficult to predict exactly when we will develop AI that we expect to be hugely transformative for society (for better or for worse) — for example, by automating all human work or drastically changing the structure of society.17 But here we’ll go through a few approaches.

    One option is to survey experts. Data from the 2019 survey of 300 AI experts implies that there is 20% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2060, and 85% by 2100.18 There are a lot of reasons to be suspicious of these estimates,4 but we take it as one data point.

    Ajeya Cotra (a researcher at Open Philanthropy) attempted to forecast transformative AI by comparing modern deep learning to the human brain. Deep learning involves using a huge amount of compute to train a model, before that model is able to perform some task. There’s also a relationship between the amount of compute used to train a model and the amount used by the model when it’s run. And — if the scaling hypothesis is true — we should expect the performance of a model to predictably improve as the computational power used increases. So Cotra used a variety of approaches (including, for example, estimating how much compute the human brain uses on a variety of tasks) to estimate how much compute might be needed to train a model that, when run, could carry out the hardest tasks humans can do. She then estimated when using that much compute would be affordable.

    Cotra’s 2022 update on her report’s conclusions estimates that there is a 35% probability of transformative AI by 2036, 50% by 2040, and 60% by 2050 — noting that these guesses are not stable.19

    Tom Davidson (also a researcher at Open Philanthropy) wrote a report to complement Cotra’s work. He attempted to figure out when we might expect to see transformative AI based only on looking at various types of research that transformative AI might be like (e.g. developing technology that’s the ultimate goal of a STEM field, or proving difficult mathematical conjectures), and how long it’s taken for each of these kinds of research to be completed in the past, given some quantity of research funding and effort.

    Davidson’s report estimates that, solely on this information, you’d think that there was an 8% chance of transformative AI by 2036, 13% by 2060, and 20% by 2100. However, Davidson doesn’t consider the actual ways in which AI has progressed since research started in the 1950s, and notes that it seems likely that the amount of effort we put into AI research will increase as AI becomes increasingly relevant to our economy. As a result, Davidson expects these numbers to be underestimates.

    Holden Karnofsky, co-CEO of Open Philanthropy, attempted to sum up the findings of all of the approaches above. He guesses there is more than a 10% chance we’ll see transformative AI by 2036(!), 50% by 2060, and 66% by 2100. And these guesses might be conservative, since they didn’t incorporate what we see as faster-than-expected progress since the estimates were made.

    Method Chance of transformative AI by 2036 Chance of transformative AI by 2060 Chance of transformative AI by 2100
    Expert survey (Zhang et al., 2022) 20% 50% 85%
    Biological anchors (Cotra, 2022) 35% 60% (by 2050) 80% (according to the 2020 report)
    Semi-informative priors (Davidson, 2021) 8% 13% 20%
    Overall guess (Karnofsky, 2021) 10% 50% 66%

    All in all, AI seems to be advancing rapidly. More money and talent is going into the field every year, and models are getting bigger and more efficient.

    Even if AI were advancing more slowly, we’d be concerned about it — most of the arguments about the risks from AI (that we’ll get to below) do not depend on this rapid progress.

    However, the speed of these recent advances increases the urgency of the issue.

    (It’s totally possible that these estimates are wrong – below, we discuss how the possibility that we might have a lot of time to work on this problem is one of the best arguments against this problem being pressing).

    3. Power-seeking AI could pose an existential threat to humanity

    We’ve argued so far that we expect AI to be an important — and potentially transformative — new technology.

    We’ve also seen reason to think that such transformative AI systems could be built this century.

    Now we’ll turn to the core question: why do we think this matters so much?

    There could be a lot of reasons. If advanced AI is as transformative as it seems like it’ll be, there will be many important consequences. But here we are going to explain the issue that seems most concerning to us: AI systems could pose risks by seeking and gaining power.

    We’ll argue that:

    1. It’s likely that we’ll build AI systems that can make and execute plans to achieve goals
    2. Advanced planning systems could easily be ‘misaligned’ — in a way that could lead them to make plans that involve disempowering humanity
    3. Disempowerment by AI systems would be an existential catastrophe
    4. People might deploy AI systems that are misaligned, despite this risk

    Thinking through each step, I think there’s something like a 1% chance of an existential catastrophe resulting from power-seeking AI systems this century. This is my all things considered guess at the risk incorporating considerations of the argument in favour of the risk (which is itself probabilistic), as well as reasons why this argument might be wrong (some of which I discuss below). This puts me on the less worried end of 80,000 Hours staff, whose views on our last staff survey ranged from 1–55%, with a median of 15%.

    It’s likely we’ll build advanced planning systems

    We’re going to argue that future systems with the following three properties might pose a particularly important threat to humanity:20

    1. They have goals and are good at making plans.

      Not all AI systems have goals or make plans to achieve those goals. But some systems (like some chess-playing AI systems) can be thought of in this way. When discussing power-seeking AI, we’re considering planning systems that are relatively advanced, with plans that are in pursuit of some goal(s), and that are capable of carrying out those plans.

    2. They have excellent strategic awareness.

      A particularly good planning system would have a good enough understanding of the world to notice obstacles and opportunities that may help or hinder its plans, and respond to these accordingly. Following Carlsmith, we’ll call this strategic awareness, since it allows systems to strategise in a more sophisticated way.

    3. They have highly advanced capabilities relative to today’s systems.

      For these systems to actually affect the world, we need them to not just make plans, but also be good at all the specific tasks required to execute those plans.

      Since we’re worried about systems attempting to take power from humanity, we are particularly concerned about AI systems that might be better than humans on one or more tasks that grant people significant power when carried out well in today’s world.

      For example, people who are very good at persuasion and/or manipulation are often able to gain power — so an AI being good at these things might also be able to gain power. Other examples might include hacking into other systems, tasks within scientific and engineering research, as well as business, military, or political strategy.

    These systems seem technically possible and we’ll have strong incentives to build them

    As we saw above, we’ve already produced systems that are very good at carrying out specific tasks.

    We’ve also already produced rudimentary planning systems, like AlphaStar, which skilfully plays the strategy game Starcraft, and MuZero, which plays chess, shogi, and Go.21

    We’re not sure whether these systems are producing plans in pursuit of goals per se, because we’re not sure exactly what it means to “have goals.” However, since they consistently plan in ways that achieve goals, it seems like they have goals in some sense.

    Moreover, some existing systems seem to actually represent goals as part of their neural networks.22

    That said, planning in the real world (instead of games) is much more complex, and to date we’re not aware of any unambiguous examples of goal-directed planning systems, or systems that exhibit high degrees of strategic awareness.

    But as we’ve discussed, we expect to see further advances within this century. And we think these advances are likely to produce systems with all three of the above properties.

    That’s because we think that there are particularly strong incentives (like profit) to develop these kinds of systems. In short: because being able to plan to achieve a goal, and execute that plan, seems like a particularly powerful and general way of affecting the world.

    Getting things done — whether that’s a company selling products, a person buying a house, or a government developing policy — almost always seems to require these skills. One example would be assigning a powerful system a goal and expecting the system to achieve it — rather than having to guide it every step of the way. So planning systems seem likely to be (economically and politically) extremely useful.23

    And if systems are extremely useful, there are likely to be big incentives to build them. For example, an AI that could plan the actions of a company by being given the goal to increase its profits (that is, an AI CEO) would likely provide significant wealth for the people involved — a direct incentive to produce such an AI.

    As a result, if we can build systems with these properties (and from what we know, it seems like we will be able to), it seems like we are likely to do so.24

    Advanced planning systems could easily be dangerously ‘misaligned’

    There are reasons to think that these kinds of advanced planning AI systems will be misaligned. That is, they will aim to do things that we don’t want them to do.25

    There are many reasons why systems might not be aiming to do exactly what we want them to do. For one thing, we don’t know how, using modern ML techniques, to give systems the precise goals we want (more here).26

    We’re going to focus specifically on some reasons why systems might by default be misaligned in such a way that they develop plans that pose risks to humanity’s ability to influence the world — even when we don’t want that influence to be lost.27

    What do we mean by “by default”? Essentially, unless we actively find solutions to some (potentially quite difficult) problems, then it seems like we’ll create dangerously misaligned AI. (There are reasons this might be wrong — which we discuss later.)

    Three examples of “misalignment” in a variety of systems

    It’s worth noting that misalignment isn’t a purely theoretical possibility (or specific to AI) — we see misaligned goals in humans and institutions all the time, and have also seen examples of misalignment in AI systems.28

    The democratic political framework is intended to ensure that politicians make decisions that benefit society. But what political systems actually reward is winning elections, so that’s what many politicians end up aiming for.

    This is a decent proxy goal — if you have a plan to improve people’s lives, they’re probably more likely to vote for you — but it isn’t perfect. As a result, politicians do things that aren’t clearly the best way of running a country, like raising taxes at the start of their term and cutting them right before elections.

    That is to say, the things the system does are at least a little different from what we would, in a perfect world, want it to do: the system is misaligned.

    Companies have profit-making incentives. By producing more, and therefore helping people obtain goods and services at cheaper prices, companies make more money.

    This is sometimes a decent proxy for making the world better, but profit isn’t actually the same as the good of all of humanity (bold claim, we know). As a result, there are negative externalities: for example, companies will pollute to make money despite this being worse for society overall.

    Again, we have a misaligned system, where the things the system does are at least a little different from what we would want it to do.

    DeepMind has documented examples of specification gaming: an AI doing well according to its specified reward function (which encodes our intentions for the system), but not doing what researchers intended.

    In one example, a robot arm was asked to grasp a ball. But the reward was specified in terms of whether humans thought the robot had been successful. As a result, the arm learned to hover between the ball and the camera, fooling the humans into thinking that it had grasped the ball.29

    A simulated arm hovers between a ball and a camera.
    Source: Christiano et al., 2017

    So we know it’s possible to create a misaligned AI system.

    Why these systems could (by default) be dangerously misaligned

    Here’s the core argument of this article. We’ll use all three properties from earlier: planning ability, strategic awareness, and advanced capabilities.

    To start, we should realise that a planning system that has a goal will also develop ‘instrumental goals’: things that, if they occur, will make it easier to achieve an overall goal.

    We use instrumental goals in plans all the time. For example, a high schooler planning their career might think that getting into university will be helpful for their future job prospects. In this case, “getting into university” would be an instrumental goal.

    A sufficiently advanced AI planning system would also include instrumental goals in its overall plans.

    If a planning AI system also has enough strategic awareness, it will be able to identify facts about the real world (including potential things that would be obstacles to any plans), and plan in light of them. Crucially, these facts would include that access to resources (e.g. money, compute, influence) and greater capabilities — that is, forms of power — open up new, more effective ways of achieving goals.

    This means that, by default, advanced planning AI systems would have some worrying instrumental goals:

    • Self-preservation — because a system is more likely to achieve its goals if it is still around to pursue them (in Stuart Russell’s memorable phrase, “You can’t fetch the coffee if you’re dead”).
    • Preventing any changes to the AI system’s goals — since changing its goals would lead to outcomes that are different from those it would achieve with its current goals.
    • Gaining power — for example, by getting more resources and greater capabilities.

    Crucially, one clear way in which the AI can ensure that it will continue to exist (and not be turned off), and that its objectives will never be changed, would be to gain power over the humans who might affect it (we talk here about how AI systems might actually be able to do that).

    What’s more, the AI systems we’re considering have advanced capabilities — meaning they can do one or more tasks that grant people significant power when carried out well in today’s world. With such advanced capabilities, these instrumental goals will not be out of reach, and as a result, it seems like the AI system would use its advanced capabilities to get power as part of the plan’s execution. If we don’t want the AI systems we create to take power away from us this would be a particularly dangerous form of misalignment.

    In the most extreme scenarios, a planning AI system with sufficiently advanced capabilities could successfully disempower us completely.

    As a (very non-rigorous) intuitive check on this argument, let’s try to apply it to humans.

    Humans have a variety of goals. For many of these goals, some form of power-seeking is advantageous: though not everyone seeks power, many people do (in the form of wealth or social or political status), because it’s useful for getting what they want. This is not catastrophic (usually!) because, as human beings:

    • We generally feel bound by human norms and morality (even people who really want wealth usually aren’t willing to kill to get it).
    • We aren’t that much more capable or intelligent than one another. So even in cases where people aren’t held back by morality, they’re not able to take over the world.

    (We discuss whether humans are truly power-seeking later.)

    A sufficiently advanced AI wouldn’t have those limitations.

    It might be hard to find ways to prevent this sort of misalignment

    The point of all this isn’t to say that any advanced planning AI system will necessarily attempt to seek power. Instead, it’s to point out that, unless we find a way to design systems that don’t have this flaw, we’ll face significant risk.

    It seems more than plausible that we could create an AI system that isn’t misaligned in this way, and thereby prevent any disempowerment. Here are some strategies we might take (plus, unfortunately, some reasons why they might be difficult in practice):30

    • Control the objectives of the AI system. We may be able to design systems that simply don’t have objectives to which the above argument applies — and thus don’t incentivise power-seeking behaviour. For example, we could find ways to explicitly instruct AI systems not to harm humans, or find ways to reward AI systems (in training environments) for not engaging in specific kinds of power-seeking behaviour (and also find ways to ensure that this behaviour continues outside the training environment).

      Carlsmith gives two reasons why doing this seems particularly hard.

      First, for modern ML systems, we don’t get to explicitly state a system’s objectives — instead we reward (or punish) a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is goal misgeneralisation. Researchers have uncovered real examples of systems that appear to have learned to pursue a goal in the training environment, but then fail to generalise that goal when they operate in a new environment. This raises the possibility that we could think we’ve successfully trained an AI system not to seek power — but that the system would seek power anyway when deployed in the real world.31

      Second, when we specify a goal to an AI system (or, when we can’t explicitly do that, when we find ways to reward or punish a system during training), we usually do this by giving the system a proxy by which outcomes can be measured (e.g. positive human feedback on a system’s achievement). But often those proxies don’t quite work.32 In general, we might expect that even if a proxy appears to correlate well with successful outcomes, it might not do so when that proxy is optimised for. (The examples above of politicians, companies, and the robot arm failing to grasp a ball are illustrations of this.) We’ll look at a more specific example of how problems with proxies could lead to an existential catastrophe here.

      For more on the specific difficulty of controlling the objectives given to deep neural networks trained using self-supervised learning and reinforcement learning, we recommend OpenAI governance researcher Richard Ngo’s discussion of how realistic training processes lead to the development of misaligned goals.

    • Control the inputs into the AI system. AI systems will only develop plans to seek power if they have enough information about the world to realise that seeking power is indeed a way to achieve its goals.

    • Control the capabilities of the AI system. AI systems will likely only be able to carry out plans to seek power if they have sufficiently advanced capabilities in skills that grant people significant power in today’s world.

    But to make any strategy work, it will need to both:

    • Retain the usefulness of the AI systems — and so remain economically competitive with less safe systems. Controlling the inputs and capabilities of AI systems will clearly have costs, so it seems hard to ensure that these controls, even if they’re developed, are actually used. But this is also a problem for controlling a system’s objectives. For example, we may be able to prevent power-seeking behaviour by ensuring that AI systems stop to check in with humans about any decisions they make. But these systems might be significantly slower and less immediately useful to people than systems that don’t stop to carry out these checks. As a result, there might still be incentives to use a faster, more initially effective misaligned system (we’ll look at incentives more in the next section).

    • Continue to work as the planning ability and strategic awareness of systems improve over time. Some seemingly simple solutions (for example, trying to give a system a long list of things it isn’t allowed to do, like stealing money or physically harming humans) break down as the planning abilities of the systems increase. This is because, the more capable a system is at developing plans, the more likely it is to identify loopholes or failures in the safety strategy — and as a result, the more likely the system is to develop a plan that involves power-seeking.

    Ultimately, by looking at the state of the research on this topic, and speaking to experts in the field, we think that there are currently no known ways of building aligned AI systems that seem likely to fulfil both these criteria.

    So: that’s the core argument. There are many variants of this argument. Some have argued that AI systems might gradually shape our future via subtler forms of influence that nonetheless could amount to an existential catastrophe; others argue that the most likely form of disempowerment is in fact just killing everyone. We’re not sure how a catastrophe would be most likely to play out, but have tried to articulate the heart of the argument, as we see it: that AI presents an existential risk.

    There are definitely reasons this argument might not be right! We go through some of the reasons that seem strongest to us below. But overall it seems possible that, for at least some kinds of advanced planning AI systems, it will be harder to build systems that don’t seek power in this dangerous way than to build systems that do.

    At this point, you may have questions like:

    We think there are good responses to all these questions, so we’ve added a long list of arguments against working on AI risk — and our responses — for these (and other) questions below.

    Disempowerment by AI systems would be an existential catastrophe

    When we say we’re concerned about existential catastrophes, we’re not just concerned about risks of extinction. This is because the source of our concern is rooted in longtermism: the idea that the lives of all future generations matter, and so it’s extremely important to protect their interests.

    This means that any event that could prevent all future generations from living lives full of whatever you think makes life valuable (whether that’s happiness, justice, beauty, or general flourishing) counts as an existential catastrophe.

    It seems extremely unlikely that we’d be able to regain power over a system that successfully disempowers humanity. And as a result, the entirety of the future — everything that happens for Earth-originating life, for the rest of time — would be determined by the goals of systems that, although built by us, are not aligned with us. Perhaps those goals will create a long and flourishing future, but we see little reason for confidence.33

    This isn’t to say that we don’t think AI also poses a risk of human extinction. Indeed, we think making humans extinct is one highly plausible way in which an AI system could completely and permanently ensure that we are never able to regain power.

    People might deploy misaligned AI systems despite the risk

    Surely no one would actually build or use a misaligned AI if they knew it could have such terrible consequences, right?

    Unfortunately, there are at least two reasons people might create and then deploy misaligned AI — which we’ll go through one at a time:34

    1. People might think it’s aligned when it’s not

    Imagine there’s a group of researchers trying to tell, in a test environment, whether a system they’ve built is aligned. We’ve argued that an intelligent planning AI will want to improve its abilities to effect changes in pursuit of its objective, and it’s almost always easier to do that if it’s deployed in the real world, where a much wider range of actions are available. As a result, any misaligned AI that’s sophisticated enough will try to understand what the researchers want it to do and at least pretend to be doing that, deceiving the researchers into thinking it’s aligned. (For example, a reinforcement learning system might be rewarded for certain apparent behaviour during training, regardless of what it’s actually doing.)

    Hopefully, we’ll be aware of this sort of behaviour and be able to detect it. But catching a sufficiently advanced AI in deception seems potentially harder than catching a human in a lie, which isn’t always easy. For example, a sufficiently intelligent deceptive AI system may be able to deceive us into thinking we’ve solved the problem of AI deception, even if we haven’t.

    If AI systems are good at deception, and have sufficiently advanced capabilities, a reasonable strategy for such a system could be to deceive humans completely until the system has a way to guarantee it can overcome any resistance to its goals.

    2. There are incentives to deploy systems sooner rather than later

    We might also expect some people with the ability to deploy a misaligned AI to charge ahead despite any warning signs of misalignment that do come up, because of race dynamics — where people developing AI want to do so before anyone else.

    For example, if you’re developing an AI to improve military or political strategy, it’s much more useful if none of your rivals have a similarly powerful AI.

    These incentives apply even to people attempting to build an AI in the hopes of using it to make the world a better place.

    For example, say you’ve spent years and years researching and developing a powerful AI system, and all you want is to use it to make the world a better place. Simplifying things a lot, say there are two possibilities:

    1. This powerful AI will be aligned with your beneficent aims, and you’ll transform society in a potentially radically positive way.
    2. The AI will be sufficiently misaligned that it’ll take power and permanently end humanity’s control over the future.

    Let’s say you think there’s a 90% chance that you’ve succeeded in building an aligned AI. But technology often develops at similar speeds across society, so there’s a good chance that someone else will soon also develop a powerful AI. And you think they’re less cautious, or less altruistic, so you think their AI will only have an 80% chance of being aligned with good goals, and pose a 20% chance of existential catastrophe. And only if you get there first can your more beneficial AI be dominant. As a result, you might decide to go ahead with deploying your AI, accepting the 10% risk.

    This all sounds very abstract. What could an existential catastrophe caused by AI actually look like?

    The argument we’ve given so far is very general, and doesn’t really look at the specifics of how an AI that is attempting to seek power might actually do so.

    If you’d like to get a better understanding of what an existential catastrophe caused by AI might actually look like, we’ve written a short separate article on that topic. If you’re happy with the high-level abstract arguments so far, feel free to skip to the next section!

    What could an existential AI catastrophe actually look like?

    4. Even if we find a way to avoid power-seeking, there are still risks

    So far we’ve described what a large proportion of researchers in the field2 think is the major existential risk from potential advances in AI, which depends crucially on an AI seeking power to achieve its goals.

    If we can prevent power-seeking behaviour, we will have reduced existential risk substantially.

    But even if we succeed, there are still existential risks that AI could pose.

    AI could worsen war

    We’re concerned that great power conflict could also pose a substantial threat to our world, and advances in AI seem likely to change the nature of war — through lethal autonomous weapons35 or through automated decision making.36

    In some cases, great power war could pose an existential threat — for example, if the conflict is nuclear. It’s possible that AI could exacerbate risks of nuclear escalation, although there are also reasons to think AI could decrease this risk.37

    Finally, if a single actor produces particularly powerful AI systems, this could be seen as giving them a decisive strategic advantage. For example, the US may produce a planning AI that’s intelligent enough to ensure that Russia or China could never successfully launch another nuclear weapon. This could incentivise a first strike from the actor’s rivals before these AI-developed plans can ever be put into action.

    AI could be used to develop dangerous new technology

    We expect that AI systems will help increase the rate of scientific progress.38

    While there would be clear benefits to this automation — the rapid development of new medicine, for example — some forms of technological development can pose threats, including existential threats, to humanity. This could be through biotechnology39 (see our article on preventing catastrophic pandemics for more) or through some other form of currently unknown but dangerous technology.40

    AI could empower totalitarian governments

    An AI-enabled authoritarian government could completely automate the monitoring and repression of its citizens, as well as significantly influence the information people see, perhaps making it impossible to coordinate action against such a regime.41

    If this became a form of truly stable totalitarianism, this could make people’s lives far worse for extremely long periods of time, making it a particularly scary possible scenario resulting from AI.

    Other risks from AI

    We’re also concerned about the following issues, though we know less about them:

    • Existential threats that result not from the power-seeking behaviour of AI systems, but as a result of the interaction between AI systems. (In order to pose a risk, these systems would still need to be, to some extent, misaligned.)
    • Other ways we haven’t thought of in which AI systems could be misused — especially ones that might significantly affect future generations.
    • Other moral mistakes made in the design and use of AI systems, particularly if future AI systems are themselves deserving of moral consideration. For example, perhaps we will (inadvertently) create conscious AI systems, which could then suffer in huge numbers. We think this could be extremely important, so we’ve written about it in a separate problem profile.

    This is a really difficult question to answer.

    There are no past examples we can use to determine the frequency of AI-related catastrophes.

    All we have to go off are arguments (like the ones we’ve given above), and less relevant data like the history of technological advances. And we’re definitely not certain that the arguments we’ve presented are completely correct.

    Consider the argument we gave earlier about the dangers of power-seeking AI in particular, based off Carlsmith’s report. At the end of his report, Carlsmith gives some rough guesses of the chances that each stage of his argument is correct (conditional on the previous stage being correct):

    1. By 2070 it will be possible and financially feasible to build strategically aware systems that can outperform humans on many power-granting tasks, and that can successfully make and carry out plans: Carlsmith guesses there’s a 65% chance of this being true.
    2. Given this feasibility, there will be strong incentives to build such systems: 80%.
    3. Given both the feasibility and incentives to build such systems, it will be much harder to develop aligned systems that don’t seek power than to develop misaligned systems that do, but which are at least superficially attractive to deploy: 40%.
    4. Given all of this, some deployed systems will seek power in a misaligned way that causes over $1 trillion (in 2021 dollars) of damage: 65%.
    5. Given all the previous premises, misaligned power-seeking AI systems will end up disempowering basically all of humanity: 40%.
    6. Given all the previous premises, this disempowerment will constitute an existential catastrophe: 95%.

    Multiplying these numbers together, Carlsmith estimated that there’s a 5% chance that his argument is right and there will be an existential catastrophe from misaligned power-seeking AI by 2070. When we spoke to Carlsmith, he noted that in the year between the writing of his report and the publication of this article, his overall guess at the chance of an existential catastrophe from power-seeking AI by 2070 had increased to >10%.42

    The overall probability of existential catastrophe from AI would, in Carlsmith’s view, be higher than this, because there are other routes to possible catastrophe — like those discussed in the previous section — although our guess is that these other routes are probably a lot less likely to lead to existential catastrophe.

    For another estimate, in The Precipice, philosopher and advisor to 80,000 Hours Toby Ord estimated a 1-in-6 risk of existential catastrophe by 2120 (from any cause), and that 60% of this risk comes from misaligned AI — giving a total of a 10% risk of existential catastrophe from misaligned AI by 2120.

    A 2021 survey of 44 researchers working on reducing existential risks from AI found the median risk estimate was 32.5% — the highest answer given was 98%, and the lowest was 2%.43 There’s obviously a lot of selection bias here: people choose to work on reducing risks from AI because they think this is unusually important, so we should expect estimates from this survey to be substantially higher than estimates from other sources. But there’s clearly significant uncertainty about how big this risk is, and huge variation in answers.

    All these numbers are shockingly, disturbingly high. We’re far from certain that all the arguments are correct. But these are generally the highest guesses for the level of existential risk of any of the issues we’ve examined (like engineered pandemics, great power conflict, climate change, or nuclear war).

    That said, I think there are reasons why it’s harder to make guesses about the risks from AI than other risks – and possibly reasons to think that the estimates we’ve quoted above are systematically too high.

    If I was forced to put a number on it, I’d say something like 1%. This number includes considerations both in favour and against the argument. I’m less worried than other 80,000 Hours staff — our position as an organisation is that the risk is between 3% and 50%.

    All this said, the arguments for such high estimates of the existential risk posed by AI are persuasive — making risks from AI a top contender for the most pressing problem facing humanity.

    5. We can tackle these risks

    We think one of the most important things you can do would be to help reduce the gravest risks that AI poses.

    This isn’t just because we think these risks are high — it’s also because we think there are real things we can do to reduce these risks.

    We know of two broad approaches:

    1. Technical AI safety research
    2. AI governance research and implementation

    For both of these, there are lots of ways to contribute. We’ll go through them in more detail below, but in this section we want to illustrate the point that there are things we can do to address these risks.

    Technical AI safety research

    The benefits of transformative AI could be huge, and there are many different actors involved (operating in different countries), which means it will likely be really hard to prevent its development altogether.

    (It’s also possible that it wouldn’t even be a good idea if we could — after all, that would mean forgoing the benefits as well as preventing the risks.)

    As a result, we think it makes more sense to focus on making sure that this development is safe — meaning that it has a high probability of avoiding all the catastrophic failures listed above.

    One way to do this is to try to develop technical solutions to prevent the kind of power-seeking behaviour we discussed earlier — this is generally known as working on technical AI safety, sometimes called just “AI safety” for short.

    Read more about technical AI safety research below.

    AI governance research and implementation

    A second strategy for reducing risks from AI is to shape its development through policy, norms-building, and other governance mechanisms.

    Good AI governance can help technical safety work, for example by producing safety agreements between corporations, or helping talented safety researchers from around the world move to where they can be most effective. AI governance could also help with other problems that lead to risks, like race dynamics.

    But also, as we’ve discussed, even if we successfully manage to make AI do what we want (i.e. we ‘align’ it), we might still end up choosing something bad for it to do! So we need to worry about the incentives not just of the AI systems, but of the human actors using them.

    Read more about AI governance research and implementation below.

    6. This work is neglected

    We estimate there are around 400 people around the world working directly on reducing the chances of an AI-related existential catastrophe (with a 90% confidence interval ranging between 200 and 1,000). Of these, about three quarters are working on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.44 We think there are around 800 people working in complementary roles, but we’re highly uncertain about this estimate.45

    In The Precipice, Ord estimated that there was between $10 million and $50 million spent on reducing AI risk in 2020.

    That might sound like a lot of money, but we’re spending something like 1,000 times that amount11 on speeding up the development of transformative AI via commercial capabilities research and engineering at large AI labs.

    To compare the $50 million spent on AI safety in 2020 to other well-known risks, we’re currently spending several hundreds of billions per year on tackling climate change.

    Because this field is so neglected and has such high stakes, we think your impact working on risks from AI could be much higher than working on many other areas — which is why our top two recommended career paths for making a big positive difference in the world are technical AI safety and AI policy research and implementation.

    What do we think are the best arguments against this problem being pressing?

    As we said above, we’re not totally sure the arguments we’ve presented for AI representing an existential threat are right. Though we do still think that the chance of catastrophe from AI is high enough to warrant many more people pursuing careers to try to prevent such an outcome, we also want to be honest about the arguments against doing so, so you can more easily make your own call on the question.

    Here we’ll cover the strongest reasons (in our opinion) to think this problem isn’t particularly pressing. In the next section we’ll cover some common objections that (in our opinion) hold up less well, and explain why.

    The longer we have before transformative AI is developed, the less pressing it is to work now on ways to ensure that it goes well. This is because the work of others in the future could be much better or more relevant than the work we are able to do now.

    Also, if it takes us a long time to create transformative AI, we have more time to figure out how to make it safe. The risk seems much higher if AI developers will create transformative AI in the next few decades.

    It seems plausible that the first transformative AI won’t be based on current deep learning methods. (AI Impacts have documented arguments that current methods won’t be able to produce AI that has human-level intelligence.) This could mean that some of our current research might not end up being useful (and also — depending on what method ends up being used — could make the arguments for risk less worrying).

    Relatedly, we might expect that progress in the development of AI will occur in bursts. Previously, the field has seen AI winters, periods of time with significantly reduced investment, interest and research in AI. It’s unclear how likely it is that we’ll see another AI winter — but this possibility should lengthen our guesses about how long it’ll be before we’ve developed transformative AI. Cotra writes about the possibility of an AI winter in part four of her report forecasting transformative AI. New constraints on the rate of growth of AI capabilities, like the availability of training data, could also mean that there’s more time to work on this (Cotra discusses this here.)

    Thirdly, the estimates about when we’ll get transformative AI from Cotra, Kanfosky and Davidson that we looked at earlier were produced by people who already expected that working on preventing an AI-related catastrophe might be one of the world’s most pressing problems. As a result, there’s selection bias here: people who think transformative AI is coming relatively soon are also the people incentivised to carry out detailed investigations. (That said, if the investigations themselves seem strong, this effect could be pretty small.)

    Finally, none of the estimates we discussed earlier were trying to predict when an existential catastrophe might occur. Instead, they were looking at when AI systems might be able to automate all tasks humans can do, or when AI systems might significantly transform the economy. It’s by no means certain that the kinds of AI systems that could transform the economy would be the same advanced planning systems that are core to the argument that AI systems might seek power. Advanced planning systems do seem to be particularly useful, so there is at least some reason to think these might be the sorts of systems that end up being built. But even if the forecasted transformative AI systems are advanced planning systems, it’s unclear how capable such systems would need to be to pose a threat — it’s more than plausible that systems would need to be far more capable to pose a substantial existential threat than they would need to be to transform the economy. This would mean that all the estimates we considered above would be underestimates of how long we have to work on this problem.

    All that said, it might be extremely difficult to find technical solutions to prevent power-seeking behaviour — and if that’s the case, focusing on finding those solutions now does seem extremely valuable.

    Overall, we think that transformative AI is sufficiently likely in the next 10–80 years that it is well worth it (in expected value terms) to work on this issue now. Perhaps future generations will take care of it, and all the work we’d do now will be in vain — we hope so! But it might not be prudent to take that risk.

    If the best AI we have improves gradually over time (rather than AI capabilities remaining fairly low for a while and then suddenly increasing), we’re likely to end up with ‘warning shots’: we’ll notice forms of misaligned behaviour in fairly weak systems, and be able to correct for it before it’s too late.

    In such a gradual scenario, we’ll have a better idea about what form powerful AI might take (e.g. whether it will be built using current deep learning techniques, or something else entirely), which could significantly help with safety research. There will also be more focus on this issue by society as a whole, as the risks of AI become clearer.

    So if gradual development of AI seems more likely, the risk seems lower.

    But it’s very much not certain that AI development will be gradual, or if it is, gradual enough for the risk to be noticeably lower. And even if AI development is gradual, there could still be significant benefits to having plans and technical solutions in place well in advance. So overall we still think it’s extremely valuable to attempt to reduce the risk now.

    If you want to learn more, you can read AI Impacts’ work on arguments for and against discontinuous (i.e. non-gradual) progress in AI development, and Toby Ord and Owen Cotton-Barratt on strategic implications of slower AI development.

    Making something have goals aligned with human designers’ ultimate objectives and making something useful seem like very related problems. If so, perhaps the need to make AI useful will drive us to produce only aligned AI — in which case the alignment problem is likely to be solved by default.

    Ben Garfinkel gave a few examples of this on our podcast:

    • You can think of a thermostat as a very simple AI that attempts to keep a room at a certain temperature. The thermostat has a metal strip in it that expands as the room heats, and cuts off the current once a certain temperature has been reached. This piece of metal makes the thermostat act like it has a goal of keeping the room at a certain temperature, but also makes it capable of achieving this goal (and therefore of being actually useful).
    • Imagine you’re building a cleaning robot with reinforcement learning techniques — that is, you provide some specific condition under which you give the robot positive feedback. You might say something like, “The less dust in the house, the more positive the feedback.” But if you do this, the robot will end up doing things you don’t want — like ripping apart a cushion to find dust on the inside. Probably instead you need to use techniques like those being developed by people working on AI safety (things like watching a human clean a house and letting the AI figure things out from there). So people building AIs will be naturally incentivised to also try to make them aligned (and so in some sense safe), so they can do their jobs.

    If we need to solve the problem of alignment anyway to make useful AI systems, this significantly reduces the chances we will have misaligned but still superficially useful AI systems. So the incentive to deploy a misaligned AI would be a lot lower, reducing the risk to society.

    That said, there are still reasons to be concerned. For example, it seems like we could still be susceptible to problems of AI deception.

    And, as we’ve argued, AI alignment is only part of the overall issue. Solving the alignment problem isn’t the same thing as completely eliminating existential risk from AI, since aligned AI could also be used to bad ends — such as by authoritarian governments.

    As with many research projects in their early stages, we don’t know how hard the alignment problem — or other AI problems that pose risks — are to solve. Someone could believe there are major risks from machine intelligence, but be pessimistic about what additional research or policy work will accomplish, and so decide not to focus on it.

    This is definitely a reason to potentially work on another issue — the solvability of an issue is a key part of how we try to compare global problems. For example, we’re also very concerned about risks from pandemics, and it may be much easier to solve that issue.

    That said, we think that given the stakes, it could make sense for many people to work on reducing AI risk, even if you think the chance of success is low. You’d have to think that it was extremely difficult to reduce risks from AI in order to conclude that it’s better just to let the risks materialise and the chance of catastrophe play out.

    At least in our own case at 80,000 Hours, we want to keep trying to help with AI safety — for example, by writing profiles like this one — even if the chance of success seems low (though in fact we’re overall pretty optimistic).

    There are some reasons to think that the core argument that any advanced, strategically aware planning system will by default seek power (which we gave here) isn’t totally right.46

    1. For a start, the argument that advanced AI systems will seek power relies on the idea that systems will produce plans to achieve goals. We’re not quite sure what this means — and as a result, we’re not sure what properties are really required for power-seeking behaviour to occur, and unsure whether the things we’ll build will have those properties.

      We’d love to see a more in-depth analysis of what aspects of planning are economically incentivised, and whether those aspects seem like they’ll be enough for the argument for power-seeking behaviour to work.

      Grace has written more about the ambiguity around “how much goal-directedness is needed to bring about disaster”

    2. It’s possible that only a few goals that AI systems could have would lead to misaligned power-seeking.

      Richard Ngo, in his analysis of what people mean by “goals”, points out that you’ll only get power-seeking behaviour if you have goals that mean the system can actually benefit from seeking power. Ngo suggests that these goals need to be “large-scale.” (Some have argued that, by default, we should expect AI systems to have “short-term” goals that won’t lead to power-seeking behaviour.)

      But whether an AI system would plan to take power depends on how easy it would be for the system to take power, because the easier it is for a system to take power, the more likely power-seeking plans are to be successful — so a good planning system would be more likely to choose them. This suggests it will be easier to accidentally create a power-seeking AI system as systems’ capabilities increase.

      So there still seems to be cause for increased concern, because the capabilities of AI systems do seem to be increasing fast. There are two considerations here: if few goals really lead to power-seeking, even for quite capable AI systems, that significantly reduces the risk and thus the importance of the problem. But it might also increase the solvability of the problem by demonstrating that solutions could be easy to find (e.g. the solution of never giving systems “large-scale” goals) — making this issue more valuable for people to work on.

    3. Earlier we argued that we can expect AI systems to do things that seem generally instrumentally useful to their overall goal, and that as a result it could be hard to prevent AI systems from doing these instrumentally useful things.

      But we can find examples where how generally instrumentally useful things would be doesn’t seem to affect how hard it is to prevent these things. Consider an autonomous car that can move around only if its engine is on. For many possible goals (other than, say, turning the car radio on), it seems like it would be useful for the car to be able to move around, so we should expect the car to turn its engine on. But despite that, we might still be able to train the car to keep its engine off: for example, we can give it some negative feedback whenever it turns the engine on, even if we also had given the car some other goals. Now imagine we improve the car so that its top speed is higher — this massively increases the number of possible action sequences that involve, as a first step, turning its engine on. In some sense, this seems to increase the instrumental usefulness of turning the engine on — there are more possible actions the car can take, once its engine is on, because the range of possible speeds it can travel at is higher. (It’s not clear if this sense of “instrumental usefulness” is the same as the one in the argument for the risk, although it does seem somewhat related.) But it doesn’t seem like this increase in the instrumental usefulness of turning on the engine makes it much harder to stop the car turning it on. Simple examples like this cast some doubt on the idea that, just because a particular action is instrumentally useful, we won’t be able to find ways to prevent it. (For more on this example, see page 25 of Garfinkel’s review of Carlsmith’s report.)

    4. Humans are clearly highly intelligent, but it’s unclear they are perfect goal-optimisers. For example, humans often face some kind of existential angst over what their true goals are. And even if we accept humans as an example of a strategically aware agent capable of planning, humans certainly aren’t always power-seeking. We obviously care about having basics like food and shelter, and many people go to great lengths for more money, status, education, or even formal power. But some humans choose not to pursue these goals, and pursuing them doesn’t seem to correlate with intelligence.

      However, this doesn’t mean that the argument that there will be an incentive to seek power is wrong. Most people do face and act on incentives to gain forms of influence via wealth, status, promotions, and so on. And we can explain the observation that humans don’t usually seek huge amounts of power by observing that we aren’t usually in circumstances that make the effort worth it.

      For example, most people don’t try to start billion-dollar companies — you probably won’t succeed, and it’ll cost you a lot of time and effort.

      But you’d still walk across the street to pick up a billion-dollar cheque.

    The absence of extreme power-seeking in many humans, along with uncertainties in what it really means to plan to achieve goals, does suggest that the argument we gave that advanced AI systems will seek power above might not be completely correct. And they also suggest that, if there really is a problem to solve here, in principle, alignment research into preventing power-seeking in AIs could succeed.

    This is good news! But for the moment — short of hoping we’re wrong about the existence of the problem — we don’t actually know how to prevent this power-seeking behaviour.

    Arguments against working on AI risk to which we think there are strong responses

    We’ve just discussed the major objections to working on AI risk that we think are most persuasive. In this section, we’ll look at objections that we think are less persuasive, and give some reasons why.

    People have been saying since the 1950s that artificial intelligence smarter than humans is just around the corner.

    But it hasn’t happened yet.

    One reason for this could be that it’ll never happen. Some have argued that producing artificial general intelligence is fundamentally impossible. Others think it’s possible, but unlikely to actually happen, especially not with current deep learning methods.

    Overall, we think the existence of human intelligence shows it’s possible in principle to create artificial intelligence. And the speed of current advances isn’t something we think would have been predicted by those who thought that we’ll never develop powerful, general AI.

    But most importantly, the idea that you need fully general intelligent AI systems for there to be a substantial existential risk is a common misconception.

    The argument we gave earlier relied on AI systems being as good or better than humans in a subset of areas: planning, strategic awareness, and areas related to seeking and keeping power. So as long as you think all these things are possible, the risk remains.

    And even if no single AI has all of these properties, there are still ways in which we might end up with systems of ‘narrow’ AI systems that, together, can disempower humanity. For example, we might have a planning AI that develops plans for a company, a separate AI system that measures things about the company, another AI system that attempts to evaluate plans from the first AI by predicting how much profit each will make, and further AI systems that carry out those plans (for example, by automating the building and operation of factories). Considered together, this system as a whole has the capability to form and carry out plans to achieve some goal, and potentially also has advanced capabilities in areas that help it seek power.

    It does seem like it will be easier to prevent these ‘narrow’ AI systems from seeking power. This could happen if the skills the AIs have, even when combined, don’t add up to being able to plan to achieve goals, or if the narrowness reduces the risk of systems developing power-seeking plans (e.g. if you build systems that can only produce very short-term plans). It also seems like it gives another point of weakness for humans to intervene if necessary: the coordination of the different systems.

    Nevertheless, the risk remains, even from systems of many interacting AIs.

    It might just be really, really hard.

    Stopping people and computers from running software is already incredibly difficult.

    Think about how hard it would be to shut down Google’s web services. Google’s data centres have millions of servers over 34 different locations, many of which are running the same sets of code. And these data centres are absolutely crucial to Google’s bottom line, so even if Google could decide to shut down their entire business, they probably wouldn’t.

    Or think about how hard it is to get rid of computer viruses that autonomously spread between computers across the world.

    Ultimately, we think any dangerous power-seeking AI system will be looking for ways to not be turned off, which makes it more likely we’ll be in one of these situations, rather than in a case where we can just unplug a single machine.

    That said, we absolutely should try to shape the future of AI such that we can ‘unplug’ powerful AI systems.

    There may be ways we can develop systems that let us turn them off. But for the moment, we’re not sure how to do that.

    Ensuring that we can turn off potentially dangerous AI systems could be a safety measure developed by technical AI safety research, or it could be the result of careful AI governance, such as planning coordinated efforts to stop autonomous software once it’s running.

    We could (and should!) definitely try.

    If we could successfully ‘sandbox’ an advanced AI — that is, contain it to a training environment with no access to the real world until we were very confident it wouldn’t do harm — that would help our efforts to mitigate AI risks tremendously.

    But there are a few things that might make this difficult.

    For a start, we might only need one failure — like one person to remove the sandbox, or one security vulnerability in the sandbox we hadn’t noticed — for the AI system to begin affecting the real world.

    Moreover, this solution doesn’t scale with the capabilities of the AI system. This is because:

    • More capable systems are more likely to be able to find vulnerabilities or other ways of leaving the sandbox (e.g. threatening or coercing humans).
    • Systems that are good at planning might attempt to deceive us into deploying them.

    So the more dangerous the AI system, the less likely sandboxing is to be possible. That’s the opposite of what we’d want from a good solution to the risk.

    For some definitions of “truly intelligent” — for example, if true intelligence includes a deep understanding of morality and a desire to be moral — this would probably be the case.

    But if that’s your definition of truly intelligent, then it’s not truly intelligent systems that pose a risk. As we argued earlier, it’s advanced systems that can plan and have strategic awareness that pose risks to humanity.

    With sufficiently advanced strategic awareness, an AI system’s excellent understanding of the world may well encompass an excellent understanding of people’s moral beliefs. But that’s not a strong reason to think that such a system would act morally.

    For example, when we learn about other cultures or moral systems, that doesn’t necessarily create a desire to follow their morality. A scholar of the Antebellum South might have a very good understanding of how 19th century slave owners justified themselves as moral, but would be very unlikely to defend slavery.

    AI systems with excellent understandings of human morality could be even more dangerous than AIs without such understanding: the AI system could act morally at first as a way to deceive us into thinking that it is safe.

    There are definitely dangers from current artificial intelligence.

    For example, data used to train neural networks often contains hidden biases. This means that AI systems can learn these biases — and this can lead to racist and sexist behaviour.

    There are other dangers too. Our earlier discussion on nuclear war explains a threat which doesn’t require AI systems to have particularly advanced capabilities.

    But we don’t think the fact that there are also risks from current systems is a reason not to prioritise reducing existential threats from AI, if they are sufficiently severe.

    As we’ve discussed, future systems — not necessarily superintelligence or totally general intelligence, but systems advanced in their planning and power-seeking capabilities — seem like they could pose threats to the existence of the entirety of humanity. And it also seems somewhat likely that we’ll produce such systems this century.

    What’s more, lots of technical AI safety research is also relevant to solving problems with existing AI systems. For example, some research focuses on ensuring that ML models do what we want them to, and will still do this as their size and capabilities increase; other research tries to work out how and why existing models are making the decisions and taking the actions that they do.

    As a result, at least in the case of technical research, the choice between working on current threats and future risks may look more like a choice between only ensuring that current models are safe, or instead finding ways to ensure that current models are safe that will also continue to work as AI systems become more complex and more intelligent.

    Ultimately, we have limited time in our careers, so choosing which problem to work on could be a huge way of increasing your impact. When there are such substantial threats, it seems reasonable for many people to focus on addressing these worst-case possibilities.

    Yes, it can.

    AI systems are already improving healthcare, putting driverless cars on the roads, and automating household chores.

    And if we’re able to automate advancements in science and technology, we could see truly incredible economic and scientific progress. AI could likely help solve many of the world’s most pressing problems.

    But, just because something can do a lot of good, that doesn’t mean it can’t also do a lot of harm. AI is an example of a dual-use technology — a technology that can be used for both dangerous and beneficial purposes. For example, researchers were able to get an AI model that was trained to develop medical drugs to instead generate designs for bioweapons.

    We are excited and hopeful about seeing large benefits from AI. But we also want to work hard to minimise the enormous risks advanced AI systems pose.

    It’s undoubtedly true that some people are drawn to thinking about AI safety because they like computers and science fiction — as with any other issue, there are people working on it not because they think it’s important, but because they think it’s cool.

    But, for many people, working on AI safety comes with huge reluctance.

    For me, and many of us at 80,000 Hours, spending our limited time and resources working on any cause that affects the long-run future — and therefore not spending that time on the terrible problems in the world today — is an incredibly emotionally difficult thing to do.

    But we’ve gradually investigated these arguments (in the course of trying to figure out how we can do the most good), and over time both gained more expertise about AI and became more concerned about the risk.

    We think scepticism is healthy, and are far from certain that these arguments completely work. So while this suspicion is definitely a reason to dig a little deeper, we hope that, ultimately, this worry won’t be treated as a reason to deprioritise what may well be the most important problem of our time.

    That something sounds like science fiction isn’t a reason in itself to dismiss it outright. There are loads of examples of things first mentioned in sci-fi that then went on to actually happen (this list of inventions in science fiction contains plenty of examples).

    There are even a few such cases involving technology that are real existential threats today:

    • In his 1914 novel The World Set Free, H. G. Wells predicted atomic energy fueling powerful explosives — 20 years before we realised there could in theory be nuclear fission chain reactions, and 30 years before nuclear weapons were actually produced. In the 1920s and 1930s, Nobel Prize–winning physicists Millikan, Rutherford, and Einstein all predicted that we would never be able to use nuclear power. Nuclear weapons were literal science fiction before they were reality.
    • In the 1964 film Dr. Strangelove, the USSR builds a doomsday machine that would automatically trigger an extinction-level nuclear event in response to a nuclear strike, but keeps it secret. Dr Strangelove points out that keeping it secret rather reduces its deterrence effect. But we now know that in the 1980s the USSR built an extremely similar system… and kept it secret.

    Moreover, there are top academics and researchers working on preventing these risks from AI — at MIT, Cambridge, Oxford, UC Berkeley, and elsewhere. Two of the world’s top AI labs (DeepMind and OpenAI) have teams explicitly dedicated to working on technical AI safety. Researchers from these places helped us with this article.

    It’s totally possible all these people are wrong to be worried, but the fact that so many people take this threat seriously undermines the idea that this is merely science fiction.

    It’s reasonable when you hear something that sounds like science fiction to want to investigate it thoroughly before acting on it. But having investigated it, if the arguments seem solid, then simply sounding like science fiction is not a reason to dismiss them.

    We never know for sure what’s going to happen in the future. So, unfortunately for us, if we’re trying to have a positive impact on the world, that means we’re always having to deal with at least some degree of uncertainty.

    We also think there’s an important distinction between guaranteeing that you’ve achieved some amount of good and doing the very best you can. To achieve the former, you can’t take any risks at all — and that could mean missing out on the best opportunities to do good.

    When you’re dealing with uncertainty, it makes sense to roughly think about the expected value of your actions: the sum of all the good and bad potential consequences of your actions, weighted by their probability.

    Given the stakes are so high, and the risks from AI aren’t that low, this makes the expected value of helping with this problem high.

    We’re sympathetic to the concern that if you work on AI safety, you might end up doing not much at all when you might have done a tremendous amount of good working on something else — simply because the problem and our current ideas about what to do about it are so uncertain.

    But we think the world will be better off if we decide that some of us should work on solving this problem, so that together we have the best chance of successfully navigating the transition to a world with advanced AI rather than risking an existential crisis.

    And it seems like an immensely valuable thing to try.

    Pascal’s mugging is a thought experiment — a riff on the famous Pascal’s wager — where someone making decisions using expected value calculations can be exploited by claims that they can get something extraordinarily good (or avoid something extraordinarily bad), with an extremely low probability of succeeding.

    The story goes like this: a random mugger stops you on the street and says, “Give me your wallet or I’ll cast a spell of torture on you and everyone who has ever lived.” You can’t rule out with 100% probability that he won’t — after all, nothing’s 100% for sure. And torturing everyone who’s ever lived is so bad that surely even avoiding a tiny, tiny probability of that is worth the $40 in your wallet? But intuitively, it seems like you shouldn’t give your wallet to someone just because they threaten you with something completely implausible.

    Analogously, you could worry that working on AI safety means giving your valuable time to avoid a tiny, tiny chance of catastrophe. Working on reducing risks from AI isn’t free — the opportunity cost is quite substantial, as it means you forgo working on other extremely important things, like reducing risks from pandemics or ending factory farming.

    Here’s the thing though: while there’s lots of value at stake — perhaps the lives of everybody alive today, and the entirety of the future of humanity — it’s not the case that the probability that you can make a difference by working on reducing risks from AI is small enough for this argument to apply.

    We wish the chance of an AI catastrophe was that vanishingly small.

    Instead, we think the probability of such a catastrophe (I think, around 1% this century) is much, much larger than things that people try to prevent all the time — such as fatal plane crashes, which happen in 0.00002% of flights.

    What really matters, though, is the extent to which your work can reduce the chance of a catastrophe.

    Let’s look at working on reducing risks from AI. For example, if:

    1. There’s a 1% chance of an AI-related existential catastrophe by 2100
    2. There’s a 30% chance that we can find a way to prevent this by technical research
    3. Five people working on technical AI safety raises the chances of solving the problem by 1% of that 30% (so 0.3 percentage points)

    Then each person involved has a 0.00006 percentage point share in preventing this catastrophe.

    Other ways of acting altruistically involve similarly sized probabilities.

    The chances of a volunteer campaigner swinging a US presidential election is somewhere between 0.001% and 0.00001%. But you can still justify working on a campaign because of the large impact you expect you’d have on the world if your preferred candidate won.

    You have even lower chances of wild success from things like trying to reform political institutions, or working on some very fundamental science research to build knowledge that might one day help cure cancer.

    Overall, as a society, we may be able to reduce the chance of an AI-related catastrophe all the way down from 10% (or higher) to close to zero — that’d be clearly worth it for a group of people, so it has to be worth it for the individuals, too.

    We wouldn’t want to just not do fundamental science because each researcher has a low chance of making the next big discovery, or not do any peacekeeping because any one person has a low chance of preventing World War III. As a society, we need some people working on these big issues — and maybe you can be one of them.

    What you can do concretely to help

    As we mentioned above, we know of two main ways to help reduce existential risks from AI:

    1. Technical AI safety research
    2. AI strategy/policy research and implementation

    The biggest way you could help would be to pursue a career in either one of these areas, or in a supporting area.

    The first step is learning a lot more about the technologies, problems, and possible solutions. We’ve collated some lists of our favourite resources here, and our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

    If you decide to pursue a career in this area, we’d generally recommend working at an organisation focused on specifically addressing this problem (though there are other ways to help besides working at existing organisations, as we discuss briefly below).

    Technical AI safety

    Approaches

    There are lots of approaches to technical AI safety, including:

    See Neel Nanda’s overview of the AI alignment landscape for more details.

    Key organisations

    AI labs in industry that have empirical technical safety teams, or are focused entirely on safety:

    • Anthropic is an AI safety company working on building interpretable and safe AI systems. They focus on empirical AI safety research. Anthropic cofounders Daniela and Dario Amodei gave an interview about the lab on the Future of Life Institute podcast. On our podcast, we spoke to Chris Olah, who leads Anthropic’s research into interpretability, and Nova DasSarma, who works on systems infrastructure at Anthropic.
    • ARC Evals works on assessing whether cutting-edge AI systems could pose catastrophic risks to civilization, including early-stage, experimental work to develop techniques, and evaluating systems produced by Anthropic and OpenAI.
    • The Center for AI Safety is a nonprofit that does technical research and promotion of safety in the wider machine learning community.
    • FAR AI is a research nonprofit that incubates and accelerates research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry, including research in adversarial robustness, interpretability and preference learning.
    • Google DeepMind is probably the largest and most well-known research group developing general artificial machine intelligence, and is famous for its work creating AlphaGo, AlphaZero, and AlphaFold. It is not principally focused on safety, but has two teams focused on AI safety, with the Scalable Alignment Team focusing on aligning existing state-of-the-art systems, and the Alignment Team focused on research bets for aligning future systems.
    • OpenAI, founded in 2015, is a lab that is trying to build artificial general intelligence that is safe and benefits all of humanity. OpenAI is well known for its language models like GPT-4. Like DeepMind, it is not principally focused on safety, but has a safety team and a governance team. Jan Leike (head of the alignment team) has some blog posts on how he thinks about AI alignment.
    • Ought is a machine learning lab building Elicit, an AI research assistant. Their aim is to align open-ended reasoning by learning human reasoning steps, and to direct AI progress towards helping with evaluating evidence and arguments.
    • Redwood Research is an AI safety research organisation, whose first big project attempted to make sure language models (like GPT-3) produce output following certain rules with very high probability, in order to address failure modes too rare to show up in standard training.

    Theoretical / conceptual AI safety labs:

    • The Alignment Research Center (ARC) is attempting to produce alignment strategies that could be adopted in industry today while also being able to scale to future systems. They focus on conceptual work, developing strategies that could work for alignment and which may be promising directions for empirical work, rather than doing empirical AI work themselves. Their first project was releasing a report on Eliciting Latent Knowledge, the problem of getting advanced AI systems to honestly tell you what they believe (or ‘believe’) about the world. On our podcast, we interviewed ARC founder Paul Christiano about his research (before he founded ARC).
    • The Center on Long-Term Risk works to address worst-case risks from advanced AI. They focus on conflict between AI systems.
    • The Machine Intelligence Research Institute was one of the first groups to become concerned about the risks from machine intelligence in the early 2000s, and its team has published a number of papers on safety issues and how to resolve them.
    • Some teams in commercial labs also do some more theoretical and conceptual work on alignment, such as Anthropic’s work on conditioning predictive models and the Causal Incentives Working Group at Google DeepMind.

    AI safety in academia (a very non-comprehensive list; while the number of academics explicitly and publicly focused on AI safety is small, it’s possible to do relevant work at a much wider set of places):

    If you’re interested in learning more about technical AI safety as an area — e.g. the different techniques, schools of thought, and threat models — our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

    We discuss this path in more detail here:

    Career review of technical AI safety research

    Alternatively, if you’re looking for something more concrete and step-by-step (with very little in the way of introduction), check out this detailed guide to pursuing a career in AI alignment.

    It’s important to note that you don’t have to be an academic or an expert in AI or AI safety to contribute to AI safety research. For example, software engineers are needed at many places conducting technical safety research, and we also highlight more roles below.

    AI governance and strategy

    Approaches

    Quite apart from the technical problems, we face a host of governance issues, which include:

    • Coordination problems that are increasing the risks from AI (e.g. there could be incentives to use AI for personal gain in ways that can cause harm, or race dynamics that reduce incentives for careful and safe AI development).
    • Risks from accidents or misuse of AI that would be dangerous even if we are able to prevent power-seeking behaviour (as discussed above).
    • A lack of clarity on how and when exactly risks from AI (particularly power-seeking AI) might play out.
    • A lack of clarity on which intermediate goals we could pursue that, if achieved, would reduce existential risk from AI.

    To tackle these, we need a combination of research and policy.48

    We are in the early stages of figuring out the shape of this problem and the most effective ways to tackle it. So it’s crucial that we do more research. This includes forecasting research into what we should expect to happen, and strategy and policy research into the best ways of acting to reduce the risks.

    But also, as AI begins to impact our society more and more, it’ll be crucial that governments and corporations have the best policies in place to shape its development. For example, governments might be able to enforce agreements not to cut corners on safety, further the work of researchers who are less likely to cause harm, or cause the benefits of AI to be distributed more evenly. So there eventually might be a key role to be played in advocacy and lobbying for appropriate AI policy — though we’re not yet at the point of knowing what policies would be useful to implement.

    Key organisations

    AI strategy and policy organisations:

    If you’re interested in learning more about AI governance, our top recommendation is to take a look at the governance curriculum from AGI safety fundamentals.

    We discuss this path in more detail here:

    Career review of AI strategy and policy careers

    Also note: it could be particularly important for people with the right personal fit to work on AI strategy and governance in China.

    Complementary (yet crucial) roles

    Even in a research organisation, around half of the staff will be doing other tasks essential for the organisation to perform at its best and have an impact. Having high-performing people in these roles is crucial.

    We think the importance of these roles is often underrated because the work is less visible. So we’ve written several career reviews on these areas to help more people enter these careers and succeed, including:

    Other ways to help

    AI safety is a big problem and it needs help from people doing a lot of different kinds of work.

    One major way to help is to work in a role that directs funding or people towards AI risk, rather than working on the problem directly. We’ve reviewed a few career paths along these lines, including:

    There are ways all of these could go wrong, so the first step is to become well-informed about the issue.

    There are also other technical roles besides safety research that could help contribute, like:

    • Working in information security to protect AI (or the results of key experiments) from misuse, theft, or tampering.
    • Becoming an expert in AI hardware as a way of steering AI progress in safer directions.

    You can read about all these careers — why we think they’re helpful, how to enter them, and how you can predict whether they’re a good fit for you — on our career reviews page.

    Want one-on-one advice on pursuing this path?

    We think that the risks posed by the development of AI may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.

    We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

    APPLY TO SPEAK WITH OUR TEAM

    Find vacancies on our job board

    Our job board features opportunities in AI technical safety and governance:

      View all opportunities

      Top resources to learn more

      We’ve hit you with a lot of further reading throughout this article — here are a few of our favourites:

      On The 80,000 Hours Podcast, we have a number of in-depth interviews with people actively working to positively shape the development of artificial intelligence:

      If you want to go into much more depth, the AGI safety fundamentals course is a good starting point. There are two tracks to choose from: technical alignment or AI governance. If you have a more technical background, you could try Intro to ML Safety, a course from the Center for AI Safety.

      And finally, here are a few general sources (rather than specific articles) that you might want to explore:

      • The AI Alignment Forum, which is aimed at researchers working in technical AI safety.
      • AI Impacts, a project that aims to improve society’s understanding of the likely impacts of human-level artificial intelligence.
      • The Alignment Newsletter, a weekly publication with recent content relevant to AI alignment with thousands of subscribers.
      • Import AI, a weekly newsletter about artificial intelligence by Jack Clark (cofounder of Anthropic), read by more than 10,000 experts.
      • Jeff Ding’s ChinAI Newsletter, weekly translations of writings from Chinese thinkers on China’s AI landscape.

      Acknowledgements

      Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlmüller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing this article or their extremely thoughtful and helpful comments and conversations. (This isn’t to say that they would all agree with everything we’ve said here — in fact, we’ve had many spirited disagreements in the comments on this article!)

      The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.

      ]]>
      Robert Long on why large language models like GPT (probably) aren’t conscious https://80000hours.org/podcast/episodes/robert-long-artificial-sentience/ Tue, 14 Mar 2023 05:38:05 +0000 https://80000hours.org/?post_type=podcast&p=80986 The post Robert Long on why large language models like GPT (probably) aren’t conscious appeared first on 80,000 Hours.

      ]]>
      The post Robert Long on why large language models like GPT (probably) aren’t conscious appeared first on 80,000 Hours.

      ]]>
      Athena Aktipis on why cancer is actually one of the fundamental phenomena in our universe https://80000hours.org/podcast/episodes/athena-aktipis-cancer-cooperation-apocalypse/ Wed, 25 Jan 2023 23:58:07 +0000 https://80000hours.org/?post_type=podcast&p=80494 The post Athena Aktipis on why cancer is actually one of the fundamental phenomena in our universe appeared first on 80,000 Hours.

      ]]>
      The post Athena Aktipis on why cancer is actually one of the fundamental phenomena in our universe appeared first on 80,000 Hours.

      ]]>