Benjamin Hilton (Author archive) - 80,000 Hours https://80000hours.org/author/benjamin-hilton/ Mon, 27 Nov 2023 16:57:45 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.2 What you should know about our updated career guide https://80000hours.org/2023/09/what-you-should-know-about-our-updated-career-guide/ Tue, 19 Sep 2023 10:14:18 +0000 https://80000hours.org/?p=83759 The post What you should know about our updated career guide appeared first on 80,000 Hours.

]]>
The question this week: what are the biggest changes to our career guide since 2017?

  • Read the new and updated career guide here, by our founder Benjamin Todd and the 80,000 Hours team.

Our 2023 career guide isn’t just a fancy new design — here’s a rundown of how the content has been updated:


1. Career capital: get good at something useful

In our previous career guide, we argued that your primary focus should be on building very broadly applicable skills, credentials, and connections — what we called transferable career capital.

We also highlighted jobs like consulting as a way to get this.

However, since launching the 2017 version of the career guide, we came to think a focus on transferable career capital might lead you to neglect experience that can be very useful to enter the most impactful jobs — for example, experience working in an AI lab or studying synthetic biology.

OK, so how should you figure out the best career capital option for you?

Our new advice: get good at something useful.

In more depth — choose some valuable skills to learn, and that are a good fit for you, and then find opportunities that let you practise those skills. And then have concrete back-up plans and plan Bs in mind, rather than relying on general ‘transferability.’

This focus on skills is important because you’re much more likely to have an impact if you’re good at what you do — and research suggests it can take years of experience to reach your peak abilities. It also becomes much easier to build up other components of career capital — like gaining credentials or making connections — once you have something useful to offer.

We’ve supplemented this with an updated list of impactful role types to aim at long-term and common types of next steps that help to learn skills useful for those.

These steps still often involve learning skills that can be applied to many different global problems or sectors (since all else equal, more transferability is better), but we don’t emphasise transferability as much. We’re also less keen on consulting as a route into working on the most pressing problems (though it’s still best for some).

More:


2. How to plan your career

We have greatly expanded our content on how to plan your career.

Our chapter on career planning leads you through planning for both a longer-term vision and immediate next steps:

  • Your longer-term vision is useful for helping shape your plans, although it shouldn’t be more than a vague idea about where you’d like to end up (read more).
  • You can then work backwards from that vision to help come up with next steps — but you should also work forward from your current situation, looking at any opportunities immediately in front of you (read more).

And to help you develop your career plan, we also have a new career planning template, designed to be used alongside our career guide.

More:


3. Other changes and improvements

  • New types of impactful careers. We added sections on why government and policy and organisation-building careers could have a high impact.
  • A new chapter on which global problems are most pressing. The previous version of the printed book (although not the website) didn’t contain anything about which problems we think are most pressing and why. The new chapter tells the story of how our views have evolved, and why we focus on reducing existential risks today.
  • Avoiding doing harm with your career. In the past years, we’ve become more concerned about the risk of people potentially causing harm with their careers, despite attempts to do good. Our new career guide more carefully and explicitly warns against this, and provides advice on how to avoid causing harm. Relatedly, we suggest considering your character as part of your career capital, and so considering how any job you take will shape and form your virtues.
  • We greatly expanded the chapter on assessing personal fit and exploring your options.
  • We’ve fully updated the more empirical sections of the guide using more up-to-date papers and data.

By working together, in our lifetimes, we can prevent the next pandemic and mitigate the risks of AI, we can end extreme global poverty and factory farming — and we can do this while having interesting, fulfilling lives too.

Our hope is that this new guide will help you do exactly that.

This blog post was first released to our newsletter subscribers.

Join over 350,000 newsletter subscribers who get content like this in their inboxes every two weeks — and we’ll also mail you a free book!

Learn more:

The post What you should know about our updated career guide appeared first on 80,000 Hours.

]]>
Announcing the new 80,000 Hours career guide https://80000hours.org/2023/09/career-guide-launch/ Mon, 04 Sep 2023 15:35:36 +0000 https://80000hours.org/?p=83444 The post Announcing the new 80,000 Hours career guide appeared first on 80,000 Hours.

]]>
From 2016 to 2019, 80,000 Hours’ core content was contained in our persistently popular career guide. (You may also remember it as the 80,000 Hours book: 80,000 Hours — Find a fulfilling career that does good).

Today, we’re re-launching that guide. Among many other changes, in the new version:

You can read the guide here or start with a 2-minute summary.

It’s also available as a printed book (you can get a free copy by signing up for our newsletter or buy it on Amazon), audiobook, podcast series or ebook (available as a .pdf or .epub).

We’d appreciate you sharing the new guide with a friend! You can send them a free copy using this link. Many of the people who’ve found our advice most useful in the past have found us via a friend, so we think the time you take to share it could be really worthwhile.

What’s in the guide?

The career guide aims to cover the most important basic concepts in career planning. (If instead you’d like to see something more in-depth, see our advanced series and podcast.)

The first article is about what to look for in a fulfilling job:

The next five are about which options are most impactful for the world:

The next four cover how to find the best option for you and invest in your skills:

The last two cover how to take action and launch your dream career:

Why did we make this change?

In 2019, we deprioritised 80,000 Hours’ career guide in favour of our key ideas series.

Our key ideas series had a more serious tone and was more focused on impact. It represented our best and most up-to-date advice. We expected that this switch would reduce engagement time on our site, but that the key ideas series would better appeal to people more likely to change their careers to do good.

However, the drop in engagement time which we could attribute to this change was larger than we’d expected. In addition, data from our user survey suggested that people who changed their careers were more, not less, likely to have found and used the older, more informal career guide (which we kept up on our site).

As a result, we decided to bring the advice in our career guide in line with our latest views, while attempting to retain its structure, tone, and engagingness.

We’re retaining the content in our key ideas series: it’s been re-released as our advanced series.

Has it been successful so far?

Yes!

We’ve had positive feedback on the quality of the content in the guide, and we’ve also seen many more people reading this guide than our key ideas series. Since soft launching the guide in May, we’ve seen about a 30% increase in total weekly engagement time on our site.

How can you help?

Please take a look at the guide and, if possible, share it with a friend! You can send them a free copy using this link.

You can also give us some feedback on the guide using this form.

Here are the links to the guide again:

Thank you so much!

The post Announcing the new 80,000 Hours career guide appeared first on 80,000 Hours.

]]>
Should you work at a leading AI lab? https://80000hours.org/career-reviews/working-at-an-ai-lab/ Tue, 20 Jun 2023 11:27:49 +0000 https://80000hours.org/?post_type=career_profile&p=82309 The post Should you work at a leading AI lab? appeared first on 80,000 Hours.

]]>

In a nutshell: Working at a leading AI lab is an important career option to consider, but the impact of any given role is complex to assess. It comes with great potential for career growth, and many roles could be (or lead to) highly impactful ways of reducing the chances of an AI-related catastrophe — one of the world’s most pressing problems. However, there’s a risk of doing substantial harm in some cases. There are also roles you should probably avoid.

Pros

  • Many roles have a high potential for impact by reducing risks from AI
  • Among the best and most robust ways to gain AI-specific career capital
  • Possibility of shaping the lab’s approach to governance, security, and standards

Cons

  • Can be extremely competitive to enter
  • Risk of contributing to the development of harmful AI systems
  • Stress and frustration, especially because of a need to carefully and frequently assess whether your role is harmful

Key facts on fit

Excellent understanding of the risks posed by future AI systems, and for some roles, comfort with a lot of quick and morally ambiguous decision making. You’ll also need to be a good fit for the specific role you’re applying for, whether you’re in research, comms, policy, or something else (see our related career reviews).

Recommendation: it's complicated

We think there are people in our audience for whom this is their highest impact option — but some of these roles might also be very harmful for some people. This means it's important to take real care figuring out whether you're in a harmful role, and, if not, whether the role is a good fit for you.

Review status

Based on a medium-depth investigation

This review is informed by two surveys of people with expertise about this path — one on whether you should be open to roles that advance AI capabilities (written up here), and a second follow-up survey. We also performed an in-depth investigation into at least one of our key uncertainties concerning this path. Some of our views will be thoroughly researched, though it's likely there are still some gaps in our understanding, as many of these considerations remain highly debated.

Why might it be high-impact to work for a leading AI lab?

We think AI is likely to have transformative effects over the coming decades. We also think that reducing the chances of an AI-related catastrophe is one of the world’s most pressing problems.

So it’s natural to wonder — if you’re thinking about your career — whether it would be worth working in the labs that are doing the most to build, and shape, these future AI systems.

Working at a top AI lab, like Google DeepMind, OpenAI, or Anthropic, might be an excellent way to build career capital to work on reducing AI risk in the future. Their work is extremely relevant to solving this problem, which suggests you’ll likely gain directly useful skills, connections, and credentials (more on this later).

In fact, we suggest working at AI labs in many of our career reviews; it can be a great step in technical AI safety and AI governance and coordination careers. We’ve also looked at working in AI labs in our career reviews on information security, software engineering, data collection for AI alignment, and non-technical roles in AI labs.

What’s more, the importance of these organisations to the development of AI suggests that they could be huge forces for either good or bad (more below). If the former, they might be high-impact places to work. And if the latter, there’s still a chance that by working in a leading lab you may be able to reduce the risks.

All that said, we think it’s crucial to take an enormous amount of care before working at an organisation that might be a huge force for harm. Overall, it’s complicated to assess whether it’s good to work at a leading AI lab — and it’ll vary from person to person, and role to role. But we think this is an important option to consider for many people who want to use their careers to reduce the chances of an existential catastrophe (or other harmful outcomes) resulting from the development of AI.

What relevant considerations are there?

Labs could be a huge force for good — or harm

We think that a leading — but careful — AI project could be a huge force for good, and crucial to preventing an AI-related catastrophe. Such a project could, for example:

(Read more about what AI companies can do today to reduce risks).1

But a leading and uncareful — or just unlucky — AI project could be a huge danger to the world. It could, for example, generate hype and acceleration (which we’d guess is harmful), make it more likely (through hype, open-sourcing or other actions) that incautious players enter the field, normalise disregard for governance, standards and security, and ultimately it could even produce the very systems that cause a catastrophe.

So, in order to successfully be a force for good, a leading AI lab would need to balance continuing their development of powerful AI (and possibly even retaining a leadership position), whilst also appropriately prioritising doing things that reduce the risk overall.

This tightrope seems difficult to walk, with constant tradeoffs to make between success and caution. And it seems hard to assess from the outside which labs are doing this well. The top labs — as of 2023, OpenAI, Google DeepMind, and Anthropic — seem reasonably inclined towards safety, and it’s plausible that any or all of these could be successfully walking the tightrope, but we’re not really sure.

We don’t feel confident enough to give concrete recommendations on which of these labs people should or should not work for. We can only really recommend that you put work into forming your own views about whether a company is a force for good. But the fact that labs could be such a huge force for good is part of why we think it’s likely there are many roles at leading AI labs that are among the world’s most impactful positions.

It’s often excellent career capital

Top AI labs are high-performing, rapidly growing organisations. In general, one of the best ways to gain career capital is to go and work with any high-performing team — you can just learn a huge amount about getting stuff done. They also have excellent reputations more widely (AI is one of the world’s most sought-after fields right now, and the top labs are top for a reason). So you get the credential of saying you’ve worked in a leading lab, and you’ll also gain lots of dynamic, impressive connections. So even if we didn’t think the development of AI was a particularly pressing problem, they’d already seem good for career capital.

But you will also learn a huge amount about and make connections within AI in particular, and, in some roles, gain technical skills which could be much harder to learn elsewhere.

We think that, if you’re early in your career, this is probably the biggest effect of working for a leading AI lab, and the career capital is (generally) a more important consideration than the direct impact of the work. You’re probably not going to be having much impact at all, whether for good or for bad, when you’re just getting started.

However, your character is also shaped and built by the jobs you take, and matters a lot for your long-run impact, so is one of the components of career capital. Some experts we’ve spoken to warn against working at leading AI labs because you should always assume that you are psychologically affected by the environment you work in. That is, there’s a risk you change your mind without ever encountering an argument that you’d currently endorse (for example, you could end up thinking that it’s much less important to ensure that AI systems are safe, purely because that’s the view of people around you). Our impression is that leading labs are increasingly concerned about the risks, which makes this consideration less important — but we still think it should be taken into account in any decision you make. There are ways of mitigating this risk, which we’ll discuss later.

Of course, it’s important to compare working at an AI lab with other ways you might gain career capital. For example, to get into technical AI safety research, you may want to go do a PhD instead. Generally, the best option for career capital will depend on a number of factors, including the path you’re aiming for longer term and your personal fit for the options in front of you.

You might advance AI capabilities, which could be (really) harmful

We’d guess that, all else equal, we’d prefer that progress on AI capabilities was slower.

This is because it seems plausible that we could develop transformative AI fairly soon (potentially in the next few decades). This suggests that we could also build potentially dangerous AI systems fairly soon — and the sooner this occurs the less time society has to successfully mitigate the risks. As a broad rule of thumb, less time to mitigate risks seems likely to mean that the risks are higher overall.

But that’s not necessarily the case. There are reasons to think that advancing at least some kinds of AI capabilities could be beneficial. Here are a few:

  • This distinction between ‘capabilities’ research and ‘safety’ research is extremely fuzzy, and we have a somewhat poor track record of predicting which areas of research will be beneficial for safety work in the future. This suggests that work that advances some (and perhaps many) kinds of capabilities faster may be useful for reducing risks.
  • Moving faster could reduce the risk that AI projects that are less cautious than the existing ones can enter the field.
  • Lots of work that makes models more useful — and so could be classified as capabilities (for example, work to align existing large language models) — probably does so without increasing the risk of danger . This kind of work might allow us to use these models to reduce the risk overall, for example, through the kinds of defensive deployment discussed earlier.
  • It’s possible that the later we develop transformative AI, the faster (and therefore more dangerously) everything will play out, because other currently-constraining factors (like the amount of compute available in the world) could continue to grow independently of technical progress. Slowing down advances now could increase the rate of development in the future, when we’re much closer to being able to build transformative AI systems. This would give the world less time to conduct safety research with models that are very similar to ones we should be concerned about but which aren’t themselves dangerous. (When this is caused by a growth in the amount of compute, it’s often referred to as a hardware overhang.)

Overall, we think not all capabilities research is made equal — and that many roles advancing AI capabilities (especially more junior ones) will not be harmful, and could be beneficial. That said, our best guess is that the broad rule of thumb that there will be less time to mitigate the risks is more important than these other considerations — and as a result, broadly advancing AI capabilities should be regarded overall as probably harmful.

This raises an important question. In our article on whether it’s ever OK to take a harmful job to do more good, we ask whether it might be morally impermissible to do a job that causes serious harm, even if you think it’s a good idea on net.

It’s really unclear to us how jobs that advance AI capabilities fall into the framework proposed in that article.

This is made even more complicated by our view that a leading AI project could be crucial to preventing an AI-related catastrophe — and failing to prevent a catastrophe seems, in many value systems, similarly bad to causing one.

Ultimately, answering the question of moral permissibility is going to depend on ethical considerations about which we’re just hugely uncertain. Our guess is that it’s good for us to sometimes recommend that people work in roles that could harmfully advance AI capabilities — but we could easily change our minds on this.

For another article, we asked the 22 people we thought would be most informed about working in roles that advance AI capabilities — and who we knew had a range of views — to write a summary of their takes on the question: if you want to help prevent an AI-related catastrophe, should you be open to roles that also advance AI capabilities, or steer clear of them? There’s a range of views among the 11 responses we received, which we’ve published here.

You may be able to help labs reduce risks

As far as we can tell, there are many roles at leading AI labs where the primary effects of the roles could be to reduce risks.

Most obviously, these include research and engineering roles focused on AI safety. Labs also often don’t have enough staff in relevant teams to develop and implement good internal policies (like on evaluating and red-teaming their models and wider activity), or to figure out what they should be lobbying governments for (we’d guess that many of the top labs would lobby for things that reduce existential risks). We’re also particularly excited about people working in information security at labs to reduce risks of theft and misuse.

Beyond the direct impact of your role, you may be able to help guide internal culture in a more risk-sensitive direction. You probably won’t be able to influence many specific decisions, unless you’re very senior (or have the potential to become very senior), but if you’re a good employee you can just generally become part of the ‘conscience’ of an organisation. Just like anyone working at a powerful institution, you can also — if you see something really harmful occurring — consider organising internal complaints, whistleblowing, or even resigning. Finally, you could help foster good, cooperative working relationships with other labs as well as the public.

To do this well, you’d need the sorts of social skills that let you climb the organisational ladder and bring people round to your point of view. We’d also guess that you should spend almost all of your work time focused on doing your job well; criticism is usually far more powerful coming from a high performer.

There’s a risk that doing this badly could accidentally cause harm, for example, by making people think that arguments for caution are unconvincing.

How can you mitigate the downsides of this option?

There are a few things you can do to mitigate the downsides of taking a role in a leading AI lab:

  • Don’t work in certain positions unless you feel awesome about the lab being a force for good. This includes some technical work, like work that improves the efficiency of training very large models, whether via architectural improvements, optimiser improvements, improved reduced-precision training, or improved hardware. We’d also guess that roles in marketing, commercialisation, and fundraising tend to contribute to hype and acceleration, and so are somewhat likely to be harmful.
  • Think carefully, and take action if you need to. Take the time to think carefully about the work you’re doing, and how it’ll be disclosed outside the lab. For example, will publishing your research lead to harmful hype and acceleration? Who should have access to any models that you build? Be an employee who pays attention to the actions of the company you’re working for, and speaks up when you’re unhappy or uncomfortable.
  • Consult others. Don’t be a unilateralist. It’s worth discussing any role in advance with others. We can give you 1-1 advice, for free. If you know anyone working in the area who’s concerned about the risks, discuss your options with them. You may be able to meet people through our community, and our advisors can also help you make connections with people who can give you more nuanced and personalised advice.
  • Continue to engage with the broader safety community. To reduce the chance that your opinions or values will drift just because of the people you’re socialising with, try to find a way to spend time with people who more closely share your values. For example, if you’re a researcher or engineer, you may be able to spend some of your working time with a safety-focused research group.
  • Be ready to switch. Avoid being in a financial or psychological situation where it’s just going to be really hard for you to switch jobs into something more exclusively focused on doing good. Instead, constantly ask yourself whether you’d be able to make that switch, and whether you’re making decisions that could make it harder to do so in the future.

How to predict your fit in advance

In general, we think you’ll be a better fit for working at an AI lab if you have an excellent understanding of risks from AI. If the positive impact of your role comes from being able to persuade others to make better decisions, you’ll also need very good social skills. You’ll probably have a better time if you’re pragmatic and comfortable with making decisions that can, at times, be difficult, time-pressured, and morally ambiguous.

While a career in a leading AI lab can be rewarding and high impact for some, it’s not suitable for everyone. People who should probably not work at an AI lab include:

  • People who can’t follow tight security practices: AI labs often deal with sensitive information that needs to be handled responsibly.
  • People who aren’t able to keep their options open — that is, they aren’t (for a number of possible reasons) financially or psychologically prepared to leave if it starts to seem like the right idea. (In general, whatever your career path, we think it’s worth trying to build at least 6-12 months of financial runway.)
  • People who are more sensitive than average to incentives and social pressure: you’re just more likely to do things you wouldn’t currently endorse.

More specifically than that, predicting your fit will depend on the exact career path you’re following, and for that you can check out our other related career reviews.

How to enter

Some labs have internships (e.g. at Google DeepMind) or residency programmes (e.g. at OpenAI) — but the path to entering a leading AI lab can depend substantially on the specific role you’re interested in. So we’d suggest you look at our other career reviews for more detail, as well as plenty of practical advice.

Recommended organisations

We’re really not sure. It seems like OpenAI, Google DeepMind, and Anthropic are currently taking existential risk more seriously than other labs. Some people we spoke to have strong opinions about which of these is best, but they disagree with each other substantially.

Big tech companies like Apple, Microsoft, Meta, Amazon, and NVIDIA — which have the resources to potentially become rising stars in AI — are also worth considering, as there’s a need for more people in these companies who care about AI safety and ethics. Relatedly, plenty of startups can be good places to gain career capital, especially if they’re not advancing dangerous capabilities. However, the absence of teams focused on existential safety means that we’d guess these are worse choices for most of our readers.

Want one-on-one advice on pursuing this path?

If you think this path might be a great option for you, but you need help deciding or thinking about what to do next, our team might be able to help.

We can help you compare options, make connections, and possibly even help you find jobs or funding opportunities.

APPLY TO SPEAK WITH OUR TEAM

Find a job in this path

If you think you might be a good fit for this path and you’re ready to start looking at job opportunities that are currently accepting applications, see our list of opportunities for this path:

    View all opportunities

    Learn more about working at AI labs

    Learn more about making career decisions where there’s a risk of harm:

    Relevant career reviews (for more specific and practical advice):

    Read next:  Learn about other high-impact careers

    Want to consider more paths? See our list of the highest-impact career paths according to our research.

    Plus, join our newsletter and we’ll mail you a free book

    Join our newsletter and we’ll send you a free copy of The Precipice — a book by philosopher Toby Ord about how to tackle the greatest threats facing humanity.

    The post Should you work at a leading AI lab? appeared first on 80,000 Hours.

    ]]>
    AI safety technical research https://80000hours.org/career-reviews/ai-safety-researcher/ Mon, 19 Jun 2023 10:28:33 +0000 https://80000hours.org/?post_type=career_profile&p=74400 The post AI safety technical research appeared first on 80,000 Hours.

    ]]>
    Progress in AI — while it could be hugely beneficial — comes with significant risks. Risks that we’ve argued could be existential.

    But these risks can be tackled.

    With further progress in AI safety, we have an opportunity to develop AI for good: systems that are safe, ethical, and beneficial for everyone.

    This article explains how you can help.

    In a nutshell: Artificial intelligence will have transformative effects on society over the coming decades, and could bring huge benefits — but we also think there’s a substantial risk. One promising way to reduce the chances of an AI-related catastrophe is to find technical solutions that could allow us to prevent AI systems from carrying out dangerous behaviour.

    Pros

    • Opportunity to make a significant contribution to a hugely important area of research
    • Intellectually challenging and interesting work
    • The area has a strong need for skilled researchers and engineers, and is highly neglected overall

    Cons

    • Due to a shortage of managers, it’s difficult to get jobs and might take you some time to build the required career capital and expertise
    • You need a strong quantitative background
    • It might be very difficult to find solutions
    • There’s a real risk of doing harm

    Key facts on fit

    You’ll need a quantitative background and should probably enjoy programming. If you’ve never tried programming, you may be a good fit if you can break problems down into logical parts, generate and test hypotheses, possess a willingness to try out many different solutions, and have high attention to detail.

    If you already:

    • Are a strong software engineer, you could apply for empirical research contributor roles right now (even if you don’t have a machine learning background, although that helps)
    • Could get into a top 10 machine learning PhD, that would put you on track to become a research lead
    • Have a very strong maths or theoretical computer science background, you’ll probably be a good fit for theoretical alignment research

    Recommended

    If you are well suited to this career, it may be the best way for you to have a social impact.

    Review status

    Based on a medium-depth investigation 

    Thanks to Adam Gleave, Jacob Hilton and Rohin Shah for reviewing this article. And thanks to Charlie Rogers-Smith for his help, and his article on the topic — How to pursue a career in technical AI alignment.

    Why AI safety technical research is high impact

    As we’ve argued, in the next few decades, we might see the development of hugely powerful machine learning systems with the potential to transform society. This transformation could bring huge benefits — but only if we avoid the risks.

    We think that the worst-case risks from AI systems arise in large part because AI systems could be misaligned — that is, they will aim to do things that we don’t want them to do. In particular, we think they could be misaligned in such a way that they develop (and execute) plans that pose risks to humanity’s ability to influence the world, even when we don’t want that influence to be lost.

    We think this means that these future systems pose an existential threat to civilisation.

    Even if we find a way to avoid this power-seeking behaviour, there are still substantial risks — such as misuse by governments or other actors — which could be existential threats in themselves.

    Want to learn more about risks from AI? Read the problem profile.

    We think that technical AI safety could be the highest-impact career path we’ve identified to date. That’s because it seems like a promising way of reducing risks from AI. We’ve written an entire article about what those risks are and why they’re so important.

    Read more about preventing an AI-related catastrophe

    There are many ways in which we could go about reducing the risks that these systems might pose. But one of the most promising may be researching technical solutions that prevent unwanted behaviour — including misaligned behaviour — from AI systems. (Finding a technical way to prevent misalignment in particular is known as the alignment problem.)

    In the past few years, we’ve seen more organisations start to take these risks more seriously. Many of the leading industry labs developing AI — including Google DeepMind and OpenAI — have teams dedicated to finding these solutions, alongside academic research groups including at MIT, Oxford, Cambridge, Carnegie Mellon University, and UC Berkeley.

    That said, the field is still very new. We think there are only around 300 people working on technical approaches to reducing existential risks from AI systems,1 which makes this a highly neglected field.

    Finding technical ways to reduce this risk could be quite challenging. Any practically helpful solution must retain the usefulness of the systems (remaining economically competitive with less safe systems), and continue to work as systems improve over time (that is, it needs to be ‘scalable’). As we argued in our problem profile, it seems like it might be difficult to find viable solutions, particularly for modern ML (machine learning) systems.

    (If you don’t know anything about ML, we’ve written a very very short introduction to ML, and we’ll go into more detail on how to learn about ML later in this article. Alternatively, if you do have ML experience, talk to our team — they can give you personalised career advice, make introductions to others working on these issues, and possibly even help you find jobs or funding opportunities.)

    Although it seems hard, there are lots of avenues for more research — and the field really is very young, so there are new promising research directions cropping up all the time. So we think it’s moderately tractable, though we’re highly uncertain.

    In fact, we’re uncertain about all of this and have written extensively about reasons we might be wrong about AI risk.

    But, overall, we think that — if it’s a good fit for you — going into AI safety technical research may just be the highest-impact thing you can do with your career.

    What does this path involve?

    AI safety technical research generally involves working as a scientist or engineer at major AI labs, in academia, or in independent nonprofits.

    These roles can be very hard to get. You’ll likely need to build up career capital before you end up in a high-impact role (more on this later, in the section on how to enter). That said, you may not need to spend a long time building this career capital — we’ve seen exceptionally talented people move into AI safety from other quantitative fields, sometimes in less than a year.

    Most AI safety technical research falls on a spectrum between empirical research (experimenting with current systems as a way of learning more about what will work), and theoretical research (conceptual and mathematical research looking at ways of ensuring that future AI systems are safe).

    No matter where on this spectrum you end up working, your career path might look a bit different depending on whether you want to aim at becoming a research lead — proposing projects, managing a team and setting direction — or a contributor — focusing on carrying out the research.

    Finally, there are two slightly different roles you might aim for:

    • In academia, research is often led by professors — the key distinguishing feature of being a professor is that you’ll also teach classes and mentor grad students (and you’ll definitely need a PhD).
    • Many (but not all) contributor roles in empirical research are also engineers, often software engineers. Here, we’re focusing on software roles that directly contribute to AI safety research (and which often require some ML background) — we’ve written about software engineering more generally in a separate career review.

    4 kinds of AI safety role: empirical lead, empirical contributor, theoretical lead and theoretical contributor

    We think that research lead roles are probably higher-impact in general. But overall, the impact you could have in any of these roles is likely primarily determined by your personal fit for the role — see the section on how to predict your fit in advance.

    Next, we’ll take a look at what working in each path might involve. Later, we’ll go into how you might enter each path.

    What does work in the empirical AI safety path involve?

    Empirical AI safety tends to involve teams working directly with ML models to identify any risks and develop ways in which they might be mitigated.

    That means the work is focused on current ML techniques and techniques that might be applied in the very near future.

    Practically, working on empirical AI safety involves lots of programming and ML engineering. You might, for example, come up with ways you could test the safety of existing systems, and then carry out these empirical tests.

    You can find roles in empirical AI safety in industry and academia, as well as some in AI safety-focused nonprofits.

    Particularly in academia, lots of relevant work isn’t explicitly labelled as being focused on existential risk — but it can still be highly valuable. For example, work in interpretability, adversarial examples, diagnostics and backdoor learning, among other areas, could be highly relevant to reducing the chance of an AI-related catastrophe.

    We’re also excited by experimental work to develop safety standards that AI companies might adhere to in the future — for example, the work being carried out by ARC Evals.

    To learn more about the sorts of research taking place at labs focused on empirical AI safety, take a look at:

    While programming is central to all empirical work, generally, research lead roles will be less focused on programming; instead, they need stronger research taste and theoretical understanding. In comparison, research contributors need to be very good at programming and software engineering.

    What does work in the theoretical AI safety path involve?

    Theoretical AI safety is much more heavily conceptual and mathematical. Often it involves careful reasoning about the hypothetical behaviour of future systems.

    Generally, the aim is to come up with properties that it would be useful for safe ML algorithms to have. Once you have some useful properties, you can try to develop algorithms with these properties (bearing in mind that to be practically useful these algorithms will have to end up being adopted by industry). Alternatively, you could develop ways of checking whether systems have these properties. These checks could, for example, help hold future AI products to high safety standards.

    Many people working in theoretical AI safety will spend much of their time proving theorems or developing new mathematical frameworks. More conceptual approaches also exist, although they still tend to make heavy use of formal frameworks.

    Some examples of research in theoretical AI safety include:

    There are generally fewer roles available in theoretical AI safety work, especially as research contributors. Theoretical research contributor roles exist at nonprofits (primarily the Alignment Research Center), as well as at some labs (for example, Anthropic’s work on conditioning predictive models and the Causal Incentives Working Group at Google DeepMind). Most contributor roles in theoretical AI safety probably exist in academia (for example, PhD students in teams working on projects relevant to theoretical AI safety).

    Some exciting approaches to AI safety

    There are lots of technical approaches to AI safety currently being pursued. Here are just a few of them:

    It’s worth noting that there are many approaches to AI safety, and people in the field strongly disagree on what will or won’t work.

    This means that, once you’re working in the field, it can be worth being charitable and careful not to assume that others’ work is unhelpful just because it seemed so on a quick skim. You should probably be uncertain about your own research agenda as well.

    What’s more, as we mentioned earlier, lots of relevant work across all these areas isn’t explicitly labelled ‘safety.’

    So it’s important to think carefully about how or whether any particular research helps reduce the risks that AI systems might pose.

    What are the downsides of this career path?

    AI safety technical research is not the only way to make progress on reducing the risks that future AI systems might pose. Also, there are many other pressing problems in the world that aren’t the possibility of an AI-related catastrophe, and lots of careers that can help with them. If you’d be a better fit working on something else, you should probably do that.

    Beyond personal fit, there are a few other downsides to the career path:

    • It can be very competitive to enter (although once you’re in, the jobs are well paid, and there are lots of backup options).
    • You need quantitative skills — and probably programming skills.
    • The work is geographically concentrated in just a few places (mainly the California Bay Area and London, but there are also opportunities in places with top universities such as Oxford, New York, Pittsburgh, and Boston). That said, remote work is increasingly possible at many research labs.
    • It might not be very tractable to find good technical ways of reducing the risk. Although assessments of its difficulty vary, and while making progress is almost certainly possible, it may be quite hard to do so. This reduces the impact that you could have working in the field. That said, if you start out in technical work you might be able to transition to governance work, since that often benefits from technical training and experience with the industry, which most people do not have.)
    • Relatedly, there’s lots of disagreement in the field about what could work; you’ll probably be able to find at least some people who think what you’re working on is useless, whatever you end up doing.
    • Most importantly, there’s some risk of doing harm. While gaining career capital, and while working on the research itself, you’ll have to make difficult decisions and judgement calls about whether you’re working on something beneficial (see our anonymous advice about working in roles that advance AI capabilities). There’s huge disagreement on which technical approaches to AI safety might work — and sometimes this disagreement takes the form of thinking that a strategy will actively increase existential risks from AI.

    Finally, we’ve written more about the best arguments against AI being pressing in our problem profile on preventing an AI-related catastrophe. If those are right, maybe you could have more impact working on a different issue.

    How much do AI safety technical researchers earn?

    Many technical researchers work at companies or small startups that pay wages competitive with the Bay Area and Silicon Valley tech industry, and even smaller organisations and nonprofits will pay competitive wages to attract top talent. The median compensation for a software engineer in the San Francisco Bay area was $222,000 per year in 2020.3 (Read more about software engineering salaries).

    This $222,000 median may be an underestimate, as AI roles, especially in top AI labs that are rapidly scaling up their work in AI, often pay better than other tech jobs, and the same applies to safety researchers — even those in nonprofits.

    However, academia has lower salaries than industry in general, and we’d guess that AI safety research roles in academia pay less than commercial labs and nonprofits.

    Examples of people pursuing this path

    How to predict your fit in advance

    You’ll generally need a quantitative background (although not necessarily a background in computer science or machine learning) to enter this career path.

    There are two main approaches you can take to predict your fit, and it’s helpful to do both:

    • Try it out: try out the first few steps in the section below on learning the basics. If you haven’t yet, try learning some python, as well as taking courses in linear algebra, calculus, and probability. And if you’ve done that, try learning a bit about deep learning and AI safety. Finally, the best way to try this out for many people would be to actually get a job as a (non-safety) ML engineer (see more in the section on how to enter).
    • Talk to people about whether it would be a good fit for you: If you want to become a technical researcher, our team probably wants to talk to you. We can give you 1-1 advice, for free. If you know anyone working in the area (or something similar), discuss this career path with them and ask for their honest opinion. You may be able to meet people through our community. Our advisors can also help make connections.

    It can take some time to build expertise, and enjoyment can follow expertise — so be prepared to take some time to learn and practice before you decide to switch to something else entirely.

    If you’re not sure what roles you might aim for longer term, here are a few rough ways you could make a guess about what to aim for, and whether you might be a good fit for various roles on this path:

    • Testing your fit as an empirical research contributor: In a blog post about hiring for safety researchers, the Google DeepMind team said “as a rough test for the Research Engineer role, if you can reproduce a typical ML paper in a few hundred hours and your interests align with ours, we’re probably interested in interviewing you.”
      • Looking specifically at software engineering, one hiring manager at Anthropic said that if you could, with a few weeks’ work, write a complex new feature or fix a very serious bug in a major ML library, they’d want to interview you straight away. (Read more.)
    • Testing your fit for theoretical research: If you could have got into a top 10 maths or theoretical computer science PhD programme if you’d optimised your undergrad to do so, that’s a decent indication of your fit (and many researchers in fact have these PhDs). The Alignment Research Center (one of the few organisations that hires for theoretical research contributors, as of 2023) said that they were open to hiring people without any research background. They gave four tests of fit: creativity (e.g. you may have ideas for solving open problems in the field, like Eliciting Latent Knowledge); experience designing algorithms, proving theorems, or formalising concepts; broad knowledge of maths and computer science; and having thought a lot about the AI alignment problem in particular.
    • Testing your fit as a research lead (or for a PhD): The vast majority of research leads have a PhD. Also, many (but definitely not all) AI safety technical research roles will require a PhD — and if they don’t, having a PhD (or being the sort of person that could get one) would definitely help show that you’re a good fit for the work. To get into a top 20 machine learning PhD programme, you’d probably need to publish something like a first author workshop paper, as well as a third author conference paper at a major ML conference (like NeurIPS or ICML). (Read more about whether you should do a PhD).

    Read our article on personal fit to learn more about how to assess your fit for the career paths you want to pursue.

    How to enter

    You might be able to apply for roles right away — especially if you meet, or are near meeting, the tests we just looked at — but it also might take you some time, possibly several years, to skill up first.

    So, in this section, we’ll give you a guide to entering technical AI safety research. We’ll go through four key questions:

    1. How to learn the basics
    2. Whether you should do a PhD
    3. How to get a job in empirical research
    4. How to get a job in theoretical research

    Hopefully, by the end of the section, you’ll have everything you need to get going.

    Learning the basics

    To get anywhere in the world of AI safety technical research, you’ll likely need a background knowledge of coding, maths, and deep learning.

    You might also want to practice enough to become a decent ML engineer (although this is generally more useful for empirical research), and learn a bit about safety techniques in particular (although this is generally more useful for empirical research leads and theoretical researchers).

    We’ll go through each of these in turn.

    Learning to program

    You’ll probably want to learn to code in python, because it’s the most widely used language in ML engineering.

    The first step is probably just trying it out. As a complete beginner, you can write a Python program in less than 20 minutes that reminds you to take a break every two hours. Don’t be discouraged if your code doesn’t work the first time — that’s what normally happens when people code!

    Once you’ve done that, you have a few options:

    You can read more about learning to program — and how to get your first job in software engineering (if that’s the route you want to take) — in our career review on software engineering.

    Learning the maths

    The maths of deep learning relies heavily on calculus and linear algebra, and statistics can be useful too — although generally learning the maths is much less important than programming and basic, practical ML.

    We’d generally recommend studying a quantitative degree (like maths, computer science or engineering), most of which will cover all three areas pretty well.

    If you want to actually get good at maths, you have to be solving problems. So, generally, the most useful thing that textbooks and online courses provide isn’t their explanations — it’s a set of exercises to try to solve, in order, with some help if you get stuck.

    If you want to self-study (especially if you don’t have a quantitative degree) here are some possible resources:

    You might be able to find resources that cover all these areas, like Imperial College’s Mathematics for Machine Learning.

    Learning basic machine learning

    You’ll likely need to have a decent understanding of how AI systems are currently being developed. This will involve learning about machine learning and neural networks, before diving into any specific subfields of deep learning.

    Again, there’s the option of covering this at university. If you’re currently at college, it’s worth checking if you can take an ML course even if you’re not majoring in computer science.

    There’s one important caveat here: you’ll learn a huge amount on the job, and the amount you’ll need to know in advance for any role or course will vary hugely! Not even top academics know everything about their fields. It’s worth trying to find out how much you’ll need to know for the role you want to do before you invest hundreds of hours into learning about ML.

    With that caveat in mind, here are some suggestions of places you might start if you want to self-study the basics:

    PyTorch is a very common package used for implementing neural networks, and probably worth learning! When I was first learning about ML, my first neural network was a 3-layer convolutional neural network with L2 regularisation classifying characters from the MNIST database. This is a pretty common first challenge, and a good way to learn PyTorch.

    Learning about AI safety

    If you’re going to work as an AI safety researcher, it usually helps to know about AI safety.

    This isn’t always true — some engineering roles won’t require much knowledge of AI safety. But even then, knowing the basics will probably help land you a position, and can also help with things like making difficult judgement calls and avoiding doing harm. And if you want to be able to identify and do useful work, you’ll need to learn about the field eventually.

    Because the field is still so new, there probably aren’t (yet) university courses you can take. So you’ll need to do some self-study. Here are some places you might start:

    For more suggestions — especially when it comes to reading about the nature of the risks we might face from AI systems — take a look at the top resources to learn more from our problem profile.

    Should you do a PhD?

    Some technical research roles will require a PhD — but many won’t, and PhDs aren’t the best option for everyone.

    The main benefit of doing a PhD is probably practising setting and carrying out your own research agenda. As a result, getting a PhD is practically the default if you want to be a research lead.

    That said, you can also become a research lead without a PhD — in particular, by transitioning from a role as a research contributor. At some large labs, the boundary between being a contributor and a lead is increasingly blurry.

    Many people find PhDs very difficult. They can be isolating and frustrating, and take a very long time (4–6 years). What’s more, both your quality of life and the amount you’ll learn will depend on your supervisor — and it can be really difficult to figure out in advance whether you’re making a good choice.

    So, if you’re considering doing a PhD, here are some things to consider:

    • Your long-term vision: If you’re aiming to be a research lead, that suggests you might want to do a PhD — the vast majority of research leads have PhDs. If you mainly want to be a contributor (e.g. an ML or software engineer), that suggests you might not. If you’re unsure, you should try doing something to test your fit for each, like trying a project or internship. You might try a pre-doctoral research assistant role — if the research you do is relevant to your future career, these can be good career capital, whether or not you do a PhD.
    • The topic of your research: It’s easy to let yourself become tied down to a PhD topic you’re not confident in. If the PhD you’re considering would let you work on something that seems useful for AI safety, it’s probably — all else equal — better for your career, and the research itself might have a positive impact as well.
    • Mentorship: What are the supervisors or managers like at the opportunities open to you? You might be able to find ML engineering or research roles in industry where you could learn much more than you would in a PhD — or vice versa. When picking a supervisor, try reaching out to the current or former students of a prospective supervisor and asking them some frank questions. (Also, see this article on how to choose a PhD supervisor.)
    • Your fit for the work environment: Doing a PhD means working on your own with very little supervision or feedback for long periods of time. Some people thrive in these conditions! But some really don’t and find PhDs extremely difficult.

    Read more in our more detailed (but less up-to-date) review of machine learning PhDs.

    It’s worth remembering that most jobs don’t need a PhD. And for some jobs, especially empirical research contributor roles, even if a PhD would be helpful, there are often better ways of getting the career capital you’d need (for example, working as a software or ML engineer). We’ve interviewed two ML engineers who have had hugely successful careers without doing a PhD.

    Whether you should do a PhD doesn’t depend (much) on timelines

    We think it’s plausible that we will develop AI that could be hugely transformative for society by the end of the 2030s.

    All else equal, that possibility could argue for trying to have an impact right away, rather than spending five (or more) years doing a PhD.

    Ultimately, though, how well you, in particular, are suited to a particular PhD is probably a much more important factor than when AI will be developed.

    That is to say, we think the increase in impact caused by choosing a path that’s a good fit for you is probably larger than any decrease in impact caused by delaying your work. This is in part because the spread in impact caused by the specific roles available to you, as well as your personal fit for them, is usually very large. Some roles (especially research lead roles) will just require having a PhD, and others (especially more engineering-heavy roles) won’t — and people’s fit for these paths varies quite a bit.

    We’re also highly uncertain about estimates about when we might develop transformative AI. This uncertainty reduces the expected cost of any delay.

    Most importantly, we think PhDs shouldn’t be thought of as a pure delay to your impact. You can do useful work in a PhD, and generally, the first couple of years in any career path will involve a lot of learning the basics and getting up to speed. So if you have a good mentor, work environment, and choice of topic, your PhD work could be as good as, or possibly better than, the work you’d do if you went to work elsewhere early in your career. And if you suddenly receive evidence that we have less time than you thought, it’s relatively easy to drop out.

    There are lots of other considerations here — for a rough overview, and some discussion, see this post by 80,000 Hours advisor Alex Lawsen, as well as the comments.

    Overall, we’d suggest that instead of worrying about a delay to your impact, think instead about which longer-term path you want to pursue, and how the specific opportunities in front of you will get you there.

    How to get into a PhD

    ML PhDs can be very competitive. To get in, you’ll probably need a few publications (as we said above, something like a first author workshop paper, as well as a third author conference paper at a major ML conference (like NeurIPS or ICML), and references, probably from ML academics. (Although publications also look good whatever path you end up going down!)

    To end up at that stage, you’ll need a fair bit of luck, and you’ll also need to find ways to get some research experience.

    One option is to do a master’s degree in ML, although make sure it’s a research masters — most ML master’s degrees primarily focus on preparation for industry.

    Even better, try getting an internship in an ML research group. Opportunities include RISS at Carnegie Mellon University, UROP at Imperial College London, the Aalto Science Institute international summer research programme, the Data Science Summer Institute, the Toyota Technological Institute intern programme and MILA. You can also try doing an internship specifically in AI safety, for example at CHAI. However, there are sometimes disadvantages to doing internships specifically in AI safety directly — in general, it may be harder to publish and mentorship might be more limited.

    Another way of getting research experience is by asking whether you can work with researchers. If you’re already at a top university, it can be easiest to reach out to people working at the university you’re studying at.

    PhD students or post-docs can be more responsive than professors, but eventually, you’ll want a few professors you’ve worked with to provide references, so you’ll need to get in touch. Professors tend to get lots of cold emails, so try to get their attention! You can try:

    • Getting an introduction, for example from a professor who’s taught you
    • Mentioning things you’ve done (your grades, relevant courses you’ve taken, your GitHub, any ML research papers you’ve attempted to replicate as practice)
    • Reading some of their papers and the main papers in the field, and mention them in the email
    • Applying for funding that’s available to students who want to work in AI safety, and letting people know you’ve got funding to work with them

    Ideally, you’ll find someone who supervises you well and has time to work with you (that doesn’t necessarily mean the most famous professor — although it helps a lot if they’re regularly publishing at top conferences). That way, they’ll get to know you, you can impress them, and they’ll provide an amazing reference when you apply for PhDs.

    It’s very possible that, to get the publications and references you’ll need to get into a PhD, you’ll need to spend a year or two working as a research assistant, although these positions can also be quite competitive.

    This guide by Adam Gleave also goes into more detail on how to get a PhD, including where to apply and tips on the application process itself. We discuss ML PhDs in more detail in our career review on ML PhDs (though it’s outdated compared to this career review).

    Getting a job in empirical AI safety research

    Ultimately, the best way of learning to do empirical research — especially in contributor and engineering-focused roles — is to work somewhere that does both high-quality engineering and cutting-edge research.

    The top three labs are probably Google DeepMind (who offer internships to students), OpenAI (who have a 6-month residency programme) and Anthropic. (Working at a leading AI lab carries with it some risk of doing harm, so it’s important to think carefully about your options. We’ve written a separate article going through the major relevant considerations.)

    To end up working in an empirical research role, you’ll probably need to build some career capital.

    Whether you want to be a research lead or a contributor, it’s going to help to become a really good software engineer. The best ways of doing this usually involve getting a job as a software engineer at a big tech company or at a promising startup. (We’ve written an entire article about becoming a software engineer.)

    Many roles will require you to be a good ML engineer, which means going further than just the basics we looked at above. The best way to become a good ML engineer is to get a job doing ML engineering — and the best places for that are probably leading AI labs.

    For roles as a research lead, you’ll need relatively more research experience. You’ll either want to become a research contributor first, or enter through academia (for example by doing a PhD).

    All that said, it’s important to remember that you don’t need to know everything to start applying, as you’ll inevitably learn loads on the job — so do try to find out what you’ll need to learn to land the specific roles you’re considering.

    How much experience do you need to get a job? It’s worth reiterating the tests we looked at above for contributor roles:

    • In a blog post about hiring for safety researchers, the DeepMind team said “as a rough test for the Research Engineer role, if you can reproduce a typical ML paper in a few hundred hours and your interests align with ours, we’re probably interested in interviewing you.”
    • Looking specifically at software engineering, one hiring manager at Anthropic said that if you could, with a few weeks’ work, write a new feature or fix a serious bug in a major ML library, they’d want to interview you straight away. (Read more.)

    In the process of getting this experience, you might end up working in roles that advance AI capabilities. There are a variety of views on whether this might be harmful — so we’d suggest reading our article about working at leading AI labs and our article containing anonymous advice from experts about working in roles that advance capabilities. It’s also worth talking to our team about any specific opportunities you have.

    If you’re doing another job, or a degree, or think you need to learn some more before trying to change careers, there are a few good ways of getting more experience doing ML engineering that go beyond the basics we’ve already covered:

    • Getting some experience in software / ML engineering. For example, if you’re doing a degree, you might try an internship as a software engineer during the summer. DeepMind offer internships for students with at least two years of study in a technical subject,
    • Replicating papers. One great way of getting experience doing ML engineering, is to replicate some papers in whatever sub-field you might want to work in. Richard Ngo, an AI governance researcher at OpenAI, has written some advice on replicating papers. But bear in mind that replicating papers can be quite hard — take a look at Amid Fish’s blog on what he learned replicating a deep RL paper. Finally, Rogers-Smith has some suggestions on papers to replicate. If you do spend some time replicating papers, remember that when you get to applying for roles, it will be really useful to be able to prove you’ve done the work. So try uploading your work to GitHub, or writing a blog on your progress. And if you’re thinking about spending a long time on this (say, over 100 hours), try to get some feedback on the papers you might replicate before you start — you could even reach out to a lab you want to work for.
    • Taking or following a more in-depth course in empirical AI safety research. Redwood Research ran the MLAB bootcamp, and you can apply for access to their curriculum here. You could also take a look at this Deep Learning Curriculum by Jacob Hilton, a researcher at the Alignment Research Center — although it’s probably very challenging without mentorship.4 The Alignment Research Engineer Accelerator is a program that uses this curriculum. Some mentors on the SERI ML Alignment Theory Scholars Program focus on empirical research.
    • Learning about a sub-field of deep learning. In particular, we’d suggest natural language processing (in particular transformers — see this lecture as a starting point) and reinforcement learning (take a look at Pong from Pixels by Andrej Karpathy, and OpenAI’s Spinning up in Deep RL). Try to get to the point where you know about the most important recent advances.

    Getting a job in theoretical AI safety research

    There are fewer jobs available in theoretical AI safety research, so it’s harder to give concrete advice. Having a maths or theoretical computer science PhD isn’t always necessary, but is fairly common among researchers in industry, and is pretty much required to be an academic.

    If you do a PhD, ideally it’d be in an area at least somewhat related to theoretical AI safety research. For example, it could be in probability theory as applied to AI, or in theoretical CS (look for researchers who publish in COLT or FOCS).

    Alternatively, one path is to become an empirical research lead before moving into theoretical research.

    Compared to empirical research, you’ll need to know relatively less about engineering, and relatively more about AI safety as a field.

    Once you’ve done the basics, one possible next step you could try is reading papers from a particular researcher, or on a particular topic, and summarising what you’ve found.

    You could also try spending some time (maybe 10–100 hours) reading about a topic and then some more time (maybe another 10–100 hours) trying to come up with some new ideas on that topic. For example, you could try coming up with proposals to solve the problem of eliciting latent knowledge. Alternatively, if you wanted to focus on the more mathematical side, you could try having a go at the assignment at the end of this lecture by Michael Cohen, a grad student at the University of Oxford.

    If you want to enter academia, reading a ton of papers seems particularly important. Maybe try writing a survey paper on a certain topic in your spare time. It’s a great way to master a topic, spark new ideas, spot gaps, and come up with research ideas. When applying to grad school or jobs, your paper is a fantastic way to show you love research so much you do it for fun.

    There are some research programmes aimed at people new to the field, such as the SERI ML Alignment Theory Scholars Program, to which you could apply.

    Other ways to get more concrete experience include doing research internships, working as a research assistant, or doing a PhD, all of which we’ve written about above, in the section on whether and how you can get into a PhD programme.

    One note is that a lot of people we talk to try to learn independently. This can be a great idea for some people, but is fairly tough for many, because there’s substantially less structure and mentorship.

    AI labs in industry that have empirical technical safety teams, or are focused entirely on safety:

    • Anthropic is an AI safety company working on building interpretable and safe AI systems. They focus on empirical AI safety research. Anthropic cofounders Daniela and Dario Amodei gave an interview about the lab on the Future of Life Institute podcast. On our podcast, we spoke to Chris Olah, who leads Anthropic’s research into interpretability, and Nova DasSarma, who works on systems infrastructure at Anthropic.
    • ARC Evals works on assessing whether cutting-edge AI systems could pose catastrophic risks to civilization, including early-stage, experimental work to develop techniques, and evaluating systems produced by Anthropic and OpenAI.
    • The Center for AI Safety is a nonprofit that does technical research and promotion of safety in the wider machine learning community.
    • FAR AI is a research nonprofit that incubates and accelerates research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry, including research in adversarial robustness, interpretability and preference learning.
    • Google DeepMind is probably the largest and most well-known research group developing general artificial machine intelligence, and is famous for its work creating AlphaGo, AlphaZero, and AlphaFold. It is not principally focused on safety, but has two teams focused on AI safety, with the Scalable Alignment Team focusing on aligning existing state-of-the-art systems, and the Alignment Team focused on research bets for aligning future systems.
    • OpenAI, founded in 2015, is a lab that is trying to build artificial general intelligence that is safe and benefits all of humanity. OpenAI is well known for its language models like GPT-4. Like DeepMind, it is not principally focused on safety, but has a safety team and a governance team. Jan Leike (co-lead of the superalignment team) has some blog posts on how he thinks about AI alignment, and has spoken on our podcast about the sorts of people he’d like to hire for his team.
    • Ought is a machine learning lab building Elicit, an AI research assistant. Their aim is to align open-ended reasoning by learning human reasoning steps, and to direct AI progress towards helping with evaluating evidence and arguments.
    • Redwood Research is an AI safety research organisation, whose first big project attempted to make sure language models (like GPT-3) produce output following certain rules with very high probability, in order to address failure modes too rare to show up in standard training.

    Theoretical / conceptual AI safety labs:

    • The Alignment Research Center (ARC) is attempting to produce alignment strategies that could be adopted in industry today while also being able to scale to future systems. They focus on conceptual work, developing strategies that could work for alignment and which may be promising directions for empirical work, rather than doing empirical AI work themselves. Their first project was releasing a report on Eliciting Latent Knowledge, the problem of getting advanced AI systems to honestly tell you what they believe (or ‘believe’) about the world. On our podcast, we interviewed ARC founder Paul Christiano about his research (before he founded ARC).
    • The Center on Long-Term Risk works to address worst-case risks from advanced AI. They focus on conflict between AI systems.
    • The Machine Intelligence Research Institute was one of the first groups to become concerned about the risks from machine intelligence in the early 2000s, and its team has published a number of papers on safety issues and how to resolve them.
    • Some teams in commercial labs also do some more theoretical and conceptual work on alignment, such as Anthropic’s work on conditioning predictive models and the Causal Incentives Working Group at Google DeepMind.

    AI safety in academia (a very non-comprehensive list; while the number of academics explicitly and publicly focused on AI safety is small, it’s possible to do relevant work at a much wider set of places):

    Want one-on-one advice on pursuing this path?

    We think that the risks posed by the development of AI may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.

    We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

    APPLY TO SPEAK WITH OUR TEAM

    Find a job in this path

    If you think you might be a good fit for this path and you’re ready to start looking at job opportunities that are currently accepting applications, see our curated list of opportunities for this path:

      View all opportunities

      Learn more about AI safety technical research

      Here are some suggestions about where you could learn more:

      If you prefer podcasts, there are some relevant episodes of the 80,000 Hours podcast you might find helpful:

      Read next:  Learn about other high-impact careers

      Want to consider more paths? See our list of the highest-impact career paths according to our research.

      Plus, join our newsletter and we’ll mail you a free book

      Join our newsletter and we’ll send you a free copy of The Precipice — a book by philosopher Toby Ord about how to tackle the greatest threats facing humanity.

      The post AI safety technical research appeared first on 80,000 Hours.

      ]]>
      Practical steps to take now that AI risk is mainstream https://80000hours.org/2023/06/practical-steps-to-take-now-that-ai-risk-is-mainstream/ Tue, 06 Jun 2023 14:47:55 +0000 https://80000hours.org/?p=82106 The post Practical steps to take now that AI risk is mainstream appeared first on 80,000 Hours.

      ]]>
      AI risk has gone mainstream. So what’s next?

      Last Tuesday’s statement on AI risk has hit headlines across the world. Hundreds of leading AI scientists and other prominent figures — including the CEOs of OpenAI, Anthropic and Google DeepMind — signed the one-sentence statement:

      Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

      This mainstreaming of concerns about the risk of extinction from AI represents a substantial shift to the strategic landscape — and should, as a result, have implications on how best to reduce the risk.

      How has the landscape shifted?

      Pictures from the White House Press Briefing. Meme from @kristjanmoore. The relevant video is here.


      So far, I think the most significant effect of the changes in the way these risks are viewed can be seen in changes in political activity.

      World leaders — including Joe Biden, Rishi Sunak, Emmanuel Macron — have all met leaders in AI in the last few months. AI regulation was a key topic of discussion at the G7. And now it’s been announced that Biden and Sunak will discuss extinction risks from AI as part of talks in DC next week.

      At the moment, it’s extremely unclear where this discussion will go. While I (tentatively) think that there are actions that governments could be taking, it’s possible that governments will act in a way that could increase the risk overall.

      But it does seem like some kind of government action is now likely to take place, in the near future.

      Looking a bit further forward, the other substantial change to the strategic landscape is that, probably, more people are going to end up working on AI risk sooner than I would previously have predicted.

      This blog post was first released to our newsletter subscribers.

      Join over 350,000 newsletter subscribers who get content like this in their inboxes every two weeks — and we’ll also mail you a free book!

      What does this mean for our actions?

      Our framework suggests thinking about the effects of all this news on the scale, neglectedness and solvability of risks from AI.

      • Scale: It’s unclear whether the risk from AI has gone up or down in recent months. All else equal, I’d guess that more attention on the issue will be beneficial — but I’m very uncertain.
      • Neglectedness: It seems likely that more people will be working on AI risk sooner. This reduces the neglectedness of the risk, making it harder for any one individual to have an impact.
      • Solvability: For the moment, at least, it appears that it’s going to be easier to convince people to take action to reduce the risk.

      Overall, I’d guess that the increase in solvability is, for now, more substantial than any decrease in neglectedness, making risks from AI overall slightly more pressing than they have been in the past. (It’s very plausible that this could change within the next few years.)

      Our recommendations for action

      There are still far too few people working on AI risk — and now might be one of the highest-impact times to get involved. We think more people should consider using their careers to reduce risks from AI, and not everyone needs a technical or quantitative background to do so.

      (People often think that we think everyone should work on AI, but this isn’t the case. The impact you have over your career depends not just on how pressing the problems you focus on are, but also the effectiveness of the particular roles you might do, and your personal fit for working in them. I’d guess that less than half of our readers should work on reducing risks from AI — there are other problems that are similarly important. But I’d also guess that many of our readers underestimate their ability to contribute.)

      There are real ways you could work to reduce this risk:

      • Consider working on technical roles in AI safety. If you might be able to open up new research directions, that seems incredibly high impact. But there are lots of other ways to help. You don’t need to be an expert in ML — it can be really useful just to be a great software engineer.
      • It’s vital that information on how to produce and run potentially dangerous AI models is kept safe. This means roles in information security could be highly impactful.
      • Other jobs at AI companies — many of which don’t require technical skills — could be good to take, if you’re careful to avoid the risks.
      • Jobs in government and policy could, if you’re well informed, position you to provide cautious and helpful advice as AI systems become more dangerous and powerful.

      We’ll be updating our career reviews of technical and governance roles soon as part of our push to keep our advice in line with recent developments.

      Getting to the point where you could secure any of these roles — and doing good with them — means becoming more informed about the issues, and then building career capital to put yourself in a better position to have an impact. Our rule of thumb would be “get good at something helpful.”

      Take a look at our (new!) article on building career capital for more.

      Finally, if you’re serious about working on reducing risks from AI, consider talking to our team.

      I don’t think human extinction is likely. In fact, in my opinion, humanity is almost certainly going to survive this century. But the risk is real, and substantial, making this probably the most pressing problem currently facing the world.

      And right now could be the best time to start helping. Should you?

      Learn more:

      The post Practical steps to take now that AI risk is mainstream appeared first on 80,000 Hours.

      ]]>
      Give feedback on the new 80,000 Hours career guide https://80000hours.org/2023/05/give-feedback-on-the-new-80000-hours-career-guide/ Thu, 25 May 2023 16:24:16 +0000 https://80000hours.org/?p=81996 The post Give feedback on the new 80,000 Hours career guide appeared first on 80,000 Hours.

      ]]>
      We’ve spent the last few months updating 80,000 Hours’ career guide (which we previously released in 2017 and which you’ve been able to get as a physical book). This week, we’ve put our new career guide live on our website. Before we formally launch and promote the guide — and republish the book — we’d like to gather feedback from our readers!

      How can you help?

      First, take a look at the new career guide.

      Note that our target audience for this career guide is approximately the ~100k young adults most likely to have high-impact careers, in the English-speaking world. Many of them may not yet be familiar with many of the ideas that are widely discussed in the effective altruism community. Also, this guide is primarily aimed at people aged 18–24.

      When you’re ready, there’s a simple form to fill in:

      Give feedback

      Thank you so much!

      Extra context: why are we making this change?

      In 2018, we deprioritised 80,000 Hours’ career guide in favour of our key ideas series.

      Our key ideas series had a more serious tone, and was more focused on impact. It represented our best and most up-to-date advice. We expected that this switch would reduce engagement time on our site, but that the key ideas series would better appeal to people more likely to change their careers to do good.

      However, the drop in engagement time which we could attribute to this change was larger than we’d expected. In addition, data from our user survey suggested that people who changed their careers were more, not less, likely to have found and used the older, more informal career guide (which we kept up on our site).

      As a result, we decided to bring the advice in our career guide in line with our latest views, while attempting to retain its structure, tone and engagingness.

      We’re retaining the content in our key ideas series: it’s been re-released as our advanced series.

      Want to give feedback on the new guide?

      First, read the new career guide:

      Read the guide

      Then, fill out our feedback form.

      Thank you for your help!

      The post Give feedback on the new 80,000 Hours career guide appeared first on 80,000 Hours.

      ]]>
      Preventing an AI-related catastrophe https://80000hours.org/problem-profiles/artificial-intelligence/ Thu, 25 Aug 2022 19:43:58 +0000 https://80000hours.org/?post_type=problem_profile&p=77853 The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.

      ]]>

      Note from the author: At its core, this problem profile tries to predict the future of technology. This is a notoriously difficult thing to do. In addition, there has been much less rigorous research into the risks from AI than into the other risks 80,000 Hours writes about (like pandemics or climate change).1 That said, there is a growing field of research into the topic, which I’ve tried to reflect. For this article I’ve leaned especially on this draft report by Joseph Carlsmith at Open Philanthropy (also available as a narration), as it’s the most rigorous overview of the risk that I could find. I’ve also had the article reviewed by over 30 people with different expertise and opinions on the topic. (Almost all are concerned about advanced AI’s potential impact.)

      If you have any feedback on this article — whether there’s something technical we’ve got wrong, some wording we could improve, or just that you did or didn’t like reading it — we’d really appreciate it if you could tell us what you think using this form.

      Why do we think that reducing risks from AI is one of the most pressing issues of our time? In short, our reasons are:

      1. Even before getting into the actual arguments, we can see some cause for concern — as many AI experts think there’s a small but non-negligible chance that AI will lead to outcomes as bad as human extinction.
      2. We’re making advances in AI extremely quickly — which suggests that AI systems could have a significant influence on society, soon.
      3. There are strong arguments that “power-seeking” AI could pose an existential threat to humanity2 — which we’ll go through below.
      4. Even if we find a way to avoid power-seeking, there are still other risks.
      5. We think we can tackle these risks.
      6. This work is neglected.

      We’re going to cover each of these in turn, then consider some of the best counterarguments, explain concrete things you can do to help, and finally outline some of the best resources for learning more about this area.

      1. Many AI experts think there’s a non-negligible chance AI will lead to outcomes as bad as extinction

      In May 2023, hundreds of AI prominent scientists — and other notable figures — signed a statement saying that mitigating the risk of extinction from AI should be a global priority.

      So it’s pretty clear that at least some experts are concerned.

      But how concerned are they? And is this just a fringe view?

      We looked at three surveys of AI researchers who published at NeurIPS and ICML (two of the most prestigious machine learning conferences) — one in 2016, one in 2019, and one in 2022.3

      It’s important to note that there could be considerable selection bias on surveys like this. For example, you might think researchers who go to the top AI conferences are more likely to be optimistic about AI, because they have been selected to think that AI research is doing good. Alternatively, you might think that researchers who are already concerned about AI are more likely to respond to a survey asking about these concerns.4

      All that said, here’s what we found:

      In all three surveys, the median researcher thought that the chances that AI would be “extremely good” was reasonably high: 20% in the 2016 survey, 20% in 2019, and 10% in 2022.5

      Indeed, AI systems are already having substantial positive effects — for example, in medical care or academic research.

      But in all three surveys, the median researcher also estimated small — and certainly not negligible — chances that AI would be “extremely bad (e.g. human extinction)”: a 5% chance of extremely bad outcomes in the 2016 survey, 2% in 2019, and 5% in 2022.6 7

      In the 2022 survey, participants were specifically asked about the chances of existential catastrophe caused by future AI advances — and again, over half of researchers thought the chances of an existential catastrophe was greater than 5%.8

      So experts disagree on the degree to which AI poses an existential risk — a kind of threat we’ve argued deserves serious moral weight.

      This fits with our understanding of the state of the research field. Three of the leading labs developing AI — DeepMind, Anthropic and OpenAI — also have teams dedicated to figuring out how to solve technical safety issues that we believe could, for reasons we discuss at length below, lead to an existential threat to humanity.9

      There are also several academic research groups (including at MIT, Oxford, Cambridge, Carnegie Mellon University, and UC Berkeley) focusing on these same technical AI safety problems.10

      It’s hard to know exactly what to take from all this, but we’re confident that it’s not a fringe position in the field to think that there is a material risk of outcomes as bad as an existential catastrophe. Some experts in the field maintain, though, that the risks are overblown.

      Still, why do we side with those who are more concerned? In short, it’s because there are arguments we’ve found persuasive that AI could pose such an existential threat — arguments we will go through step by step below.

      It’s important to recognise that the fact that many experts recognise there’s a problem doesn’t mean that everything’s OK, the experts have got it covered. Overall, we think this problem remains highly neglected, with only around 400 people working directly on the issue worldwide (more on this below).

      Meanwhile, there are billions of dollars a year going into making AI more advanced.11

      2. We’re making advances in AI extremely quickly

      Two cats dressed as computer programmers generated by different AI software.
      A cat dressed as a computer programmer” as generated by Craiyon (formerly DALL-E mini) (left) and OpenAI’s DALL-E 2. (right). DALL-E mini uses a model 27 times smaller than OpenAI’s DALL-E 1 model, released in January 2021. DALL-E 2 was released in April 2022.12

      Before we try to figure out what the future of AI might look like, it’s helpful to take a look at what AI can already do.

      Modern AI techniques involve machine learning (ML): models that improve automatically through data input. The most common form of this technique used today is known as deep learning.

      ML systems today can only perform a very small portion of tasks that humans can do, and (with a few exceptions) only within narrow specialties (like playing one particular game or generating one particular kind of image).

      That said, since the increasingly widespread use of deep learning in the mid-2010s, there has been huge progress in what can be achieved with ML. Here’s a brief timeline of only some of the advances we saw from 2019 to 2022:

      • AlphaStar, which can beat top professional players at StarCraft II (January 2019)
      • MuZero, a single system that learned to win games of chess, shogi, and Go — without ever being told the rules (November 2019)
      • GPT-3, a natural language model capable of producing high-quality text (May 2020)
      • GPT-f, which can solve some Maths Olympiad problems (September 2020)
      • AlphaFold 2, a huge step forward in solving the long-perplexing protein-folding problem (July 2021)
      • Codex, which can produce code for programs from natural language instructions (August 2021)
      • PaLM, a language model which has shown impressive capabilities to reason about things like cause and effect or explaining jokes (April 2022)
      • DALL-E 2 (April 2022) and Imagen (May 2022), which are both capable of generating high-quality images from written descriptions
      • SayCan, which takes natural language instructions and uses them to operate a robot (April 2022)
      • Gato, a single ML model capable of doing a huge number of different things (including playing Atari, captioning images, chatting, and stacking blocks with a real robot arm), deciding based on its context what it should output (May 2022)
      • Minerva can solve complex maths problems — fairly well at college level, and even better at high school maths competition level. (Minerva is far more successful than forecasters predicted in 2021.)

      If you’re anything like us, you found the complexity and breadth of the tasks these systems can carry out surprising.

      And if the technology keeps advancing at this pace, it seems clear there will be major effects on society. At the very least, automating tasks makes carrying out those tasks cheaper. As a result, we may see rapid increases in economic growth (perhaps even to the level we saw during the Industrial Revolution).

      If we’re able to partially or fully automate scientific advancement we may see more transformative changes to society and technology.13

      That could be just the beginning. We may be able to get computers to eventually automate anything humans can do. This seems like it has to be possible — at least in principle. This is because it seems that, with enough power and complexity, a computer should be able to simulate the human brain. This would itself be a way of automating anything humans can do (if not the most efficient method of doing so).

      And as we’ll see in the next section, there are some indications that extensive automation may well be possible through scaling up existing techniques.

      Current trends show rapid progress in the capabilities of ML systems

      There are three things that are crucial to building AI through machine learning:

      1. Good algorithms (e.g. more efficient algorithms are better)
      2. Data to train an algorithm
      3. Enough computational power (known as compute) to do this training

      We spoke to Danny Hernandez, who (at the time) was a research scientist on the Foresight team at OpenAI. Hernandez and his team looked at how two of these inputs (compute and algorithm efficiency) are changing over time.

      They found that, since 2012, the amount of compute used for training the largest AI models has been rising exponentially — doubling every 3.4 months.

      That is to say, since 2012, the amount of computational power used to train our largest machine learning models has grown by over 1 billion times.

      Hernandez and his team also looked at how much compute has been needed to train a neural network to have the same performance as AlexNet (an early image classification algorithm).

      They found that the amount of compute required for the same performance has been falling exponentially — halving every 16 months.

      So since 2012, the amount of compute required for the same level of performance has fallen by over 100 times. Combined with the increased compute used, that’s a lot of growth.16

      It’s hard to say whether these trends will continue, but they speak to incredible gains over the past decade in what it’s possible to do with machine learning.

      Indeed, it looks like increasing the size of models (and the amount of compute used to train them) introduces ever more sophisticated behaviour. This is how things like GPT-3 are able to perform tasks they weren’t specifically trained for.

      These observations have led to the scaling hypothesis: that we can simply build bigger and bigger neural networks, and as a result we will end up with more and more powerful artificial intelligence, and that this trend of increasing capabilities may increase to human-level AI and beyond.

      If this is true, we can attempt to predict how the capabilities of AI technology will increase over time simply by looking at how quickly we are increasing the amount of compute available to train models.

      But as we’ll see, it’s not just the scaling hypothesis that suggests we could end up with extremely powerful AI relatively soon — other methods of predicting AI progress come to similar conclusions.

      When can we expect transformative AI?

      It’s difficult to predict exactly when we will develop AI that we expect to be hugely transformative for society (for better or for worse) — for example, by automating all human work or drastically changing the structure of society.17 But here we’ll go through a few approaches.

      One option is to survey experts. Data from the 2019 survey of 300 AI experts implies that there is 20% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2060, and 85% by 2100.18 There are a lot of reasons to be suspicious of these estimates,4 but we take it as one data point.

      Ajeya Cotra (a researcher at Open Philanthropy) attempted to forecast transformative AI by comparing modern deep learning to the human brain. Deep learning involves using a huge amount of compute to train a model, before that model is able to perform some task. There’s also a relationship between the amount of compute used to train a model and the amount used by the model when it’s run. And — if the scaling hypothesis is true — we should expect the performance of a model to predictably improve as the computational power used increases. So Cotra used a variety of approaches (including, for example, estimating how much compute the human brain uses on a variety of tasks) to estimate how much compute might be needed to train a model that, when run, could carry out the hardest tasks humans can do. She then estimated when using that much compute would be affordable.

      Cotra’s 2022 update on her report’s conclusions estimates that there is a 35% probability of transformative AI by 2036, 50% by 2040, and 60% by 2050 — noting that these guesses are not stable.19

      Tom Davidson (also a researcher at Open Philanthropy) wrote a report to complement Cotra’s work. He attempted to figure out when we might expect to see transformative AI based only on looking at various types of research that transformative AI might be like (e.g. developing technology that’s the ultimate goal of a STEM field, or proving difficult mathematical conjectures), and how long it’s taken for each of these kinds of research to be completed in the past, given some quantity of research funding and effort.

      Davidson’s report estimates that, solely on this information, you’d think that there was an 8% chance of transformative AI by 2036, 13% by 2060, and 20% by 2100. However, Davidson doesn’t consider the actual ways in which AI has progressed since research started in the 1950s, and notes that it seems likely that the amount of effort we put into AI research will increase as AI becomes increasingly relevant to our economy. As a result, Davidson expects these numbers to be underestimates.

      Holden Karnofsky, co-CEO of Open Philanthropy, attempted to sum up the findings of all of the approaches above. He guesses there is more than a 10% chance we’ll see transformative AI by 2036(!), 50% by 2060, and 66% by 2100. And these guesses might be conservative, since they didn’t incorporate what we see as faster-than-expected progress since the estimates were made.

      Method Chance of transformative AI by 2036 Chance of transformative AI by 2060 Chance of transformative AI by 2100
      Expert survey (Zhang et al., 2022) 20% 50% 85%
      Biological anchors (Cotra, 2022) 35% 60% (by 2050) 80% (according to the 2020 report)
      Semi-informative priors (Davidson, 2021) 8% 13% 20%
      Overall guess (Karnofsky, 2021) 10% 50% 66%

      All in all, AI seems to be advancing rapidly. More money and talent is going into the field every year, and models are getting bigger and more efficient.

      Even if AI were advancing more slowly, we’d be concerned about it — most of the arguments about the risks from AI (that we’ll get to below) do not depend on this rapid progress.

      However, the speed of these recent advances increases the urgency of the issue.

      (It’s totally possible that these estimates are wrong – below, we discuss how the possibility that we might have a lot of time to work on this problem is one of the best arguments against this problem being pressing).

      3. Power-seeking AI could pose an existential threat to humanity

      We’ve argued so far that we expect AI to be an important — and potentially transformative — new technology.

      We’ve also seen reason to think that such transformative AI systems could be built this century.

      Now we’ll turn to the core question: why do we think this matters so much?

      There could be a lot of reasons. If advanced AI is as transformative as it seems like it’ll be, there will be many important consequences. But here we are going to explain the issue that seems most concerning to us: AI systems could pose risks by seeking and gaining power.

      We’ll argue that:

      1. It’s likely that we’ll build AI systems that can make and execute plans to achieve goals
      2. Advanced planning systems could easily be ‘misaligned’ — in a way that could lead them to make plans that involve disempowering humanity
      3. Disempowerment by AI systems would be an existential catastrophe
      4. People might deploy AI systems that are misaligned, despite this risk

      Thinking through each step, I think there’s something like a 1% chance of an existential catastrophe resulting from power-seeking AI systems this century. This is my all things considered guess at the risk incorporating considerations of the argument in favour of the risk (which is itself probabilistic), as well as reasons why this argument might be wrong (some of which I discuss below). This puts me on the less worried end of 80,000 Hours staff, whose views on our last staff survey ranged from 1–55%, with a median of 15%.

      It’s likely we’ll build advanced planning systems

      We’re going to argue that future systems with the following three properties might pose a particularly important threat to humanity:20

      1. They have goals and are good at making plans.

        Not all AI systems have goals or make plans to achieve those goals. But some systems (like some chess-playing AI systems) can be thought of in this way. When discussing power-seeking AI, we’re considering planning systems that are relatively advanced, with plans that are in pursuit of some goal(s), and that are capable of carrying out those plans.

      2. They have excellent strategic awareness.

        A particularly good planning system would have a good enough understanding of the world to notice obstacles and opportunities that may help or hinder its plans, and respond to these accordingly. Following Carlsmith, we’ll call this strategic awareness, since it allows systems to strategise in a more sophisticated way.

      3. They have highly advanced capabilities relative to today’s systems.

        For these systems to actually affect the world, we need them to not just make plans, but also be good at all the specific tasks required to execute those plans.

        Since we’re worried about systems attempting to take power from humanity, we are particularly concerned about AI systems that might be better than humans on one or more tasks that grant people significant power when carried out well in today’s world.

        For example, people who are very good at persuasion and/or manipulation are often able to gain power — so an AI being good at these things might also be able to gain power. Other examples might include hacking into other systems, tasks within scientific and engineering research, as well as business, military, or political strategy.

      These systems seem technically possible and we’ll have strong incentives to build them

      As we saw above, we’ve already produced systems that are very good at carrying out specific tasks.

      We’ve also already produced rudimentary planning systems, like AlphaStar, which skilfully plays the strategy game Starcraft, and MuZero, which plays chess, shogi, and Go.21

      We’re not sure whether these systems are producing plans in pursuit of goals per se, because we’re not sure exactly what it means to “have goals.” However, since they consistently plan in ways that achieve goals, it seems like they have goals in some sense.

      Moreover, some existing systems seem to actually represent goals as part of their neural networks.22

      That said, planning in the real world (instead of games) is much more complex, and to date we’re not aware of any unambiguous examples of goal-directed planning systems, or systems that exhibit high degrees of strategic awareness.

      But as we’ve discussed, we expect to see further advances within this century. And we think these advances are likely to produce systems with all three of the above properties.

      That’s because we think that there are particularly strong incentives (like profit) to develop these kinds of systems. In short: because being able to plan to achieve a goal, and execute that plan, seems like a particularly powerful and general way of affecting the world.

      Getting things done — whether that’s a company selling products, a person buying a house, or a government developing policy — almost always seems to require these skills. One example would be assigning a powerful system a goal and expecting the system to achieve it — rather than having to guide it every step of the way. So planning systems seem likely to be (economically and politically) extremely useful.23

      And if systems are extremely useful, there are likely to be big incentives to build them. For example, an AI that could plan the actions of a company by being given the goal to increase its profits (that is, an AI CEO) would likely provide significant wealth for the people involved — a direct incentive to produce such an AI.

      As a result, if we can build systems with these properties (and from what we know, it seems like we will be able to), it seems like we are likely to do so.24

      Advanced planning systems could easily be dangerously ‘misaligned’

      There are reasons to think that these kinds of advanced planning AI systems will be misaligned. That is, they will aim to do things that we don’t want them to do.25

      There are many reasons why systems might not be aiming to do exactly what we want them to do. For one thing, we don’t know how, using modern ML techniques, to give systems the precise goals we want (more here).26

      We’re going to focus specifically on some reasons why systems might by default be misaligned in such a way that they develop plans that pose risks to humanity’s ability to influence the world — even when we don’t want that influence to be lost.27

      What do we mean by “by default”? Essentially, unless we actively find solutions to some (potentially quite difficult) problems, then it seems like we’ll create dangerously misaligned AI. (There are reasons this might be wrong — which we discuss later.)

      Three examples of “misalignment” in a variety of systems

      It’s worth noting that misalignment isn’t a purely theoretical possibility (or specific to AI) — we see misaligned goals in humans and institutions all the time, and have also seen examples of misalignment in AI systems.28

      The democratic political framework is intended to ensure that politicians make decisions that benefit society. But what political systems actually reward is winning elections, so that’s what many politicians end up aiming for.

      This is a decent proxy goal — if you have a plan to improve people’s lives, they’re probably more likely to vote for you — but it isn’t perfect. As a result, politicians do things that aren’t clearly the best way of running a country, like raising taxes at the start of their term and cutting them right before elections.

      That is to say, the things the system does are at least a little different from what we would, in a perfect world, want it to do: the system is misaligned.

      Companies have profit-making incentives. By producing more, and therefore helping people obtain goods and services at cheaper prices, companies make more money.

      This is sometimes a decent proxy for making the world better, but profit isn’t actually the same as the good of all of humanity (bold claim, we know). As a result, there are negative externalities: for example, companies will pollute to make money despite this being worse for society overall.

      Again, we have a misaligned system, where the things the system does are at least a little different from what we would want it to do.

      DeepMind has documented examples of specification gaming: an AI doing well according to its specified reward function (which encodes our intentions for the system), but not doing what researchers intended.

      In one example, a robot arm was asked to grasp a ball. But the reward was specified in terms of whether humans thought the robot had been successful. As a result, the arm learned to hover between the ball and the camera, fooling the humans into thinking that it had grasped the ball.29

      A simulated arm hovers between a ball and a camera.
      Source: Christiano et al., 2017

      So we know it’s possible to create a misaligned AI system.

      Why these systems could (by default) be dangerously misaligned

      Here’s the core argument of this article. We’ll use all three properties from earlier: planning ability, strategic awareness, and advanced capabilities.

      To start, we should realise that a planning system that has a goal will also develop ‘instrumental goals’: things that, if they occur, will make it easier to achieve an overall goal.

      We use instrumental goals in plans all the time. For example, a high schooler planning their career might think that getting into university will be helpful for their future job prospects. In this case, “getting into university” would be an instrumental goal.

      A sufficiently advanced AI planning system would also include instrumental goals in its overall plans.

      If a planning AI system also has enough strategic awareness, it will be able to identify facts about the real world (including potential things that would be obstacles to any plans), and plan in light of them. Crucially, these facts would include that access to resources (e.g. money, compute, influence) and greater capabilities — that is, forms of power — open up new, more effective ways of achieving goals.

      This means that, by default, advanced planning AI systems would have some worrying instrumental goals:

      • Self-preservation — because a system is more likely to achieve its goals if it is still around to pursue them (in Stuart Russell’s memorable phrase, “You can’t fetch the coffee if you’re dead”).
      • Preventing any changes to the AI system’s goals — since changing its goals would lead to outcomes that are different from those it would achieve with its current goals.
      • Gaining power — for example, by getting more resources and greater capabilities.

      Crucially, one clear way in which the AI can ensure that it will continue to exist (and not be turned off), and that its objectives will never be changed, would be to gain power over the humans who might affect it (we talk here about how AI systems might actually be able to do that).

      What’s more, the AI systems we’re considering have advanced capabilities — meaning they can do one or more tasks that grant people significant power when carried out well in today’s world. With such advanced capabilities, these instrumental goals will not be out of reach, and as a result, it seems like the AI system would use its advanced capabilities to get power as part of the plan’s execution. If we don’t want the AI systems we create to take power away from us this would be a particularly dangerous form of misalignment.

      In the most extreme scenarios, a planning AI system with sufficiently advanced capabilities could successfully disempower us completely.

      As a (very non-rigorous) intuitive check on this argument, let’s try to apply it to humans.

      Humans have a variety of goals. For many of these goals, some form of power-seeking is advantageous: though not everyone seeks power, many people do (in the form of wealth or social or political status), because it’s useful for getting what they want. This is not catastrophic (usually!) because, as human beings:

      • We generally feel bound by human norms and morality (even people who really want wealth usually aren’t willing to kill to get it).
      • We aren’t that much more capable or intelligent than one another. So even in cases where people aren’t held back by morality, they’re not able to take over the world.

      (We discuss whether humans are truly power-seeking later.)

      A sufficiently advanced AI wouldn’t have those limitations.

      It might be hard to find ways to prevent this sort of misalignment

      The point of all this isn’t to say that any advanced planning AI system will necessarily attempt to seek power. Instead, it’s to point out that, unless we find a way to design systems that don’t have this flaw, we’ll face significant risk.

      It seems more than plausible that we could create an AI system that isn’t misaligned in this way, and thereby prevent any disempowerment. Here are some strategies we might take (plus, unfortunately, some reasons why they might be difficult in practice):30

      • Control the objectives of the AI system. We may be able to design systems that simply don’t have objectives to which the above argument applies — and thus don’t incentivise power-seeking behaviour. For example, we could find ways to explicitly instruct AI systems not to harm humans, or find ways to reward AI systems (in training environments) for not engaging in specific kinds of power-seeking behaviour (and also find ways to ensure that this behaviour continues outside the training environment).

        Carlsmith gives two reasons why doing this seems particularly hard.

        First, for modern ML systems, we don’t get to explicitly state a system’s objectives — instead we reward (or punish) a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is goal misgeneralisation. Researchers have uncovered real examples of systems that appear to have learned to pursue a goal in the training environment, but then fail to generalise that goal when they operate in a new environment. This raises the possibility that we could think we’ve successfully trained an AI system not to seek power — but that the system would seek power anyway when deployed in the real world.31

        Second, when we specify a goal to an AI system (or, when we can’t explicitly do that, when we find ways to reward or punish a system during training), we usually do this by giving the system a proxy by which outcomes can be measured (e.g. positive human feedback on a system’s achievement). But often those proxies don’t quite work.32 In general, we might expect that even if a proxy appears to correlate well with successful outcomes, it might not do so when that proxy is optimised for. (The examples above of politicians, companies, and the robot arm failing to grasp a ball are illustrations of this.) We’ll look at a more specific example of how problems with proxies could lead to an existential catastrophe here.

        For more on the specific difficulty of controlling the objectives given to deep neural networks trained using self-supervised learning and reinforcement learning, we recommend OpenAI governance researcher Richard Ngo’s discussion of how realistic training processes lead to the development of misaligned goals.

      • Control the inputs into the AI system. AI systems will only develop plans to seek power if they have enough information about the world to realise that seeking power is indeed a way to achieve its goals.

      • Control the capabilities of the AI system. AI systems will likely only be able to carry out plans to seek power if they have sufficiently advanced capabilities in skills that grant people significant power in today’s world.

      But to make any strategy work, it will need to both:

      • Retain the usefulness of the AI systems — and so remain economically competitive with less safe systems. Controlling the inputs and capabilities of AI systems will clearly have costs, so it seems hard to ensure that these controls, even if they’re developed, are actually used. But this is also a problem for controlling a system’s objectives. For example, we may be able to prevent power-seeking behaviour by ensuring that AI systems stop to check in with humans about any decisions they make. But these systems might be significantly slower and less immediately useful to people than systems that don’t stop to carry out these checks. As a result, there might still be incentives to use a faster, more initially effective misaligned system (we’ll look at incentives more in the next section).

      • Continue to work as the planning ability and strategic awareness of systems improve over time. Some seemingly simple solutions (for example, trying to give a system a long list of things it isn’t allowed to do, like stealing money or physically harming humans) break down as the planning abilities of the systems increase. This is because, the more capable a system is at developing plans, the more likely it is to identify loopholes or failures in the safety strategy — and as a result, the more likely the system is to develop a plan that involves power-seeking.

      Ultimately, by looking at the state of the research on this topic, and speaking to experts in the field, we think that there are currently no known ways of building aligned AI systems that seem likely to fulfil both these criteria.

      So: that’s the core argument. There are many variants of this argument. Some have argued that AI systems might gradually shape our future via subtler forms of influence that nonetheless could amount to an existential catastrophe; others argue that the most likely form of disempowerment is in fact just killing everyone. We’re not sure how a catastrophe would be most likely to play out, but have tried to articulate the heart of the argument, as we see it: that AI presents an existential risk.

      There are definitely reasons this argument might not be right! We go through some of the reasons that seem strongest to us below. But overall it seems possible that, for at least some kinds of advanced planning AI systems, it will be harder to build systems that don’t seek power in this dangerous way than to build systems that do.

      At this point, you may have questions like:

      We think there are good responses to all these questions, so we’ve added a long list of arguments against working on AI risk — and our responses — for these (and other) questions below.

      Disempowerment by AI systems would be an existential catastrophe

      When we say we’re concerned about existential catastrophes, we’re not just concerned about risks of extinction. This is because the source of our concern is rooted in longtermism: the idea that the lives of all future generations matter, and so it’s extremely important to protect their interests.

      This means that any event that could prevent all future generations from living lives full of whatever you think makes life valuable (whether that’s happiness, justice, beauty, or general flourishing) counts as an existential catastrophe.

      It seems extremely unlikely that we’d be able to regain power over a system that successfully disempowers humanity. And as a result, the entirety of the future — everything that happens for Earth-originating life, for the rest of time — would be determined by the goals of systems that, although built by us, are not aligned with us. Perhaps those goals will create a long and flourishing future, but we see little reason for confidence.33

      This isn’t to say that we don’t think AI also poses a risk of human extinction. Indeed, we think making humans extinct is one highly plausible way in which an AI system could completely and permanently ensure that we are never able to regain power.

      People might deploy misaligned AI systems despite the risk

      Surely no one would actually build or use a misaligned AI if they knew it could have such terrible consequences, right?

      Unfortunately, there are at least two reasons people might create and then deploy misaligned AI — which we’ll go through one at a time:34

      1. People might think it’s aligned when it’s not

      Imagine there’s a group of researchers trying to tell, in a test environment, whether a system they’ve built is aligned. We’ve argued that an intelligent planning AI will want to improve its abilities to effect changes in pursuit of its objective, and it’s almost always easier to do that if it’s deployed in the real world, where a much wider range of actions are available. As a result, any misaligned AI that’s sophisticated enough will try to understand what the researchers want it to do and at least pretend to be doing that, deceiving the researchers into thinking it’s aligned. (For example, a reinforcement learning system might be rewarded for certain apparent behaviour during training, regardless of what it’s actually doing.)

      Hopefully, we’ll be aware of this sort of behaviour and be able to detect it. But catching a sufficiently advanced AI in deception seems potentially harder than catching a human in a lie, which isn’t always easy. For example, a sufficiently intelligent deceptive AI system may be able to deceive us into thinking we’ve solved the problem of AI deception, even if we haven’t.

      If AI systems are good at deception, and have sufficiently advanced capabilities, a reasonable strategy for such a system could be to deceive humans completely until the system has a way to guarantee it can overcome any resistance to its goals.

      2. There are incentives to deploy systems sooner rather than later

      We might also expect some people with the ability to deploy a misaligned AI to charge ahead despite any warning signs of misalignment that do come up, because of race dynamics — where people developing AI want to do so before anyone else.

      For example, if you’re developing an AI to improve military or political strategy, it’s much more useful if none of your rivals have a similarly powerful AI.

      These incentives apply even to people attempting to build an AI in the hopes of using it to make the world a better place.

      For example, say you’ve spent years and years researching and developing a powerful AI system, and all you want is to use it to make the world a better place. Simplifying things a lot, say there are two possibilities:

      1. This powerful AI will be aligned with your beneficent aims, and you’ll transform society in a potentially radically positive way.
      2. The AI will be sufficiently misaligned that it’ll take power and permanently end humanity’s control over the future.

      Let’s say you think there’s a 90% chance that you’ve succeeded in building an aligned AI. But technology often develops at similar speeds across society, so there’s a good chance that someone else will soon also develop a powerful AI. And you think they’re less cautious, or less altruistic, so you think their AI will only have an 80% chance of being aligned with good goals, and pose a 20% chance of existential catastrophe. And only if you get there first can your more beneficial AI be dominant. As a result, you might decide to go ahead with deploying your AI, accepting the 10% risk.

      This all sounds very abstract. What could an existential catastrophe caused by AI actually look like?

      The argument we’ve given so far is very general, and doesn’t really look at the specifics of how an AI that is attempting to seek power might actually do so.

      If you’d like to get a better understanding of what an existential catastrophe caused by AI might actually look like, we’ve written a short separate article on that topic. If you’re happy with the high-level abstract arguments so far, feel free to skip to the next section!

      What could an existential AI catastrophe actually look like?

      4. Even if we find a way to avoid power-seeking, there are still risks

      So far we’ve described what a large proportion of researchers in the field2 think is the major existential risk from potential advances in AI, which depends crucially on an AI seeking power to achieve its goals.

      If we can prevent power-seeking behaviour, we will have reduced existential risk substantially.

      But even if we succeed, there are still existential risks that AI could pose.

      AI could worsen war

      We’re concerned that great power conflict could also pose a substantial threat to our world, and advances in AI seem likely to change the nature of war — through lethal autonomous weapons35 or through automated decision making.36

      In some cases, great power war could pose an existential threat — for example, if the conflict is nuclear. It’s possible that AI could exacerbate risks of nuclear escalation, although there are also reasons to think AI could decrease this risk.37

      Finally, if a single actor produces particularly powerful AI systems, this could be seen as giving them a decisive strategic advantage. For example, the US may produce a planning AI that’s intelligent enough to ensure that Russia or China could never successfully launch another nuclear weapon. This could incentivise a first strike from the actor’s rivals before these AI-developed plans can ever be put into action.

      AI could be used to develop dangerous new technology

      We expect that AI systems will help increase the rate of scientific progress.38

      While there would be clear benefits to this automation — the rapid development of new medicine, for example — some forms of technological development can pose threats, including existential threats, to humanity. This could be through biotechnology39 (see our article on preventing catastrophic pandemics for more) or through some other form of currently unknown but dangerous technology.40

      AI could empower totalitarian governments

      An AI-enabled authoritarian government could completely automate the monitoring and repression of its citizens, as well as significantly influence the information people see, perhaps making it impossible to coordinate action against such a regime.41

      If this became a form of truly stable totalitarianism, this could make people’s lives far worse for extremely long periods of time, making it a particularly scary possible scenario resulting from AI.

      Other risks from AI

      We’re also concerned about the following issues, though we know less about them:

      • Existential threats that result not from the power-seeking behaviour of AI systems, but as a result of the interaction between AI systems. (In order to pose a risk, these systems would still need to be, to some extent, misaligned.)
      • Other ways we haven’t thought of in which AI systems could be misused — especially ones that might significantly affect future generations.
      • Other moral mistakes made in the design and use of AI systems, particularly if future AI systems are themselves deserving of moral consideration. For example, perhaps we will (inadvertently) create conscious AI systems, which could then suffer in huge numbers. We think this could be extremely important, so we’ve written about it in a separate problem profile.

      This is a really difficult question to answer.

      There are no past examples we can use to determine the frequency of AI-related catastrophes.

      All we have to go off are arguments (like the ones we’ve given above), and less relevant data like the history of technological advances. And we’re definitely not certain that the arguments we’ve presented are completely correct.

      Consider the argument we gave earlier about the dangers of power-seeking AI in particular, based off Carlsmith’s report. At the end of his report, Carlsmith gives some rough guesses of the chances that each stage of his argument is correct (conditional on the previous stage being correct):

      1. By 2070 it will be possible and financially feasible to build strategically aware systems that can outperform humans on many power-granting tasks, and that can successfully make and carry out plans: Carlsmith guesses there’s a 65% chance of this being true.
      2. Given this feasibility, there will be strong incentives to build such systems: 80%.
      3. Given both the feasibility and incentives to build such systems, it will be much harder to develop aligned systems that don’t seek power than to develop misaligned systems that do, but which are at least superficially attractive to deploy: 40%.
      4. Given all of this, some deployed systems will seek power in a misaligned way that causes over $1 trillion (in 2021 dollars) of damage: 65%.
      5. Given all the previous premises, misaligned power-seeking AI systems will end up disempowering basically all of humanity: 40%.
      6. Given all the previous premises, this disempowerment will constitute an existential catastrophe: 95%.

      Multiplying these numbers together, Carlsmith estimated that there’s a 5% chance that his argument is right and there will be an existential catastrophe from misaligned power-seeking AI by 2070. When we spoke to Carlsmith, he noted that in the year between the writing of his report and the publication of this article, his overall guess at the chance of an existential catastrophe from power-seeking AI by 2070 had increased to >10%.42

      The overall probability of existential catastrophe from AI would, in Carlsmith’s view, be higher than this, because there are other routes to possible catastrophe — like those discussed in the previous section — although our guess is that these other routes are probably a lot less likely to lead to existential catastrophe.

      For another estimate, in The Precipice, philosopher and advisor to 80,000 Hours Toby Ord estimated a 1-in-6 risk of existential catastrophe by 2120 (from any cause), and that 60% of this risk comes from misaligned AI — giving a total of a 10% risk of existential catastrophe from misaligned AI by 2120.

      A 2021 survey of 44 researchers working on reducing existential risks from AI found the median risk estimate was 32.5% — the highest answer given was 98%, and the lowest was 2%.43 There’s obviously a lot of selection bias here: people choose to work on reducing risks from AI because they think this is unusually important, so we should expect estimates from this survey to be substantially higher than estimates from other sources. But there’s clearly significant uncertainty about how big this risk is, and huge variation in answers.

      All these numbers are shockingly, disturbingly high. We’re far from certain that all the arguments are correct. But these are generally the highest guesses for the level of existential risk of any of the issues we’ve examined (like engineered pandemics, great power conflict, climate change, or nuclear war).

      That said, I think there are reasons why it’s harder to make guesses about the risks from AI than other risks – and possibly reasons to think that the estimates we’ve quoted above are systematically too high.

      If I was forced to put a number on it, I’d say something like 1%. This number includes considerations both in favour and against the argument. I’m less worried than other 80,000 Hours staff — our position as an organisation is that the risk is between 3% and 50%.

      All this said, the arguments for such high estimates of the existential risk posed by AI are persuasive — making risks from AI a top contender for the most pressing problem facing humanity.

      5. We can tackle these risks

      We think one of the most important things you can do would be to help reduce the gravest risks that AI poses.

      This isn’t just because we think these risks are high — it’s also because we think there are real things we can do to reduce these risks.

      We know of two broad approaches:

      1. Technical AI safety research
      2. AI governance research and implementation

      For both of these, there are lots of ways to contribute. We’ll go through them in more detail below, but in this section we want to illustrate the point that there are things we can do to address these risks.

      Technical AI safety research

      The benefits of transformative AI could be huge, and there are many different actors involved (operating in different countries), which means it will likely be really hard to prevent its development altogether.

      (It’s also possible that it wouldn’t even be a good idea if we could — after all, that would mean forgoing the benefits as well as preventing the risks.)

      As a result, we think it makes more sense to focus on making sure that this development is safe — meaning that it has a high probability of avoiding all the catastrophic failures listed above.

      One way to do this is to try to develop technical solutions to prevent the kind of power-seeking behaviour we discussed earlier — this is generally known as working on technical AI safety, sometimes called just “AI safety” for short.

      Read more about technical AI safety research below.

      AI governance research and implementation

      A second strategy for reducing risks from AI is to shape its development through policy, norms-building, and other governance mechanisms.

      Good AI governance can help technical safety work, for example by producing safety agreements between corporations, or helping talented safety researchers from around the world move to where they can be most effective. AI governance could also help with other problems that lead to risks, like race dynamics.

      But also, as we’ve discussed, even if we successfully manage to make AI do what we want (i.e. we ‘align’ it), we might still end up choosing something bad for it to do! So we need to worry about the incentives not just of the AI systems, but of the human actors using them.

      Read more about AI governance research and implementation below.

      6. This work is neglected

      We estimate there are around 400 people around the world working directly on reducing the chances of an AI-related existential catastrophe (with a 90% confidence interval ranging between 200 and 1,000). Of these, about three quarters are working on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.44 We think there are around 800 people working in complementary roles, but we’re highly uncertain about this estimate.45

      In The Precipice, Ord estimated that there was between $10 million and $50 million spent on reducing AI risk in 2020.

      That might sound like a lot of money, but we’re spending something like 1,000 times that amount11 on speeding up the development of transformative AI via commercial capabilities research and engineering at large AI labs.

      To compare the $50 million spent on AI safety in 2020 to other well-known risks, we’re currently spending several hundreds of billions per year on tackling climate change.

      Because this field is so neglected and has such high stakes, we think your impact working on risks from AI could be much higher than working on many other areas — which is why our top two recommended career paths for making a big positive difference in the world are technical AI safety and AI policy research and implementation.

      What do we think are the best arguments against this problem being pressing?

      As we said above, we’re not totally sure the arguments we’ve presented for AI representing an existential threat are right. Though we do still think that the chance of catastrophe from AI is high enough to warrant many more people pursuing careers to try to prevent such an outcome, we also want to be honest about the arguments against doing so, so you can more easily make your own call on the question.

      Here we’ll cover the strongest reasons (in our opinion) to think this problem isn’t particularly pressing. In the next section we’ll cover some common objections that (in our opinion) hold up less well, and explain why.

      The longer we have before transformative AI is developed, the less pressing it is to work now on ways to ensure that it goes well. This is because the work of others in the future could be much better or more relevant than the work we are able to do now.

      Also, if it takes us a long time to create transformative AI, we have more time to figure out how to make it safe. The risk seems much higher if AI developers will create transformative AI in the next few decades.

      It seems plausible that the first transformative AI won’t be based on current deep learning methods. (AI Impacts have documented arguments that current methods won’t be able to produce AI that has human-level intelligence.) This could mean that some of our current research might not end up being useful (and also — depending on what method ends up being used — could make the arguments for risk less worrying).

      Relatedly, we might expect that progress in the development of AI will occur in bursts. Previously, the field has seen AI winters, periods of time with significantly reduced investment, interest and research in AI. It’s unclear how likely it is that we’ll see another AI winter — but this possibility should lengthen our guesses about how long it’ll be before we’ve developed transformative AI. Cotra writes about the possibility of an AI winter in part four of her report forecasting transformative AI. New constraints on the rate of growth of AI capabilities, like the availability of training data, could also mean that there’s more time to work on this (Cotra discusses this here.)

      Thirdly, the estimates about when we’ll get transformative AI from Cotra, Kanfosky and Davidson that we looked at earlier were produced by people who already expected that working on preventing an AI-related catastrophe might be one of the world’s most pressing problems. As a result, there’s selection bias here: people who think transformative AI is coming relatively soon are also the people incentivised to carry out detailed investigations. (That said, if the investigations themselves seem strong, this effect could be pretty small.)

      Finally, none of the estimates we discussed earlier were trying to predict when an existential catastrophe might occur. Instead, they were looking at when AI systems might be able to automate all tasks humans can do, or when AI systems might significantly transform the economy. It’s by no means certain that the kinds of AI systems that could transform the economy would be the same advanced planning systems that are core to the argument that AI systems might seek power. Advanced planning systems do seem to be particularly useful, so there is at least some reason to think these might be the sorts of systems that end up being built. But even if the forecasted transformative AI systems are advanced planning systems, it’s unclear how capable such systems would need to be to pose a threat — it’s more than plausible that systems would need to be far more capable to pose a substantial existential threat than they would need to be to transform the economy. This would mean that all the estimates we considered above would be underestimates of how long we have to work on this problem.

      All that said, it might be extremely difficult to find technical solutions to prevent power-seeking behaviour — and if that’s the case, focusing on finding those solutions now does seem extremely valuable.

      Overall, we think that transformative AI is sufficiently likely in the next 10–80 years that it is well worth it (in expected value terms) to work on this issue now. Perhaps future generations will take care of it, and all the work we’d do now will be in vain — we hope so! But it might not be prudent to take that risk.

      If the best AI we have improves gradually over time (rather than AI capabilities remaining fairly low for a while and then suddenly increasing), we’re likely to end up with ‘warning shots’: we’ll notice forms of misaligned behaviour in fairly weak systems, and be able to correct for it before it’s too late.

      In such a gradual scenario, we’ll have a better idea about what form powerful AI might take (e.g. whether it will be built using current deep learning techniques, or something else entirely), which could significantly help with safety research. There will also be more focus on this issue by society as a whole, as the risks of AI become clearer.

      So if gradual development of AI seems more likely, the risk seems lower.

      But it’s very much not certain that AI development will be gradual, or if it is, gradual enough for the risk to be noticeably lower. And even if AI development is gradual, there could still be significant benefits to having plans and technical solutions in place well in advance. So overall we still think it’s extremely valuable to attempt to reduce the risk now.

      If you want to learn more, you can read AI Impacts’ work on arguments for and against discontinuous (i.e. non-gradual) progress in AI development, and Toby Ord and Owen Cotton-Barratt on strategic implications of slower AI development.

      Making something have goals aligned with human designers’ ultimate objectives and making something useful seem like very related problems. If so, perhaps the need to make AI useful will drive us to produce only aligned AI — in which case the alignment problem is likely to be solved by default.

      Ben Garfinkel gave a few examples of this on our podcast:

      • You can think of a thermostat as a very simple AI that attempts to keep a room at a certain temperature. The thermostat has a metal strip in it that expands as the room heats, and cuts off the current once a certain temperature has been reached. This piece of metal makes the thermostat act like it has a goal of keeping the room at a certain temperature, but also makes it capable of achieving this goal (and therefore of being actually useful).
      • Imagine you’re building a cleaning robot with reinforcement learning techniques — that is, you provide some specific condition under which you give the robot positive feedback. You might say something like, “The less dust in the house, the more positive the feedback.” But if you do this, the robot will end up doing things you don’t want — like ripping apart a cushion to find dust on the inside. Probably instead you need to use techniques like those being developed by people working on AI safety (things like watching a human clean a house and letting the AI figure things out from there). So people building AIs will be naturally incentivised to also try to make them aligned (and so in some sense safe), so they can do their jobs.

      If we need to solve the problem of alignment anyway to make useful AI systems, this significantly reduces the chances we will have misaligned but still superficially useful AI systems. So the incentive to deploy a misaligned AI would be a lot lower, reducing the risk to society.

      That said, there are still reasons to be concerned. For example, it seems like we could still be susceptible to problems of AI deception.

      And, as we’ve argued, AI alignment is only part of the overall issue. Solving the alignment problem isn’t the same thing as completely eliminating existential risk from AI, since aligned AI could also be used to bad ends — such as by authoritarian governments.

      As with many research projects in their early stages, we don’t know how hard the alignment problem — or other AI problems that pose risks — are to solve. Someone could believe there are major risks from machine intelligence, but be pessimistic about what additional research or policy work will accomplish, and so decide not to focus on it.

      This is definitely a reason to potentially work on another issue — the solvability of an issue is a key part of how we try to compare global problems. For example, we’re also very concerned about risks from pandemics, and it may be much easier to solve that issue.

      That said, we think that given the stakes, it could make sense for many people to work on reducing AI risk, even if you think the chance of success is low. You’d have to think that it was extremely difficult to reduce risks from AI in order to conclude that it’s better just to let the risks materialise and the chance of catastrophe play out.

      At least in our own case at 80,000 Hours, we want to keep trying to help with AI safety — for example, by writing profiles like this one — even if the chance of success seems low (though in fact we’re overall pretty optimistic).

      There are some reasons to think that the core argument that any advanced, strategically aware planning system will by default seek power (which we gave here) isn’t totally right.46

      1. For a start, the argument that advanced AI systems will seek power relies on the idea that systems will produce plans to achieve goals. We’re not quite sure what this means — and as a result, we’re not sure what properties are really required for power-seeking behaviour to occur, and unsure whether the things we’ll build will have those properties.

        We’d love to see a more in-depth analysis of what aspects of planning are economically incentivised, and whether those aspects seem like they’ll be enough for the argument for power-seeking behaviour to work.

        Grace has written more about the ambiguity around “how much goal-directedness is needed to bring about disaster”

      2. It’s possible that only a few goals that AI systems could have would lead to misaligned power-seeking.

        Richard Ngo, in his analysis of what people mean by “goals”, points out that you’ll only get power-seeking behaviour if you have goals that mean the system can actually benefit from seeking power. Ngo suggests that these goals need to be “large-scale.” (Some have argued that, by default, we should expect AI systems to have “short-term” goals that won’t lead to power-seeking behaviour.)

        But whether an AI system would plan to take power depends on how easy it would be for the system to take power, because the easier it is for a system to take power, the more likely power-seeking plans are to be successful — so a good planning system would be more likely to choose them. This suggests it will be easier to accidentally create a power-seeking AI system as systems’ capabilities increase.

        So there still seems to be cause for increased concern, because the capabilities of AI systems do seem to be increasing fast. There are two considerations here: if few goals really lead to power-seeking, even for quite capable AI systems, that significantly reduces the risk and thus the importance of the problem. But it might also increase the solvability of the problem by demonstrating that solutions could be easy to find (e.g. the solution of never giving systems “large-scale” goals) — making this issue more valuable for people to work on.

      3. Earlier we argued that we can expect AI systems to do things that seem generally instrumentally useful to their overall goal, and that as a result it could be hard to prevent AI systems from doing these instrumentally useful things.

        But we can find examples where how generally instrumentally useful things would be doesn’t seem to affect how hard it is to prevent these things. Consider an autonomous car that can move around only if its engine is on. For many possible goals (other than, say, turning the car radio on), it seems like it would be useful for the car to be able to move around, so we should expect the car to turn its engine on. But despite that, we might still be able to train the car to keep its engine off: for example, we can give it some negative feedback whenever it turns the engine on, even if we also had given the car some other goals. Now imagine we improve the car so that its top speed is higher — this massively increases the number of possible action sequences that involve, as a first step, turning its engine on. In some sense, this seems to increase the instrumental usefulness of turning the engine on — there are more possible actions the car can take, once its engine is on, because the range of possible speeds it can travel at is higher. (It’s not clear if this sense of “instrumental usefulness” is the same as the one in the argument for the risk, although it does seem somewhat related.) But it doesn’t seem like this increase in the instrumental usefulness of turning on the engine makes it much harder to stop the car turning it on. Simple examples like this cast some doubt on the idea that, just because a particular action is instrumentally useful, we won’t be able to find ways to prevent it. (For more on this example, see page 25 of Garfinkel’s review of Carlsmith’s report.)

      4. Humans are clearly highly intelligent, but it’s unclear they are perfect goal-optimisers. For example, humans often face some kind of existential angst over what their true goals are. And even if we accept humans as an example of a strategically aware agent capable of planning, humans certainly aren’t always power-seeking. We obviously care about having basics like food and shelter, and many people go to great lengths for more money, status, education, or even formal power. But some humans choose not to pursue these goals, and pursuing them doesn’t seem to correlate with intelligence.

        However, this doesn’t mean that the argument that there will be an incentive to seek power is wrong. Most people do face and act on incentives to gain forms of influence via wealth, status, promotions, and so on. And we can explain the observation that humans don’t usually seek huge amounts of power by observing that we aren’t usually in circumstances that make the effort worth it.

        For example, most people don’t try to start billion-dollar companies — you probably won’t succeed, and it’ll cost you a lot of time and effort.

        But you’d still walk across the street to pick up a billion-dollar cheque.

      The absence of extreme power-seeking in many humans, along with uncertainties in what it really means to plan to achieve goals, does suggest that the argument we gave that advanced AI systems will seek power above might not be completely correct. And they also suggest that, if there really is a problem to solve here, in principle, alignment research into preventing power-seeking in AIs could succeed.

      This is good news! But for the moment — short of hoping we’re wrong about the existence of the problem — we don’t actually know how to prevent this power-seeking behaviour.

      Arguments against working on AI risk to which we think there are strong responses

      We’ve just discussed the major objections to working on AI risk that we think are most persuasive. In this section, we’ll look at objections that we think are less persuasive, and give some reasons why.

      People have been saying since the 1950s that artificial intelligence smarter than humans is just around the corner.

      But it hasn’t happened yet.

      One reason for this could be that it’ll never happen. Some have argued that producing artificial general intelligence is fundamentally impossible. Others think it’s possible, but unlikely to actually happen, especially not with current deep learning methods.

      Overall, we think the existence of human intelligence shows it’s possible in principle to create artificial intelligence. And the speed of current advances isn’t something we think would have been predicted by those who thought that we’ll never develop powerful, general AI.

      But most importantly, the idea that you need fully general intelligent AI systems for there to be a substantial existential risk is a common misconception.

      The argument we gave earlier relied on AI systems being as good or better than humans in a subset of areas: planning, strategic awareness, and areas related to seeking and keeping power. So as long as you think all these things are possible, the risk remains.

      And even if no single AI has all of these properties, there are still ways in which we might end up with systems of ‘narrow’ AI systems that, together, can disempower humanity. For example, we might have a planning AI that develops plans for a company, a separate AI system that measures things about the company, another AI system that attempts to evaluate plans from the first AI by predicting how much profit each will make, and further AI systems that carry out those plans (for example, by automating the building and operation of factories). Considered together, this system as a whole has the capability to form and carry out plans to achieve some goal, and potentially also has advanced capabilities in areas that help it seek power.

      It does seem like it will be easier to prevent these ‘narrow’ AI systems from seeking power. This could happen if the skills the AIs have, even when combined, don’t add up to being able to plan to achieve goals, or if the narrowness reduces the risk of systems developing power-seeking plans (e.g. if you build systems that can only produce very short-term plans). It also seems like it gives another point of weakness for humans to intervene if necessary: the coordination of the different systems.

      Nevertheless, the risk remains, even from systems of many interacting AIs.

      It might just be really, really hard.

      Stopping people and computers from running software is already incredibly difficult.

      Think about how hard it would be to shut down Google’s web services. Google’s data centres have millions of servers over 34 different locations, many of which are running the same sets of code. And these data centres are absolutely crucial to Google’s bottom line, so even if Google could decide to shut down their entire business, they probably wouldn’t.

      Or think about how hard it is to get rid of computer viruses that autonomously spread between computers across the world.

      Ultimately, we think any dangerous power-seeking AI system will be looking for ways to not be turned off, which makes it more likely we’ll be in one of these situations, rather than in a case where we can just unplug a single machine.

      That said, we absolutely should try to shape the future of AI such that we can ‘unplug’ powerful AI systems.

      There may be ways we can develop systems that let us turn them off. But for the moment, we’re not sure how to do that.

      Ensuring that we can turn off potentially dangerous AI systems could be a safety measure developed by technical AI safety research, or it could be the result of careful AI governance, such as planning coordinated efforts to stop autonomous software once it’s running.

      We could (and should!) definitely try.

      If we could successfully ‘sandbox’ an advanced AI — that is, contain it to a training environment with no access to the real world until we were very confident it wouldn’t do harm — that would help our efforts to mitigate AI risks tremendously.

      But there are a few things that might make this difficult.

      For a start, we might only need one failure — like one person to remove the sandbox, or one security vulnerability in the sandbox we hadn’t noticed — for the AI system to begin affecting the real world.

      Moreover, this solution doesn’t scale with the capabilities of the AI system. This is because:

      • More capable systems are more likely to be able to find vulnerabilities or other ways of leaving the sandbox (e.g. threatening or coercing humans).
      • Systems that are good at planning might attempt to deceive us into deploying them.

      So the more dangerous the AI system, the less likely sandboxing is to be possible. That’s the opposite of what we’d want from a good solution to the risk.

      For some definitions of “truly intelligent” — for example, if true intelligence includes a deep understanding of morality and a desire to be moral — this would probably be the case.

      But if that’s your definition of truly intelligent, then it’s not truly intelligent systems that pose a risk. As we argued earlier, it’s advanced systems that can plan and have strategic awareness that pose risks to humanity.

      With sufficiently advanced strategic awareness, an AI system’s excellent understanding of the world may well encompass an excellent understanding of people’s moral beliefs. But that’s not a strong reason to think that such a system would act morally.

      For example, when we learn about other cultures or moral systems, that doesn’t necessarily create a desire to follow their morality. A scholar of the Antebellum South might have a very good understanding of how 19th century slave owners justified themselves as moral, but would be very unlikely to defend slavery.

      AI systems with excellent understandings of human morality could be even more dangerous than AIs without such understanding: the AI system could act morally at first as a way to deceive us into thinking that it is safe.

      There are definitely dangers from current artificial intelligence.

      For example, data used to train neural networks often contains hidden biases. This means that AI systems can learn these biases — and this can lead to racist and sexist behaviour.

      There are other dangers too. Our earlier discussion on nuclear war explains a threat which doesn’t require AI systems to have particularly advanced capabilities.

      But we don’t think the fact that there are also risks from current systems is a reason not to prioritise reducing existential threats from AI, if they are sufficiently severe.

      As we’ve discussed, future systems — not necessarily superintelligence or totally general intelligence, but systems advanced in their planning and power-seeking capabilities — seem like they could pose threats to the existence of the entirety of humanity. And it also seems somewhat likely that we’ll produce such systems this century.

      What’s more, lots of technical AI safety research is also relevant to solving problems with existing AI systems. For example, some research focuses on ensuring that ML models do what we want them to, and will still do this as their size and capabilities increase; other research tries to work out how and why existing models are making the decisions and taking the actions that they do.

      As a result, at least in the case of technical research, the choice between working on current threats and future risks may look more like a choice between only ensuring that current models are safe, or instead finding ways to ensure that current models are safe that will also continue to work as AI systems become more complex and more intelligent.

      Ultimately, we have limited time in our careers, so choosing which problem to work on could be a huge way of increasing your impact. When there are such substantial threats, it seems reasonable for many people to focus on addressing these worst-case possibilities.

      Yes, it can.

      AI systems are already improving healthcare, putting driverless cars on the roads, and automating household chores.

      And if we’re able to automate advancements in science and technology, we could see truly incredible economic and scientific progress. AI could likely help solve many of the world’s most pressing problems.

      But, just because something can do a lot of good, that doesn’t mean it can’t also do a lot of harm. AI is an example of a dual-use technology — a technology that can be used for both dangerous and beneficial purposes. For example, researchers were able to get an AI model that was trained to develop medical drugs to instead generate designs for bioweapons.

      We are excited and hopeful about seeing large benefits from AI. But we also want to work hard to minimise the enormous risks advanced AI systems pose.

      It’s undoubtedly true that some people are drawn to thinking about AI safety because they like computers and science fiction — as with any other issue, there are people working on it not because they think it’s important, but because they think it’s cool.

      But, for many people, working on AI safety comes with huge reluctance.

      For me, and many of us at 80,000 Hours, spending our limited time and resources working on any cause that affects the long-run future — and therefore not spending that time on the terrible problems in the world today — is an incredibly emotionally difficult thing to do.

      But we’ve gradually investigated these arguments (in the course of trying to figure out how we can do the most good), and over time both gained more expertise about AI and became more concerned about the risk.

      We think scepticism is healthy, and are far from certain that these arguments completely work. So while this suspicion is definitely a reason to dig a little deeper, we hope that, ultimately, this worry won’t be treated as a reason to deprioritise what may well be the most important problem of our time.

      That something sounds like science fiction isn’t a reason in itself to dismiss it outright. There are loads of examples of things first mentioned in sci-fi that then went on to actually happen (this list of inventions in science fiction contains plenty of examples).

      There are even a few such cases involving technology that are real existential threats today:

      • In his 1914 novel The World Set Free, H. G. Wells predicted atomic energy fueling powerful explosives — 20 years before we realised there could in theory be nuclear fission chain reactions, and 30 years before nuclear weapons were actually produced. In the 1920s and 1930s, Nobel Prize–winning physicists Millikan, Rutherford, and Einstein all predicted that we would never be able to use nuclear power. Nuclear weapons were literal science fiction before they were reality.
      • In the 1964 film Dr. Strangelove, the USSR builds a doomsday machine that would automatically trigger an extinction-level nuclear event in response to a nuclear strike, but keeps it secret. Dr Strangelove points out that keeping it secret rather reduces its deterrence effect. But we now know that in the 1980s the USSR built an extremely similar system… and kept it secret.

      Moreover, there are top academics and researchers working on preventing these risks from AI — at MIT, Cambridge, Oxford, UC Berkeley, and elsewhere. Two of the world’s top AI labs (DeepMind and OpenAI) have teams explicitly dedicated to working on technical AI safety. Researchers from these places helped us with this article.

      It’s totally possible all these people are wrong to be worried, but the fact that so many people take this threat seriously undermines the idea that this is merely science fiction.

      It’s reasonable when you hear something that sounds like science fiction to want to investigate it thoroughly before acting on it. But having investigated it, if the arguments seem solid, then simply sounding like science fiction is not a reason to dismiss them.

      We never know for sure what’s going to happen in the future. So, unfortunately for us, if we’re trying to have a positive impact on the world, that means we’re always having to deal with at least some degree of uncertainty.

      We also think there’s an important distinction between guaranteeing that you’ve achieved some amount of good and doing the very best you can. To achieve the former, you can’t take any risks at all — and that could mean missing out on the best opportunities to do good.

      When you’re dealing with uncertainty, it makes sense to roughly think about the expected value of your actions: the sum of all the good and bad potential consequences of your actions, weighted by their probability.

      Given the stakes are so high, and the risks from AI aren’t that low, this makes the expected value of helping with this problem high.

      We’re sympathetic to the concern that if you work on AI safety, you might end up doing not much at all when you might have done a tremendous amount of good working on something else — simply because the problem and our current ideas about what to do about it are so uncertain.

      But we think the world will be better off if we decide that some of us should work on solving this problem, so that together we have the best chance of successfully navigating the transition to a world with advanced AI rather than risking an existential crisis.

      And it seems like an immensely valuable thing to try.

      Pascal’s mugging is a thought experiment — a riff on the famous Pascal’s wager — where someone making decisions using expected value calculations can be exploited by claims that they can get something extraordinarily good (or avoid something extraordinarily bad), with an extremely low probability of succeeding.

      The story goes like this: a random mugger stops you on the street and says, “Give me your wallet or I’ll cast a spell of torture on you and everyone who has ever lived.” You can’t rule out with 100% probability that he won’t — after all, nothing’s 100% for sure. And torturing everyone who’s ever lived is so bad that surely even avoiding a tiny, tiny probability of that is worth the $40 in your wallet? But intuitively, it seems like you shouldn’t give your wallet to someone just because they threaten you with something completely implausible.

      Analogously, you could worry that working on AI safety means giving your valuable time to avoid a tiny, tiny chance of catastrophe. Working on reducing risks from AI isn’t free — the opportunity cost is quite substantial, as it means you forgo working on other extremely important things, like reducing risks from pandemics or ending factory farming.

      Here’s the thing though: while there’s lots of value at stake — perhaps the lives of everybody alive today, and the entirety of the future of humanity — it’s not the case that the probability that you can make a difference by working on reducing risks from AI is small enough for this argument to apply.

      We wish the chance of an AI catastrophe was that vanishingly small.

      Instead, we think the probability of such a catastrophe (I think, around 1% this century) is much, much larger than things that people try to prevent all the time — such as fatal plane crashes, which happen in 0.00002% of flights.

      What really matters, though, is the extent to which your work can reduce the chance of a catastrophe.

      Let’s look at working on reducing risks from AI. For example, if:

      1. There’s a 1% chance of an AI-related existential catastrophe by 2100
      2. There’s a 30% chance that we can find a way to prevent this by technical research
      3. Five people working on technical AI safety raises the chances of solving the problem by 1% of that 30% (so 0.3 percentage points)

      Then each person involved has a 0.00006 percentage point share in preventing this catastrophe.

      Other ways of acting altruistically involve similarly sized probabilities.

      The chances of a volunteer campaigner swinging a US presidential election is somewhere between 0.001% and 0.00001%. But you can still justify working on a campaign because of the large impact you expect you’d have on the world if your preferred candidate won.

      You have even lower chances of wild success from things like trying to reform political institutions, or working on some very fundamental science research to build knowledge that might one day help cure cancer.

      Overall, as a society, we may be able to reduce the chance of an AI-related catastrophe all the way down from 10% (or higher) to close to zero — that’d be clearly worth it for a group of people, so it has to be worth it for the individuals, too.

      We wouldn’t want to just not do fundamental science because each researcher has a low chance of making the next big discovery, or not do any peacekeeping because any one person has a low chance of preventing World War III. As a society, we need some people working on these big issues — and maybe you can be one of them.

      What you can do concretely to help

      As we mentioned above, we know of two main ways to help reduce existential risks from AI:

      1. Technical AI safety research
      2. AI strategy/policy research and implementation

      The biggest way you could help would be to pursue a career in either one of these areas, or in a supporting area.

      The first step is learning a lot more about the technologies, problems, and possible solutions. We’ve collated some lists of our favourite resources here, and our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

      If you decide to pursue a career in this area, we’d generally recommend working at an organisation focused on specifically addressing this problem (though there are other ways to help besides working at existing organisations, as we discuss briefly below).

      Technical AI safety

      Approaches

      There are lots of approaches to technical AI safety, including:

      See Neel Nanda’s overview of the AI alignment landscape for more details.

      Key organisations

      AI labs in industry that have empirical technical safety teams, or are focused entirely on safety:

      • Anthropic is an AI safety company working on building interpretable and safe AI systems. They focus on empirical AI safety research. Anthropic cofounders Daniela and Dario Amodei gave an interview about the lab on the Future of Life Institute podcast. On our podcast, we spoke to Chris Olah, who leads Anthropic’s research into interpretability, and Nova DasSarma, who works on systems infrastructure at Anthropic.
      • ARC Evals works on assessing whether cutting-edge AI systems could pose catastrophic risks to civilization, including early-stage, experimental work to develop techniques, and evaluating systems produced by Anthropic and OpenAI.
      • The Center for AI Safety is a nonprofit that does technical research and promotion of safety in the wider machine learning community.
      • FAR AI is a research nonprofit that incubates and accelerates research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry, including research in adversarial robustness, interpretability and preference learning.
      • Google DeepMind is probably the largest and most well-known research group developing general artificial machine intelligence, and is famous for its work creating AlphaGo, AlphaZero, and AlphaFold. It is not principally focused on safety, but has two teams focused on AI safety, with the Scalable Alignment Team focusing on aligning existing state-of-the-art systems, and the Alignment Team focused on research bets for aligning future systems.
      • OpenAI, founded in 2015, is a lab that is trying to build artificial general intelligence that is safe and benefits all of humanity. OpenAI is well known for its language models like GPT-4. Like DeepMind, it is not principally focused on safety, but has a safety team and a governance team. Jan Leike (head of the alignment team) has some blog posts on how he thinks about AI alignment.
      • Ought is a machine learning lab building Elicit, an AI research assistant. Their aim is to align open-ended reasoning by learning human reasoning steps, and to direct AI progress towards helping with evaluating evidence and arguments.
      • Redwood Research is an AI safety research organisation, whose first big project attempted to make sure language models (like GPT-3) produce output following certain rules with very high probability, in order to address failure modes too rare to show up in standard training.

      Theoretical / conceptual AI safety labs:

      • The Alignment Research Center (ARC) is attempting to produce alignment strategies that could be adopted in industry today while also being able to scale to future systems. They focus on conceptual work, developing strategies that could work for alignment and which may be promising directions for empirical work, rather than doing empirical AI work themselves. Their first project was releasing a report on Eliciting Latent Knowledge, the problem of getting advanced AI systems to honestly tell you what they believe (or ‘believe’) about the world. On our podcast, we interviewed ARC founder Paul Christiano about his research (before he founded ARC).
      • The Center on Long-Term Risk works to address worst-case risks from advanced AI. They focus on conflict between AI systems.
      • The Machine Intelligence Research Institute was one of the first groups to become concerned about the risks from machine intelligence in the early 2000s, and its team has published a number of papers on safety issues and how to resolve them.
      • Some teams in commercial labs also do some more theoretical and conceptual work on alignment, such as Anthropic’s work on conditioning predictive models and the Causal Incentives Working Group at Google DeepMind.

      AI safety in academia (a very non-comprehensive list; while the number of academics explicitly and publicly focused on AI safety is small, it’s possible to do relevant work at a much wider set of places):

      If you’re interested in learning more about technical AI safety as an area — e.g. the different techniques, schools of thought, and threat models — our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

      We discuss this path in more detail here:

      Career review of technical AI safety research

      Alternatively, if you’re looking for something more concrete and step-by-step (with very little in the way of introduction), check out this detailed guide to pursuing a career in AI alignment.

      It’s important to note that you don’t have to be an academic or an expert in AI or AI safety to contribute to AI safety research. For example, software engineers are needed at many places conducting technical safety research, and we also highlight more roles below.

      AI governance and strategy

      Approaches

      Quite apart from the technical problems, we face a host of governance issues, which include:

      • Coordination problems that are increasing the risks from AI (e.g. there could be incentives to use AI for personal gain in ways that can cause harm, or race dynamics that reduce incentives for careful and safe AI development).
      • Risks from accidents or misuse of AI that would be dangerous even if we are able to prevent power-seeking behaviour (as discussed above).
      • A lack of clarity on how and when exactly risks from AI (particularly power-seeking AI) might play out.
      • A lack of clarity on which intermediate goals we could pursue that, if achieved, would reduce existential risk from AI.

      To tackle these, we need a combination of research and policy.48

      We are in the early stages of figuring out the shape of this problem and the most effective ways to tackle it. So it’s crucial that we do more research. This includes forecasting research into what we should expect to happen, and strategy and policy research into the best ways of acting to reduce the risks.

      But also, as AI begins to impact our society more and more, it’ll be crucial that governments and corporations have the best policies in place to shape its development. For example, governments might be able to enforce agreements not to cut corners on safety, further the work of researchers who are less likely to cause harm, or cause the benefits of AI to be distributed more evenly. So there eventually might be a key role to be played in advocacy and lobbying for appropriate AI policy — though we’re not yet at the point of knowing what policies would be useful to implement.

      Key organisations

      AI strategy and policy organisations:

      If you’re interested in learning more about AI governance, our top recommendation is to take a look at the governance curriculum from AGI safety fundamentals.

      We discuss this path in more detail here:

      Career review of AI strategy and policy careers

      Also note: it could be particularly important for people with the right personal fit to work on AI strategy and governance in China.

      Complementary (yet crucial) roles

      Even in a research organisation, around half of the staff will be doing other tasks essential for the organisation to perform at its best and have an impact. Having high-performing people in these roles is crucial.

      We think the importance of these roles is often underrated because the work is less visible. So we’ve written several career reviews on these areas to help more people enter these careers and succeed, including:

      Other ways to help

      AI safety is a big problem and it needs help from people doing a lot of different kinds of work.

      One major way to help is to work in a role that directs funding or people towards AI risk, rather than working on the problem directly. We’ve reviewed a few career paths along these lines, including:

      There are ways all of these could go wrong, so the first step is to become well-informed about the issue.

      There are also other technical roles besides safety research that could help contribute, like:

      • Working in information security to protect AI (or the results of key experiments) from misuse, theft, or tampering.
      • Becoming an expert in AI hardware as a way of steering AI progress in safer directions.

      You can read about all these careers — why we think they’re helpful, how to enter them, and how you can predict whether they’re a good fit for you — on our career reviews page.

      Want one-on-one advice on pursuing this path?

      We think that the risks posed by the development of AI may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.

      We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

      APPLY TO SPEAK WITH OUR TEAM

      Find vacancies on our job board

      Our job board features opportunities in AI technical safety and governance:

        View all opportunities

        Top resources to learn more

        We’ve hit you with a lot of further reading throughout this article — here are a few of our favourites:

        On The 80,000 Hours Podcast, we have a number of in-depth interviews with people actively working to positively shape the development of artificial intelligence:

        If you want to go into much more depth, the AGI safety fundamentals course is a good starting point. There are two tracks to choose from: technical alignment or AI governance. If you have a more technical background, you could try Intro to ML Safety, a course from the Center for AI Safety.

        And finally, here are a few general sources (rather than specific articles) that you might want to explore:

        • The AI Alignment Forum, which is aimed at researchers working in technical AI safety.
        • AI Impacts, a project that aims to improve society’s understanding of the likely impacts of human-level artificial intelligence.
        • The Alignment Newsletter, a weekly publication with recent content relevant to AI alignment with thousands of subscribers.
        • Import AI, a weekly newsletter about artificial intelligence by Jack Clark (cofounder of Anthropic), read by more than 10,000 experts.
        • Jeff Ding’s ChinAI Newsletter, weekly translations of writings from Chinese thinkers on China’s AI landscape.

        Acknowledgements

        Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlmüller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing this article or their extremely thoughtful and helpful comments and conversations. (This isn’t to say that they would all agree with everything we’ve said here — in fact, we’ve had many spirited disagreements in the comments on this article!)

        The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.

        ]]>
        What Bing’s chatbot can tell us about AI risk — and what it can’t https://80000hours.org/2023/03/what-bings-chatbot-can-tell-us-about-ai-risk-and-what-it-cant/ Sun, 05 Mar 2023 11:27:06 +0000 https://80000hours.org/?p=80905 The post What Bing’s chatbot can tell us about AI risk — and what it can’t appeared first on 80,000 Hours.

        ]]>
        You may have seen the new Bing. It’s impressive — and, reportedly, unhinged: manipulating people, threatening users and even telling one reporter it loved him.

        You may also have seen me writing about the risk of an AI-related catastrophe.

        I’m not just concerned about AI going wrong in minor ways: I think that there’s a small but possible chance of an existential catastrophe caused by AI within the next 100 years.

        This blog post was first released to our newsletter subscribers.

        Join over 350,000 newsletter subscribers who get content like this in their inboxes every two weeks — and we’ll also mail you a free book!

        Here’s my view on Bing:

        Bing does tell us a little about how careful we can expect large corporations to be when deploying AI systems.

        But Bing isn’t very dangerous, and isn’t an example of the sorts of misaligned AI that we should be most worried about.

        (Before moving on, I should disclose that my brother, Jacob Hilton, used to work for OpenAI, the AI lab behind both Bing and ChatGPT.)

        How does Bing chat work?

        Bing chat (like ChatGPT) is based on a large language model.

        A large language model is a machine learning algorithm that is basically trained to continue whatever text it is given as input. It writes an article from a headline or continues a poem from the first few lines.

        But a good chat assistant would do more than just continue whatever text you give it. So, somehow, engineers need to take a large language model and turn it into a kind, helpful, useful chatbot.

        With ChatGPT, engineers used reinforcement learning from human feedback.1

        Essentially, human labellers marked different outputs of the language model as ‘good’ or ‘bad,’ and the model was given this feedback.

        They gave the model positive feedback for things like acting according to a coherent chatbot personality, giving helpful and polite answers, and refusing to answer questions that might be harmful.

        And they gave the model negative feedback for things like discriminatory or racist language, or giving out dangerous information, like how to build weapons.

        Gradually — in a stochastic way we can’t really predict — the model becomes increasingly likely to produce the kinds of answers that were labelled good, and less likely to produce the answers that were labelled bad.

        What went wrong with Bing?

        We don’t really know. But, as with most products that end up full of mistakes, there could have been a combination of factors:

        • Internal pressure within Microsoft to ship Bing chat fast (especially before Google does something similar), and, as a result, before it was ready
        • Failures of communication between OpenAI and Microsoft
        • Complications resulting from whatever technical requirements Microsoft had for the project
        • Maybe other stuff we don’t know about

        And so, as a result, Bing chat was — intentionally or not — released before it was ready.

        On a technical level, it seems pretty clear that while Bing chat is based on a powerful language model, things have gone wrong at the second stage: turning the powerful language model into a helpful chat assistant.

        If Bing — like ChatGPT — was built using reinforcement learning from human feedback, then I think the most likely possibilities are that:

        • They just didn’t have (or use) enough data for the model to learn enough.
        • The data they used was too low quality for the model to learn enough.
        • They used a worse version of the reinforcement learning from human feedback algorithm.

        I’ve seen speculation that Bing didn’t use reinforcement learning from human feedback at all, but instead a much simpler process.

        But whatever the cause, what seems to have happened is that Bing chat learned parts of what it needed to learn, but not all of it:

        • Bing is often polite, and ends even its most threatening messages with smiley emojis.
        • Bing seems to have a (sort of) coherent personality as a chatbot — it doesn’t just continue text.
        • Bing often refuses to have conversations on racist or otherwise harmful topics.

        But, because of the deficiencies of its training process, Bing does all of these things badly.

        Is this an alignment failure?

        When I wrote about the risk of an AI-related catastrophe, I was chiefly concerned about alignment failures: that is, AI systems aiming to do things we don’t want them to do.

        Clearly, Bing is an example of a system doing things that its designers didn’t want it to do.

        But for misalignment to lead to the kinds of catastrophe I describe here, the system needs to aim to do these things: that is, the system needs to have some kind of coherent concept of a ‘goal’ and make plans to achieve that goal.

        This goal-seeking behaviour is important because of the instrumental convergence argument. That is, if a system truly ‘has a goal,’2 it will also have instrumental goals such as self-preservation and seeking power. (See a longer explanation of this argument.)

        It’s this last quality — seeking power — which concerns me most, because one particularly effective way of taking power permanently would be by causing an existential catastrophe (Sceptical? See a longer explanation of this part of the argument too.)

        I don’t think it’s very likely that Bing ‘has goals’ in the sense relevant to this argument; as I explained earlier, I think there are much better explanations for why Bing went wrong. So I don’t think Bing threatening users is an example of the kind of behaviour we should be most concerned about.

        Is Bing an example of ‘letting the AI out of the box‘?

        One big reason to be less worried about AI-related catastrophe is that generally there are incentives not to deploy particularly dangerous systems.

        People more concerned about the risk of AI-related catastrophe tend to argue that it’ll be difficult to ‘sandbox’ AI systems (i.e. contain them in a restricted environment) or that companies would, for a variety of reasons, be reckless (more on sandboxing dangerous AI systems).

        Bing does look like a good example of a large corporation failing to adequately test an AI product before its release.

        Ultimately, though, Bing isn’t a particularly dangerous system. It can be mean and cruel, and it can grant access to some harmful information — but it won’t tell you things you can’t easily find on the internet anyway.

        My best guess is that the incentives for companies to be careful increase as the potential danger of systems increases. Bing’s failures seem much more like shipping software full of bugs (not unlike the famously buggy 2014 release of Assassin’s Creed Unity) than recklessly letting loose an AI system that could kill people, or worse.

        As a result, I don’t think Bing tells us much about whether corporations will “let AI systems out of the box,” if we start making systems that could be more dangerous.

        So is there anything we can learn about dangerous AI from Bing?

        I’ve argued that:

        • Bing isn’t a misalignment failure.
        • Bing doesn’t tell us much about how we’ll manage possibly highly dangerous AI systems, because it’s not a possibly highly dangerous AI system.

        So what does Bing tell us?

        I think Bing is evidence that the AI race is heating up, and that corporations — including OpenAI, Microsoft and others — may, in an attempt to win that race, deploy systems before they’re ready.

        That seems likely to be true of all AI systems, whether they’re dangerous or not.

        This is especially worrying if we think people might accidentally create a dangerous system without realising it. I don’t know how likely that is. My best guess is that it’s unlikely that we’ll accidentally build systems dangerous enough to cause large catastrophes.

        But it’s certainly possible — and a possibility worth taking seriously.

        More about the problem, and how you might be able to use your career to help:

        Learn more

        The post What Bing’s chatbot can tell us about AI risk — and what it can’t appeared first on 80,000 Hours.

        ]]>
        Is the world getting better or worse? https://80000hours.org/2023/02/is-the-world-getting-better-or-worse/ Thu, 09 Feb 2023 20:32:55 +0000 https://80000hours.org/?p=80568 The post Is the world getting better or worse? appeared first on 80,000 Hours.

        ]]>

        The question this week: is the world getting better or worse?

        In a nutshell: while there are some positive global trends, there are important ways the world is getting worse. We’ve looked at how your career can help solve these, in particular:

        This blog post was first released to our newsletter subscribers.

        Join over 350,000 newsletter subscribers who get content like this in their inboxes every two weeks — and we’ll also mail you a free book!

        Three ways the world’s getting better

        1. Poverty has decreased.

        Lots of stats about trends in the world – even ones that seem good to some people – are complicated to evaluate overall.1

        But here’s a long-term trend, based on solid data, that seems uncontroversially good:

        Living in extreme poverty is exceedingly difficult. And it’s not just the share of the population in extreme poverty that’s fallen. Since 1990, the absolute number has fallen too.

        2. We’re healthier than ever before.

        For a start, child mortality rates have fallen steeply in the last 100 years, as has the absolute number of children dying before reaching the age of five.

        We’re close to eradicating polio and guinea worm disease, and we’re gradually getting a grip on malaria.

        Overall, life expectancy in every continent is at its highest point ever and is increasing.

        3. Renewable energy generation is rising.

        While global temperatures are soaring and carbon dioxide emissions continue to increase, we’re also producing more renewable energy than ever before:

        The share of energy produced by renewables has been increasing since around 2005. And the fact that renewable energy has rapidly become so cheap relative to fossil fuels suggests this trend is likely to continue.


        Three ways the world’s getting worse

        1. Not everything’s getting better for humans. For example, the number of people living in autocracies is increasing.

        This one’s slightly more controversial, because exactly how you classify a country as an autocracy rather than a democracy is tricky (see here for more on how researchers measure democracy). And the share of people living in autocracies is also decreasing.

        But the absolute number of people living in autocracies has been rising since around 1950.

        It’s also possible that autocracies don’t treat people as badly as they did in the past – but there again, maybe they treat people worse.

        2. Non-human animals exploited by humans suffer worse conditions and in greater numbers than ever.

        At this moment there are likely over 100 billion animals living in captivity, largely living in the abhorrent conditions of factory farms.

        The numbers are even more shocking if you look at deaths rather than the number alive at any one moment: we’re slaughtering many hundreds of billions – and according to higher estimates, possibly trillions – of animals a year in our farms.

        Meat consumption tends to rise as we get richer. So it looks like global meat consumption is going to continue to rise – although there are some ways we might work to prevent this, which we discuss in our article on factory farming.

        3. The risk of catastrophes is probably higher than in any previous century.

        Since 1945, we’ve possessed the capacity to destroy ourselves. The number of countries with nuclear weapons rose rapidly until the 1960s and hasn’t decreased (the absolute number of nuclear warheads has fallen since the 1980s, although there are still more than enough to cause a major catastrophe).

        And we face new threats as we develop more dangerous technology. We’re particularly concerned about engineered pandemics and the possibility of an AI-related catastrophe. It’s possible new developments will create threats we haven’t even imagined.

        We think that the problem you work on in your career is the biggest driver of your impact. And we think that these existential risks are the biggest problems we currently face.


        Overall, it’s really unclear whether the world is getting better or worse — any conclusion is going to be based on some difficult subjective judgements, like how much we should care about the lives of non-human animals.

        Personally, my best guess is that the negative effects of factory farming alone make the world worse than it’s ever been. That’s before we consider things like pandemic risks or the risk of nuclear war.

        That said, this also provides an opportunity. There are ways we can work to fix these problems — and ways in which you could use your career to help.

        Learn more:

        The post Is the world getting better or worse? appeared first on 80,000 Hours.

        ]]>
        Open position: Content associate https://80000hours.org/2023/01/open-position-content-associate/ Mon, 23 Jan 2023 11:15:07 +0000 https://80000hours.org/?p=80457 The post Open position: Content associate appeared first on 80,000 Hours.

        ]]>

        80,000 Hours is looking for a content associate to help us improve and grow the impact of the 80,000 Hours website, which provides free research and advice to help people use their careers to address the world’s most pressing problems.

        As a content associate, you would review 80,000 Hours articles to ensure they are up to date, well presented, and error free, help us collect user feedback, provide analytical support, and generate ideas for the team (more on responsibilities below).

        We’d like to support the person in this role to take on more responsibility over time; one possibility would be to move to become the primary author / researcher of new articles.

        About the 80,000 Hours web team

        80,000 Hours provides free research and support to help people find careers tackling the world’s most pressing problems.

        We’ve had over 10 million visitors to our website (with over 100,000 hours of reading time per year), and more than 3,000 people have told us that they’ve significantly changed their career plans due to our work. We’re also the largest single source of people getting involved in the effective altruism community, according to the most recent EA Community Survey.

        Our articles are read by thousands, and are among the most important ways we help people shift their careers towards higher-impact options.

        The role

        As a content associate, you would:

        • Support the 80,000 Hours web team flexibly across a range of articles and projects.
        • Proofread 80,000 Hours articles before release, suggest style improvements, and check for errors.
        • Upload new articles and make changes to the site.
        • Ensure that our newsletters are sent out error-free and on time to the over 250,000 people on our mailing list.
        • Provide analytical support for the team, improving our ability to use data to measure and increase our impact.
        • Manage the gathering of feedback on our website from both readers and subject matter experts.
        • Generate ideas for new pieces.
        • Generally help grow the impact of the site.

        Some of the types of pieces you could work on include:

        As some indication of what success in the role might look like, over the next year, you might do things like:

        • Provide substantive comments and improvements for new research on various under-explored problem areas — e.g. whole brain emulation, wild animal suffering, or potential existential risks from totalitarianism.
        • Restructure our systems for measuring the impact of the 80,000 Hours website, including both data-driven analysis and interviews of users and experts.
        • Help us streamline our writing and editing workflows.
        • Work on additional projects. We’re excited to help you double down on your strengths, based on 80,000 Hours’ needs and your personal fit. Examples of additional projects might include writing a blog post or short career review in an area of interest to you, analysing the results of our user survey, or supporting the annual review, impact evaluation, or planning processes for the web team.

        Who we’re looking for

        We’re looking for someone who has:

        • An interest in effective altruism and/or longtermism.
        • An aptitude for editing and writing.
        • The ability to learn quickly and independently.
        • Excitement to work on whatever kinds of projects are highest priority.
        • The ability to be organised, detail-oriented, and able to keep track of a large number of tasks at once.
        • A good fit with our cultural values.

        We’re aware that factors like gender, race, and socioeconomic background can affect people’s willingness to apply for roles for which they meet many but not all the suggested attributes. We’d especially like to encourage people from underrepresented backgrounds to apply!

        You don’t need any previous experience to apply. In fact, we’d encourage you to apply even if you’re not sure you meet all of the above criteria – we’d much prefer to hear from you than not!

        Details of the role

        This is a full-time role. The salary will vary based on experience, but to give a rough sense, the starting salary for someone with no relevant experience would be approximately £58,000 per year.

        We generally prefer people to work in-person in our London office if possible, but are open to remote work in this case. We can sponsor visas.

        The start date of the role is flexible, but we would expect you to start during the first half of 2023 and prefer you to start as soon as you’re available.

        Our benefits include:

        • The option to use 10% of your time for self development
        • 25 days of paid holiday, plus bank holidays
        • Standard UK pension with 3% contribution from employer
        • Private medical insurance
        • Generous parental leave
        • Long-term disability insurance
        • Flexible work hours
        • Gym, shower facilities, and free food provided at our London office

        How to apply

        To apply, please fill out this application form by 9am GMT on Tuesday, 14 February, 2023.

        We expect it’ll take most people under 45 minutes. If you have any problems submitting the form, please send your CV to jess@80000hours.org.

        The application process will vary depending on the candidate, but is likely to include a short work test, an interview, and a multi-day in-person trial. The work test and trial will be paid.

        Click here to apply

        The post Open position: Content associate appeared first on 80,000 Hours.

        ]]>