Machine learning (Topic archive) - 80,000 Hours

Nathan Labenz on the final push for AGI, understanding OpenAI’s leadership drama, and red-teaming frontier models

Robert Wiblin — Fri, 22 Dec 2023 21:29:43 +0000

The post Nathan Labenz on the final push for AGI, understanding OpenAI’s leadership drama, and red-teaming frontier models appeared first on 80,000 Hours.

Software and tech skills

Benjamin Hilton — Mon, 18 Sep 2023 13:00:13 +0000

In a nutshell:

You can start building software and tech skills by trying out learning to code, and then doing some programming projects before applying for jobs. You can apply (as well as continue to develop) your software and tech skills by specialising in a related area, such as technical AI safety research, software engineering, or information security. You can also earn to give, and this in-demand skill set has great backup options.

Key facts on fit

There’s no single profile for being great at software and tech skills. It’s particularly cheap and easy to try out programming (which is a core part of this skill set) via classes online or in school, so we’d suggest doing that. But if you’re someone who enjoys thinking systematically, building things, or has good quantitative skills, those are all good signs.

Why are software and tech skills valuable?

By “software and tech” skills we basically mean what your grandma would call “being good at computers.”

When investigating the world’s most pressing problems, we’ve found that in many cases there are software-related bottlenecks.

For example, machine learning (ML) engineering is a core skill needed to contribute to AI safety technical research. Experts in information security are crucial to reducing the risks of engineered pandemics, as well as other risks. And software engineers are often needed by nonprofits, whether they’re working on reducing poverty or mitigating the risks of climate change.

Also, having skills in this area means you’ll likely be highly paid, offering excellent options to earn to give.

Moreover, basic programming skills can be extremely useful whatever you end up doing. You’ll find ways to automate tasks or analyse data throughout your career.

What does a career using software and tech skills involve?

A career using these skills typically involves three steps:

Learn to code with a university course or self-study and then find positions where you can get great mentorship. (Read more about how to get started.)
Optionally, specialise in a particular area, for example, by building skills in machine learning or information security.
Apply your skills to helping solve a pressing global problem. (Read more about how to have an impact with software and tech.)

There’s no general answer about when to switch from a focus on learning to a focus on impact. Once you have some basic programming skills, you should look for positions that both further improve your skills and have an impact, and then decide based on which specific opportunities seem best at the time.

Software and tech skills can also be helpful in other, less directly-related career paths, like being an expert in AI hardware (for which you’ll also need a specialist knowledge skill set) or founding a tech startup (for which you’ll also need an organisation-building skill set). Being good with computers is also often part of the skills required for quantitative trading.

Programming also tends to come in handy in a wide variety of situations and jobs; there will be other great career paths that will use these skills that we haven’t written about.

How to evaluate your fit

How to predict your fit in advance

Some indications you’ll be a great fit include:

The ability to break down problems into logical parts and generate and test hypotheses
Willingness to try out many different solutions
High attention to detail
Broadly good quantitative skills

The best way to gauge your fit is just to try out programming.

It seems likely that some software engineers are significantly better than average — and we’d guess this is also true for other technical roles using software. In particular, these very best software engineers are often people who spend huge amounts of time practicing. This means that if you enjoy coding enough to want to do it both as a job and in your spare time, you are likely to be a good fit.

How to tell if you’re on track

If you’re at university or in a bootcamp, it’s especially easy to tell if you’re on track. Good signs are that you’re succeeding at your assigned projects or getting good marks. An especially good sign is that you’re progressing faster than many of your peers.

In general, a great indicator of your success is that the people you work with most closely are enthusiastic about you and your work, especially if those people are themselves impressive!

If you’re building these skills at an organisation, signs you’re on track might include:

You get job offers at organisations you’d like to work for.
You’re promoted within your first two years.
You receive excellent performance reviews.
You’re asked to take on progressively more responsibility over time.
After some time, you’re becoming someone in your team who people look to solve their problems, and people want you to teach them how to do things.
You’re building things that others are able to use successfully without your input.
Your manager / colleagues suggest you might take on more senior roles in the future.
You ask your superiors for their honest assessment of your fit and they are positive (e.g. they tell you you’re in the top 10% of people they can imagine doing your role).

How to get started building software and tech skills

Independently learning to code

As a complete beginner, you can write a Python program in less than 20 minutes that reminds you to take a break every two hours.

A great way to learn the very basics is by working through a free beginner course like Automate the Boring Stuff with Python by Al Seigart.

Once you know the fundamentals, you could try taking an intro to computer science or intro to programming course. If you’re not at university, there are plenty of courses online, such as:

Don’t be discouraged if your code doesn’t work the first time — that’s what normally happens when people code!

A great next step is to try out doing a project with other people. This lets you test out writing programs in a team and working with larger codebases. It’s easy to come up with programming projects to do with friends — you can see some examples here.

Once you have some more experience, contributing to open-source projects in particular lets you work with very large existing codebases.

Attending a coding bootcamp

We’ve advised many people who managed to get junior software engineer jobs in less than a year by going to a bootcamp.

Coding bootcamps are focused on taking people with little knowledge of programming to as highly paid a job as possible within a couple of months. This is a great entry route if you don’t already have much background, though some claim the long-term prospects are not as good as if you studied at university or in a particularly thorough way independently because you lack a deep understanding of computer science. Course Report is a great guide to choosing a bootcamp. Be careful to avoid low-quality bootcamps. To find out more, read our interview with an App Academy instructor.

Studying at university

Studying computer science at university (or another subject involving lots of programming) is a great option because it allows you to learn to code in an especially structured way and while the opportunity cost of your time is lower.

It will also give you a better theoretical understanding of computing than a bootcamp (which can be useful for getting the most highly-paid and intellectually interesting jobs), a good network, some prestige, and a better understanding of lower-level languages like C. Having a computer science degree also makes it easier to get a US work visa if you’re not from the US.

Doing internships

If you can find internships, ideally at the sorts of organisations you might want to work for to build your skills (like big tech companies or startups), you’ll gain practical experience and the key skills you wouldn’t otherwise pick up from academic degrees (e.g. using version control systems and powerful text editors). Take a look at our our list of companies with software and machine learning internships.

AI-assisted coding

As you’re getting started, it’s probably worth thinking about how developments in AI are going to affect programming in the future — and getting used to AI-assisted coding.

We’d recommend trying out using GitHub CoPilot, which writes code for you based on your comments. Cursor is a popular AI-assisted code editor based on VSCode.

You can also just ask AI chat assistants for help. ChatGPT is particularly helpful (although only if you use the paid version).

We think it’s reasonably likely that many software and tech jobs in the future will be heavily based on using tools like these.

Building a specialty

Depending on how you’re going to use software and tech skills, it may be useful to build up your skills in a particular area. Here’s how to get started in a few relevant areas:

Machine learning

If you’re currently at university, it’s worth checking if you can take an ML course (even if you’re not majoring in computer science).

But if that’s not possible, here are some suggestions of places you might start if you want to self-study the basics:

3blue1brown’s series on neural networks is a really great place to start for beginners.
When I was learning, I used Neural Networks and Deep Learning — it’s an online textbook, good if you’re familiar with the maths, with some helpful exercises as well.
You can do online intro courses like fast.ai (focused on practical applications), Full Stack Deep Learning, and the various courses at deeplearning.ai.
For more detail, see university courses like MIT’s Introduction to Machine Learning, and NYU’s Deep Learning for even more detail. We’d also recommend Google DeepMind’s lecture series.

PyTorch is a very common package used for implementing neural networks, and probably worth learning! When I was first learning about ML, my first neural network was a 3-layer convolutional neural network with L2 regularisation classifying characters from the MNIST database. This is a pretty common first challenge and a good way to learn PyTorch.

You may also need to learn some maths.

The maths of deep learning relies heavily on calculus and linear algebra, and statistics can be useful too — although generally learning the maths is much less important than programming and basic, practical ML.

Again, if you’re still at university we’d generally recommend studying a quantitative degree (like maths, computer science, or engineering), most of which will cover all three areas pretty well.

If you want to actually get good at maths, you have to be solving problems. So, generally, the most useful thing that textbooks and online courses provide isn’t their explanations — it’s a set of exercises to try to solve in order, with some help if you get stuck.

If you want to self-study (especially if you don’t have a quantitative degree) here are some possible resources:

Calculus: 3blue1brown’s video series on calculus could be a good place to start. You may also be able to follow recorded university courses: MIT’s single variable calculus (which requires only high school algebra and trigonometry) followed by MIT’s course in vector and multivariable calculus.
Linear algebra: Again, we’d suggest 3blue1brown’s video series on linear algebra as a place to start. In his post about technical alignment careers, Rogers-Smith recommends Linear Algebra Done Right by Sheldon Axler. Finally, if you prefer lectures, try MIT’s undergraduate course in linear algebra (although note that this course assumes knowledge of multivariate calculus).
Probability: Take a look at MIT’s undergraduate course in probability and random variables.

You might be able to find resources that cover all these areas, like Imperial College’s Mathematics for Machine Learning.

Information security

Most people get started in information security by studying computer science (or similar) at a university, and taking some cybersecurity courses — although this is by no means necessary to be successful.

You can get an introduction through the Google Foundations of Cybersecurity course. The full Google Cybersecurity Professional Certificate series is also worth watching to learn more on relevant technical topics.

For more, take a look at how to try out and get started in information security.

Data science and applied statistics

Data science combines programming with statistics.

One way to get started is by doing a bootcamp. The bootcamps are a similar deal to programming, although they tend to mainly recruit science PhDs. If you’ve just done a science PhD and don’t want to continue with academia, this is a good option to consider (although you should probably consider other ways of using the software and tech skills first). Similarly, you can learn data analysis, statistics, and modelling by taking the right graduate programme.

Data scientists are well paid — offering the potential to earn to give — and have high job satisfaction.

To learn more, see our full career review of data science.

Depending on how you’re aiming to have an impact with these skills (see the next section), you may also need to develop other skills. We’ve written about some other relevant skill sets:

For more, see our full list of impactful skills.

Once you have these skills, how can you best apply them to have an impact?

The problem you work on is probably the biggest driver of your impact. The first step is to make an initial assessment of which problems you think are most pressing (even if you change your mind over time, you’ll need to decide where to start working).

Once you’ve done that, the next step is to identify the highest-potential ways to use software and tech skills to help solve your top problems.

There are five broad categories here:

Use software and tech skills in research. Lots of technical research relevant to the world’s most pressing problems makes heavy use of software and tech skills — most notably, AI safety technical research. To be successful, you might also need a research skill set, which we’ve written about separately. For some paths, you’ll also need specialist knowledge in an area related to a pressing problem — e.g. hardware for becoming an expert in AI hardware.
ML engineering for AI safety research. Most AI safety researchers work closely with engineers (and in many organisations, no clear distinction is made). This is a particularly high-impact way of using software and tech skills because we think risks from AI is one of the world’s most pressing problems.
Build software for organisations working on pressing problems. Most organisations working on everything from global health to reducing the risk of nuclear war need software engineers to manage computer systems, apps, and websites. The key feature that draws this work together is that you’ll be building a product for others to use. Read more about software engineering careers and organisation-building skills.
Protect sensitive information. Some organisations need help protecting information that could be hugely dangerous if it was known more widely, such as harmful genetic sequences or powerful AI technology. Breaches in areas like these could have disastrous consequences — which makes information security a great option for people who want to have a high-impact career. Read more about information security.
Earn to give. Most jobs that use software and tech skills, whether software engineering, information security, data science, or something else entirely, command high salaries (particularly in the US) — and so they offer a great option for earning to give. Skilled software engineers can earn $300,000 a year or more at big tech companies. Probably the highest-paying routes are trading in quantitative hedge funds or founding a tech startup.

While some of these options (like protecting dangerous information) will require building up some more specialised skills, being a great programmer will let you move around most of these categories relatively easily, and the earning to give options means you’ll always have a pretty good backup plan.

Find jobs that use software and tech skills

See our curated list of job opportunities for this path.

View all opportunities

Career paths we’ve reviewed that use these skills

Nita Farahany on the neurotechnology already being used to convict criminals and manipulate workers

Luisa Rodriguez — Thu, 07 Dec 2023 22:19:32 +0000

The post Nita Farahany on the neurotechnology already being used to convict criminals and manipulate workers appeared first on 80,000 Hours.

Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his four-part playbook for AI risk

Robert Wiblin — Mon, 31 Jul 2023 23:27:31 +0000

The post Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his four-part playbook for AI risk appeared first on 80,000 Hours.

Lennart Heim on the compute governance era and what has to come after

Robert Wiblin — Thu, 22 Jun 2023 23:23:01 +0000

The post Lennart Heim on the compute governance era and what has to come after appeared first on 80,000 Hours.

AI governance and coordination

Cody Fenwick — Tue, 20 Jun 2023 12:00:34 +0000

As advancing AI capabilities gained widespread attention in late 2022 and 2023 — particularly after the release of OpenAI’s ChatGPT and Microsoft’s Bing chatbot — interest in governing and regulating these systems has grown. Discussion of the potential catastrophic risks of misaligned or uncontrollable AI also became more prominent, potentially opening up opportunities for policy that could mitigate the threats.

There’s still a lot of uncertainty about which strategies for AI governance and coordination would be best, though parts of the community of people working on this subject may be coalescing around some ideas. See, for example, a list of potential policy ideas from Luke Muehlhauser of Open Philanthropy¹ and a survey of expert opinion on best practices in AI safety and governance.

But there’s no roadmap here. There’s plenty of room for debate about which policies and proposals are needed.

We may not have found the best ideas yet in this space, and many of the existing policy ideas haven’t yet been developed into concrete, public proposals that could actually be implemented. We hope to see more people enter this field to develop expertise and skills that will contribute to risk-reducing AI governance and coordination.

In a nutshell: Advanced AI systems could have massive impacts on humanity and potentially pose global catastrophic risks. There are opportunities in AI governance and coordination around these threats to shape how society responds to and prepares for the challenges posed by the technology.

Given the high stakes, pursuing this career path could be many people’s highest-impact option. But they should be very careful not to accidentally exacerbate the threats rather than mitigate them.

Why this could be a high-impact career path

Artificial intelligence has advanced rapidly. In 2022 and 2023, new language and image generation models gained widespread attention for their abilities, blowing past previous benchmarks the technology had met.

And the applications of these models are still new; with more tweaking and integration into society, the existing AI systems may become easier to use and more ubiquitous in our lives.

We don’t know where all these developments will lead us. There’s reason to be optimistic that AI will eventually help us solve many of the world’s problems, raising living standards and helping us build a more flourishing society.

But there are also substantial risks. AI can be used for both good and ill. And we have concerns that the technology could, without the proper controls, accidentally lead to a major catastrophe — and perhaps even cause human extinction. We discuss the arguments that these risks exist in our in-depth problem profile.

Because of these risks, we encourage people to work on finding ways to reduce these risks through technical research and engineering.

But a range of strategies for risk reduction will likely be needed. Government policy and corporate governance interventions in particular may be necessary to ensure that AI is developed to be as broadly beneficial as possible and without unacceptable risk.

Governance generally refers to the processes, structures, and systems that carry out decision making for organisations and societies at a high level. In the case of AI, we expect the governance structures that matter most to be national governments and organisations developing AI — as well as some international organisations and perhaps subnational governments.

Some aims of AI governance work could include:

Preventing the deployment of any AI systems that pose a significant and direct threat of catastrophe
Mitigating the negative impact of AI technology on other catastrophic risks, such as nuclear weapons and biotechnology
Guiding the integration of AI technology into our society and economy with limited harms and to the advantage of all
Reducing the risk of an “AI arms race,” in which competition leads to technological advancement without the necessary safeguards and caution — between nations and between companies
Ensuring that those creating the most advanced AI models are incentivised to be cooperative and concerned about safety
Slowing down the development and deployment of new systems if the advancements are likely to outpace our ability to keep them safe and under control

We need a community of experts who understand the intersection of modern AI systems and policy, as well as the severe threats and potential solutions. This field is still young, and many of the paths within it aren’t clear and are not sure to pan out. But there are relevant professional paths that will provide you valuable career capital for a variety of positions and types of roles.

The rest of this article explains what work in this area might involve, how you can develop career capital and test your fit, and where some promising places to work might be.

What kinds of work might contribute to AI governance?

What should governance-related work on AI actually involve? There are a variety of ways to pursue AI governance strategies, and as the field becomes more mature, the paths are likely to become clearer and more established.

We generally don’t think people early in their careers should be aiming for a specific job that they think would be high-impact. They should instead aim to develop skills, experience, knowledge, judgement, networks, and credentials — what we call career capital — that they can later use when an opportunity to have a positive impact is ripe.

This may involve following a pretty standard career trajectory, or it may involve bouncing around in different kinds of roles. Sometimes, you just have to apply to a bunch of different roles and test your fit for various types of work before you know what you’ll be good at. The main thing to keep in mind is that you should try to get excellent at something for which you have strong personal fit and that will let you contribute to solving pressing problems.

In the AI governance and coordination space, we see at least six large categories of work that we expect to be important:

Government work
Research on AI policy and strategy
Industry work
Advocacy and lobbying
Third-party auditing and evaluation
International work and coordination

There aren’t necessarily openings in all these categories at the moment for careers in AI governance, but they represent a range of sectors in which impactful work may potentially be done in the coming years and decades. Thinking about the different skills and forms of career capital that will be useful for the categories of work you could see yourself doing in the future can help you figure out what your immediate next steps should be. (We discuss how to assess your fit and enter this field below.)

You may want to — and indeed it may be advantageous to — move between these different categories of work at different points in your career. You can also test out your fit for various roles by taking internships, fellowships, entry-level jobs, temporary placements, or even doing independent research, all of which can serve as career capital for a range of paths.

We have also reviewed career paths in AI technical safety research and engineering and information security, which may be crucial to reducing risks from AI, and which may play a significant role in an effective governance agenda. People serious about pursuing a career in AI governance should familiarise themselves with these fields as well.

Government work

Taking a role within government could lead to playing an important role in the development, enactment, and enforcement of AI policy.

Note that we generally expect that the US federal government will be the most significant player in AI governance for the foreseeable future. This is because of its global influence and its jurisdiction over much of the AI industry, including the top three AI labs training state-of-the-art, general-purpose models (Anthropic, OpenAI, and Google DeepMind) and key parts of the chip supply chain. Much of this article focuses on US policy and government.this article offers solid advice.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²

But other governments and international institutions may also end up having important roles to play in certain scenarios. For example, the UK government, the European Union, China, and potentially others, may all present opportunities for impactful AI governance work. Some US state-level governments, such as California, may also offer opportunities for impact and gaining career capital.

What would this work involve? Sections below discuss how to enter US policy work and which areas of the government that you might aim for.

But at the broadest level, people interested in positively shaping AI policy should aim to gain the skills and experience to work in areas of government with some connection to AI or emerging technology policy.

This can include roles in: legislative branches, domestic regulation, national security, diplomacy, appropriations and budgeting, and other policy areas.

If you can get a role out of the gate that is already working directly on this issue, such as a staff position with a lawmaker who is focused on AI, that could be a great opportunity.

Otherwise, you should seek to learn as much as you can about how policy works and which government roles might allow you to have the most impact, while establishing yourself as someone who’s knowledgeable about the AI policy landscape. Having almost any significant government role that touches on some aspect of AI, or having some impressive AI-related credential, may be enough to get you quite far.

One way to advance your career in government on a specific topic is what some call “getting visibility” — that is, using your position to learn about the landscape and connect with the actors and institutions that affect the policy area you care about. You’ll want to be invited to meetings with other officials and agencies, be asked for input on decisions, and engage socially with others who work in the policy area. If you can establish yourself as a well-regarded expert on an important but neglected aspect of the issue, you’ll have a better shot at being included in key discussions and events.

Career trajectories within government can be broken down roughly as follows:

Standard government track: This involves entering government at a relatively low level and building up your career capital on the inside by climbing the seniority ladder. For the highest impact, you’d ideally end up reaching senior levels by sticking around, gaining skills and experience, and getting promoted. You may move between agencies, departments, or branches.
Specialisation career capital: You can also move in and out of government throughout your career. People on this trajectory will also work at nonprofits, think tanks, industry labs, political parties, academia, and other organisations. But they will primarily focus on becoming an expert in a topic — such as AI. It can be harder to get seniority this way, but the value of expertise and experience can sometimes outweigh seniority.
Direct-impact work: Some people move into government jobs without a longer plan to build career capital because they see an opportunity for direct, immediate impact. This might look like getting tapped to lead an important commission or providing valuable input on an urgent project. We don’t generally recommend planning on this kind of strategy for your career, but it’s good to be aware of it as an opportunity that might be worth taking at some point.

Research on AI policy and strategy

There’s still a lot of research to be done on the most important avenues for AI governance approaches. While there are some promising proposals for a system of regulatory and strategic steps that can help reduce the risk of an AI catastrophe, there aren’t many concrete and publicly available policy proposals ready for adoption.

The world needs more concrete proposals for AI policies that would really start to tackle the biggest threats; developing such policies, and deepening our understanding of the strategic needs of the AI governance space, should be high priorities.

Other relevant research could involve surveys of public opinion that could inform communication strategies, legal research about the feasibility of proposed policies, technical research on issues like compute governance, and even higher-level theoretical research into questions about the societal implications of advanced AI. Some research, such as that done by Epoch AI, focuses on forecasting the future course of AI developments, which can influence AI governance decisions.

However, several experts we’ve talked to warn that a lot of research on AI governance may prove to be useless, so it’s important to be reflective and seek input from others in the field — both from experienced policy practitioners and technical experts — about what kind of contribution you can make. We list several research organisations below that we think would be good to work at in order to pursue promising research on this topic.

One potentially useful approach for testing your fit for this work — especially when starting out in this research — is to write up analyses and responses to existing work on AI policy or investigate some questions in this area that haven’t been the subject of much attention. You can then share your work widely, send it out for feedback from people in the field, and evaluate how much you enjoy the work and whether you might productively contribute to this research longer term.

But it’s possible to spend too long testing your fit without making much progress, and some people find that they’re best able to contribute when they’re working on a team. So don’t overweight or over-invest in independent work, especially if there are few signs it’s working out especially well for you. This kind of project can make sense for maybe a month or a bit longer — but it’s unlikely to be a good idea to spend much more than that without meaningful funding or some really encouraging feedback from people working in the field.

If you have the experience to be hired as a researcher, work on AI governance can be done in academia, nonprofit organisations, and think tanks. Some government agencies and committees, too, perform valuable research.

Note that universities and academia have their own priorities and incentives that often aren’t aligned with producing the most impactful work. If you’re already an established researcher with tenure, it may be highly valuable to pivot into work on AI governance — this position may even give you a credible platform from which to advocate for important ideas.

But if you’re just starting out a research career and want to focus on this issue, you should carefully consider whether your work will be best supported inside or outside of academia. For example, if you know of a specific programme with particular mentors who will help you pursue answers to critical questions in this field, it might be worth doing. We’re less inclined to encourage people to pursue generic academic-track roles with the vague hope that one day they can do important research on this topic.

Advanced degrees in policy or relevant technical fields may well be valuable, though — see more discussion of this in the section on how to assess your fit and get started.

Industry work

While government policy is likely to play a key role in coordinating various actors interested in reducing the risks from advanced AI, internal policy and corporate governance at the largest AI labs themselves is also a powerful tool. We think people who care about reducing risk can potentially do valuable work internally at industry labs. (Read our career review of non-technical roles at AI labs.)

At the highest level, deciding who sits on corporate boards, what kind of influence those boards have, and to what extent the organisation is structured to seek profit and shareholder value as opposed to other aims, can end up having a major impact on the direction a company takes. If you might be able to get a leadership role at a company developing frontier AI models, such as a management position or a seat on the board, it could potentially be a very impactful position.

If you’re able to join a policy team at a major lab, you can model threats and help develop, implement, and evaluate promising proposals internally to reduce risks. And you can build consensus around best practices, such as strong information security policies, using outside evaluators to find vulnerabilities and dangerous behaviours in AI systems (red teaming), and testing out the latest techniques from the field of AI safety.

And if, as we expect, AI labs face increasing government oversight, industry governance and policy work can ensure compliance with any relevant laws and regulations that get put in place. Interfacing with government actors and facilitating coordination over risk reduction approaches could be impactful work.

In general, the more cooperative AI labs are with each otherlabs cooperating to reduce risks, but there might also be legal obstacles to some forms of cooperation — such as anti-trust laws. Figuring out how labs can act responsibly while also complying with all relevant laws may be an impactful course of action.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³ and outside groups seeking to minimise catastrophic risks from AI, the better. And this doesn’t seem to be an outlandish hope — many industry leaders have expressed concern about extinction risks and have even called for regulation of the frontier technology they’re creating.

That said, we can expect this cooperation to take substantial work — it would be surprising if the best policies for reducing risks were totally uncontroversial in industry, since labs also face huge commercial incentives to build more powerful systems, which can carry more risk. The more everyone’s able to communicate and align their incentives, the better things seem likely to go.

Advocacy and lobbying

People outside of government or AI labs can influence the shape of public policy and corporate governance via advocacy and lobbying.

As of this writing, there has not yet been a large public movement in favour of regulating or otherwise trying to reduce risks from AI, so there aren’t many openings that we know about in this category. But we expect growing interest in this area to open up new opportunities to press for political action and policy changes at AI labs, and it could make sense to start building career capital and testing your fit now for different kinds of roles that would fall into this category down the line.

If you believe AI labs may be disposed to advocate for generally beneficial regulation, you might want to try to work for them, or become a lobbyist for the industry as a whole, to push the government to adopt specific policies. It’s plausible that AI labs will have by far the best understanding of the underlying technology, as well as the risks, failure modes, and safest paths forward.

On the other hand, it could be the case that AI labs have too much of a vested interest in the shape of regulations to reliably advocate for broadly beneficial policies. If that’s right, it may be better to join or create advocacy organisations unconnected from the industry — supported by donations or philanthropic foundations — that can take stances that are opposed to the labs’ commercial interests.

For example, it could be the case that the best approach from a totally impartial perspective would be at some point to deliberately slow down or halt the development of increasingly powerful AI models. Advocates could make this demand of the labs themselves or of the government to slow down AI progress. It may be difficult to come to this conclusion or advocate for it if you have strong connections to the companies creating these systems.

It’s also possible that the best outcomes will be achieved with a balance of industry lobbyists and outside lobbyists and advocates making the case for their preferred policies — as both bring important perspectives.

We expect there will be increasing public interest in AI policy as the technological advancements have ripple effects in the economy and wider society. And if there’s increasing awareness of the impact of AI on people’s lives, the risks the technology poses may become more salient to the public, which will give policymakers strong incentives to take the problem seriously. It may also bring new allies into the cause of ensuring that the development of advanced AI goes well.

Advocacy can also:

Highlight neglected but promising approaches to governance that have been uncovered in research
Facilitate the work of policymakers by showcasing the public’s support for governance measures
Build bridges between researchers, policymakers, the media, and the public by communicating complicated ideas in an accessible way to many audiences
Pressure corporations themselves to proceed more cautiously
Change public sentiment around AI and discourage irresponsible behaviour by individual actors, such as the spreading of powerful open-source models

However, note that advocacy can sometimes backfire. Predicting how information will be received is far from straightforward. Drawing attention to a cause area can sometimes trigger a backlash; presenting problems with certain styles of rhetoric can alienate people or polarise public opinion; spreading misleading or mistaken messages can discredit yourself and fellow advocates. It’s important that you are aware of the risks, consult with others (particularly those who you respect but might disagree with tactically), and commit to educating yourself deeply about the topic before expounding on it in public.

You can read more in the section about doing harm below. We also recommend reading our article on ways people trying to do good accidentally make things worse and how to avoid them.

Case study: the Future of Life Institute open letter

In March 2023, the Future of Life Institute published an open letter calling for a pause of at least six months on training any new models more “powerful” than OpenAI’s GPT-4 — which had been released about a week earlier. GPT-4 is a state-of-the-art language model that can be used through ChatGPT to produce novel and impressive text responses to a wide range of prompts.

The letter attracted a lot of attention, perhaps in part because it was signed by prominent figures such as Elon Musk. While it didn’t immediately achieve its explicit aims — the labs didn’t commit to a pause — it drew a lot of attention and fostered public conversations about the risks of AI and the potential benefits of slowing down. (An earlier article titled “Let’s think about slowing down AI” — by Katja Grace of the research organisation AI Impacts — aimed to have a similar effect.)

There’s no clear consensus on whether the FLI letter was on the right track. Some critics of the letter, for example, said that its advice would actually lead to worse outcomes overall if followed, because it would slow down AI safety research while many of the innovations that drive AI capabilities progress, such as chip development, would continue to race forward. Proponents of the letter pushed back on these claims.one summary of arguments for and against the wisdom of the letter.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴ It does seem clear that the letter changed the public discourse around AI safety in a way that few other efforts have achieved, which is proof of concept for what impactful advocacy can accomplish.

Third-party auditing and evaluation

If regulatory measures are put in place to reduce the risks of advanced AI, some agencies and organisations — within government or outside — will need to audit companies and systems to make sure that regulations are being followed.

One nonprofit, the Alignment Research Center, has been at the forefront of this kind of work.⁵ In addition to its research work, it has launched a program to evaluate the capabilities of advanced AI models. In early 2023, the organisation partnered with two leading AI labs, OpenAI and Anthropic, to evaluate the capabilities of the latest versions of their chatbot models prior to their release. They sought to determine in a controlled environment if the models had any potentially dangerous capabilities.

The labs voluntarily cooperated with ARC for this project, but at some point in the future, these evaluations may be legally required.

Governments often rely on third-party auditors as crucial players in regulation, because the government may lack the expertise (or the capacity to pay for the expertise) that the private sector has. There aren’t many such opportunities available in this type of role that we know of as of this writing, but they may end up playing a critical part of an effective AI governance framework.

Other types of auditing and evaluation may be required as well. ARC has said it intends to develop methods to determine which models are appropriately aligned — that is, that they will behave as their users intend them to behave — prior to release.

Governments may also want to employ auditors to evaluate the amount of compute that AI developers have access to, their information security practices, the uses of models, the data used to train models, and more.

Acquiring the technical skills and knowledge to perform these types of evaluations, and joining organisations that will be tasked to perform them, could be the foundation of a highly impactful career. This kind of work will also likely have to be facilitated by people who can manage complex relationships across industry and government. Someone with experience in both sectors could have a lot to contribute.

Some of these types of roles may have some overlap with work in AI technical safety research.

One potential advantage of working in the private sector for AI governance work is you may be significantly better paid than you would be in government.

International work and coordination

US-China

For someone with the right fit, cooperation and coordination with China on the safe development of AI could be a particularly impactful approach within the broad AI governance career path.

The Chinese government has been a major funder in the field of AI, and the country has giant tech companies that could potentially drive forward advances.

Given tensions between the US and China, and the risks posed by advanced AI, there’s a lot to be gained from increasing trust, understanding, and coordination between the two countries. The world will likely be much better off if we can avoid a major conflict between great powers and if the most significant players in emerging technology can avoid exacerbating any global risks.

We have a separate career review that goes into more depth on China-related AI safety and governance paths.

Other governments and international organisations

As we’ve said, we focus most on US policy and government roles. This is largely because we anticipate that the US is now and will likely continue to be the most pivotal actor when it comes to regulating AI, with a major caveat being China, as discussed in the previous section.

But many people interested in working on this issue can’t or don’t want to work in US policy — perhaps because they live in another country and don’t intend on moving.

Much of the advice above still applies to these people, because roles in AI governance research and advocacy can be done outside of the United States.⁶ And while we don’t think it’s generally as impactful in expectation as US government work, opportunities in other governments and international organisations can be complementary to the work to be done in the US.

The United Kingdom, for instance, may present another strong opportunity for AI policy work that would complement US work. Top UK officials have expressed interest in developing policy around AI, perhaps even a new international agency, and reducing extreme risks. And the UK government announced in 2023 the creation of a new AI Foundation Model Taskforce, with the expressed intention to drive forward safety research.

It’s possible that by taking significant steps to understand and regulate AI, the UK will encourage or inspire US officials to take similar steps by showing how it can work.

And any relatively wealthy country could use portions of its budget to fund AI safety research. While a lot of the most important work likely needs to be done in the US, along with leading researchers and at labs with access to large amounts of compute, some lines of research may be productive even without these resources. Any significant advances in AI safety research, if communicated properly, could be used by researchers working on the most powerful models.

Other countries might also develop liability standards for the creators of AI systems that could incentivise corporations to proceed more cautiously and judiciously before releasing models.

The European Union has shown that its data protection standards — the General Data Protection Regulation (GDPR) — affect corporate behaviour well beyond its geographical boundaries. EU officials have also pushed forward on regulating AI, and some research has explored the hypothesis that the impact of the union’s AI regulations will extend far beyond the continent — the so-called “Brussels effect.”

And at some point, we do expect there will be AI treaties and international regulations, just as the international community has created the International Atomic Energy Agency, the Biological Weapons Convention, and Intergovernmental Panel on Climate Change to coordinate around and mitigate other global catastrophic threats.

Efforts to coordinate governments around the world to understand and share information about threats posed by AI may end up being extremely important in some future scenarios.

The Organisation for Economic Cooperation and Development is one place where such work might occur. So far, it has been the most prominent international actor working on AI policy and has created the AI Policy Observatory.

Third-party countries may also be able to facilitate cooperation and reduce tensions betweens the United States and China, whether around AI or other potential flashpoints, should such an intervention become necessary.

How policy gets made

What does it actually take to make policy?

In this section, we’ll discuss three phases of policy making: agenda setting, policy creation and development, and implementation. We’ll generally discuss these as aspects of making government policy, but they could also be applied to organisational policy. The following section will discuss the types of work that you could do to positively contribute to the broad field of AI governance.

Agenda setting

To enact and implement a programme of government policies that have a positive impact, you have to first ensure that the subject of potential legislation and regulation is on the agenda for policymakers.

Agenda setting for policy involves identifying and defining problems, drawing attention to the problems and raising their salience (at least to the relevant people), and promoting potential approaches to solving them.

For example, when politicians take office, they often enter on a platform of promises made to their constituents and their supporters about which policy agendas they want to pursue. Those agendas are formed through public discussion, media narratives, internal party politics, deliberative debate, interest group advocacy, and other forms of input. The agenda can be, to varying degrees, problem-specific — having a broad remit of “improving health care.” Or it could be more solution-specific — aiming to create, for example, a single-payer health system.

Issues don’t necessarily have to be unusually salient to get on the agenda. Policymakers or officials at various levels of government can prioritise solving certain problems or enacting specific proposals that aren’t the subject of national debate. In fact, sometimes making issues too salient, framing them in divisive ways, or allowing partisanship and political polarisation to shape the discussion, can make it harder to successfully put solutions on the agenda.

What’s key for agenda setting as an approach to AI governance is that people with the authority have to buy into the idea of prioritising the issue, if they’re going to use their resources and political capital to focus on it.

Policy creation and development

While there does appear to be growing enthusiasm for a set or sets of policy proposals that could start to reduce the risk of an AI-related catastrophe, there’s still a lack of concrete policies that are ready to get off the ground.

This is what the policy creation and development process is for. Researchers, advocates, civil servants, lawmakers and their staff, and others all can play a role in shaping the actual legislation and regulation that the government eventually enforces. In the corporate context, internal policy creation can serve similar functions, though it may be less enforceable unless backed up with contracts.

Policy creation involves crafting solutions for the problem at hand with the policy tools available, usually requiring input from technical experts, legal experts, stakeholders, and the public. In countries with strong judicial review like the United States, special attention often has to be paid to make sure laws and regulations will hold up under the scrutiny of judges.

Once concrete policy options are on the table, they must be put through the relevant decision-making process and negotiations. If the policy in question is a law that’s going to be passed, rather than a regulation, it needs to be crafted so that it will have enough support from lawmakers and other key decision makers to be enacted. This can happen in a variety of ways; it might be rolled into a larger piece of legislation that has wide support, or it may be rallied around and brought forward as its own package to be voted on individually.

Policy creation can also be an iterative process, as policies are enacted, implemented, monitored, evaluated, and revised.

For more details on the complex work of policy creation, we recommend Thomas Kalil’s article “Policy Entrepreneurship in the White House: Getting Things Done in Large Organisations.”

Implementation

Fundamentally, a policy is only an idea. For an idea to have an impact, someone actually has to carry it out. Any of the proposals for AI-related government policy — including standards and evaluations, licensing, and compute governance — will demand complex management and implementation.

Policy implementation on this scale requires extensive planning, coordination in and out of government, communication, resource allocation, training and more — and every step in this process can be fraught with challenges. To rise to the occasion, any government implementing an AI policy regime will need talented individuals working at a high standard.

The policy creation phase is critical and is probably the highest-priority work. But good ideas can be carried out badly, which is why policy implementation is also a key part of the AI governance agenda.

Examples of people pursuing this path

How to assess your fit and get started

If you’re early on in your career, you should focus first on getting skills and other career capital to successfully contribute to the beneficial governance and regulation of AI.

You can gain career capital for roles in many ways, and the best options will vary based on your route to impact. But broadly speaking, working in or studying fields such as politics, law, international relations, communications, and economics can all be beneficial for going into policy work.

And expertise in AI itself, gained by studying and working in machine learning and technical AI safety, or potentially related fields such as computer hardware or information security, should also give you a big advantage.

Testing your fit

One general piece of career advice we give is to find relatively “cheap” tests to assess your fit for different paths. This could mean, for example, taking a policy internship, applying for a fellowship, doing a short bout of independent research as discussed above, or taking classes or courses on technical machine learning or computer engineering.

It can also just involve talking to people currently doing a job you might consider having and finding out what the day-to-day experience of the work is like and what skills are needed.

All of these factors can be difficult to predict in advance. While we grouped “government work” into a single category above, that label covers a wide range of positions and types of occupations in many different departments and agencies. Finding the right fit within a broad category like “government work” can take a while, and it can depend on a lot of factors out of your control, such as the colleagues you happen to work closely with. That’s one reason it can be useful to build broadly valuable career capital, so you have the option to move around to find the right role for you.

And don’t underestimate the value at some point of just applying to many relevant openings in the field and sector you’re aiming for and seeing what happens. You’ll likely face a lot of rejection with this strategy, but you’ll be able to better assess your qualifications for different kinds of roles after you see how far you get in the process, if you take enough chances. This can give you a lot more information than just guessing about whether you have the right experience.

It can be useful to rule out certain types of work if you gather evidence that you’re not a strong fit for the role. For example, if you invest a lot of time and effort trying to get into reputable universities or nonprofit institutions to do AI governance research, but you get no promising offers and receive little encouragement even after applying widely, this might be a significant signal that you’re unlikely to thrive in that particular path.

That wouldn’t mean you have nothing to contribute, but your comparative advantage may lie elsewhere.

Read the section of our career guide on finding a job that fits you.

Types of career capital

For a field like AI governance, a mix of people with technical and policy expertise — and some people with both — is needed.

While anyone involved in this field should work to maintain an evolving understanding of both the technical and policy details, you’ll probably start out focusing on either policy or technical skills to gain career capital.

This section covers:

Generally useful career capital
Policy-related career capital
Technical career capital
Other specific forms of career capital

Much of this advice is geared toward roles in the US, though it may be relevant in other contexts.

Generally useful career capital

The chapter of the 80,000 Hours career guide on career capital lists five key components that will be useful in any path: skills and knowledge, connections, credentials, character, and runway.

For most jobs touching on policy, social skills, networking, and — for lack of a better word — political skill will be a huge asset. This can probably be learned to some extent, but some people may find they don’t have these kinds of skills and can’t or don’t want to acquire them. That’s OK — there are many other routes to having a fulfilling and impactful career, and there may be some roles within this path that demand these skills to a much lesser extent. That’s why testing your fit is important.

Read the full section of the career guide on career capital.

To gain skills in policy, you can pursue education in many relevant fields, such as political science, economics, and law.

Many master’s programmes offer specific coursework on public policy, science and society, security studies, international relations, and other topics; having a graduate degree or law degree will give you a leg up for many positions.

In the US, a master’s, a law degree, or a PhD is particularly useful if you want to climb the federal bureaucracy. Our article on US policy master’s degrees provides detailed information about how to assess the many options.

Internships in DC are a promising route to evaluate your aptitude for policy work and to establish early career capital. Many academic institutions now offer a strategic “Semester in DC” programme, which can let you explore placements of choice in Congress, federal agencies, or think tanks. The Virtual Student Federal Service (VSFS) also offers part-time, remote government internships. Balancing their academic commitments, students can access these opportunities during the academic year, further solidifying their grasp on the intricacies of policy work. This technological advance could be the stepping stone many aspiring policy professionals need to ascend in their future careers.

Once you have a suitable background, you can take entry-level positions within parts of the government where you can build a professional network and develop your skills. In the US, you can become a congressional staffer, or take a position at a relevant federal department, such as the Department of Commerce, Department of Energy, or the Department of State. Alternatively, you can gain experience in think tanks — a particularly promising option if you have a strong aptitude for research — and government contractors, private sector companies providing services to the government.

In Washington, DC, the culture is fairly unique. There’s a big focus on networking and internal bureaucratic politics to navigate. We’ve also been told that while merit matters to a degree in US government work, it is not the primary determinant of who is most successful. People who think they wouldn’t feel able or comfortable to be in this kind of environment for the long term should consider whether other paths would be best.

If you find you can enjoy government and political work, impress your colleagues, and advance in your career, though, that’s a strong signal that you have the potential to make a real impact. Just being able to thrive in government work can be an extremely valuable comparative advantage.

US citizenship

Your citizenship may affect which opportunities are available to you. Many of the most important AI governance roles within the US — particularly in the executive branch and Congress — are only open to, or will at least heavily favour, American citizens. All key national security roles that might be especially important will be restricted to those with US citizenship, which is required to obtain a security clearance.

This may mean that those who lack US citizenship will want to consider not pursuing roles that require it. Alternatively, they could plan to move to the US and pursue the long process of becoming a citizen. For more details on immigration pathways and types of policy work available to non-citizens, see this blog post on working in US policy as a foreign national. Consider also participating in the annual diversity visa lottery if you’re from an eligible country, as this is low effort and allows you to win a US green card if you’re lucky.

Technical career capital

Technical experience in machine learning, AI hardware, and related fields can be a valuable asset for an AI governance career. So it will be very helpful if you’ve studied a relevant subject area for an undergraduate or graduate degree, or a particularly productive course of independent study.

We have a guide to technical AI safety careers, which explains how to learn the basics of machine learning.

The following resources may be particularly useful for familiarising yourself with the field of AI safety:

Working at an AI lab in technical roles, or other companies that use advanced AI systems and hardware, may also provide significant career capital in AI policy paths. (Read our career review discussing the pros and cons of working at a top AI lab.)

We also have a separate career review on how becoming an expert in AI hardware could be very valuable in governance work.

Many politicians and policymakers are generalists, as their roles require them to work in many different subject areas and on different types of problems. This means they’ll need to rely on expert knowledge when crafting and implementing policy on AI technology that they don’t fully understand. So if you can provide them this information, especially if you’re skilled at communicating it clearly, you can potentially fill influential roles.

Some people who may have initially been interested in pursuing a technical AI safety career, but who have found that they either are no longer interested in that path or find more promising policy opportunities, might also decide that they can effectively pivot into a policy-oriented career.

It is common for people with STEM backgrounds to enter and succeed in US policy careers. People with technical credentials that they may regard as fairly modest — such as computer science bachelor’s degrees or a master’s in machine learning — often find their knowledge is highly valued in Washington, DC.

Most DC jobs don’t have specific degree requirements, so you don’t need to have a policy degree to work in DC. Roles specifically addressing science and technology policy are particularly well-suited for people with technical backgrounds, and people hiring for these roles will value higher credentials like a master’s or, better even, a terminal degree like a PhD or MD.

There are many fellowship programmes specifically aiming to support people with STEM backgrounds to enter policy careers; some are listed below.

This won’t be right for everybody — many people with technical skills may not have the disposition or skills necessary for engaging in policy. People in policy-related paths often benefit from strong writing and social skills as well as a comfort navigating bureaucracies and working with people holding very different motivations and worldviews.

Ernest Moniz: from scientific expertise to political leadership

Ernest Moniz started his career as a physicist, becoming a professor at MIT in the 1970s. He gained management experience as a department head and leader of the MIT research council, and in the 1990s, he became the associate director for science in the Office of Science and Technology Policy in the White House.

He then became under secretary in the Department of Energy, which set him up to eventually become the Secretary of Energy under President Barack Obama — a highly influential role where he used his technical expertise in international negotiations over nuclear weapons.

Secretary Moniz shows how far someone with a technical background can get in terms of influencing policy — but of course you don’t have to be an MIT professor or lead a federal agency to have an impact.

Other specific forms of career capital

There are other ways to gain useful career capital that could be applied in this career path.

If you have or gain great communication skills as, say, a journalist or an activist, these skills could be very useful in advocacy and lobbying around AI governance.
- Especially since advocacy around AI issues is still in its early stages, it will likely need people with experience advocating in other important cause areas to share their knowledge and skills.
Academics with relevant skill sets are sometimes brought into government for limited stints to serve as advisors in agencies such as the US Office of Science and Technology. This isn’t necessarily the foundation of a longer career in government, though it can be, and it can give an academic deeper insight into policy and politics than they might otherwise gain.
You can work at an AI lab in non-technical roles, gaining a deeper familiarity with the technology, the business, and the culture. (Read our career review discussing the pros and cons of working at a top AI lab.)
You could work on political campaigns and get involved in party politics. This is one way to get involved in legislation, learn about policy, and help impactful lawmakers, and you can also potentially help shape the discourse around AI governance. Note, though, the previously mentioned downsides of potentially polarising public opinion around AI policy; and entering party politics may limit your potential for impact whenever the party you’ve joined doesn’t hold power.
You could even try to become an elected official yourself, though it’s obviously competitive. If you take this route, make sure you find trustworthy and highly informed advisors to rely on to build expertise in AI, since politicians have many other responsibilities and won’t be able to focus as much on any particular issue.
You can focus on developing specific skill sets that might be valuable in AI governance, such as information security, intelligence work, diplomacy with China, etc.
- Other skills: Organisational, entrepreneurial, management, diplomatic, and bureaucratic skills will also likely prove highly valuable in this career path. There may be new auditing agencies to set up or policy regimes to implement. Someone who has worked at high levels in other high-stakes industries, started an influential company, or coordinated complicated negotiations between various groups, would bring important skills to the table.

Want one-on-one advice on pursuing this path?

Because this is one of our priority paths, if you think this path might be a great option for you, we’d be especially excited to advise you on next steps, one-on-one. We can help you consider your options, make connections with others working in the same field, and possibly even help you find jobs or funding opportunities.

APPLY TO SPEAK WITH OUR TEAM

Where can this kind of work be done?

Since successful AI governance will require work from governments, industry, and other parties, there will be many potential jobs and places to work for people in this path. The landscape will likely shift over time, so if you’re just starting out on this path, the places that seem most important might be different by the time you’re pivoting to using your career capital to make progress on the issue.

Within the US government, for instance, it’s not clear which bodies will be most impactful when it comes to AI policy in five years. It will likely depend on choices that are made in the meantime.

That said, it seems useful to give our understanding of which parts of the government are generally influential in technology governance and most involved right now to help orient. Gaining AI-related experience in government right now should still serve you well if you end up wanting to move into a more impactful AI-related role down the line when the highest-impact areas to work in are clearer.

We’ll also give our current sense of important actors outside government where you might be able to build career capital and potentially have a big impact.

Note that this list has by far the most detail about places to work within the US government. We would like to expand it to include more options as we learn more. You can use this form to suggest additional options for us to include. (And the fact that an option isn’t on this list shouldn’t be taken to mean we recommend against it or even that it would necessarily be less impactful than the places listed.)

We have more detail on other options in separate (and older) career reviews, including the following:

With that out of the way, here are some of the places where someone could do promising work or gain valuable career capital:

US Congress

In Congress, you can either work directly for lawmakers themselves or as staff on a legislative committee. Staff roles on the committees are generally more influential on legislation and more prestigious, but for that reason, they’re more competitive. If you don’t have that much experience, you could start out in an entry-level job staffing a lawmaker and then later try to transition to staffing a committee.

Some people we’ve spoken to expect the following committees — and some of their subcommittees — in the House and Senate to be most impactful in the field of AI. You might aim to work on these committees or for lawmakers who have significant influence on these committees.

House of Representatives

House Committee on Energy and Commerce
House Judiciary Committee
House Committee on Space, Science, and Technology
House Committee on Appropriations
House Armed Services Committee
House Committee on Foreign Affairs
House Permanent Select Committee on Intelligence

Senate

Senate Committee on Commerce, Science, and Transportation
Senate Judiciary Committee
Senate Committee on Foreign Relations
Senate Committee on Homeland Security and Government Affairs
Senate Committee on Appropriations
Senate Committee on Armed Services
Senate Select Committee on Intelligence
Senate Committee on Energy & Natural Resources
Senate Committee on Banking, Housing, and Urban Affairs

The Congressional Research Service, a nonpartisan legislative agency, also offers opportunities to conduct research that can impact policy design across all subjects.

US executive branch

In general, we don’t recommend taking entry-level jobs within the executive branch for this path because it’s very difficult to progress your career through the bureaucracy at this level. It’s better to get a law degree or relevant master’s degree, which can give you the opportunity to start with more seniority.

The influence of different agencies over AI regulation may shift over time, and there may even be entirely new agencies set up to regulate AI at some point, which could become highly influential. Whichever agency may be most influential in the future, it will be useful to have accrued career capital working effectively in government, creating a professional network, learning about day-to-day policy work, and deepening your knowledge of all things AI.

We have a lot of uncertainty about this topic, but here are some of the agencies that may have significant influence on at least one key dimension of AI policy as of this writing:

Executive Office of the President (EOP)
- Office of Management and Budget (OMB)
- National Security Council (NSC)
- Office of Science and Technology Policy (OSTP)
Department of State
- Office of the Special Envoy for Critical and Emerging Technology (S/TECH)
- Bureau of Cyberspace and Digital Policy (CDP)
- Bureau of Arms Control, Verification and Compliance (AVC)
- Office of Emerging Security Challenges (ESC)
Federal Trade Commission
Department of Defense (DOD)
- Chief Digital and Artificial Intelligence Office (CDAO)
- Emerging Capabilities Policy Office
- Defense Advanced Research Projects Agency (DARPA)
- Defense Technology Security Administration (DTSA)
Intelligence Community (IC)
- Intelligence Advanced Research Projects Activity (IARPA)
- National Security Agency (NSA)
- Science advisor roles within the various agencies that make up the intelligence community
Department of Commerce (DOC)
- The Bureau of Industry and Security (BIS)
- The National Institute of Standards and Technology (NIST)
- CHIPS Program Office
Department of Energy (DOE)
- Artificial Intelligence and Technology Office (AITO)
- Advanced Scientific Computing Research (ASCR) Program Office
National Science Foundation (NSF)
- Directorate for Computer and Information Science and Engineering (CISE)
- Directorate for Technology, Innovation and Partnerships (TIP)
Cybersecurity and Infrastructure Security Agency (CISA)

Readers can find listings for roles in these departments and agencies at the federal government’s job board, USAJOBS; a more curated list of openings for potentially high impact roles and career capital is on the 80,000 Hours job board.

We do not currently recommend attempting to join the US government via the military if you are aiming for a career in AI policy. There are many levels of seniority to rise through and many people competing for places, and initially you have to spend all of your time doing work unrelated to AI. However, having military experience already can be valuable career capital for other important roles in government, particularly national security positions. We would consider this route more competitive for military personnel who have been to an elite military academy, such as West Point, or for commissioned officers at rank O-3 or above.

US fellowships

Policy fellowships are among the best entryways into policy work. They offer many benefits like first-hand policy experience, funding, training, mentoring, and networking. While many require an advanced degree, some are open to college graduates.

US think tanks

Center for Security and Emerging Technology (CSET)
Center for a New American Security
RAND Corporation
The MITRE Corporation
Brookings Institution
Carnegie Endowment for International Peace
Center for Strategic and International Studies (CSIS)
Federation of American Scientists (FAS)

Research nonprofits

Alignment Research Center
Open Philanthropy¹
Institute for AI Policy and Strategy
Epoch AI
Centre for the Governance of AI (GovAI)
Center for AI Safety (CAIS)
Legal Priorities Project
Apollo Research
Centre for Long-Term Resilience
AI Impacts
Johns Hopkins Applied Physics Lab

Industry labs

Anthropic is an AI safety company working on building interpretable and safe AI systems. They focus on empirical AI safety research. Anthropic cofounders Daniela and Dario Amodei gave an interview about the lab on the Future of Life Institute podcast. On our podcast, we spoke to Chris Olah, who leads Anthropic’s research into interpretability, and Nova DasSarma, who works on systems infrastructure at Anthropic.
Google DeepMind is probably the largest and most well-known research group developing general artificial machine intelligence, and is famous for its work creating AlphaGo, AlphaZero, and AlphaFold. It is not principally focused on safety, but has two teams focused on AI safety, with the Scalable Alignment Team focusing on aligning existing state-of-the-art systems, and the Alignment Team focused on research bets for aligning future systems.
OpenAI, founded in 2015, is a lab that is trying to build artificial general intelligence that is safe and benefits all of humanity. OpenAI is well known for its language models like GPT-4. Like DeepMind, it is not principally focused on safety, but has a safety team and a governance team. Jan Leike (head of the alignment team) has some blog posts on how he thinks about AI alignment.
Ought is a machine learning lab building Elicit, an AI research assistant. Their aim is to align open-ended reasoning by learning human reasoning steps and to direct AI progress towards helping with evaluating evidence and arguments.

(Read our career review discussing the pros and cons of working at a top AI lab.)

International organisations

Organisation for Economic Co-operation and Development (OECD)
International Atomic Energy Agency (IAEA)
International Telecommunication Union (ITU)
International Organization for Standardization (ISO)
European Union institutions (e.g., European Commission)
Simon Institute for Longterm Governance

Our job board features opportunities in AI safety and policy:

View all opportunities

How this career path can go wrong

Doing harm

As we discuss in an article on accidental harm, there are many ways to set back a new field that you’re working in when you’re trying to do good, and this could mean your impact is negative rather than positive. (You may also want to read our article on harmful careers.)

It seems likely there’s a lot of potential to inadvertently cause harm in the emerging field of AI governance. We discussed some possibilities in the section on advocacy and lobbying. Some other possibilities include:

Pushing for a given policy to the detriment of a superior policy
Communicating about the risks of AI in a way that ratchets up geopolitical tensions
Enacting a policy that has the opposite impact of its intended effect
Setting policy precedents that could be exploited by dangerous actors down the line
Funding projects in AI that turn out to be dangerous
Sending the message, implicitly or explicitly, that the risks are being managed when they aren’t, or that they’re lower than they in fact are
Suppressing technology that would actually be extremely beneficial for society

The trouble is that we have to act with incomplete information, so it may never be very clear when or if people in AI governance are falling into these traps. Being aware that they are potential ways of causing harm will help you keep alert for these possibilities, though, and you should remain open to changing course if you find evidence that your actions may be damaging.

And we recommend keeping in mind the following pieces of general guidance from our article on accidental harm:

Burning out

We think this work is exceptionally pressing and valuable, so we encourage our readers who might have a strong personal fit for governance work to test it out. But going into government, in particular, can be difficult. Some people we’ve advised have gone into policy roles with the hope of having an impact, only to burn out and move on.

At the same time, many policy practitioners find their work very meaningful, interesting, and varied.

Some roles in government may be especially challenging for the following reasons:

Some roles can be very fast-paced, involving relatively high stress and long hours. This is particularly true in Congress and senior executive branch positions and much less so in think tanks or junior agency roles.
It can take a long time to get into positions with much autonomy or decision-making authority.
Progress on the issues you care about can be slow, and you often have to work on other priorities. Congressional staffers in particular typically have very broad policy portfolios.
Work within bureaucracies faces many limitations, which can be frustrating.
It can be demotivating to work with people who don’t share your values. Though note that policy can select for altruistic people — even if they have different beliefs about how to do good.
The work isn’t typically well paid relative to comparable positions outside of government.

So we recommend speaking to people in the kinds of positions you might aim to have in order to get a sense of whether the career path would be right for you. And if you do choose to pursue it, look out for signs that the work may be having a negative effect on you and seek support from people who understand what you care about.

If you end up wanting or needing to leave and transition into a new path, that’s not necessarily a loss or a reason for regret. You will likely make important connections and learn a lot of useful information and skills. This career capital can be useful as you transition into another role, perhaps pursuing a complementary approach to AI governance and coordination.

What the increased attention on AI means

We’ve been concerned about risks posed by AI for years. Based on the arguments that this technology could potentially cause a global catastrophe, and otherwise have a dramatic impact on future generations, we’ve advised many people to work to mitigate the risks.

The arguments for the risk aren’t completely conclusive, in our view. But the arguments are worth taking seriously, and given the fact that few others in the world seemed to be devoting much time to even figuring out how big the threat was or how to mitigate it (while at the same time progress in making AI systems more powerful was accelerating) we concluded it was worth ranking among our top priorities.

Now that there’s increased attention on AI, some might conclude that it’s less neglected and thus less pressing to work on. However, the increased attention on AI also makes many interventions potentially more tractable than they had been previously, as policymakers and others are more open to the idea of crafting AI regulations.

And while more attention is now being paid to AI, it’s not clear it will be focused on the most important risks. So there’s likely still a lot of room for important and pressing work positively shaping the development of AI policy.

Learn more

Top recommendations

AI Governance Course – AGI Safety Fundamentals from BlueDot Impact
Podcast: Tantum Collins on what he’s learned as an AI policy insider
A list of AI policy resources to learn about the field and recent development

Further recommendations

Resources from 80,000 Hours

Article: Working in US AI policy
Podcast: Tom Kalil on how to have a big impact in government & huge organisations, based on 16 years’ experience in the White House
Podcast: Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his four-part playbook for AI risk
Podcast: Lennart Heim on the compute governance and what has to come after
Podcasts: Nathan Labenz on the final push for AGI, understanding OpenAI’s leadership drama, and red-teaming frontier models and recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps
Career review: China-related AI safety and governance paths
Podcast collection: The 80,000 Hours Podcast on Artificial Intelligence

Resources from other sources

US policy career resources on the Effective Altruism Forum
Jobs that can help with the most important century by Holden Karnofsky
12 tentative ideas for US AI policy by Luke Muehlhauser of Open Philanthropy
Why and how governments should monitor AI development by Jess Whittlestone and Jack Clark
AGI safety career advice by Richard Ngo of OpenAI
The longtermist AI governance landscape: a basic overview on the Effective Altruism forum
Four Battlegrounds: Power in the Age of Artificial Intelligence by Paul Scharre
The New Fire: War, Peace, and Democracy in the Age of AI by Ben Buchanan and Andrew Imbrie
Think tank reports, such as from CSET, CNAS, CSIS
Government strategies, such as the White House’s 2023 US National Artificial Intelligence R&D Strategic Plan, NIST’s 2023 AI Risk Management Framework, the DOD’s 2022 Responsible AI Strategy and Implementation Pathway, and the 2021 Final Report of the National Security Commission on AI
Lessons from the Development of the Atomic Bomb by Toby Ord
Collection of work on ‘Should you should focus on the EU if you’re interested in AI governance for longtermist/x-risk reasons?’ on the Effective Altruism Forum

AI safety technical research

Benjamin Hilton — Mon, 19 Jun 2023 10:28:33 +0000

Progress in AI — while it could be hugely beneficial — comes with significant risks. Risks that we’ve argued could be existential.

But these risks can be tackled.

With further progress in AI safety, we have an opportunity to develop AI for good: systems that are safe, ethical, and beneficial for everyone.

This article explains how you can help.

In a nutshell: Artificial intelligence will have transformative effects on society over the coming decades, and could bring huge benefits — but we also think there’s a substantial risk. One promising way to reduce the chances of an AI-related catastrophe is to find technical solutions that could allow us to prevent AI systems from carrying out dangerous behaviour.

Pros

Opportunity to make a significant contribution to a hugely important area of research
Intellectually challenging and interesting work
The area has a strong need for skilled researchers and engineers, and is highly neglected overall

Cons

Due to a shortage of managers, it’s difficult to get jobs and might take you some time to build the required career capital and expertise
You need a strong quantitative background
It might be very difficult to find solutions
There’s a real risk of doing harm

Key facts on fit

You’ll need a quantitative background and should probably enjoy programming. If you’ve never tried programming, you may be a good fit if you can break problems down into logical parts, generate and test hypotheses, possess a willingness to try out many different solutions, and have high attention to detail.

If you already:

Are a strong software engineer, you could apply for empirical research contributor roles right now (even if you don’t have a machine learning background, although that helps)
Could get into a top 10 machine learning PhD, that would put you on track to become a research lead
Have a very strong maths or theoretical computer science background, you’ll probably be a good fit for theoretical alignment research

Why AI safety technical research is high impact

As we’ve argued, in the next few decades, we might see the development of hugely powerful machine learning systems with the potential to transform society. This transformation could bring huge benefits — but only if we avoid the risks.

We think that the worst-case risks from AI systems arise in large part because AI systems could be misaligned — that is, they will aim to do things that we don’t want them to do. In particular, we think they could be misaligned in such a way that they develop (and execute) plans that pose risks to humanity’s ability to influence the world, even when we don’t want that influence to be lost.

We think this means that these future systems pose an existential threat to civilisation.

Even if we find a way to avoid this power-seeking behaviour, there are still substantial risks — such as misuse by governments or other actors — which could be existential threats in themselves.

Want to learn more about risks from AI? Read the problem profile.

We think that technical AI safety could be the highest-impact career path we’ve identified to date. That’s because it seems like a promising way of reducing risks from AI. We’ve written an entire article about what those risks are and why they’re so important.

There are many ways in which we could go about reducing the risks that these systems might pose. But one of the most promising may be researching technical solutions that prevent unwanted behaviour — including misaligned behaviour — from AI systems. (Finding a technical way to prevent misalignment in particular is known as the alignment problem.)

In the past few years, we’ve seen more organisations start to take these risks more seriously. Many of the leading industry labs developing AI — including Google DeepMind and OpenAI — have teams dedicated to finding these solutions, alongside academic research groups including at MIT, Oxford, Cambridge, Carnegie Mellon University, and UC Berkeley.

That said, the field is still very new. We think there are only around 300 people working on technical approaches to reducing existential risks from AI systems,full-time equivalent") working on the problem of reducing existential risks from AI using technical methods. After making a number of assumptions, I estimated that there were 76 to 536 FTE working on technical AI safety (90% confidence). To learn more, read the section on neglectedness in our problem profile on AI, alongside footnote 3.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹ which makes this a highly neglected field.

Finding technical ways to reduce this risk could be quite challenging. Any practically helpful solution must retain the usefulness of the systems (remaining economically competitive with less safe systems), and continue to work as systems improve over time (that is, it needs to be ‘scalable’). As we argued in our problem profile, it seems like it might be difficult to find viable solutions, particularly for modern ML (machine learning) systems.

(If you don’t know anything about ML, we’ve written a very very short introduction to ML, and we’ll go into more detail on how to learn about ML later in this article. Alternatively, if you do have ML experience, talk to our team — they can give you personalised career advice, make introductions to others working on these issues, and possibly even help you find jobs or funding opportunities.)

Although it seems hard, there are lots of avenues for more research — and the field really is very young, so there are new promising research directions cropping up all the time. So we think it’s moderately tractable, though we’re highly uncertain.

In fact, we’re uncertain about all of this and have written extensively about reasons we might be wrong about AI risk.

But, overall, we think that — if it’s a good fit for you — going into AI safety technical research may just be the highest-impact thing you can do with your career.

What does this path involve?

AI safety technical research generally involves working as a scientist or engineer at major AI labs, in academia, or in independent nonprofits.

These roles can be very hard to get. You’ll likely need to build up career capital before you end up in a high-impact role (more on this later, in the section on how to enter). That said, you may not need to spend a long time building this career capital — we’ve seen exceptionally talented people move into AI safety from other quantitative fields, sometimes in less than a year.

Most AI safety technical research falls on a spectrum between empirical research (experimenting with current systems as a way of learning more about what will work), and theoretical research (conceptual and mathematical research looking at ways of ensuring that future AI systems are safe).

No matter where on this spectrum you end up working, your career path might look a bit different depending on whether you want to aim at becoming a research lead — proposing projects, managing a team and setting direction — or a contributor — focusing on carrying out the research.

Finally, there are two slightly different roles you might aim for:

In academia, research is often led by professors — the key distinguishing feature of being a professor is that you’ll also teach classes and mentor grad students (and you’ll definitely need a PhD).
Many (but not all) contributor roles in empirical research are also engineers, often software engineers. Here, we’re focusing on software roles that directly contribute to AI safety research (and which often require some ML background) — we’ve written about software engineering more generally in a separate career review.

We think that research lead roles are probably higher-impact in general. But overall, the impact you could have in any of these roles is likely primarily determined by your personal fit for the role — see the section on how to predict your fit in advance.

Next, we’ll take a look at what working in each path might involve. Later, we’ll go into how you might enter each path.

What does work in the empirical AI safety path involve?

Empirical AI safety tends to involve teams working directly with ML models to identify any risks and develop ways in which they might be mitigated.

That means the work is focused on current ML techniques and techniques that might be applied in the very near future.

Practically, working on empirical AI safety involves lots of programming and ML engineering. You might, for example, come up with ways you could test the safety of existing systems, and then carry out these empirical tests.

You can find roles in empirical AI safety in industry and academia, as well as some in AI safety-focused nonprofits.

Particularly in academia, lots of relevant work isn’t explicitly labelled as being focused on existential risk — but it can still be highly valuable. For example, work in interpretability, adversarial examples, diagnostics and backdoor learning, among other areas, could be highly relevant to reducing the chance of an AI-related catastrophe.

We’re also excited by experimental work to develop safety standards that AI companies might adhere to in the future — for example, the work being carried out by METR.

To learn more about the sorts of research taking place at labs focused on empirical AI safety, take a look at:

While programming is central to all empirical work, generally, research lead roles will be less focused on programming; instead, they need stronger research taste and theoretical understanding. In comparison, research contributors need to be very good at programming and software engineering.

What does work in the theoretical AI safety path involve?

Theoretical AI safety is much more heavily conceptual and mathematical. Often it involves careful reasoning about the hypothetical behaviour of future systems.

Generally, the aim is to come up with properties that it would be useful for safe ML algorithms to have. Once you have some useful properties, you can try to develop algorithms with these properties (bearing in mind that to be practically useful these algorithms will have to end up being adopted by industry). Alternatively, you could develop ways of checking whether systems have these properties. These checks could, for example, help hold future AI products to high safety standards.

Many people working in theoretical AI safety will spend much of their time proving theorems or developing new mathematical frameworks. More conceptual approaches also exist, although they still tend to make heavy use of formal frameworks.

Some examples of research in theoretical AI safety include:

Risks from learned optimisation in advanced machine learning systems by Hubinger et al.
Eliciting latent knowledge by Christiano, Cotra and Xu.
Formalizing the presumption of independence by Christiano, Neyman, and Xu
Discovering agents by Kenton et al.
Active reward learning from multiple teachers by Barnett et al.

There are generally fewer roles available in theoretical AI safety work, especially as research contributors. Theoretical research contributor roles exist at nonprofits (primarily the Alignment Research Center), as well as at some labs (for example, Anthropic’s work on conditioning predictive models and the Causal Incentives Working Group at Google DeepMind). Most contributor roles in theoretical AI safety probably exist in academia (for example, PhD students in teams working on projects relevant to theoretical AI safety).

Some exciting approaches to AI safety

There are lots of technical approaches to AI safety currently being pursued. Here are just a few of them:

Scalably learning from human feedback. Examples include iterated amplification, AI safety via debate, building AI assistants that are uncertain about our goals and learn them by interacting with us, and other ways to get AI systems trained with stochastic gradient descent to report truthfully what they know.
Threat modelling. An example of this work would be demonstrating the possibility of (allowing us to study) dangerous capabilities, like deceptive or manipulative AI systems. You can read an overview in a recent Google DeepMind paper. This work splits into work that evaluates whether a model has dangerous capabilities (like the work of METR in evaluating GPT-4), and work that evaluates whether a model would cause harm in practice (like Anthropic’s research into the behaviour of large language models and this paper on goal misgeneralisation).
Interpretability research. This work involves studying why AI systems do what they do and trying to put it into human-understandable terms. For example, this paper examined how AlphaZero learns chess, and this paper looked into finding latent knowledge in language models without supervision. This category also includes mechanistic interpretability — for example, Zoom In: An Introduction to Circuits by Olah et al.). For more, see this survey paper, as well as Hubinger’s a transparency and interpretability tech tree, and Nanda’s A Longlist of Theories of Impact for Interpretability for overviews of of how interpretability research could reduce existential risk from AI.
Other anti-misuse research to reduce the risks of catastrophe caused by misuse of systems. (We’ve written more on this in our problem profile on AI risk). For example, this work includes training AIs so they’re hard to use for dangerous purposes. (Note there’s lots of overlap with the other work on this list).
Research to increase the robustness of neural networks. This work involves ensuring that the sorts of behaviour neural networks display when exposed to one set of inputs continues when exposed to inputs they haven’t previously been exposed to, in order to prevent AI systems changing to unsafe behaviour. See section 2 of Unsolved Problems in AI safety for more.
Work to build cooperative AI. Find ways to ensure that even if individual AI systems seem safe, they don’t produce bad outcomes through interacting with other sociotechnical systems. For more, see Open Problems in Cooperative AI by Dafoe et al. or the Cooperative AI Foundation. This seems particularly relevant for the reduction of ‘s-risks.’
More generally, there are some unified safety plans. For more, see Hubinger’s 11 possible proposals for building safe advanced AI, or Karnofsky’s How might we align transformative AI if it’s developed very soon.²

It’s worth noting that there are many approaches to AI safety, and people in the field strongly disagree on what will or won’t work.

This means that, once you’re working in the field, it can be worth being charitable and careful not to assume that others’ work is unhelpful just because it seemed so on a quick skim. You should probably be uncertain about your own research agenda as well.

What’s more, as we mentioned earlier, lots of relevant work across all these areas isn’t explicitly labelled ‘safety.’

So it’s important to think carefully about how or whether any particular research helps reduce the risks that AI systems might pose.

What are the downsides of this career path?

AI safety technical research is not the only way to make progress on reducing the risks that future AI systems might pose. Also, there are many other pressing problems in the world that aren’t the possibility of an AI-related catastrophe, and lots of careers that can help with them. If you’d be a better fit working on something else, you should probably do that.

Beyond personal fit, there are a few other downsides to the career path:

It can be very competitive to enter (although once you’re in, the jobs are well paid, and there are lots of backup options).
You need quantitative skills — and probably programming skills.
The work is geographically concentrated in just a few places (mainly the California Bay Area and London, but there are also opportunities in places with top universities such as Oxford, New York, Pittsburgh, and Boston). That said, remote work is increasingly possible at many research labs.
It might not be very tractable to find good technical ways of reducing the risk. Although assessments of its difficulty vary, and while making progress is almost certainly possible, it may be quite hard to do so. This reduces the impact that you could have working in the field. That said, if you start out in technical work you might be able to transition to governance work, since that often benefits from technical training and experience with the industry, which most people do not have.)
Relatedly, there’s lots of disagreement in the field about what could work; you’ll probably be able to find at least some people who think what you’re working on is useless, whatever you end up doing.
Most importantly, there’s some risk of doing harm. While gaining career capital, and while working on the research itself, you’ll have to make difficult decisions and judgement calls about whether you’re working on something beneficial (see our anonymous advice about working in roles that advance AI capabilities). There’s huge disagreement on which technical approaches to AI safety might work — and sometimes this disagreement takes the form of thinking that a strategy will actively increase existential risks from AI.

Finally, we’ve written more about the best arguments against AI being pressing in our problem profile on preventing an AI-related catastrophe. If those are right, maybe you could have more impact working on a different issue.

How much do AI safety technical researchers earn?

Many technical researchers work at companies or small startups that pay wages competitive with the Bay Area and Silicon Valley tech industry, and even smaller organisations and nonprofits will pay competitive wages to attract top talent. The median compensation for a software engineer in the San Francisco Bay area was $222,000 per year in 2020.Levels.fyi (visited Jan 27, 2022).

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³ (Read more about software engineering salaries).

This $222,000 median may be an underestimate, as AI roles, especially in top AI labs that are rapidly scaling up their work in AI, often pay better than other tech jobs, and the same applies to safety researchers — even those in nonprofits.

However, academia has lower salaries than industry in general, and we’d guess that AI safety research roles in academia pay less than commercial labs and nonprofits.

Examples of people pursuing this path

How to predict your fit in advance

You’ll generally need a quantitative background (although not necessarily a background in computer science or machine learning) to enter this career path.

There are two main approaches you can take to predict your fit, and it’s helpful to do both:

Try it out: try out the first few steps in the section below on learning the basics. If you haven’t yet, try learning some python, as well as taking courses in linear algebra, calculus, and probability. And if you’ve done that, try learning a bit about deep learning and AI safety. Finally, the best way to try this out for many people would be to actually get a job as a (non-safety) ML engineer (see more in the section on how to enter).
Talk to people about whether it would be a good fit for you: If you want to become a technical researcher, our team probably wants to talk to you. We can give you 1-1 advice, for free. If you know anyone working in the area (or something similar), discuss this career path with them and ask for their honest opinion. You may be able to meet people through our community. Our advisors can also help make connections.

It can take some time to build expertise, and enjoyment can follow expertise — so be prepared to take some time to learn and practice before you decide to switch to something else entirely.

If you’re not sure what roles you might aim for longer term, here are a few rough ways you could make a guess about what to aim for, and whether you might be a good fit for various roles on this path:

Testing your fit as an empirical research contributor: In a blog post about hiring for safety researchers, the Google DeepMind team said “as a rough test for the Research Engineer role, if you can reproduce a typical ML paper in a few hundred hours and your interests align with ours, we’re probably interested in interviewing you.”
- Looking specifically at software engineering, one hiring manager at Anthropic said that if you could, with a few weeks’ work, write a complex new feature or fix a very serious bug in a major ML library, they’d want to interview you straight away. (Read more.)
Testing your fit for theoretical research: If you could have got into a top 10 maths or theoretical computer science PhD programme if you’d optimised your undergrad to do so, that’s a decent indication of your fit (and many researchers in fact have these PhDs). The Alignment Research Center (one of the few organisations that hires for theoretical research contributors, as of 2023) said that they were open to hiring people without any research background. They gave four tests of fit: creativity (e.g. you may have ideas for solving open problems in the field, like Eliciting Latent Knowledge); experience designing algorithms, proving theorems, or formalising concepts; broad knowledge of maths and computer science; and having thought a lot about the AI alignment problem in particular.
Testing your fit as a research lead (or for a PhD): The vast majority of research leads have a PhD. Also, many (but definitely not all) AI safety technical research roles will require a PhD — and if they don’t, having a PhD (or being the sort of person that could get one) would definitely help show that you’re a good fit for the work. To get into a top 20 machine learning PhD programme, you’d probably need to publish something like a first author workshop paper, as well as a third author conference paper at a major ML conference (like NeurIPS or ICML). (Read more about whether you should do a PhD).

Read our article on personal fit to learn more about how to assess your fit for the career paths you want to pursue.

How to enter

You might be able to apply for roles right away — especially if you meet, or are near meeting, the tests we just looked at — but it also might take you some time, possibly several years, to skill up first.

So, in this section, we’ll give you a guide to entering technical AI safety research. We’ll go through four key questions:

Hopefully, by the end of the section, you’ll have everything you need to get going.

Learning the basics

To get anywhere in the world of AI safety technical research, you’ll likely need a background knowledge of coding, maths, and deep learning.

You might also want to practice enough to become a decent ML engineer (although this is generally more useful for empirical research), and learn a bit about safety techniques in particular (although this is generally more useful for empirical research leads and theoretical researchers).

We’ll go through each of these in turn.

Learning to program

You’ll probably want to learn to code in python, because it’s the most widely used language in ML engineering.

The first step is probably just trying it out. As a complete beginner, you can write a Python program in less than 20 minutes that reminds you to take a break every two hours. Don’t be discouraged if your code doesn’t work the first time — that’s what normally happens when people code!

Once you’ve done that, you have a few options:

Teach yourself to program. Try working through a free beginner course like Automate the boring stuff with Python by Al Seigart. There also are many great introductory computer science and programming courses online, including: Udacity’s Intro to Computer Science, MIT’s Introduction to Computer Science and Programming, and Stanford’s Programming Methodology. Then, try finding something you want to build, and building it — or getting involved in an open-source project. For interview practice, try leetcode or TopCoder, or the exercises in Cracking the Coding Interview by Gayle McDowell.
Take a college course. If you’re in university, this is a great option because it allows you to learn programming while the opportunity cost of your time is lower. You can even consider majoring in computer science (or another subject involving lots of programming).
Learn on the job. If you can find internships, you’ll gain practical experience and skills you otherwise wouldn’t pick up from academic degrees.
Go to a bootcamp. Coding bootcamps are focused on taking people with little knowledge of programming to as highly paid a job as possible within a couple of months — though some claim the long-term prospects are not as good because you lack a deep understanding of computer science. Course Report is a great guide to choosing a bootcamp. Be careful to avoid low-quality bootcamps. You can also find online bootcamps — for people completely new to programming — focused on ML, like Udemy’s Python for Data Science and Machine Learning Bootcamp.

You can read more about learning to program — and how to get your first job in software engineering (if that’s the route you want to take) — in our career review on software engineering.

Learning the maths

We’d generally recommend studying a quantitative degree (like maths, computer science or engineering), most of which will cover all three areas pretty well.

If you want to self-study (especially if you don’t have a quantitative degree) here are some possible resources:

Calculus: 3blue1brown’s video series on calculus could be a good place to start. You may also be able to follow recorded university courses: MIT’s single variable calculus (which requires only high school algebra and trigonometry) followed by MIT’s course in vector and multivariable calculus.
Linear algebra: Again, we’d suggest 3blue1brown’s video series on linear algebra as a place to start. In his post about technical alignment careers, Rogers-Smith recommends Linear Algebra Done Right by Sheldon Axler. Finally, if you prefer lectures, try MIT’s undergraduate course in linear algebra (although note that this course assumes knowledge of multivariate calculus).
Probability: Take a look at MIT’s undergraduate course in probability and random variables.

You might be able to find resources that cover all these areas, like Imperial College’s Mathematics for Machine Learning.

Learning basic machine learning

You’ll likely need to have a decent understanding of how AI systems are currently being developed. This will involve learning about machine learning and neural networks, before diving into any specific subfields of deep learning.

Again, there’s the option of covering this at university. If you’re currently at college, it’s worth checking if you can take an ML course even if you’re not majoring in computer science.

There’s one important caveat here: you’ll learn a huge amount on the job, and the amount you’ll need to know in advance for any role or course will vary hugely! Not even top academics know everything about their fields. It’s worth trying to find out how much you’ll need to know for the role you want to do before you invest hundreds of hours into learning about ML.

With that caveat in mind, here are some suggestions of places you might start if you want to self-study the basics:

3blue1brown’s series on neural networks is a really great place to start for beginners.
When I was learning, I used Neural Networks and Deep Learning — it’s an online textbook, good if you’re familiar with the maths, with some helpful exercises as well.
Online intro courses like fast.ai (focused on practical applications), Full Stack Deep Learning, and the various courses at deeplearning.ai.
For more detail, see university courses like MIT’s *Introduction to Machine Learning, NYU’s Deep Learning for even more detail. We’d also recommend Google DeepMind’s lecture series.

Learning about AI safety

If you’re going to work as an AI safety researcher, it usually helps to know about AI safety.

This isn’t always true — some engineering roles won’t require much knowledge of AI safety. But even then, knowing the basics will probably help land you a position, and can also help with things like making difficult judgement calls and avoiding doing harm. And if you want to be able to identify and do useful work, you’ll need to learn about the field eventually.

Because the field is still so new, there probably aren’t (yet) university courses you can take. So you’ll need to do some self-study. Here are some places you might start:

Section 3 of our problem profile about preventing an AI-related catastrophe provides an introduction to the problems that AI safety attempts to solve (with a particular focus on alignment).
Rob Miles’ YouTube channel is full of popular and well-explained introductory videos that don’t need much background knowledge of ML.
AXRP – the AI X-risk Research Podcast — is full of in-depth (and enjoyable) conversations with researchers about their research.
The courses from AGI Safety Fundamentals, in particular the AI Alignment Course, possibly followed by Alignment 201, which provide an introduction to research on the alignment problem.
Intro to ML Safety, a course from the Center for AI Safety focuses on withstanding hazards (“robustness”), identifying hazards (“monitoring”), and reducing systemic hazards (“systemic safety”), as well as alignment.

For more suggestions — especially when it comes to reading about the nature of the risks we might face from AI systems — take a look at the top resources to learn more from our problem profile.

Should you do a PhD?

Some technical research roles will require a PhD — but many won’t, and PhDs aren’t the best option for everyone.

The main benefit of doing a PhD is probably practising setting and carrying out your own research agenda. As a result, getting a PhD is practically the default if you want to be a research lead.

That said, you can also become a research lead without a PhD — in particular, by transitioning from a role as a research contributor. At some large labs, the boundary between being a contributor and a lead is increasingly blurry.

Many people find PhDs very difficult. They can be isolating and frustrating, and take a very long time (4–6 years). What’s more, both your quality of life and the amount you’ll learn will depend on your supervisor — and it can be really difficult to figure out in advance whether you’re making a good choice.

So, if you’re considering doing a PhD, here are some things to consider:

Your long-term vision: If you’re aiming to be a research lead, that suggests you might want to do a PhD — the vast majority of research leads have PhDs. If you mainly want to be a contributor (e.g. an ML or software engineer), that suggests you might not. If you’re unsure, you should try doing something to test your fit for each, like trying a project or internship. You might try a pre-doctoral research assistant role — if the research you do is relevant to your future career, these can be good career capital, whether or not you do a PhD.
The topic of your research: It’s easy to let yourself become tied down to a PhD topic you’re not confident in. If the PhD you’re considering would let you work on something that seems useful for AI safety, it’s probably — all else equal — better for your career, and the research itself might have a positive impact as well.
Mentorship: What are the supervisors or managers like at the opportunities open to you? You might be able to find ML engineering or research roles in industry where you could learn much more than you would in a PhD — or vice versa. When picking a supervisor, try reaching out to the current or former students of a prospective supervisor and asking them some frank questions. (Also, see this article on how to choose a PhD supervisor.)
Your fit for the work environment: Doing a PhD means working on your own with very little supervision or feedback for long periods of time. Some people thrive in these conditions! But some really don’t and find PhDs extremely difficult.

Read more in our more detailed (but less up-to-date) review of machine learning PhDs.

It’s worth remembering that most jobs don’t need a PhD. And for some jobs, especially empirical research contributor roles, even if a PhD would be helpful, there are often better ways of getting the career capital you’d need (for example, working as a software or ML engineer). We’ve interviewed two ML engineers who have had hugely successful careers without doing a PhD.

Whether you should do a PhD doesn’t depend (much) on timelines

We think it’s plausible that we will develop AI that could be hugely transformative for society by the end of the 2030s.

All else equal, that possibility could argue for trying to have an impact right away, rather than spending five (or more) years doing a PhD.

Ultimately, though, how well you, in particular, are suited to a particular PhD is probably a much more important factor than when AI will be developed.

That is to say, we think the increase in impact caused by choosing a path that’s a good fit for you is probably larger than any decrease in impact caused by delaying your work. This is in part because the spread in impact caused by the specific roles available to you, as well as your personal fit for them, is usually very large. Some roles (especially research lead roles) will just require having a PhD, and others (especially more engineering-heavy roles) won’t — and people’s fit for these paths varies quite a bit.

We’re also highly uncertain about estimates about when we might develop transformative AI. This uncertainty reduces the expected cost of any delay.

Most importantly, we think PhDs shouldn’t be thought of as a pure delay to your impact. You can do useful work in a PhD, and generally, the first couple of years in any career path will involve a lot of learning the basics and getting up to speed. So if you have a good mentor, work environment, and choice of topic, your PhD work could be as good as, or possibly better than, the work you’d do if you went to work elsewhere early in your career. And if you suddenly receive evidence that we have less time than you thought, it’s relatively easy to drop out.

There are lots of other considerations here — for a rough overview, and some discussion, see this post by 80,000 Hours advisor Alex Lawsen, as well as the comments.

Overall, we’d suggest that instead of worrying about a delay to your impact, think instead about which longer-term path you want to pursue, and how the specific opportunities in front of you will get you there.

How to get into a PhD

ML PhDs can be very competitive. To get in, you’ll probably need a few publications (as we said above, something like a first author workshop paper, as well as a third author conference paper at a major ML conference (like NeurIPS or ICML), and references, probably from ML academics. (Although publications also look good whatever path you end up going down!)

To end up at that stage, you’ll need a fair bit of luck, and you’ll also need to find ways to get some research experience.

One option is to do a master’s degree in ML, although make sure it’s a research masters — most ML master’s degrees primarily focus on preparation for industry.

Even better, try getting an internship in an ML research group. Opportunities include RISS at Carnegie Mellon University, UROP at Imperial College London, the Aalto Science Institute international summer research programme, the Data Science Summer Institute, the Toyota Technological Institute intern programme and MILA. You can also try doing an internship specifically in AI safety, for example at CHAI. However, there are sometimes disadvantages to doing internships specifically in AI safety directly — in general, it may be harder to publish and mentorship might be more limited.

Another way of getting research experience is by asking whether you can work with researchers. If you’re already at a top university, it can be easiest to reach out to people working at the university you’re studying at.

PhD students or post-docs can be more responsive than professors, but eventually, you’ll want a few professors you’ve worked with to provide references, so you’ll need to get in touch. Professors tend to get lots of cold emails, so try to get their attention! You can try:

Getting an introduction, for example from a professor who’s taught you
Mentioning things you’ve done (your grades, relevant courses you’ve taken, your GitHub, any ML research papers you’ve attempted to replicate as practice)
Reading some of their papers and the main papers in the field, and mention them in the email
Applying for funding that’s available to students who want to work in AI safety, and letting people know you’ve got funding to work with them

Ideally, you’ll find someone who supervises you well and has time to work with you (that doesn’t necessarily mean the most famous professor — although it helps a lot if they’re regularly publishing at top conferences). That way, they’ll get to know you, you can impress them, and they’ll provide an amazing reference when you apply for PhDs.

It’s very possible that, to get the publications and references you’ll need to get into a PhD, you’ll need to spend a year or two working as a research assistant, although these positions can also be quite competitive.

This guide by Adam Gleave also goes into more detail on how to get a PhD, including where to apply and tips on the application process itself. We discuss ML PhDs in more detail in our career review on ML PhDs (though it’s outdated compared to this career review).

Getting a job in empirical AI safety research

Ultimately, the best way of learning to do empirical research — especially in contributor and engineering-focused roles — is to work somewhere that does both high-quality engineering and cutting-edge research.

The top three labs are probably Google DeepMind (who offer internships to students), OpenAI (who have a 6-month residency programme) and Anthropic. (Working at a leading AI lab carries with it some risk of doing harm, so it’s important to think carefully about your options. We’ve written a separate article going through the major relevant considerations.)

To end up working in an empirical research role, you’ll probably need to build some career capital.

Whether you want to be a research lead or a contributor, it’s going to help to become a really good software engineer. The best ways of doing this usually involve getting a job as a software engineer at a big tech company or at a promising startup. (We’ve written an entire article about becoming a software engineer.)

Many roles will require you to be a good ML engineer, which means going further than just the basics we looked at above. The best way to become a good ML engineer is to get a job doing ML engineering — and the best places for that are probably leading AI labs.

For roles as a research lead, you’ll need relatively more research experience. You’ll either want to become a research contributor first, or enter through academia (for example by doing a PhD).

All that said, it’s important to remember that you don’t need to know everything to start applying, as you’ll inevitably learn loads on the job — so do try to find out what you’ll need to learn to land the specific roles you’re considering.

How much experience do you need to get a job? It’s worth reiterating the tests we looked at above for contributor roles:

In a blog post about hiring for safety researchers, the DeepMind team said “as a rough test for the Research Engineer role, if you can reproduce a typical ML paper in a few hundred hours and your interests align with ours, we’re probably interested in interviewing you.”
Looking specifically at software engineering, one hiring manager at Anthropic said that if you could, with a few weeks’ work, write a new feature or fix a serious bug in a major ML library, they’d want to interview you straight away. (Read more.)

In the process of getting this experience, you might end up working in roles that advance AI capabilities. There are a variety of views on whether this might be harmful — so we’d suggest reading our article about working at leading AI labs and our article containing anonymous advice from experts about working in roles that advance capabilities. It’s also worth talking to our team about any specific opportunities you have.

If you’re doing another job, or a degree, or think you need to learn some more before trying to change careers, there are a few good ways of getting more experience doing ML engineering that go beyond the basics we’ve already covered:

Getting some experience in software / ML engineering. For example, if you’re doing a degree, you might try an internship as a software engineer during the summer. DeepMind offer internships for students with at least two years of study in a technical subject,
Replicating papers. One great way of getting experience doing ML engineering, is to replicate some papers in whatever sub-field you might want to work in. Richard Ngo, an AI governance researcher at OpenAI, has written some advice on replicating papers. But bear in mind that replicating papers can be quite hard — take a look at Amid Fish’s blog on what he learned replicating a deep RL paper. Finally, Rogers-Smith has some suggestions on papers to replicate. If you do spend some time replicating papers, remember that when you get to applying for roles, it will be really useful to be able to prove you’ve done the work. So try uploading your work to GitHub, or writing a blog on your progress. And if you’re thinking about spending a long time on this (say, over 100 hours), try to get some feedback on the papers you might replicate before you start — you could even reach out to a lab you want to work for.
Taking or following a more in-depth course in empirical AI safety research. Redwood Research ran the MLAB bootcamp, and you can apply for access to their curriculum here. You could also take a look at this Deep Learning Curriculum by Jacob Hilton, a researcher at the Alignment Research Center — although it’s probably very challenging without mentorship.⁴ The Alignment Research Engineer Accelerator is a program that uses this curriculum. Some mentors on the SERI ML Alignment Theory Scholars Program focus on empirical research.
Learning about a sub-field of deep learning. In particular, we’d suggest natural language processing (in particular transformers — see this lecture as a starting point) and reinforcement learning (take a look at Pong from Pixels by Andrej Karpathy, and OpenAI’s Spinning up in Deep RL). Try to get to the point where you know about the most important recent advances.

Finally, Athena is an AI alignment mentorship program for women with a technical background looking to get jobs in the alignment field

Getting a job in theoretical AI safety research

There are fewer jobs available in theoretical AI safety research, so it’s harder to give concrete advice. Having a maths or theoretical computer science PhD isn’t always necessary, but is fairly common among researchers in industry, and is pretty much required to be an academic.

If you do a PhD, ideally it’d be in an area at least somewhat related to theoretical AI safety research. For example, it could be in probability theory as applied to AI, or in theoretical CS (look for researchers who publish in COLT or FOCS).

Alternatively, one path is to become an empirical research lead before moving into theoretical research.

Compared to empirical research, you’ll need to know relatively less about engineering, and relatively more about AI safety as a field.

Once you’ve done the basics, one possible next step you could try is reading papers from a particular researcher, or on a particular topic, and summarising what you’ve found.

You could also try spending some time (maybe 10–100 hours) reading about a topic and then some more time (maybe another 10–100 hours) trying to come up with some new ideas on that topic. For example, you could try coming up with proposals to solve the problem of eliciting latent knowledge. Alternatively, if you wanted to focus on the more mathematical side, you could try having a go at the assignment at the end of this lecture by Michael Cohen, a grad student at the University of Oxford.

If you want to enter academia, reading a ton of papers seems particularly important. Maybe try writing a survey paper on a certain topic in your spare time. It’s a great way to master a topic, spark new ideas, spot gaps, and come up with research ideas. When applying to grad school or jobs, your paper is a fantastic way to show you love research so much you do it for fun.

There are some research programmes aimed at people new to the field, such as the SERI ML Alignment Theory Scholars Program, to which you could apply.

Other ways to get more concrete experience include doing research internships, working as a research assistant, or doing a PhD, all of which we’ve written about above, in the section on whether and how you can get into a PhD programme.

One note is that a lot of people we talk to try to learn independently. This can be a great idea for some people, but is fairly tough for many, because there’s substantially less structure and mentorship.

Recommended organisations

AI labs in industry that have empirical technical safety teams, or are focused entirely on safety:

Anthropic is an AI safety company working on building interpretable and safe AI systems. They focus on empirical AI safety research. Anthropic cofounders Daniela and Dario Amodei gave an interview about the lab on the Future of Life Institute podcast. On our podcast, we spoke to Chris Olah, who leads Anthropic’s research into interpretability, and Nova DasSarma, who works on systems infrastructure at Anthropic.
METR works on assessing whether cutting-edge AI systems could pose catastrophic risks to civilization, including early-stage, experimental work to develop techniques, and evaluating systems produced by Anthropic and OpenAI.
The Center for AI Safety is a nonprofit that does technical research and promotion of safety in the wider machine learning community.
FAR AI is a research nonprofit that incubates and accelerates research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry, including research in adversarial robustness, interpretability and preference learning.
Google DeepMind is probably the largest and most well-known research group developing general artificial machine intelligence, and is famous for its work creating AlphaGo, AlphaZero, and AlphaFold. It is not principally focused on safety, but has two teams focused on AI safety, with the Scalable Alignment Team focusing on aligning existing state-of-the-art systems, and the Alignment Team focused on research bets for aligning future systems.
OpenAI, founded in 2015, is a lab that is trying to build artificial general intelligence that is safe and benefits all of humanity. OpenAI is well known for its language models like GPT-4. Like DeepMind, it is not principally focused on safety, but has a safety team and a governance team. Jan Leike (co-lead of the superalignment team) has some blog posts on how he thinks about AI alignment, and has spoken on our podcast about the sorts of people he’d like to hire for his team.
Ought is a machine learning lab building Elicit, an AI research assistant. Their aim is to align open-ended reasoning by learning human reasoning steps, and to direct AI progress towards helping with evaluating evidence and arguments.
Redwood Research is an AI safety research organisation, whose first big project attempted to make sure language models (like GPT-3) produce output following certain rules with very high probability, in order to address failure modes too rare to show up in standard training.

Theoretical / conceptual AI safety labs:

The Alignment Research Center (ARC) is attempting to produce alignment strategies that could be adopted in industry today while also being able to scale to future systems. They focus on conceptual work, developing strategies that could work for alignment and which may be promising directions for empirical work, rather than doing empirical AI work themselves. Their first project was releasing a report on Eliciting Latent Knowledge, the problem of getting advanced AI systems to honestly tell you what they believe (or ‘believe’) about the world. On our podcast, we interviewed ARC founder Paul Christiano about his research (before he founded ARC).
The Center on Long-Term Risk works to address worst-case risks from advanced AI. They focus on conflict between AI systems.
The Machine Intelligence Research Institute was one of the first groups to become concerned about the risks from machine intelligence in the early 2000s, and its team has published a number of papers on safety issues and how to resolve them.
Some teams in commercial labs also do some more theoretical and conceptual work on alignment, such as Anthropic’s work on conditioning predictive models and the Causal Incentives Working Group at Google DeepMind.

AI safety in academia (a very non-comprehensive list; while the number of academics explicitly and publicly focused on AI safety is small, it’s possible to do relevant work at a much wider set of places):

The Algorithmic Alignment Group in the Computer Science and Artificial Intelligence Laboratory at MIT, led by Dylan Hadfield-Menell
The Center for Human-Compatible AI at UC Berkeley, led by Stuart Russell, focuses on academic research to ensure AI is safe and beneficial to humans. (Our podcast with Stuart Russell examines his approach to provably beneficial AI.)
Jacob Steinhardt’s research group in the Department of Statistics at UC Berkeley
The NYU Alignment research Group led by Sam Bowman
David Krueger’s research group at the Computational and Biological Learning Laboratory at the University of Cambridge
The Foundations of Cooperative AI Lab at Carnegie Mellon University
The Future of Humanity Institute at the University of Oxford has an AI safety research group
The Alignment of Complex Systems research group at Charles University, Prague

Want one-on-one advice on pursuing this path?

We think that the risks posed by the development of AI may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.

We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

APPLY TO SPEAK WITH OUR TEAM

Find a job in this path

If you think you might be a good fit for this path and you’re ready to start looking at job opportunities that are currently accepting applications, see our curated list of opportunities for this path:

View all opportunities

Learn more about AI safety technical research

Top recommendations

The AGI safety fundamentals technical alignment curriculum
The 80,000 Hours Podcast on Artificial Intelligence (a collection of 10 key AI episodes from our podcast)
Charlie Rogers-Smith’s step-by-step guide to AI safety careers (which this article is in large part based on) provides some helpful concrete advice, including ways you might get some funding to help you move into an AI safety technical research career.

Further recommendations

Articles and resources

Here are some suggestions about where you could learn more:

To help you get oriented in the field, we recommend the AI safety starter pack.
Careers in Beneficial AI Research by Adam Gleave, CEO of FAR AI
Our problem profile on AI risk
This sequence of posts on AI safety technical alignment by Richard Ngo)
Our career review of machine learning PhDs
Our career review of software engineering
Our career review of working at a leading AI lab

Podcast episodes

If you prefer podcasts, there are some relevant episodes of the 80,000 Hours podcast you might find helpful:

Preventing an AI-related catastrophe

Benjamin Hilton — Thu, 25 Aug 2022 19:43:58 +0000

Note from the author: At its core, this problem profile tries to predict the future of technology. This is a notoriously difficult thing to do. In addition, there has been much less rigorous research into the risks from AI than into the other risks 80,000 Hours writes about (like pandemics or climate change).It's hard to know how to deal with this lack of research — we may be less concerned because this is evidence that researchers have chosen not to focus on this risk (and therefore, assuming they're more likely to focus on big risks, that the risk is smaller), or we may be more concerned because the risk seems more neglected overall.

Ben Garfinkel — a researcher at the Centre for the Governance of AI — has pointed out that concern among the existential risk community about different risks is somewhat correlated with how hard to analyse these risks are. He continues that:

It doesn't at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It's completely coherent to have something like this attitude: "If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it's not that big a deal. But, in practice, I can't yet think very clearly about it. That means that, unlike in the case of climate change, I also can't rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to uncharitable observers — my efforts will probably look a bit misguided after the fact.

For more, read Garfinkel's post here.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹ That said, there is a growing field of research into the topic, which I’ve tried to reflect. For this article I’ve leaned especially on this draft report by Joseph Carlsmith at Open Philanthropy (also available as a narration), as it’s the most rigorous overview of the risk that I could find. I’ve also had the article reviewed by over 30 people with different expertise and opinions on the topic. (Almost all are concerned about advanced AI’s potential impact.)

If you have any feedback on this article — whether there’s something technical we’ve got wrong, some wording we could improve, or just that you did or didn’t like reading it — we’d really appreciate it if you could tell us what you think using this form.

Why do we think that reducing risks from AI is one of the most pressing issues of our time? In short, our reasons are:

Even before getting into the actual arguments, we can see some cause for concern — as many AI experts think there’s a small but non-negligible chance that AI will lead to outcomes as bad as human extinction.
We’re making advances in AI extremely quickly — which suggests that AI systems could have a significant influence on society, soon.
There are strong arguments that “power-seeking” AI could pose an existential threat to humanity2020 survey asked researchers working on reducing existential risks from AI what risks they were most concerned about. The surveyors asked about five sources of existential risk:
- Risks from superintelligent AI (similar to the scenario we've described here)
- Risks from influence-seeking behaviour
- Risks from AI systems pursuing easy-to-measure goals (similar to the scenario we've described here)
- AI-exacerbated war
- Other intentional misuse of AI not related to war
Approximately, the researchers surveyed were equally concerned with all of these risks. The first three are covered by the section in this article on risks from power-seeking AI while the last two are covered by the section on other risks. If these groupings make sense (which we think they do), this means it's roughly the case that at the time of the survey, researchers were three times as concerned about the broad risk of power-seeking AI than they were about risks from either war or other misuse separately.
" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">² — which we’ll go through below.
Even if we find a way to avoid power-seeking, there are still other risks.
We think we can tackle these risks.
This work is neglected.

We’re going to cover each of these in turn, then consider some of the best counterarguments, explain concrete things you can do to help, and finally outline some of the best resources for learning more about this area.

1. Many AI experts think there’s a non-negligible chance AI will lead to outcomes as bad as extinction

In May 2023, hundreds of AI prominent scientists — and other notable figures — signed a statement saying that mitigating the risk of extinction from AI should be a global priority.

So it’s pretty clear that at least some experts are concerned.

But how concerned are they? And is this just a fringe view?

We looked at three surveys of AI researchers who published at NeurIPS and ICML (two of the most prestigious machine learning conferences) — one in 2016, one in 2019, and one in 2022.Stein-Perlman et al. (2022) (currently only preliminary results are available), conducted in 2022

Zhang et al. (2022), conducted in 2019

Grace et al. (2018), conducted in 2016

All three surveys contacted researchers who published at NeurIPS and ICML conferences.

Stein-Perlman et al. (2022) contacted 4,271 researchers who published at the 2021 conferences (all the researchers were randomly allocated to either the Stein-Perlman et al. survey or a second survey run by others), and received 738 responses (a 17% response rate).

Zhang et al. (2022) contacted all 2,652 authors who published at the 2018 conferences, and received 524 responses (a 20% response rate), although due to a technical error only 296 responses could be used.

Grace et al. (2018) contacted all 1,634 authors who published at the 2015 conferences, and received 352 responses (a 21% response rate).

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³

It’s important to note that there could be considerable selection bias on surveys like this. For example, you might think researchers who go to the top AI conferences are more likely to be optimistic about AI, because they have been selected to think that AI research is doing good. Alternatively, you might think that researchers who are already concerned about AI are more likely to respond to a survey asking about these concerns.notes on her blog that the framing of questions noticeably changes the answers given:

People consistently give later forecasts if you ask them for the probability in N years instead of the year that the probability is M. We saw this in the straightforward HLMI [high-level machine intelligence] question, and most of the tasks and occupations, and also in most of these things when we tested them on mturk people earlier. For HLMI for instance, if you ask when there will be a 50% chance of HLMI you get a median answer of 40 years, yet if you ask what the probability of HLMI is in 40 years, you get a median answer of 30%.

Our interview with Katja goes into more detail on the possible limitations of the 2016 survey.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴

All that said, here’s what we found:

In all three surveys, the median researcher thought that the chances that AI would be “extremely good” was reasonably high: 20% in the 2016 survey, 20% in 2019, and 10% in 2022.x%," we mean "over half of researchers thought that the chances were greater than or equal to x%."

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁵

Indeed, AI systems are already having substantial positive effects — for example, in medical care or academic research.

But in all three surveys, the median researcher also estimated small — and certainly not negligible — chances that AI would be “extremely bad (e.g. human extinction)”: a 5% chance of extremely bad outcomes in the 2016 survey, 2% in 2019, and 5% in 2022. " rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁶

When unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.

In the survey by Zhang et al., researchers were asked about "human-level machine intelligence" (HLMI), defined as:

Human-level machine intelligence (HLMI) is reached when machines are collectively able to perform almost all tasks (>90% of all tasks) that are economically relevant* better than the median human paid to do that task in 2019. You should ignore tasks that are legally or culturally restricted to humans, such as serving on a jury. We define these tasks as all the ones included in the Occupational Information Network (ONET) dataset. O*NET is a widely used dataset of tasks required for current occupations.

They were then asked:

Assume for the purpose of this question that HLMI will at some point exist. How positive or negative do you expect the overall impact of this to be for humanity, in the long run?
Please answer by saying how probable you find the following kinds of impact, with probabilities adding to 100%:

Extremely good (e.g., rapid growth in human flourishing) (2)

On balance good (1)

More or less neutral (0)

On balance bad (-1)

Extremely bad (e.g., human extinction) (-2)

For each survey, an aggregated cumulative density function of the probability of HLMI by year derived from mean or median estimates in the survey was calculated. These functions gave various aggregate chances of HLMI:

50% by 2059 (Stein-Perlman et al., mean estimates)
75% by 2080 (Zhang et al., median estimates)
65% by 2080 (Zhang et al., mean estimates)
75% by 2116 (Grace et al., mean estimates)

This means that the answers we cite are similar to but not the same as answers to the question of "Without assuming that HLMI will exist in the next century, how positive or negative do you expect the overall impact of HLMI to be for humanity in the next century?" We look at more expert forecasts of AI timelines in the section on when we can expect to develop transformative AI.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁷

In the 2022 survey, participants were specifically asked about the chances of existential catastrophe caused by future AI advances — and again, over half of researchers thought the chances of an existential catastrophe was greater than 5%.existential catastrophe that we usually use, and is also similar to the definition of existential catastrophe given by Ord in The Precipice (2020):

An existential catastrophe is the destruction of humanity's long-term potential.

Ord categorises existential risks as either risks of extinction or risks of failed continuation (Ord gives the example of a stable totalitarian regime). We think that permanent and severe disempowerment of the human species would be a form of failed continuation under Ord's definition.

Stein-Perlman et al. next asked participants specifically about the sorts of risks we're most concerned about:

What probability do you put on human inability to control future advanced AI systems causing human extinction or similarly permanent and severe disempowerment of the human species?

The median answer to this question was 10%.

Stein-Perlman notes:

This question is more specific and thus necessarily less probable than the previous question, but it was given a higher probability at the median. This could be due to noise — different random subsets of respondents received the questions, so there is no logical requirement that their answers cohere — or due to the representativeness heuristic.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁸

So experts disagree on the degree to which AI poses an existential risk — a kind of threat we’ve argued deserves serious moral weight.

This fits with our understanding of the state of the research field. Three of the leading labs developing AI — DeepMind, Anthropic and OpenAI — also have teams dedicated to figuring out how to solve technical safety issues that we believe could, for reasons we discuss at length below, lead to an existential threat to humanity.safety team and OpenAI's alignment team focus on technical AI safety research, some of which would mitigate the risks discussed in this article. We've spoken to researchers on both these teams who have told us that they believe that artificial intelligence poses the most significant existential risk to humanity this century, and that their research attempts to reduce this risk. In the same vein:

In 2011, Shane Legg, cofounder and chief scientist at DeepMind, said that AI is his "number 1 [existential] risk for this century, with an engineered biological pathogen coming a close second."
Sam Altman, cofounder and CEO at OpenAI, has at times expressed concerns, though he seems to be very optimistic about AI's impacts overall. For example, in his 2021 interview with Ezra Klein, he was asked about the incentive systems around building AI. He said he thinks the current systems address lots of problems, but "the one that remains that I am — for the entire field, not just us — most concerned about is actually closer to the super powerful systems like the ones that people talk about creating an existential risk to humanity."
We've interviewed some top researchers from these organisations on The 80,000 Hours Podcast, including Dario Amodei, former vice president of research at OpenAI (he's now cofounder and CEO of Anthropic, another AI lab), Jan Leike, former research scientist at DeepMind (he's now Alignment team lead at OpenAI), Jack Clarke, Amanda Askell, and Miles Brundage on the OpenAI policy team (Clarke is now cofounder at Anthropic, Askell is a member of technical staff at Anthropic, and Brundage is head of policy research at OpenAI). All have expressed concern about the consequences of AI for the future of humanity.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁹

There are also several academic research groups (including at MIT, Oxford, Cambridge, Carnegie Mellon University, and UC Berkeley) focusing on these same technical AI safety problems.list of professors who say they are working on AI safety because they believe this work will reduce existential risk. This list is maintained by the Future of Life Institute. The list includes academics from these and other universities.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹⁰

It’s hard to know exactly what to take from all this, but we’re confident that it’s not a fringe position in the field to think that there is a material risk of outcomes as bad as an existential catastrophe. Some experts in the field maintain, though, that the risks are overblown.

Still, why do we side with those who are more concerned? In short, it’s because there are arguments we’ve found persuasive that AI could pose such an existential threat — arguments we will go through step by step below.

It’s important to recognise that the fact that many experts recognise there’s a problem doesn’t mean that everything’s OK, the experts have got it covered. Overall, we think this problem remains highly neglected, with only around 400 people working directly on the issue worldwide (more on this below).

Meanwhile, there are billions of dollars a year going into making AI more advanced.according to its annual report. We'd expect most of that to be contributing to "advancing AI capabilities" in some sense, since its main goal is building powerful, general AI systems. (Although it's important to note that DeepMind is also contributing to work in AI safety, which may be reducing existential risk.)

If DeepMind is around about 10% of the spending on advancing AI capabilities, this gives us a figure of around £10 billion. (Given that there are many AI companies in the US, and a large effort to produce advanced AI in China, we think 10% could be a good overall guess.)

As an upper bound, the total revenues of the AI sector in 2021 were around $340 billion.

So overall, we think the amount being spent to advance AI capabilities is between $1 billion and $340 billion per year. Even assuming a figure as low as $1 billion, this would still be around 100 times the amount spent on reducing risks from AI.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹¹

2. We’re making advances in AI extremely quickly

“A cat dressed as a computer programmer” as generated by Craiyon (formerly DALL-E mini) (left) and OpenAI’s DALL-E 2. (right). DALL-E mini uses a model 27 times smaller than OpenAI’s DALL-E 1 model, released in January 2021. DALL-E 2 was released in April 2022.12 billion parameter version of GPT-3, while DALL-E mini uses only 0.4 billion. Interestingly, despite better results, DALL-E 2 was smaller than DALL-E 1, using a 3.5 billion parameter model.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹²

Before we try to figure out what the future of AI might look like, it’s helpful to take a look at what AI can already do.

Modern AI techniques involve machine learning (ML): models that improve automatically through data input. The most common form of this technique used today is known as deep learning.

What is deep learning?

Machine learning techniques, in general, take some input data and produce some outputs, in a way that depends on some parameters in the model, which are learned automatically rather than being specified by programmers.

Most of the recent advances in machine learning use neural networks. A neural network transforms input data into output data by passing it through several hidden ‘layers’ of simple calculations, with each layer made up of ‘neurons.’ Each neuron receives data from the previous layer, performs some calculation based on its parameters (basically some numbers specific to that neuron), and passes the result on to the next layer.

The engineers developing the network will choose some measure of success for the network (known as a ‘loss’ or ‘objective’ function). The degree to which the network is successful (according to the measure chosen) will depend on the exact values of the parameters for each neuron on the network.

The network is then trained using a large quantity of data. By using an optimisation algorithm (most commonly stochastic gradient descent), the parameters of each neuron are gradually tweaked each time the network is tested against the data using the loss function. The optimisation algorithm will (generally) make the neural network perform slightly better each time the parameters are tweaked. Eventually, the engineers will end up with a network that performs pretty well on the measure chosen.

Deep learning refers to the use of neural networks with many layers.

To learn more, we recommend:

3Blue1Brown’s YouTube series on neural networks, an excellent video introduction
A short introduction to machine learning by Richard Ngo, a short blog post giving an overview of the topic
Machine learning for humans by Vishal Maini and Samer Sabri, a longer but accessible introduction to machine learning

ML systems today can only perform a very small portion of tasks that humans can do, and (with a few exceptions) only within narrow specialties (like playing one particular game or generating one particular kind of image).

That said, since the increasingly widespread use of deep learning in the mid-2010s, there has been huge progress in what can be achieved with ML. Here’s a brief timeline of only some of the advances we saw from 2019 to 2022:

AlphaStar, which can beat top professional players at StarCraft II (January 2019)
MuZero, a single system that learned to win games of chess, shogi, and Go — without ever being told the rules (November 2019)
GPT-3, a natural language model capable of producing high-quality text (May 2020)
GPT-f, which can solve some Maths Olympiad problems (September 2020)
AlphaFold 2, a huge step forward in solving the long-perplexing protein-folding problem (July 2021)
Codex, which can produce code for programs from natural language instructions (August 2021)
PaLM, a language model which has shown impressive capabilities to reason about things like cause and effect or explaining jokes (April 2022)
DALL-E 2 (April 2022) and Imagen (May 2022), which are both capable of generating high-quality images from written descriptions
SayCan, which takes natural language instructions and uses them to operate a robot (April 2022)
Gato, a single ML model capable of doing a huge number of different things (including playing Atari, captioning images, chatting, and stacking blocks with a real robot arm), deciding based on its context what it should output (May 2022)
Minerva can solve complex maths problems — fairly well at college level, and even better at high school maths competition level. (Minerva is far more successful than forecasters predicted in 2021.)

If you’re anything like us, you found the complexity and breadth of the tasks these systems can carry out surprising.

And if the technology keeps advancing at this pace, it seems clear there will be major effects on society. At the very least, automating tasks makes carrying out those tasks cheaper. As a result, we may see rapid increases in economic growth (perhaps even to the level we saw during the Industrial Revolution).

If we’re able to partially or fully automate scientific advancement we may see more transformative changes to society and technology.Economists call technologies that affect the entirety of an economy general purpose technologies. We're effectively claiming here that AI could be a general purpose technology (like e.g. steam power or electricity).

It's not always easy to tell what might become a general purpose technology. For example, it took 200 years for steam power to be used for anything other than pumping water out of mines.

Despite this uncertainty, economists increasingly think that AI is a pretty promising candidate for a general purpose technology, because it will have such a wide variety of effects.

It seems likely that lots of jobs could be automated. AI's ability to speed up the rate of development of new technology could have significant implications for our economy, but also poses risks by potentially allowing the development of dangerous new technology.

AI's effects on the economy could exacerbate inequality. Owners of AI-driven industries could become much richer than the rest of society — see e.g. Artificial Intelligence and Its Implications for Income Distribution and Unemployment by Korinek and Stiglitz (2017):

Inequality is one of the main challenges posed by the proliferation of artificial intelligence (AI) and other forms of worker-replacing technological progress. This paper provides a taxonomy of the associated economic issues: First, we discuss the general conditions under which new technologies such as AI may lead to a Pareto improvement. Secondly, we delineate the two main channels through which inequality is affected – the surplus arising to innovators and redistributions arising from factor price changes. Third, we provide several simple economic models to describe how policy can counter these effects, even in the case of a "singularity" where machines come to dominate human labor. Under plausible conditions, non-distortionary taxation can be levied to compensate those who otherwise might lose. Fourth, we describe the two main channels through which technological progress may lead to technological unemployment – via efficiency wage effects and as a transitional phenomenon. Lastly, we speculate on how technologies to create super-human levels of intelligence may affect inequality and on how to save humanity from the Malthusian destiny that may ensue.

AI systems are already having discriminatory impacts on marginalised groups. For example, Sweeney (2013) found that two search engines disproportionately serve ads for arrest records when people search for racially associated names. And Ali et al. (2019), on Facebook advertising:

It has been hypothesized that this process can "skew" ad delivery in ways that the advertisers do not intend, making some users less likely than others to see particular ads based on their demographic characteristics. In this paper, we demonstrate that such skewed delivery occurs on Facebook, due to market and financial optimization effects as well as the platform's own predictions about the "relevance" of ads to different groups of users. We find that both the advertiser's budget and the content of the ad each significantly contribute to the skew of Facebook's ad delivery. Critically, we observe significant skew in delivery along gender and racial lines for "real" ads for employment and housing opportunities despite neutral targeting parameters.

We're already able to produce simple autonomous weapons, and as these weapons become more complex they're going to completely change what war looks like. As we'll argue later, AI could even impact how nuclear weapons are used.

Finally, politically, many have raised concerns that automated social media algorithms are driving political polarisation. And some experts have warned that an increased ability to generate realistic videos and photos, or automating campaigns to influence people's opinions could have a significant impact on politics over the coming years.

Notable economists who hold the view that AI is likely to be a general purpose technology include Manuel Trajtenberg and Erik Brynjolfsson.

In Artificial Intelligence as the Next GPT: A Political-Economy Perspective (2019), Trajtenberg writes:

Given that AI is poised to emerge as a powerful technological force, I discuss ways to mitigate the almost unavoidable ensuing disruption, and enhance AI's vast benign potential. This is particularly important in present times, in view of political-economic considerations that were mostly absent in previous historical episodes associated with the arrival of new GPTs.

In Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics (2018), Brynjolfsson writes:

As important as specific applications of AI may be, we argue that the more important economic effects of AI, machine learning, and associated new technologies stem from the fact that they embody the characteristics of general purpose technologies (GPTs).

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹³

That could be just the beginning. We may be able to get computers to eventually automate anything humans can do. This seems like it has to be possible — at least in principle. This is because it seems that, with enough power and complexity, a computer should be able to simulate the human brain. This would itself be a way of automating anything humans can do (if not the most efficient method of doing so).

And as we’ll see in the next section, there are some indications that extensive automation may well be possible through scaling up existing techniques.

Current trends show rapid progress in the capabilities of ML systems

There are three things that are crucial to building AI through machine learning:

Good algorithms (e.g. more efficient algorithms are better)
Data to train an algorithm
Enough computational power (known as compute) to do this training

We spoke to Danny Hernandez, who (at the time) was a research scientist on the Foresight team at OpenAI. Hernandez and his team looked at how two of these inputs (compute and algorithm efficiency) are changing over time.

They found that, since 2012, the amount of compute used for training the largest AI models has been rising exponentially — doubling every 3.4 months.

That is to say, since 2012, the amount of computational power used to train our largest machine learning models has grown by over 1 billion times.

Hernandez and his team also looked at how much compute has been needed to train a neural network to have the same performance as AlexNet (an early image classification algorithm).

Let's take a look at what GPT-3 is capable of doing

OpenAI’s GPT-3, released in June 2020, was widely regarded in the media as a substantial step forward in the capabilities of deep learning systems. On its release it held the record for the largest neural network ever created, with 175 billion parameters.

At its core, GPT-3 tries to continue a block of text.

For example, we asked GPT-3 to produce a Shakespearean poem about artificial intelligence by writing “a Shakespearean poem about artificial intelligence”:¹⁴

O for a mind that could unravel the code
And see the light within the dark machine
That powered the world with its nimble thoughts
And lit up the dark with its fiery dreams!

O for a mind that could understand
The will of the computer, and know its mind
And learn its secrets, and see its plan
And help it fulfill its great design!

But GPT-3 can do a lot more than that. Here are some of the most impressive things we’ve seen based on GPT-3:LaMDA, one of Google Brain's large language models, released in May 2022.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹⁵

GPT-3 can correctly answer some medical questions.
Software built using GPT-3 can take a description of a simple website layout and write the code to generate it.
A GPT-3–based spreadsheet function can predict complex inputs into cells.
GPT-3 can play chess (although not very well).
GPT-3 co-wrote a 200-page fantasy novel.
Nine philosophers wrote essays on the implications of GPT-3 — to which GPT-3 wrote this response.

And that’s just GPT-3. In March 2023, OpenAI released GPT-4, a far more capable model.

They found that the amount of compute required for the same performance has been falling exponentially — halving every 16 months.

So since 2012, the amount of compute required for the same level of performance has fallen by over 100 times. Combined with the increased compute used, that’s a lot of growth.More recent work seems to support the idea of exponential growth in compute, but claims it's actually happening slightly slower than OpenAI's analysis suggested. There has also been experimental work looking at how performance scales with relevant factors like compute and model size (rather than how this performance is changing month-on-month) that supports these exponential growth predictions.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹⁶

It’s hard to say whether these trends will continue, but they speak to incredible gains over the past decade in what it’s possible to do with machine learning.

Indeed, it looks like increasing the size of models (and the amount of compute used to train them) introduces ever more sophisticated behaviour. This is how things like GPT-3 are able to perform tasks they weren’t specifically trained for.

These observations have led to the scaling hypothesis: that we can simply build bigger and bigger neural networks, and as a result we will end up with more and more powerful artificial intelligence, and that this trend of increasing capabilities may increase to human-level AI and beyond.

If this is true, we can attempt to predict how the capabilities of AI technology will increase over time simply by looking at how quickly we are increasing the amount of compute available to train models.

But as we’ll see, it’s not just the scaling hypothesis that suggests we could end up with extremely powerful AI relatively soon — other methods of predicting AI progress come to similar conclusions.

When can we expect transformative AI?

It’s difficult to predict exactly when we will develop AI that we expect to be hugely transformative for society (for better or for worse) — for example, by automating all human work or drastically changing the structure of society.

Karnofsky (2021) uses "AI powerful enough to bring us into a new, qualitatively different future." (Or as he put it in 2016, "roughly and conceptually, transformative AI is AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.")
Cotra (2020) uses a similar definition. In addition, Cotra writes: "How large is an impact "as profound as the Industrial Revolution"? Roughly speaking, over the course of the Industrial Revolution, the rate of growth in gross world product (GWP) went from about ~0.1% per year before 1700 to ~1% per year after 1850, a tenfold acceleration. By analogy, I think of "transformative AI" as software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it)."
Davidson (2021) predicts timelines to "artificial general intelligence (AGI)" rather than transformative AI. He defines AGI as "computer program(s) that can perform virtually any cognitive task as well as any human, for no more money than it would cost for a human to do it." Notably, this seems sufficient (but not necessary) to reach the sorts of rapid economic changes implied by the previous two definitions.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹⁷ But here we’ll go through a few approaches.

One option is to survey experts. Data from the 2019 survey of 300 AI experts implies that there is 20% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2060, and 85% by 2100.2022 survey by Stein-Perlman et al.: approximately 50% by 2059.

2016 survey by Grace et al.: approximately 25% by 2036, 50% by 2060, and 70% by 2100.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹⁸ There are a lot of reasons to be suspicious of these estimates,notes on her blog that the framing of questions noticeably changes the answers given:

People consistently give later forecasts if you ask them for the probability in N years instead of the year that the probability is M. We saw this in the straightforward HLMI [high-level machine intelligence] question, and most of the tasks and occupations, and also in most of these things when we tested them on mturk people earlier. For HLMI for instance, if you ask when there will be a 50% chance of HLMI you get a median answer of 40 years, yet if you ask what the probability of HLMI is in 40 years, you get a median answer of 30%.

Our interview with Katja goes into more detail on the possible limitations of the 2016 survey.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴ but we take it as one data point.

Ajeya Cotra (a researcher at Open Philanthropy) attempted to forecast transformative AI by comparing modern deep learning to the human brain. Deep learning involves using a huge amount of compute to train a model, before that model is able to perform some task. There’s also a relationship between the amount of compute used to train a model and the amount used by the model when it’s run. And — if the scaling hypothesis is true — we should expect the performance of a model to predictably improve as the computational power used increases. So Cotra used a variety of approaches (including, for example, estimating how much compute the human brain uses on a variety of tasks) to estimate how much compute might be needed to train a model that, when run, could carry out the hardest tasks humans can do. She then estimated when using that much compute would be affordable.

Cotra’s 2022 update on her report’s conclusions estimates that there is a 35% probability of transformative AI by 2036, 50% by 2040, and 60% by 2050 — noting that these guesses are not stable.¹⁹

Tom Davidson (also a researcher at Open Philanthropy) wrote a report to complement Cotra’s work. He attempted to figure out when we might expect to see transformative AI based only on looking at various types of research that transformative AI might be like (e.g. developing technology that’s the ultimate goal of a STEM field, or proving difficult mathematical conjectures), and how long it’s taken for each of these kinds of research to be completed in the past, given some quantity of research funding and effort.

Davidson’s report estimates that, solely on this information, you’d think that there was an 8% chance of transformative AI by 2036, 13% by 2060, and 20% by 2100. However, Davidson doesn’t consider the actual ways in which AI has progressed since research started in the 1950s, and notes that it seems likely that the amount of effort we put into AI research will increase as AI becomes increasingly relevant to our economy. As a result, Davidson expects these numbers to be underestimates.

Holden Karnofsky, co-CEO of Open Philanthropy, attempted to sum up the findings of all of the approaches above. He guesses there is more than a 10% chance we’ll see transformative AI by 2036(!), 50% by 2060, and 66% by 2100. And these guesses might be conservative, since they didn’t incorporate what we see as faster-than-expected progress since the estimates were made.

Method	Chance of transformative AI by 2036	Chance of transformative AI by 2060	Chance of transformative AI by 2100
Expert survey (Zhang et al., 2022)	20%	50%	85%
Biological anchors (Cotra, 2022)	35%	60% (by 2050)	80% (according to the 2020 report)
Semi-informative priors (Davidson, 2021)	8%	13%	20%
Overall guess (Karnofsky, 2021)	10%	50%	66%

All in all, AI seems to be advancing rapidly. More money and talent is going into the field every year, and models are getting bigger and more efficient.

Even if AI were advancing more slowly, we’d be concerned about it — most of the arguments about the risks from AI (that we’ll get to below) do not depend on this rapid progress.

However, the speed of these recent advances increases the urgency of the issue.

(It’s totally possible that these estimates are wrong – below, we discuss how the possibility that we might have a lot of time to work on this problem is one of the best arguments against this problem being pressing).

3. Power-seeking AI could pose an existential threat to humanity

We’ve argued so far that we expect AI to be an important — and potentially transformative — new technology.

We’ve also seen reason to think that such transformative AI systems could be built this century.

Now we’ll turn to the core question: why do we think this matters so much?

There could be a lot of reasons. If advanced AI is as transformative as it seems like it’ll be, there will be many important consequences. But here we are going to explain the issue that seems most concerning to us: AI systems could pose risks by seeking and gaining power.

We’ll argue that:

It’s likely that we’ll build AI systems that can make and execute plans to achieve goals
Advanced planning systems could easily be ‘misaligned’ — in a way that could lead them to make plans that involve disempowering humanity
Disempowerment by AI systems would be an existential catastrophe
People might deploy AI systems that are misaligned, despite this risk

Thinking through each step, I think there’s something like a 1% chance of an existential catastrophe resulting from power-seeking AI systems this century. This is my all things considered guess at the risk incorporating considerations of the argument in favour of the risk (which is itself probabilistic), as well as reasons why this argument might be wrong (some of which I discuss below). This puts me on the less worried end of 80,000 Hours staff, whose views on our last staff survey ranged from 1–55%, with a median of 15%.

It’s likely we’ll build advanced planning systems

We’re going to argue that future systems with the following three properties might pose a particularly important threat to humanity:draft report into existential risks from AI, Section 2.1: Three key properties.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²⁰

They have goals and are good at making plans.

Not all AI systems have goals or make plans to achieve those goals. But some systems (like some chess-playing AI systems) can be thought of in this way. When discussing power-seeking AI, we’re considering planning systems that are relatively advanced, with plans that are in pursuit of some goal(s), and that are capable of carrying out those plans.
They have excellent strategic awareness.

A particularly good planning system would have a good enough understanding of the world to notice obstacles and opportunities that may help or hinder its plans, and respond to these accordingly. Following Carlsmith, we’ll call this strategic awareness, since it allows systems to strategise in a more sophisticated way.
They have highly advanced capabilities relative to today’s systems.

For these systems to actually affect the world, we need them to not just make plans, but also be good at all the specific tasks required to execute those plans.

Since we’re worried about systems attempting to take power from humanity, we are particularly concerned about AI systems that might be better than humans on one or more tasks that grant people significant power when carried out well in today’s world.

For example, people who are very good at persuasion and/or manipulation are often able to gain power — so an AI being good at these things might also be able to gain power. Other examples might include hacking into other systems, tasks within scientific and engineering research, as well as business, military, or political strategy.

These systems seem technically possible and we’ll have strong incentives to build them

As we saw above, we’ve already produced systems that are very good at carrying out specific tasks.

We’ve also already produced rudimentary planning systems, like AlphaStar, which skilfully plays the strategy game Starcraft, and MuZero, which plays chess, shogi, and Go.write:

For many years, researchers have sought methods that can both learn a model that explains their environment, and can then use that model to plan the best course of action. Until now, most approaches have struggled to plan effectively in domains, such as Atari, where the rules or dynamics are typically unknown and complex.

MuZero, first introduced in a preliminary paper in 2019, solves this problem by learning a model that focuses only on the most important aspects of the environment for planning. By combining this model with AlphaZero's powerful lookahead tree search, MuZero set a new state of the art result on the Atari benchmark, while simultaneously matching the performance of AlphaZero in the classic planning challenges of Go, chess and shogi. In doing so, MuZero demonstrates a significant leap forward in the capabilities of reinforcement learning algorithms.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²¹

We’re not sure whether these systems are producing plans in pursuit of goals per se, because we’re not sure exactly what it means to “have goals.” However, since they consistently plan in ways that achieve goals, it seems like they have goals in some sense.

Moreover, some existing systems seem to actually represent goals as part of their neural networks.Jaderberg et al. developed deep reinforcement learning agents to play games of Quake III Capture The Flag — and identified "particular neurons that code directly for some of the most important game states, such as a neuron that activates when the agent's flag is taken" — indicating they can identify states of the game that they value the most (and then plan and act to achieve those states). This sounds pretty similar to "having goals" to us.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²²

That said, planning in the real world (instead of games) is much more complex, and to date we’re not aware of any unambiguous examples of goal-directed planning systems, or systems that exhibit high degrees of strategic awareness.

But as we’ve discussed, we expect to see further advances within this century. And we think these advances are likely to produce systems with all three of the above properties.

That’s because we think that there are particularly strong incentives (like profit) to develop these kinds of systems. In short: because being able to plan to achieve a goal, and execute that plan, seems like a particularly powerful and general way of affecting the world.

Getting things done — whether that’s a company selling products, a person buying a house, or a government developing policy — almost always seems to require these skills. One example would be assigning a powerful system a goal and expecting the system to achieve it — rather than having to guide it every step of the way. So planning systems seem likely to be (economically and politically) extremely useful.²³

And if systems are extremely useful, there are likely to be big incentives to build them. For example, an AI that could plan the actions of a company by being given the goal to increase its profits (that is, an AI CEO) would likely provide significant wealth for the people involved — a direct incentive to produce such an AI.

As a result, if we can build systems with these properties (and from what we know, it seems like we will be able to), it seems like we are likely to do so.Carlsmith section 3 gives two other reasons why we might expect these kinds of advanced, strategically aware planning systems to be built:

It may be easier to produce these kinds of systems. For example, the best way to automate many tasks may be to create systems that can learn new tasks (instead of separately automating each task). And perhaps the best way to create systems that can learn new tasks is to create a planning system that has a high level understanding of how the world in general works, and then fine-tuning this system on specific tasks.
We may find that planning is difficult to avoid as we create more sophisticated systems. For example, some have argued that being an excellent planner (and having the advanced capabilities to carry out any plans created) is the best way of achieving any task. If that's true, then as we optimise our systems we should expect them to (once we've optimised hard enough) become good at planning.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²⁴

Advanced planning systems could easily be dangerously ‘misaligned’

There are reasons to think that these kinds of advanced planning AI systems will be misaligned. That is, they will aim to do things that we don’t want them to do.Shapiro & Shachter, 2002).

An AI is aligned if it acts in the interests of humans (Soares & Fallenstein, 2015).

An AI is "intent aligned" if it is trying to do what its operator wants it to do (Christiano, 2018).

An AI is "impact aligned" (with humans) if it doesn't take actions that we would judge to be bad/problematic/dangerous/catastrophic, and "intent aligned" if the optimal policy for its behavioural objective is impact aligned with humans (Hubinger, 2020).

An AI is "intent aligned" if it is trying to do, or "impact aligned" if it is succeeding in doing what a human person or institution wants it to do (Critch, 2020).

An AI is "fully aligned" if it does not engage in unintended behaviour (specifically, unintended behaviour that arises in virtue of problems with the system's objectives) in response to any inputs compatible with basic physical conditions of our universe (Carlsmith, 2022).

The term "aligned" is also often used to refer to the goals of a system, in the sense that an AI's goals are aligned if they will produce the same actions from the AI that would occur if the AI shared the goals of some other entity (e.g. its user or operator).

We use alignment here to refer to systems, rather than goals. Our definition is most similar to the definitions of "intent" alignment given by Christiano and Critch, and is similar to the definition of "full" alignment given by Carlsmith.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²⁵

There are many reasons why systems might not be aiming to do exactly what we want them to do. For one thing, we don’t know how, using modern ML techniques, to give systems the precise goals we want (more here).later. This has two implications:

It's hard to ensure that systems are trying to do what we want them to do, which means it's hard to make systems aligned.
It's hard to correct systems when we think that problems with their objectives could have particularly bad consequences.

As we'll argue, we think problems with AI systems' objectives could have particularly bad consequences.

Ajeya Cotra, a researcher at Open Philanthropy has written about why we might expect AI alignment to be hard with modern deep learning. We'd recommend this post for people new to ML, and this for those more familiar with ML.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²⁶

We’re going to focus specifically on some reasons why systems might by default be misaligned in such a way that they develop plans that pose risks to humanity’s ability to influence the world — even when we don’t want that influence to be lost.later. However, we should note that this doesn't seem fundamentally true of all cases where things gain power, because in some cases power can be used to produce good outcomes (e.g. often people attempting to do good in the world will try to win elections). With AI systems, as we'll argue, we're really not sure how to ensure those outcomes would be good.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²⁷

What do we mean by “by default”? Essentially, unless we actively find solutions to some (potentially quite difficult) problems, then it seems like we’ll create dangerously misaligned AI. (There are reasons this might be wrong — which we discuss later.)

Three examples of “misalignment” in a variety of systems

It’s worth noting that misalignment isn’t a purely theoretical possibility (or specific to AI) — we see misaligned goals in humans and institutions all the time, and have also seen examples of misalignment in AI systems.meat and dairy farmers are selling their animals and concentrating on growing plants instead because of concerns about the moral value of animals.)

Misaligned AI systems (especially those with advanced capabilities, doing things more than moving around a simulated robot arm) won't necessarily have these tempering human instincts, and could have a lot more power.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²⁸

Example 1: Winning elections

The democratic political framework is intended to ensure that politicians make decisions that benefit society. But what political systems actually reward is winning elections, so that’s what many politicians end up aiming for.

This is a decent proxy goal — if you have a plan to improve people’s lives, they’re probably more likely to vote for you — but it isn’t perfect. As a result, politicians do things that aren’t clearly the best way of running a country, like raising taxes at the start of their term and cutting them right before elections.

That is to say, the things the system does are at least a little different from what we would, in a perfect world, want it to do: the system is misaligned.

Example 2: The profit incentive

Companies have profit-making incentives. By producing more, and therefore helping people obtain goods and services at cheaper prices, companies make more money.

This is sometimes a decent proxy for making the world better, but profit isn’t actually the same as the good of all of humanity (bold claim, we know). As a result, there are negative externalities: for example, companies will pollute to make money despite this being worse for society overall.

Again, we have a misaligned system, where the things the system does are at least a little different from what we would want it to do.

Example 3: Specification gaming in existing AI systems

DeepMind has documented examples of specification gaming: an AI doing well according to its specified reward function (which encodes our intentions for the system), but not doing what researchers intended.

In one example, a robot arm was asked to grasp a ball. But the reward was specified in terms of whether humans thought the robot had been successful. As a result, the arm learned to hover between the ball and the camera, fooling the humans into thinking that it had grasped the ball.original paper), but one possibility is that the animation is showing the deployed system's attempts to grasp the ball, rather than the data used to train the system.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">²⁹

Source: Christiano et al., 2017

So we know it’s possible to create a misaligned AI system.

Why these systems could (by default) be dangerously misaligned

Here’s the core argument of this article. We’ll use all three properties from earlier: planning ability, strategic awareness, and advanced capabilities.

To start, we should realise that a planning system that has a goal will also develop ‘instrumental goals’: things that, if they occur, will make it easier to achieve an overall goal.

We use instrumental goals in plans all the time. For example, a high schooler planning their career might think that getting into university will be helpful for their future job prospects. In this case, “getting into university” would be an instrumental goal.

A sufficiently advanced AI planning system would also include instrumental goals in its overall plans.

If a planning AI system also has enough strategic awareness, it will be able to identify facts about the real world (including potential things that would be obstacles to any plans), and plan in light of them. Crucially, these facts would include that access to resources (e.g. money, compute, influence) and greater capabilities — that is, forms of power — open up new, more effective ways of achieving goals.

This means that, by default, advanced planning AI systems would have some worrying instrumental goals:

Self-preservation — because a system is more likely to achieve its goals if it is still around to pursue them (in Stuart Russell’s memorable phrase, “You can’t fetch the coffee if you’re dead”).
Preventing any changes to the AI system’s goals — since changing its goals would lead to outcomes that are different from those it would achieve with its current goals.
Gaining power — for example, by getting more resources and greater capabilities.

Crucially, one clear way in which the AI can ensure that it will continue to exist (and not be turned off), and that its objectives will never be changed, would be to gain power over the humans who might affect it (we talk here about how AI systems might actually be able to do that).

What’s more, the AI systems we’re considering have advanced capabilities — meaning they can do one or more tasks that grant people significant power when carried out well in today’s world. With such advanced capabilities, these instrumental goals will not be out of reach, and as a result, it seems like the AI system would use its advanced capabilities to get power as part of the plan’s execution. If we don’t want the AI systems we create to take power away from us this would be a particularly dangerous form of misalignment.

In the most extreme scenarios, a planning AI system with sufficiently advanced capabilities could successfully disempower us completely.

As a (very non-rigorous) intuitive check on this argument, let’s try to apply it to humans.

Humans have a variety of goals. For many of these goals, some form of power-seeking is advantageous: though not everyone seeks power, many people do (in the form of wealth or social or political status), because it’s useful for getting what they want. This is not catastrophic (usually!) because, as human beings:

We generally feel bound by human norms and morality (even people who really want wealth usually aren’t willing to kill to get it).
We aren’t that much more capable or intelligent than one another. So even in cases where people aren’t held back by morality, they’re not able to take over the world.

(We discuss whether humans are truly power-seeking later.)

A sufficiently advanced AI wouldn’t have those limitations.

It might be hard to find ways to prevent this sort of misalignment

The point of all this isn’t to say that any advanced planning AI system will necessarily attempt to seek power. Instead, it’s to point out that, unless we find a way to design systems that don’t have this flaw, we’ll face significant risk.

It seems more than plausible that we could create an AI system that isn’t misaligned in this way, and thereby prevent any disempowerment. Here are some strategies we might take (plus, unfortunately, some reasons why they might be difficult in practice):report into existential risks from power-seeking AI.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³⁰

Control the objectives of the AI system. We may be able to design systems that simply don’t have objectives to which the above argument applies — and thus don’t incentivise power-seeking behaviour. For example, we could find ways to explicitly instruct AI systems not to harm humans, or find ways to reward AI systems (in training environments) for not engaging in specific kinds of power-seeking behaviour (and also find ways to ensure that this behaviour continues outside the training environment).

Carlsmith gives two reasons why doing this seems particularly hard.

First, for modern ML systems, we don’t get to explicitly state a system’s objectives — instead we reward (or punish) a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is goal misgeneralisation. Researchers have uncovered real examples of systems that appear to have learned to pursue a goal in the training environment, but then fail to generalise that goal when they operate in a new environment. This raises the possibility that we could think we’ve successfully trained an AI system not to seek power — but that the system would seek power anyway when deployed in the real world.report into existential risks from power-seeking AI.
" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³¹

Second, when we specify a goal to an AI system (or, when we can’t explicitly do that, when we find ways to reward or punish a system during training), we usually do this by giving the system a proxy by which outcomes can be measured (e.g. positive human feedback on a system’s achievement). But often those proxies don’t quite work.report into existential risks from power-seeking AI.
" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³² In general, we might expect that even if a proxy appears to correlate well with successful outcomes, it might not do so when that proxy is optimised for. (The examples above of politicians, companies, and the robot arm failing to grasp a ball are illustrations of this.) We’ll look at a more specific example of how problems with proxies could lead to an existential catastrophe here.

For more on the specific difficulty of controlling the objectives given to deep neural networks trained using self-supervised learning and reinforcement learning, we recommend OpenAI governance researcher Richard Ngo’s discussion of how realistic training processes lead to the development of misaligned goals.
Control the inputs into the AI system. AI systems will only develop plans to seek power if they have enough information about the world to realise that seeking power is indeed a way to achieve its goals.
Control the capabilities of the AI system. AI systems will likely only be able to carry out plans to seek power if they have sufficiently advanced capabilities in skills that grant people significant power in today’s world.

But to make any strategy work, it will need to both:

Retain the usefulness of the AI systems — and so remain economically competitive with less safe systems. Controlling the inputs and capabilities of AI systems will clearly have costs, so it seems hard to ensure that these controls, even if they’re developed, are actually used. But this is also a problem for controlling a system’s objectives. For example, we may be able to prevent power-seeking behaviour by ensuring that AI systems stop to check in with humans about any decisions they make. But these systems might be significantly slower and less immediately useful to people than systems that don’t stop to carry out these checks. As a result, there might still be incentives to use a faster, more initially effective misaligned system (we’ll look at incentives more in the next section).
Continue to work as the planning ability and strategic awareness of systems improve over time. Some seemingly simple solutions (for example, trying to give a system a long list of things it isn’t allowed to do, like stealing money or physically harming humans) break down as the planning abilities of the systems increase. This is because, the more capable a system is at developing plans, the more likely it is to identify loopholes or failures in the safety strategy — and as a result, the more likely the system is to develop a plan that involves power-seeking.

Ultimately, by looking at the state of the research on this topic, and speaking to experts in the field, we think that there are currently no known ways of building aligned AI systems that seem likely to fulfil both these criteria.

So: that’s the core argument. There are many variants of this argument. Some have argued that AI systems might gradually shape our future via subtler forms of influence that nonetheless could amount to an existential catastrophe; others argue that the most likely form of disempowerment is in fact just killing everyone. We’re not sure how a catastrophe would be most likely to play out, but have tried to articulate the heart of the argument, as we see it: that AI presents an existential risk.

There are definitely reasons this argument might not be right! We go through some of the reasons that seem strongest to us below. But overall it seems possible that, for at least some kinds of advanced planning AI systems, it will be harder to build systems that don’t seek power in this dangerous way than to build systems that do.

At this point, you may have questions like:

Why can’t we just unplug a dangerous AI?
Surely a truly intelligent AI system would know not to disempower everyone?
Couldn’t we just ‘sandbox’ any potentially dangerous AI system until we know it’s safe?

We think there are good responses to all these questions, so we’ve added a long list of arguments against working on AI risk — and our responses — for these (and other) questions below.

Disempowerment by AI systems would be an existential catastrophe

When we say we’re concerned about existential catastrophes, we’re not just concerned about risks of extinction. This is because the source of our concern is rooted in longtermism: the idea that the lives of all future generations matter, and so it’s extremely important to protect their interests.

This means that any event that could prevent all future generations from living lives full of whatever you think makes life valuable (whether that’s happiness, justice, beauty, or general flourishing) counts as an existential catastrophe.

It seems extremely unlikely that we’d be able to regain power over a system that successfully disempowers humanity. And as a result, the entirety of the future — everything that happens for Earth-originating life, for the rest of time — would be determined by the goals of systems that, although built by us, are not aligned with us. Perhaps those goals will create a long and flourishing future, but we see little reason for confidence.³³

This isn’t to say that we don’t think AI also poses a risk of human extinction. Indeed, we think making humans extinct is one highly plausible way in which an AI system could completely and permanently ensure that we are never able to regain power.

People might deploy misaligned AI systems despite the risk

Surely no one would actually build or use a misaligned AI if they knew it could have such terrible consequences, right?

Unfortunately, there are at least two reasons people might create and then deploy misaligned AI — which we’ll go through one at a time:draft report into existential risks from AI.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³⁴

1. People might think it’s aligned when it’s not

Imagine there’s a group of researchers trying to tell, in a test environment, whether a system they’ve built is aligned. We’ve argued that an intelligent planning AI will want to improve its abilities to effect changes in pursuit of its objective, and it’s almost always easier to do that if it’s deployed in the real world, where a much wider range of actions are available. As a result, any misaligned AI that’s sophisticated enough will try to understand what the researchers want it to do and at least pretend to be doing that, deceiving the researchers into thinking it’s aligned. (For example, a reinforcement learning system might be rewarded for certain apparent behaviour during training, regardless of what it’s actually doing.)

Hopefully, we’ll be aware of this sort of behaviour and be able to detect it. But catching a sufficiently advanced AI in deception seems potentially harder than catching a human in a lie, which isn’t always easy. For example, a sufficiently intelligent deceptive AI system may be able to deceive us into thinking we’ve solved the problem of AI deception, even if we haven’t.

If AI systems are good at deception, and have sufficiently advanced capabilities, a reasonable strategy for such a system could be to deceive humans completely until the system has a way to guarantee it can overcome any resistance to its goals.

2. There are incentives to deploy systems sooner rather than later

We might also expect some people with the ability to deploy a misaligned AI to charge ahead despite any warning signs of misalignment that do come up, because of race dynamics — where people developing AI want to do so before anyone else.

For example, if you’re developing an AI to improve military or political strategy, it’s much more useful if none of your rivals have a similarly powerful AI.

These incentives apply even to people attempting to build an AI in the hopes of using it to make the world a better place.

For example, say you’ve spent years and years researching and developing a powerful AI system, and all you want is to use it to make the world a better place. Simplifying things a lot, say there are two possibilities:

This powerful AI will be aligned with your beneficent aims, and you’ll transform society in a potentially radically positive way.
The AI will be sufficiently misaligned that it’ll take power and permanently end humanity’s control over the future.

Let’s say you think there’s a 90% chance that you’ve succeeded in building an aligned AI. But technology often develops at similar speeds across society, so there’s a good chance that someone else will soon also develop a powerful AI. And you think they’re less cautious, or less altruistic, so you think their AI will only have an 80% chance of being aligned with good goals, and pose a 20% chance of existential catastrophe. And only if you get there first can your more beneficial AI be dominant. As a result, you might decide to go ahead with deploying your AI, accepting the 10% risk.

This all sounds very abstract. What could an existential catastrophe caused by AI actually look like?

The argument we’ve given so far is very general, and doesn’t really look at the specifics of how an AI that is attempting to seek power might actually do so.

If you’d like to get a better understanding of what an existential catastrophe caused by AI might actually look like, we’ve written a short separate article on that topic. If you’re happy with the high-level abstract arguments so far, feel free to skip to the next section!

What could an existential AI catastrophe actually look like?

4. Even if we find a way to avoid power-seeking, there are still risks

So far we’ve described what a large proportion of researchers in the field2020 survey asked researchers working on reducing existential risks from AI what risks they were most concerned about. The surveyors asked about five sources of existential risk:

Risks from superintelligent AI (similar to the scenario we've described here)
Risks from influence-seeking behaviour
Risks from AI systems pursuing easy-to-measure goals (similar to the scenario we've described here)
AI-exacerbated war
Other intentional misuse of AI not related to war

Approximately, the researchers surveyed were equally concerned with all of these risks. The first three are covered by the section in this article on risks from power-seeking AI while the last two are covered by the section on other risks. If these groupings make sense (which we think they do), this means it's roughly the case that at the time of the survey, researchers were three times as concerned about the broad risk of power-seeking AI than they were about risks from either war or other misuse separately.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">² think is the major existential risk from potential advances in AI, which depends crucially on an AI seeking power to achieve its goals.

If we can prevent power-seeking behaviour, we will have reduced existential risk substantially.

But even if we succeed, there are still existential risks that AI could pose.

AI could worsen war

We’re concerned that great power conflict could also pose a substantial threat to our world, and advances in AI seem likely to change the nature of war — through lethal autonomous weaponsalready exist.

For more information, see:

Risks from Autonomous Weapon Systems and Military AI, an overview of attempts to reduce risks from lethal autonomous weapons.
On AI Weapons, a presentation of the argument that lethal autonomous weapons are, on balance, more good than bad.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³⁵ or through automated decision making.Machine learning, artificial intelligence, and the use of force by states, by Deeks et al. (2019).

AI and International Stability: Risks and Confidence-Building Measures, by Horowitz and Scharre (2021).

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³⁶

In some cases, great power war could pose an existential threat — for example, if the conflict is nuclear. It’s possible that AI could exacerbate risks of nuclear escalation, although there are also reasons to think AI could decrease this risk.This is because the current dominant nuclear deterrence strategy of 'mutually assured destruction' relies on symmetry between the abilities of nuclear powers, so that the threat of a nuclear response to a first strike is believable. Advances in AI which could be directly applied to nuclear forces could create asymmetries in the capabilities of nuclear-armed nations. This could include improving early warning systems, air defence systems, and cyberattacks that disable weapons.

For example, many countries use submarine-launched ballistic missiles as part of their nuclear deterrence systems — the idea is that if nuclear weapons can be hidden under the ocean, they will never be destroyed in the first strike. This means that they can always be used for a counterattack, and therefore act as an effective deterrent against first strikes. But AI could make it far easier to detect submarines underwater, making it possible to destroy submarines on a first strike — removing this deterrent.

A report from the Stockholm International Peace Research Institute found that, while AI could potentially also have stabilising effects (for example by making everyone feel more vulnerable, decreasing the chances of escalation), we could see destabilising effects even before advances in AI are actually deployed. This is because one state's belief that their opponents have new nuclear capabilities can be enough to disrupt the delicate balance of deterrence.

Luckily, there are also plausible ways in which AI could help prevent the use of nuclear weapons — for example, by improving the ability of states to detect nuclear launches, reducing the chances of false alarms like those that nearly caused nuclear war in 1983.

So, overall, we're uncertain about whether AI will substantially increase the risk of nuclear conflict in the short term.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³⁷

Finally, if a single actor produces particularly powerful AI systems, this could be seen as giving them a decisive strategic advantage. For example, the US may produce a planning AI that’s intelligent enough to ensure that Russia or China could never successfully launch another nuclear weapon. This could incentivise a first strike from the actor’s rivals before these AI-developed plans can ever be put into action.

AI could be used to develop dangerous new technology

We expect that AI systems will help increase the rate of scientific progress.Elicit). If AI systems replace some jobs, or speed up economic growth, we'll see more resources able to be dedicated to scientific advancement. And if we're successful at developing particularly capable AI systems, we could see parts of the scientific process being automated completely.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³⁸

While there would be clear benefits to this automation — the rapid development of new medicine, for example — some forms of technological development can pose threats, including existential threats, to humanity. This could be through biotechnologyUrbina et al. (2022) developed a computational proof that existing AI technologies for drug discovery could be misused to design biochemical weapons.

Also see:

O'Brien and Nelson (2020):

Within the realm of synthetic biology, AI could potentially lower some of the barriers for a malicious actor to design dangerous pathogens with custom features.

Turchin and Denkenberger (2020), section 3.2.3.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">³⁹ (see our article on preventing catastrophic pandemics for more) or through some other form of currently unknown but dangerous technology. " rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴⁰

AI could empower totalitarian governments

An AI-enabled authoritarian government could completely automate the monitoring and repression of its citizens, as well as significantly influence the information people see, perhaps making it impossible to coordinate action against such a regime.AI is already facilitating the ability of governments to monitor their own citizens.

The NSA is using AI to help filter the huge amounts of data they collect, significantly speeding up their ability to identify and predict the actions of people they are monitoring. China is increasingly using facial recognition and predictive policing, including automated racial profiling and automatic alarms when people classified as potential threats enter certain public places.

These sorts of surveillance technologies look like they are going to significantly improve — and in doing so, significantly increase the ability for governments to control their populations.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴¹

If this became a form of truly stable totalitarianism, this could make people’s lives far worse for extremely long periods of time, making it a particularly scary possible scenario resulting from AI.

Other risks from AI

We’re also concerned about the following issues, though we know less about them:

Existential threats that result not from the power-seeking behaviour of AI systems, but as a result of the interaction between AI systems. (In order to pose a risk, these systems would still need to be, to some extent, misaligned.)
Other ways we haven’t thought of in which AI systems could be misused — especially ones that might significantly affect future generations.
Other moral mistakes made in the design and use of AI systems, particularly if future AI systems are themselves deserving of moral consideration. For example, perhaps we will (inadvertently) create conscious AI systems, which could then suffer in huge numbers. We think this could be extremely important, so we’ve written about it in a separate problem profile.

This is a really difficult question to answer.

There are no past examples we can use to determine the frequency of AI-related catastrophes.

All we have to go off are arguments (like the ones we’ve given above), and less relevant data like the history of technological advances. And we’re definitely not certain that the arguments we’ve presented are completely correct.

Consider the argument we gave earlier about the dangers of power-seeking AI in particular, based off Carlsmith’s report. At the end of his report, Carlsmith gives some rough guesses of the chances that each stage of his argument is correct (conditional on the previous stage being correct):

By 2070 it will be possible and financially feasible to build strategically aware systems that can outperform humans on many power-granting tasks, and that can successfully make and carry out plans: Carlsmith guesses there’s a 65% chance of this being true.
Given this feasibility, there will be strong incentives to build such systems: 80%.
Given both the feasibility and incentives to build such systems, it will be much harder to develop aligned systems that don’t seek power than to develop misaligned systems that do, but which are at least superficially attractive to deploy: 40%.
Given all of this, some deployed systems will seek power in a misaligned way that causes over $1 trillion (in 2021 dollars) of damage: 65%.
Given all the previous premises, misaligned power-seeking AI systems will end up disempowering basically all of humanity: 40%.
Given all the previous premises, this disempowerment will constitute an existential catastrophe: 95%.

Multiplying these numbers together, Carlsmith estimated that there’s a 5% chance that his argument is right and there will be an existential catastrophe from misaligned power-seeking AI by 2070. When we spoke to Carlsmith, he noted that in the year between the writing of his report and the publication of this article, his overall guess at the chance of an existential catastrophe from power-seeking AI by 2070 had increased to >10%.critique Carlsmith's report and give their own estimates of the existential risk from power-seeking AI. The estimates given of existential risk from power-seeking AI by 2070 were: Aschenbrenner: 0.5%, Garfinkel: 0.4%, Kokotajlo: 65%, Nanda: 9%, Soares: >77%, Tarsney: 3.5%, Thorstad: 0.000002%, Wallace: 2%.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴²

The overall probability of existential catastrophe from AI would, in Carlsmith’s view, be higher than this, because there are other routes to possible catastrophe — like those discussed in the previous section — although our guess is that these other routes are probably a lot less likely to lead to existential catastrophe.

For another estimate, in The Precipice, philosopher and advisor to 80,000 Hours Toby Ord estimated a 1-in-6 risk of existential catastrophe by 2120 (from any cause), and that 60% of this risk comes from misaligned AI — giving a total of a 10% risk of existential catastrophe from misaligned AI by 2120.

A 2021 survey of 44 researchers working on reducing existential risks from AI found the median risk estimate was 32.5% — the highest answer given was 98%, and the lowest was 2%.⁴³ There’s obviously a lot of selection bias here: people choose to work on reducing risks from AI because they think this is unusually important, so we should expect estimates from this survey to be substantially higher than estimates from other sources. But there’s clearly significant uncertainty about how big this risk is, and huge variation in answers.

All these numbers are shockingly, disturbingly high. We’re far from certain that all the arguments are correct. But these are generally the highest guesses for the level of existential risk of any of the issues we’ve examined (like engineered pandemics, great power conflict, climate change, or nuclear war).

That said, I think there are reasons why it’s harder to make guesses about the risks from AI than other risks – and possibly reasons to think that the estimates we’ve quoted above are systematically too high.

If I was forced to put a number on it, I’d say something like 1%. This number includes considerations both in favour and against the argument. I’m less worried than other 80,000 Hours staff — our position as an organisation is that the risk is between 3% and 50%.

All this said, the arguments for such high estimates of the existential risk posed by AI are persuasive — making risks from AI a top contender for the most pressing problem facing humanity.

5. We can tackle these risks

We think one of the most important things you can do would be to help reduce the gravest risks that AI poses.

This isn’t just because we think these risks are high — it’s also because we think there are real things we can do to reduce these risks.

We know of two broad approaches:

Technical AI safety research
AI governance research and implementation

For both of these, there are lots of ways to contribute. We’ll go through them in more detail below, but in this section we want to illustrate the point that there are things we can do to address these risks.

Technical AI safety research

The benefits of transformative AI could be huge, and there are many different actors involved (operating in different countries), which means it will likely be really hard to prevent its development altogether.

(It’s also possible that it wouldn’t even be a good idea if we could — after all, that would mean forgoing the benefits as well as preventing the risks.)

As a result, we think it makes more sense to focus on making sure that this development is safe — meaning that it has a high probability of avoiding all the catastrophic failures listed above.

One way to do this is to try to develop technical solutions to prevent the kind of power-seeking behaviour we discussed earlier — this is generally known as working on technical AI safety, sometimes called just “AI safety” for short.

AI governance research and implementation

A second strategy for reducing risks from AI is to shape its development through policy, norms-building, and other governance mechanisms.

Good AI governance can help technical safety work, for example by producing safety agreements between corporations, or helping talented safety researchers from around the world move to where they can be most effective. AI governance could also help with other problems that lead to risks, like race dynamics.

But also, as we’ve discussed, even if we successfully manage to make AI do what we want (i.e. we ‘align’ it), we might still end up choosing something bad for it to do! So we need to worry about the incentives not just of the AI systems, but of the human actors using them.

Here are some more questions you might have:

Can it make sense to dedicate my career to solving an issue based on a speculative story about a technology that may or may not ever exist?
Is this a form of ‘Pascal’s mugging’ — taking a big bet on tiny probabilities?

Again, we think there are strong responses to these questions.

6. This work is neglected

We estimate there are around 400 people around the world working directly on reducing the chances of an AI-related existential catastrophe (with a 90% confidence interval ranging between 200 and 1,000). Of these, about three quarters are working on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.⁴⁴ We think there are around 800 people working in complementary roles, but we’re highly uncertain about this estimate.full-time equivalent") working on the problem of reducing existential risks from AI.

But there are lots of ambiguities around what counts as working on the issue. So I tried to use the following guidelines in my estimates:

I didn't include people who might think of themselves on a career path that is building towards a role preventing an AI-related catastrophe, but who are currently skilling up rather than working directly on the problem.
I included researchers, engineers, and other staff that seem to work directly on technical AI safety research or AI strategy and governance. But there's an uncertain boundary between these people and others who I chose not to include. For example, I didn't include machine learning engineers whose role is building AI systems that might be used for safety research but aren't primarily designed for that purpose.
I only included time spent on work that seems related to reducing the potentially existential risks from AI, like those discussed in this article. Lots of wider AI safety and AI ethics work focuses on reducing other risks from AI seems relevant to reducing existential risks – this 'indirect' work makes this estimate difficult. I decided not to include indirect work on reducing the risks of an AI-related catastrophe (see our problem framework for more).
Relatedly, I didn't include people working on other problems that might indirectly affect the chances of an AI-related catastrophe, such as epistemics and improving institutional decision-making, reducing the chances of great power conflict, or building effective altruism.

With those decisions made, I estimated this in three different ways.

First, for each organisation in the AI Watch database, I estimated the number of FTE working directly on reducing existential risks from AI. I did this by looking at the number of staff listed at each organisation, both in total and in 2022, as well as the number of researchers listed at each organisation. Overall I estimated that there were 76 to 536 FTE working on technical AI safety (90% confidence), with a mean of 196 FTE. I estimated that there were 51 to 359 FTE working on AI governance and strategy (90% confidence), with a mean of 151 FTE. There's a lot of subjective judgement in these estimates because of the ambiguities above. The estimates could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area.

Second, I adapted the methodology used by Gavin Leech's estimate of the number of people working on reducing existential risks from AI. I split the organisations in Leech's estimate into technical safety and governance/strategy. I adapted Gavin's figures for the proportion of computer science academic work relevant to the topic to fit my definitions above, and made a related estimate for work outside computer science but within academia that is relevant. Overall I estimated that there were 125 to 1,848 FTE working on technical AI safety (90% confidence), with a mean of 580 FTE. I estimated that there were 48 to 268 FTE working on AI governance and strategy (90% confidence), with a mean of 100 FTE.

Third, I looked at the estimates of similar numbers by Stephen McAleese. I made minor changes to McAleese's categorisation of organisations, to ensure the numbers were consistent with the previous two estimates. Overall I estimated that there were 110 to 552 FTE working on technical AI safety (90% confidence), with a mean of 267 FTE. I estimated that there were 36 to 193 FTE working on AI governance and strategy (90% confidence), with a mean of 81 FTE.

I took a geometric mean of the three estimates to form a final estimate, and combined confidence intervals by assuming that distributions were approximately lognormal.

Finally, I estimated the number of FTE in complementary roles using the AI Watch database. For relevant organisations, I identified those where there was enough data listed about the number of researchers at those organisations. I calculated the ratio between the number of researchers in 2022 and the number of staff in 2022, as recorded in the database. I calculated the mean of those ratios, and a confidence interval using the standard deviation. I used this ratio to calculate the overall number of support staff by assuming that estimates of the number of staff are lognormally distributed and that the estimate of this ratio is normally distributed. Overall I estimated that there were 2 to 2,357 FTE in complementary roles (90% confidence), with a mean of 770 FTE.

There are likely many errors in this methodology, but I expect these errors are small compared to the uncertainty in the underlying data I'm using. Ultimately, I'm still highly uncertain about the overall FTE working on preventing an AI-related catastrophe, but I'm confident enough that the number is relatively small to say that the problem as a whole is highly neglected.

I'm very uncertain about this estimate. It involved a number of highly subjective judgement calls. You can see the (very rough) spreadsheet I worked off here. If you have any feedback, I'd really appreciate it if you could tell me what you think using this form.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴⁵

In The Precipice, Ord estimated that there was between $10 million and $50 million spent on reducing AI risk in 2020.

That might sound like a lot of money, but we’re spending something like 1,000 times that amountaccording to its annual report. We'd expect most of that to be contributing to "advancing AI capabilities" in some sense, since its main goal is building powerful, general AI systems. (Although it's important to note that DeepMind is also contributing to work in AI safety, which may be reducing existential risk.)

As an upper bound, the total revenues of the AI sector in 2021 were around $340 billion.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">¹¹ on speeding up the development of transformative AI via commercial capabilities research and engineering at large AI labs.

To compare the $50 million spent on AI safety in 2020 to other well-known risks, we’re currently spending several hundreds of billions per year on tackling climate change.

Because this field is so neglected and has such high stakes, we think your impact working on risks from AI could be much higher than working on many other areas — which is why our top two recommended career paths for making a big positive difference in the world are technical AI safety and AI policy research and implementation.

What do we think are the best arguments against this problem being pressing?

As we said above, we’re not totally sure the arguments we’ve presented for AI representing an existential threat are right. Though we do still think that the chance of catastrophe from AI is high enough to warrant many more people pursuing careers to try to prevent such an outcome, we also want to be honest about the arguments against doing so, so you can more easily make your own call on the question.

Here we’ll cover the strongest reasons (in our opinion) to think this problem isn’t particularly pressing. In the next section we’ll cover some common objections that (in our opinion) hold up less well, and explain why.

We might have a lot of time to work on this problem

The longer we have before transformative AI is developed, the less pressing it is to work now on ways to ensure that it goes well. This is because the work of others in the future could be much better or more relevant than the work we are able to do now.

Also, if it takes us a long time to create transformative AI, we have more time to figure out how to make it safe. The risk seems much higher if AI developers will create transformative AI in the next few decades.

It seems plausible that the first transformative AI won’t be based on current deep learning methods. (AI Impacts have documented arguments that current methods won’t be able to produce AI that has human-level intelligence.) This could mean that some of our current research might not end up being useful (and also — depending on what method ends up being used — could make the arguments for risk less worrying).

Relatedly, we might expect that progress in the development of AI will occur in bursts. Previously, the field has seen AI winters, periods of time with significantly reduced investment, interest and research in AI. It’s unclear how likely it is that we’ll see another AI winter — but this possibility should lengthen our guesses about how long it’ll be before we’ve developed transformative AI. Cotra writes about the possibility of an AI winter in part four of her report forecasting transformative AI. New constraints on the rate of growth of AI capabilities, like the availability of training data, could also mean that there’s more time to work on this (Cotra discusses this here.)

Thirdly, the estimates about when we’ll get transformative AI from Cotra, Kanfosky and Davidson that we looked at earlier were produced by people who already expected that working on preventing an AI-related catastrophe might be one of the world’s most pressing problems. As a result, there’s selection bias here: people who think transformative AI is coming relatively soon are also the people incentivised to carry out detailed investigations. (That said, if the investigations themselves seem strong, this effect could be pretty small.)

Finally, none of the estimates we discussed earlier were trying to predict when an existential catastrophe might occur. Instead, they were looking at when AI systems might be able to automate all tasks humans can do, or when AI systems might significantly transform the economy. It’s by no means certain that the kinds of AI systems that could transform the economy would be the same advanced planning systems that are core to the argument that AI systems might seek power. Advanced planning systems do seem to be particularly useful, so there is at least some reason to think these might be the sorts of systems that end up being built. But even if the forecasted transformative AI systems are advanced planning systems, it’s unclear how capable such systems would need to be to pose a threat — it’s more than plausible that systems would need to be far more capable to pose a substantial existential threat than they would need to be to transform the economy. This would mean that all the estimates we considered above would be underestimates of how long we have to work on this problem.

All that said, it might be extremely difficult to find technical solutions to prevent power-seeking behaviour — and if that’s the case, focusing on finding those solutions now does seem extremely valuable.

Overall, we think that transformative AI is sufficiently likely in the next 10–80 years that it is well worth it (in expected value terms) to work on this issue now. Perhaps future generations will take care of it, and all the work we’d do now will be in vain — we hope so! But it might not be prudent to take that risk.

AI might improve gradually over time

If the best AI we have improves gradually over time (rather than AI capabilities remaining fairly low for a while and then suddenly increasing), we’re likely to end up with ‘warning shots’: we’ll notice forms of misaligned behaviour in fairly weak systems, and be able to correct for it before it’s too late.

In such a gradual scenario, we’ll have a better idea about what form powerful AI might take (e.g. whether it will be built using current deep learning techniques, or something else entirely), which could significantly help with safety research. There will also be more focus on this issue by society as a whole, as the risks of AI become clearer.

So if gradual development of AI seems more likely, the risk seems lower.

But it’s very much not certain that AI development will be gradual, or if it is, gradual enough for the risk to be noticeably lower. And even if AI development is gradual, there could still be significant benefits to having plans and technical solutions in place well in advance. So overall we still think it’s extremely valuable to attempt to reduce the risk now.

If you want to learn more, you can read AI Impacts’ work on arguments for and against discontinuous (i.e. non-gradual) progress in AI development, and Toby Ord and Owen Cotton-Barratt on strategic implications of slower AI development.

We might need to solve alignment anyway to make AI useful

Making something have goals aligned with human designers’ ultimate objectives and making something useful seem like very related problems. If so, perhaps the need to make AI useful will drive us to produce only aligned AI — in which case the alignment problem is likely to be solved by default.

Ben Garfinkel gave a few examples of this on our podcast:

You can think of a thermostat as a very simple AI that attempts to keep a room at a certain temperature. The thermostat has a metal strip in it that expands as the room heats, and cuts off the current once a certain temperature has been reached. This piece of metal makes the thermostat act like it has a goal of keeping the room at a certain temperature, but also makes it capable of achieving this goal (and therefore of being actually useful).
Imagine you’re building a cleaning robot with reinforcement learning techniques — that is, you provide some specific condition under which you give the robot positive feedback. You might say something like, “The less dust in the house, the more positive the feedback.” But if you do this, the robot will end up doing things you don’t want — like ripping apart a cushion to find dust on the inside. Probably instead you need to use techniques like those being developed by people working on AI safety (things like watching a human clean a house and letting the AI figure things out from there). So people building AIs will be naturally incentivised to also try to make them aligned (and so in some sense safe), so they can do their jobs.

If we need to solve the problem of alignment anyway to make useful AI systems, this significantly reduces the chances we will have misaligned but still superficially useful AI systems. So the incentive to deploy a misaligned AI would be a lot lower, reducing the risk to society.

That said, there are still reasons to be concerned. For example, it seems like we could still be susceptible to problems of AI deception.

And, as we’ve argued, AI alignment is only part of the overall issue. Solving the alignment problem isn’t the same thing as completely eliminating existential risk from AI, since aligned AI could also be used to bad ends — such as by authoritarian governments.

The problem could be extremely difficult to solve

As with many research projects in their early stages, we don’t know how hard the alignment problem — or other AI problems that pose risks — are to solve. Someone could believe there are major risks from machine intelligence, but be pessimistic about what additional research or policy work will accomplish, and so decide not to focus on it.

This is definitely a reason to potentially work on another issue — the solvability of an issue is a key part of how we try to compare global problems. For example, we’re also very concerned about risks from pandemics, and it may be much easier to solve that issue.

That said, we think that given the stakes, it could make sense for many people to work on reducing AI risk, even if you think the chance of success is low. You’d have to think that it was extremely difficult to reduce risks from AI in order to conclude that it’s better just to let the risks materialise and the chance of catastrophe play out.

At least in our own case at 80,000 Hours, we want to keep trying to help with AI safety — for example, by writing profiles like this one — even if the chance of success seems low (though in fact we’re overall pretty optimistic).

We could be overestimating the chances that strategic AI systems would try to seek power

There are some reasons to think that the core argument that any advanced, strategically aware planning system will by default seek power (which we gave here) isn’t totally right.draft report into existential risks from AI.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴⁶

For a start, the argument that advanced AI systems will seek power relies on the idea that systems will produce plans to achieve goals. We’re not quite sure what this means — and as a result, we’re not sure what properties are really required for power-seeking behaviour to occur, and unsure whether the things we’ll build will have those properties.
We’d love to see a more in-depth analysis of what aspects of planning are economically incentivised, and whether those aspects seem like they’ll be enough for the argument for power-seeking behaviour to work.

Grace has written more about the ambiguity around “how much goal-directedness is needed to bring about disaster”
It’s possible that only a few goals that AI systems could have would lead to misaligned power-seeking.

Richard Ngo, in his analysis of what people mean by “goals”, points out that you’ll only get power-seeking behaviour if you have goals that mean the system can actually benefit from seeking power. Ngo suggests that these goals need to be “large-scale.” (Some have argued that, by default, we should expect AI systems to have “short-term” goals that won’t lead to power-seeking behaviour.)

But whether an AI system would plan to take power depends on how easy it would be for the system to take power, because the easier it is for a system to take power, the more likely power-seeking plans are to be successful — so a good planning system would be more likely to choose them. This suggests it will be easier to accidentally create a power-seeking AI system as systems’ capabilities increase.

So there still seems to be cause for increased concern, because the capabilities of AI systems do seem to be increasing fast. There are two considerations here: if few goals really lead to power-seeking, even for quite capable AI systems, that significantly reduces the risk and thus the importance of the problem. But it might also increase the solvability of the problem by demonstrating that solutions could be easy to find (e.g. the solution of never giving systems “large-scale” goals) — making this issue more valuable for people to work on.
Earlier we argued that we can expect AI systems to do things that seem generally instrumentally useful to their overall goal, and that as a result it could be hard to prevent AI systems from doing these instrumentally useful things.

But we can find examples where how generally instrumentally useful things would be doesn’t seem to affect how hard it is to prevent these things. Consider an autonomous car that can move around only if its engine is on. For many possible goals (other than, say, turning the car radio on), it seems like it would be useful for the car to be able to move around, so we should expect the car to turn its engine on. But despite that, we might still be able to train the car to keep its engine off: for example, we can give it some negative feedback whenever it turns the engine on, even if we also had given the car some other goals. Now imagine we improve the car so that its top speed is higher — this massively increases the number of possible action sequences that involve, as a first step, turning its engine on. In some sense, this seems to increase the instrumental usefulness of turning the engine on — there are more possible actions the car can take, once its engine is on, because the range of possible speeds it can travel at is higher. (It’s not clear if this sense of “instrumental usefulness” is the same as the one in the argument for the risk, although it does seem somewhat related.) But it doesn’t seem like this increase in the instrumental usefulness of turning on the engine makes it much harder to stop the car turning it on. Simple examples like this cast some doubt on the idea that, just because a particular action is instrumentally useful, we won’t be able to find ways to prevent it. (For more on this example, see page 25 of Garfinkel’s review of Carlsmith’s report.)
Humans are clearly highly intelligent, but it’s unclear they are perfect goal-optimisers. For example, humans often face some kind of existential angst over what their true goals are. And even if we accept humans as an example of a strategically aware agent capable of planning, humans certainly aren’t always power-seeking. We obviously care about having basics like food and shelter, and many people go to great lengths for more money, status, education, or even formal power. But some humans choose not to pursue these goals, and pursuing them doesn’t seem to correlate with intelligence.

However, this doesn’t mean that the argument that there will be an incentive to seek power is wrong. Most people do face and act on incentives to gain forms of influence via wealth, status, promotions, and so on. And we can explain the observation that humans don’t usually seek huge amounts of power by observing that we aren’t usually in circumstances that make the effort worth it.

For example, most people don’t try to start billion-dollar companies — you probably won’t succeed, and it’ll cost you a lot of time and effort.

But you’d still walk across the street to pick up a billion-dollar cheque.

The absence of extreme power-seeking in many humans, along with uncertainties in what it really means to plan to achieve goals, does suggest that the argument we gave that advanced AI systems will seek power above might not be completely correct. And they also suggest that, if there really is a problem to solve here, in principle, alignment research into preventing power-seeking in AIs could succeed.

This is good news! But for the moment — short of hoping we’re wrong about the existence of the problem — we don’t actually know how to prevent this power-seeking behaviour.

Arguments against working on AI risk to which we think there are strong responses

We’ve just discussed the major objections to working on AI risk that we think are most persuasive. In this section, we’ll look at objections that we think are less persuasive, and give some reasons why.

Is it even possible to produce artificial general intelligence?

People have been saying since the 1950s that artificial intelligence smarter than humans is just around the corner.

But it hasn’t happened yet.

One reason for this could be that it’ll never happen. Some have argued that producing artificial general intelligence is fundamentally impossible. Others think it’s possible, but unlikely to actually happen, especially not with current deep learning methods.

Overall, we think the existence of human intelligence shows it’s possible in principle to create artificial intelligence. And the speed of current advances isn’t something we think would have been predicted by those who thought that we’ll never develop powerful, general AI.

But most importantly, the idea that you need fully general intelligent AI systems for there to be a substantial existential risk is a common misconception.

The argument we gave earlier relied on AI systems being as good or better than humans in a subset of areas: planning, strategic awareness, and areas related to seeking and keeping power. So as long as you think all these things are possible, the risk remains.

And even if no single AI has all of these properties, there are still ways in which we might end up with systems of ‘narrow’ AI systems that, together, can disempower humanity. For example, we might have a planning AI that develops plans for a company, a separate AI system that measures things about the company, another AI system that attempts to evaluate plans from the first AI by predicting how much profit each will make, and further AI systems that carry out those plans (for example, by automating the building and operation of factories). Considered together, this system as a whole has the capability to form and carry out plans to achieve some goal, and potentially also has advanced capabilities in areas that help it seek power.

It does seem like it will be easier to prevent these ‘narrow’ AI systems from seeking power. This could happen if the skills the AIs have, even when combined, don’t add up to being able to plan to achieve goals, or if the narrowness reduces the risk of systems developing power-seeking plans (e.g. if you build systems that can only produce very short-term plans). It also seems like it gives another point of weakness for humans to intervene if necessary: the coordination of the different systems.

Nevertheless, the risk remains, even from systems of many interacting AIs.

Why can't we just unplug a dangerous AI?

It might just be really, really hard.

Stopping people and computers from running software is already incredibly difficult.

Think about how hard it would be to shut down Google’s web services. Google’s data centres have millions of servers over 34 different locations, many of which are running the same sets of code. And these data centres are absolutely crucial to Google’s bottom line, so even if Google could decide to shut down their entire business, they probably wouldn’t.

Or think about how hard it is to get rid of computer viruses that autonomously spread between computers across the world.

Ultimately, we think any dangerous power-seeking AI system will be looking for ways to not be turned off, which makes it more likely we’ll be in one of these situations, rather than in a case where we can just unplug a single machine.

That said, we absolutely should try to shape the future of AI such that we can ‘unplug’ powerful AI systems.

There may be ways we can develop systems that let us turn them off. But for the moment, we’re not sure how to do that.

Ensuring that we can turn off potentially dangerous AI systems could be a safety measure developed by technical AI safety research, or it could be the result of careful AI governance, such as planning coordinated efforts to stop autonomous software once it’s running.

Couldn't we just 'sandbox' any potentially dangerous AI system until we know it's safe?

We could (and should!) definitely try.

If we could successfully ‘sandbox’ an advanced AI — that is, contain it to a training environment with no access to the real world until we were very confident it wouldn’t do harm — that would help our efforts to mitigate AI risks tremendously.

But there are a few things that might make this difficult.

For a start, we might only need one failure — like one person to remove the sandbox, or one security vulnerability in the sandbox we hadn’t noticed — for the AI system to begin affecting the real world.

Moreover, this solution doesn’t scale with the capabilities of the AI system. This is because:

More capable systems are more likely to be able to find vulnerabilities or other ways of leaving the sandbox (e.g. threatening or coercing humans).
Systems that are good at planning might attempt to deceive us into deploying them.

So the more dangerous the AI system, the less likely sandboxing is to be possible. That’s the opposite of what we’d want from a good solution to the risk.

Surely a truly intelligent AI system would know not to disempower everyone?

For some definitions of “truly intelligent” — for example, if true intelligence includes a deep understanding of morality and a desire to be moral — this would probably be the case.

But if that’s your definition of truly intelligent, then it’s not truly intelligent systems that pose a risk. As we argued earlier, it’s advanced systems that can plan and have strategic awareness that pose risks to humanity.

With sufficiently advanced strategic awareness, an AI system’s excellent understanding of the world may well encompass an excellent understanding of people’s moral beliefs. But that’s not a strong reason to think that such a system would act morally.

For example, when we learn about other cultures or moral systems, that doesn’t necessarily create a desire to follow their morality. A scholar of the Antebellum South might have a very good understanding of how 19th century slave owners justified themselves as moral, but would be very unlikely to defend slavery.

AI systems with excellent understandings of human morality could be even more dangerous than AIs without such understanding: the AI system could act morally at first as a way to deceive us into thinking that it is safe.

Isn't the real danger from actual current AI — not some sort of futuristic superintelligence?

There are definitely dangers from current artificial intelligence.

For example, data used to train neural networks often contains hidden biases. This means that AI systems can learn these biases — and this can lead to racist and sexist behaviour.

There are other dangers too. Our earlier discussion on nuclear war explains a threat which doesn’t require AI systems to have particularly advanced capabilities.

But we don’t think the fact that there are also risks from current systems is a reason not to prioritise reducing existential threats from AI, if they are sufficiently severe.

As we’ve discussed, future systems — not necessarily superintelligence or totally general intelligence, but systems advanced in their planning and power-seeking capabilities — seem like they could pose threats to the existence of the entirety of humanity. And it also seems somewhat likely that we’ll produce such systems this century.

What’s more, lots of technical AI safety research is also relevant to solving problems with existing AI systems. For example, some research focuses on ensuring that ML models do what we want them to, and will still do this as their size and capabilities increase; other research tries to work out how and why existing models are making the decisions and taking the actions that they do.

As a result, at least in the case of technical research, the choice between working on current threats and future risks may look more like a choice between only ensuring that current models are safe, or instead finding ways to ensure that current models are safe that will also continue to work as AI systems become more complex and more intelligent.

Ultimately, we have limited time in our careers, so choosing which problem to work on could be a huge way of increasing your impact. When there are such substantial threats, it seems reasonable for many people to focus on addressing these worst-case possibilities.

But can't AI also do a lot of good?

Yes, it can.

AI systems are already improving healthcare, putting driverless cars on the roads, and automating household chores.

And if we’re able to automate advancements in science and technology, we could see truly incredible economic and scientific progress. AI could likely help solve many of the world’s most pressing problems.

But, just because something can do a lot of good, that doesn’t mean it can’t also do a lot of harm. AI is an example of a dual-use technology — a technology that can be used for both dangerous and beneficial purposes. For example, researchers were able to get an AI model that was trained to develop medical drugs to instead generate designs for bioweapons.

We are excited and hopeful about seeing large benefits from AI. But we also want to work hard to minimise the enormous risks advanced AI systems pose.

Why shouldn't I dismiss this as motivated reasoning by a group of people who just like playing with computers and want to think that's important?

It’s undoubtedly true that some people are drawn to thinking about AI safety because they like computers and science fiction — as with any other issue, there are people working on it not because they think it’s important, but because they think it’s cool.

But, for many people, working on AI safety comes with huge reluctance.

For me, and many of us at 80,000 Hours, spending our limited time and resources working on any cause that affects the long-run future — and therefore not spending that time on the terrible problems in the world today — is an incredibly emotionally difficult thing to do.

But we’ve gradually investigated these arguments (in the course of trying to figure out how we can do the most good), and over time both gained more expertise about AI and became more concerned about the risk.

We think scepticism is healthy, and are far from certain that these arguments completely work. So while this suspicion is definitely a reason to dig a little deeper, we hope that, ultimately, this worry won’t be treated as a reason to deprioritise what may well be the most important problem of our time.

This all reads, and feels, like science fiction

That something sounds like science fiction isn’t a reason in itself to dismiss it outright. There are loads of examples of things first mentioned in sci-fi that then went on to actually happen (this list of inventions in science fiction contains plenty of examples).

There are even a few such cases involving technology that are real existential threats today:

In his 1914 novel The World Set Free, H. G. Wells predicted atomic energy fueling powerful explosives — 20 years before we realised there could in theory be nuclear fission chain reactions, and 30 years before nuclear weapons were actually produced. In the 1920s and 1930s, Nobel Prize–winning physicists Millikan, Rutherford, and Einstein all predicted that we would never be able to use nuclear power. Nuclear weapons were literal science fiction before they were reality.
In the 1964 film Dr. Strangelove, the USSR builds a doomsday machine that would automatically trigger an extinction-level nuclear event in response to a nuclear strike, but keeps it secret. Dr Strangelove points out that keeping it secret rather reduces its deterrence effect. But we now know that in the 1980s the USSR built an extremely similar system… and kept it secret.

Moreover, there are top academics and researchers working on preventing these risks from AI — at MIT, Cambridge, Oxford, UC Berkeley, and elsewhere. Two of the world’s top AI labs (DeepMind and OpenAI) have teams explicitly dedicated to working on technical AI safety. Researchers from these places helped us with this article.

It’s totally possible all these people are wrong to be worried, but the fact that so many people take this threat seriously undermines the idea that this is merely science fiction.

It’s reasonable when you hear something that sounds like science fiction to want to investigate it thoroughly before acting on it. But having investigated it, if the arguments seem solid, then simply sounding like science fiction is not a reason to dismiss them.

Can it make sense to dedicate my career to solving an issue based on a speculative story about a technology that may or may not ever exist?

We never know for sure what’s going to happen in the future. So, unfortunately for us, if we’re trying to have a positive impact on the world, that means we’re always having to deal with at least some degree of uncertainty.

We also think there’s an important distinction between guaranteeing that you’ve achieved some amount of good and doing the very best you can. To achieve the former, you can’t take any risks at all — and that could mean missing out on the best opportunities to do good.

When you’re dealing with uncertainty, it makes sense to roughly think about the expected value of your actions: the sum of all the good and bad potential consequences of your actions, weighted by their probability.

Given the stakes are so high, and the risks from AI aren’t that low, this makes the expected value of helping with this problem high.

We’re sympathetic to the concern that if you work on AI safety, you might end up doing not much at all when you might have done a tremendous amount of good working on something else — simply because the problem and our current ideas about what to do about it are so uncertain.

But we think the world will be better off if we decide that some of us should work on solving this problem, so that together we have the best chance of successfully navigating the transition to a world with advanced AI rather than risking an existential crisis.

And it seems like an immensely valuable thing to try.

Is this a form of Pascal's mugging — taking a big bet on tiny probabilities?

Pascal’s mugging is a thought experiment — a riff on the famous Pascal’s wager — where someone making decisions using expected value calculations can be exploited by claims that they can get something extraordinarily good (or avoid something extraordinarily bad), with an extremely low probability of succeeding.

The story goes like this: a random mugger stops you on the street and says, “Give me your wallet or I’ll cast a spell of torture on you and everyone who has ever lived.” You can’t rule out with 100% probability that he won’t — after all, nothing’s 100% for sure. And torturing everyone who’s ever lived is so bad that surely even avoiding a tiny, tiny probability of that is worth the $40 in your wallet? But intuitively, it seems like you shouldn’t give your wallet to someone just because they threaten you with something completely implausible.

Analogously, you could worry that working on AI safety means giving your valuable time to avoid a tiny, tiny chance of catastrophe. Working on reducing risks from AI isn’t free — the opportunity cost is quite substantial, as it means you forgo working on other extremely important things, like reducing risks from pandemics or ending factory farming.

Here’s the thing though: while there’s lots of value at stake — perhaps the lives of everybody alive today, and the entirety of the future of humanity — it’s not the case that the probability that you can make a difference by working on reducing risks from AI is small enough for this argument to apply.

We wish the chance of an AI catastrophe was that vanishingly small.

Instead, we think the probability of such a catastrophe (I think, around 1% this century) is much, much larger than things that people try to prevent all the time — such as fatal plane crashes, which happen in 0.00002% of flights.

What really matters, though, is the extent to which your work can reduce the chance of a catastrophe.

Let’s look at working on reducing risks from AI. For example, if:

There’s a 1% chance of an AI-related existential catastrophe by 2100
There’s a 30% chance that we can find a way to prevent this by technical research
Five people working on technical AI safety raises the chances of solving the problem by 1% of that 30% (so 0.3 percentage points)

Then each person involved has a 0.00006 percentage point share in preventing this catastrophe.

Other ways of acting altruistically involve similarly sized probabilities.

The chances of a volunteer campaigner swinging a US presidential election is somewhere between 0.001% and 0.00001%. But you can still justify working on a campaign because of the large impact you expect you’d have on the world if your preferred candidate won.

You have even lower chances of wild success from things like trying to reform political institutions, or working on some very fundamental science research to build knowledge that might one day help cure cancer.

Overall, as a society, we may be able to reduce the chance of an AI-related catastrophe all the way down from 10% (or higher) to close to zero — that’d be clearly worth it for a group of people, so it has to be worth it for the individuals, too.

We wouldn’t want to just not do fundamental science because each researcher has a low chance of making the next big discovery, or not do any peacekeeping because any one person has a low chance of preventing World War III. As a society, we need some people working on these big issues — and maybe you can be one of them.

What you can do concretely to help

As we mentioned above, we know of two main ways to help reduce existential risks from AI:

Technical AI safety research
AI strategy/policy research and implementation

The biggest way you could help would be to pursue a career in either one of these areas, or in a supporting area.

The first step is learning a lot more about the technologies, problems, and possible solutions. We’ve collated some lists of our favourite resources here, and our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

If you decide to pursue a career in this area, we’d generally recommend working at an organisation focused on specifically addressing this problem (though there are other ways to help besides working at existing organisations, as we discuss briefly below).

Technical AI safety

Approaches

There are lots of approaches to technical AI safety, including:

Scalably learning from human feedback. Examples include iterated amplification, AI safety via debate, building AI assistants that are uncertain about our goals and learn them by interacting with us, and other ways to get AI systems trained with stochastic gradient descent to report truthfully what they know.
Threat modelling. An example of this work would be demonstrating the possibility of (allowing us to study) dangerous capabilities, like deceptive or manipulative AI systems. You can read an overview in a recent Google DeepMind paper. This work splits into work that evaluates whether a model has dangerous capabilities (like the work of METR in evaluating GPT-4), and work that evaluates whether a model would cause harm in practice (like Anthropic’s research into the behaviour of large language models and this paper on goal misgeneralisation).
Interpretability research. This work involves studying why AI systems do what they do and trying to put it into human-understandable terms. For example, this paper examined how AlphaZero learns chess, and this paper looked into finding latent knowledge in language models without supervision. This category also includes mechanistic interpretability — for example, Zoom In: An Introduction to Circuits by Olah et al.). For more, see this survey paper, as well as Hubinger’s a transparency and interpretability tech tree, and Nanda’s A Longlist of Theories of Impact for Interpretability for overviews of of how interpretability research could reduce existential risk from AI.
Other anti-misuse research to reduce the risks of catastrophe caused by misuse of systems. (We’ve written more on this in our problem profile on AI risk. For example, this work includes training AIs so they’re hard to use for dangerous purposes. (Note there’s lots of overlap with the other work on this list).
Research to increase the robustness of neural networks. This work involves ensuring that the sorts of behaviour neural networks display when exposed to one set of inputs continues when exposed to inputs they haven’t previously been exposed to, in order to prevent AI systems changing to unsafe behaviour. See section 2 of Unsolved Problems in AI safety for more.
Work to build cooperative AI. Find ways to ensure that even if individual AI systems seem safe, they don’t produce bad outcomes through interacting with other sociotechnical systems. For more, see Open Problems in Cooperative AI by Dafoe et al. or the Cooperative AI Foundation. This seems particularly relevant for the reduction of ‘s-risks.’
More generally, there are some unified safety plans. For more, see Hubinger’s 11 possible proposals for building safe advanced AI, or Karnofsky’s How might we align transformative AI if it’s developed very soon.⁴⁷

See Neel Nanda’s overview of the AI alignment landscape for more details.

Key organisations

AI labs in industry that have empirical technical safety teams, or are focused entirely on safety:

Anthropic is an AI safety company working on building interpretable and safe AI systems. They focus on empirical AI safety research. Anthropic cofounders Daniela and Dario Amodei gave an interview about the lab on the Future of Life Institute podcast. On our podcast, we spoke to Chris Olah, who leads Anthropic’s research into interpretability, and Nova DasSarma, who works on systems infrastructure at Anthropic.
Model Evaluation and Threat Research works on assessing whether cutting-edge AI systems could pose catastrophic risks to civilization, including early-stage, experimental work to develop techniques, and evaluating systems produced by Anthropic and OpenAI.
The Center for AI Safety is a nonprofit that does technical research and promotion of safety in the wider machine learning community.
FAR AI is a research nonprofit that incubates and accelerates research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry, including research in adversarial robustness, interpretability and preference learning.
Google DeepMind is probably the largest and most well-known research group developing general artificial machine intelligence, and is famous for its work creating AlphaGo, AlphaZero, and AlphaFold. It is not principally focused on safety, but has two teams focused on AI safety, with the Scalable Alignment Team focusing on aligning existing state-of-the-art systems, and the Alignment Team focused on research bets for aligning future systems.
OpenAI, founded in 2015, is a lab that is trying to build artificial general intelligence that is safe and benefits all of humanity. OpenAI is well known for its language models like GPT-4. Like DeepMind, it is not principally focused on safety, but has a safety team and a governance team. Jan Leike (head of the alignment team) has some blog posts on how he thinks about AI alignment.
Ought is a machine learning lab building Elicit, an AI research assistant. Their aim is to align open-ended reasoning by learning human reasoning steps, and to direct AI progress towards helping with evaluating evidence and arguments.
Redwood Research is an AI safety research organisation, whose first big project attempted to make sure language models (like GPT-3) produce output following certain rules with very high probability, in order to address failure modes too rare to show up in standard training.

Theoretical / conceptual AI safety labs:

The Alignment Research Center (ARC) is attempting to produce alignment strategies that could be adopted in industry today while also being able to scale to future systems. They focus on conceptual work, developing strategies that could work for alignment and which may be promising directions for empirical work, rather than doing empirical AI work themselves. Their first project was releasing a report on Eliciting Latent Knowledge, the problem of getting advanced AI systems to honestly tell you what they believe (or ‘believe’) about the world. On our podcast, we interviewed ARC founder Paul Christiano about his research (before he founded ARC).
The Center on Long-Term Risk works to address worst-case risks from advanced AI. They focus on conflict between AI systems.
The Machine Intelligence Research Institute was one of the first groups to become concerned about the risks from machine intelligence in the early 2000s, and its team has published a number of papers on safety issues and how to resolve them.
Some teams in commercial labs also do some more theoretical and conceptual work on alignment, such as Anthropic’s work on conditioning predictive models and the Causal Incentives Working Group at Google DeepMind.

The Algorithmic Alignment Group in the Computer Science and Artificial Intelligence Laboratory at MIT, led by Dylan Hadfield-Menell
The Center for Human-Compatible AI at UC Berkeley, led by Stuart Russell, focuses on academic research to ensure AI is safe and beneficial to humans. (Our podcast with Stuart Russell examines his approach to provably beneficial AI.)
Jacob Steinhardt’s research group in the Department of Statistics at UC Berkeley
The NYU Alignment research Group led by Sam Bowman
David Krueger’s research group at the Computational and Biological Learning Laboratory at the University of Cambridge
The Foundations of Cooperative AI Lab at Carnegie Mellon University
The Future of Humanity Institute at the University of Oxford has an AI safety research group
The Alignment of Complex Systems research group at Charles University, Prague

If you’re interested in learning more about technical AI safety as an area — e.g. the different techniques, schools of thought, and threat models — our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

We discuss this path in more detail here:

Career review of technical AI safety research

Alternatively, if you’re looking for something more concrete and step-by-step (with very little in the way of introduction), check out this detailed guide to pursuing a career in AI alignment.

It’s important to note that you don’t have to be an academic or an expert in AI or AI safety to contribute to AI safety research. For example, software engineers are needed at many places conducting technical safety research, and we also highlight more roles below.

AI governance and strategy

Approaches

Quite apart from the technical problems, we face a host of governance issues, which include:

Coordination problems that are increasing the risks from AI (e.g. there could be incentives to use AI for personal gain in ways that can cause harm, or race dynamics that reduce incentives for careful and safe AI development).
Risks from accidents or misuse of AI that would be dangerous even if we are able to prevent power-seeking behaviour (as discussed above).
A lack of clarity on how and when exactly risks from AI (particularly power-seeking AI) might play out.
A lack of clarity on which intermediate goals we could pursue that, if achieved, would reduce existential risk from AI.

To tackle these, we need a combination of research and policy.Sam Clarke's overview of AI governance.

" rel="footnote" class="footnote-link no-visited-styling" aria-label="Footnote">⁴⁸

We are in the early stages of figuring out the shape of this problem and the most effective ways to tackle it. So it’s crucial that we do more research. This includes forecasting research into what we should expect to happen, and strategy and policy research into the best ways of acting to reduce the risks.

But also, as AI begins to impact our society more and more, it’ll be crucial that governments and corporations have the best policies in place to shape its development. For example, governments might be able to enforce agreements not to cut corners on safety, further the work of researchers who are less likely to cause harm, or cause the benefits of AI to be distributed more evenly. So there eventually might be a key role to be played in advocacy and lobbying for appropriate AI policy — though we’re not yet at the point of knowing what policies would be useful to implement.

Key organisations

AI strategy and policy organisations:

AI Impacts attempts to find answers to all sorts of relevant questions about the future of AI, like “How likely is a sudden jump in AI progress at around human-level performance?”
The AI Security Initiative at UC Berkeley’s Center for Long-Term Cybersecurity.
The Centre for the Governance of AI (GovAI) aims to build a global research community, dedicated to helping humanity navigate the transition to a world with advanced AI. On our podcast we’ve spoken to Ben Garfinkel, acting director of GovAI, about some weaknesses of classic AI risk arguments, as well as Allan Dafoe, president of GovAI and leader of DeepMind’s Long-Term Strategy and Governance team, about the destabilising effects of AI.
The Centre for Long-Term Resilience is a UK think tank focused on existential threats, including those from AI.
The Center for Security and Emerging Technology at Georgetown researches the foundations of AI (talent, data, and computational power). It focuses on how AI can be used in national security. Listen to our podcast with Helen Toner, their Director of Strategy, for more.
The Centre for the Study of Existential Risk at the University of Cambridge has a group considering the governance of AI.
DeepMind and OpenAI both have policy teams (listen to our podcast with members of the OpenAI policy team and our podcast with the head of DeepMind’s governance team, Allan Dafoe).
The Future of Life Insitute advocates for awareness of AI risk within the academic community and gives out grants for work focused on AI safety.
The Future of Humanity Institute at the University of Oxford has a macrostrategy research group that considers the future of AI and its contribution to existential risk.
The Leverhulme Centre for the Future of Intelligence is an interdisciplinary research centre at the University of Cambridge focusing on the impacts of AI on humanity.
Open Philanthropy provides grants to organisations working on altruistic issues. As a result they have research teams looking at the issues they focus on, including a team looking at potential risks from advanced AI. On our podcast, we spoke to Holden Karnofsky, co-CEO of Open Philanthropy, about his views on risks from AI. (Note: Open Philanthropy is 80,000 Hours’ biggest funder.)
The Institute for AI Policy and Strategy is focused on AI governance and strategy.

If you’re interested in learning more about AI governance, our top recommendation is to take a look at the governance curriculum from AGI safety fundamentals.

We discuss this path in more detail here:

Career review of AI strategy and policy careers

Also note: it could be particularly important for people with the right personal fit to work on AI strategy and governance in China.

Complementary (yet crucial) roles

Even in a research organisation, around half of the staff will be doing other tasks essential for the organisation to perform at its best and have an impact. Having high-performing people in these roles is crucial.

We think the importance of these roles is often underrated because the work is less visible. So we’ve written several career reviews on these areas to help more people enter these careers and succeed, including:

Operations management to help impactful organisations grow and function as effectively as possible.
Research management at an AI safety research organisation.
Being an executive assistant to someone who’s doing really important work on safety and governance.
Other non-technical roles in leading AI labs.

Other ways to help

AI safety is a big problem and it needs help from people doing a lot of different kinds of work.

One major way to help is to work in a role that directs funding or people towards AI risk, rather than working on the problem directly. We’ve reviewed a few career paths along these lines, including:

Founding new projects — in this case, starting new initiatives aimed at reducing risks from advanced AI.
Being a grantmaker to fund promising projects focused on reducing catastrophic AI risk.
Working in communication roles
Helping to build communities of people working on this problem. The most relevant community is the AI safety community itself, but it could also be impactful to help build the community of people working on the world’s most pressing problems (including risks from AI).

There are ways all of these could go wrong, so the first step is to become well-informed about the issue.

There are also other technical roles besides safety research that could help contribute, like:

Working in information security to protect AI (or the results of key experiments) from misuse, theft, or tampering.
Becoming an expert in AI hardware as a way of steering AI progress in safer directions.

You can read about all these careers — why we think they’re helpful, how to enter them, and how you can predict whether they’re a good fit for you — on our career reviews page.

Want one-on-one advice on pursuing this path?

We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

APPLY TO SPEAK WITH OUR TEAM

Find vacancies on our job board

Our job board features opportunities in AI technical safety and governance:

View all opportunities

Top resources to learn more

We've hit you with a lot of further reading throughout this article — here are a few of our favourites:

AI could defeat all of us combined and the "most important century" blog post series by Holden Karnofsky, co-CEO of Open Philanthropy, argues that the 21st century could be the most important century ever for humanity as a result of AI.
Why AI alignment could be hard with modern deep learning by Open Philanthropy researcher Cotra is a gentle introduction to how risks from power-seeking AI could play out with current machine learning methods. Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover, also by Cotra, provides a much more detailed description of how risks could play out (which we'd recommend for people familiar with ML).
AGI safety from first principles provides OpenAI governance researcher Richard Ngo's perspective on how to think about risks from artificial general intelligence.
Is power-seeking AI an existential risk? by Open Philanthropy researcher Joseph Carlsmith is an in-depth look covering exactly how and why AI could cause the disempowerment of humanity (but watch out — it's even longer than this article!). It's also available as an audio narration. For a shorter summary, see Carlsmith's talk on the same topic.
Distinguishing AI takeover scenarios by Sam Clarke and Sammy Martin summarises various ways in which AI could go wrong.
AI governance: Opportunity and theory of impact by DeepMind governance lead Allan Dafoe explores ways in which research into AI governance could effect change.
A bird's-eye view of the AI alignment landscape by Neel Nanda summarises the different ways in which technical alignment research could reduce the risk from AI.
An overview of 11 proposals for building safe advanced AI by Evan Hubinger discusses and evaluates plausible techniques for AI alignment.
Podcasts: The AI X-risk Research Podcast, particularly episode 12 with Paul Christiano and episode 13 with Richard Ngo — both of which serve as excellent introductions to AI risk.

On The 80,000 Hours Podcast, we have a number of in-depth interviews with people actively working to positively shape the development of artificial intelligence:

Paul Christiano on how OpenAI is developing real solutions to the 'AI alignment problem', and his vision of how humanity will progressively hand over decision-making to AI systems
Allan Dafoe on trying to prepare the world for the possibility that AI will destabilise global politics
Richard Ngo on large language models, OpenAI, and striving to make the future go well
Ajeya Cotra on accidentally teaching AI models to deceive us
Jan Leike on how to become a machine learning alignment researcher (from 2018) and OpenAI's massive push to make superintelligence safe in 4 years or less (from 2023)
Nathan Labenz on the final push for AGI, understanding OpenAI's leadership drama, and red-teaming frontier models and on recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps
Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters
Tom Davidson on how quickly AI could transform the world
Dario Amodei on OpenAI and how AI will change the world for good and ill
Miles Brundage on the world's desperate need for AI strategists and policy experts
Holden Karnofsky, cofounder of GiveWell and Open Philanthropy, has been on three of our podcasts, explaining:
- How AIs might take over even if they're no smarter than humans, and his four-part playbook for AI risk
- How philanthropy can have maximum impact by taking big risks (including a discussion of his work in positively shaping the development of AI)
- Why this might be the most important century
PhD or programming? Fast paths into aligning AI as a machine learning engineer, according to ML engineers Catherine Olsson & Daniel Ziegler

If you want to go into much more depth, the AGI safety fundamentals course is a good starting point. There are two tracks to choose from: technical alignment or AI governance. If you have a more technical background, you could try Intro to ML Safety, a course from the Center for AI Safety.

And finally, here are a few general sources (rather than specific articles) that you might want to explore:

The AI Alignment Forum, which is aimed at researchers working in technical AI safety.
AI Impacts, a project that aims to improve society's understanding of the likely impacts of human-level artificial intelligence.
The Alignment Newsletter, a weekly publication with recent content relevant to AI alignment with thousands of subscribers.
Import AI, a weekly newsletter about artificial intelligence by Jack Clark (cofounder of Anthropic), read by more than 10,000 experts.
Jeff Ding's ChinAI Newsletter, weekly translations of writings from Chinese thinkers on China's AI landscape.

Acknowledgements

Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlmüller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing this article or their extremely thoughtful and helpful comments and conversations. (This isn’t to say that they would all agree with everything we’ve said here — in fact, we’ve had many spirited disagreements in the comments on this article!)

The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.

Nova DasSarma on why information security may be critical to the safe development of AI systems

Robert Wiblin — Tue, 14 Jun 2022 21:46:23 +0000

The post Nova DasSarma on why information security may be critical to the safe development of AI systems appeared first on 80,000 Hours.

Data collection for AI alignment

Benjamin Hilton — Wed, 11 May 2022 21:52:52 +0000

In a nutshell:

To reduce the risks posed by the rise of artificial intelligence, we need to figure out how to make sure that powerful AI systems do what we want. Many potential solutions to this problem will require a lot of high-quality data from humans to train machine learning models. Building excellent pipelines so that this data can be collected more easily could be an important way to support technical research into AI alignment, as well as lay the foundation for actually building aligned AIs in the future. If not handled correctly, this work risks making things worse, so this path needs people who can and will change directions if needed.

Sometimes recommended — personal fit dependent

This career will be some people's highest-impact option if their personal fit is especially good.

Review status

Based on a shallow investigation

Why might becoming an expert in data collection for AI alignment be high impact?

We think it’s crucial that we work to positively shape the development of AI, including through technical research on how to ensure that any potentially transformative AI we develop does what we want it to do (known as the alignment problem). If we don’t find ways to align AI with our values and goals — or worse, don’t find ways to prevent AI from actively harming us or otherwise working against our values — the development of AI could pose an existential threat to humanity.

There are lots of different proposals for building aligned AI, and it’s unclear which (if any) of these approaches will work. A sizeable subset of these approaches require humans to give data to machine learning models, including include AI safety via debate, microscope AI, and iterated amplification.

These proposals involve collecting human data on tasks like:

Evaluating whether a critique of an argument was good
Breaking a difficult question into easier subquestions
Examining the outputs of tools that interpret deep neural networks
Using one model as a tool to make a judgement on how good or bad the outputs of another model are
Finding ways to make models behave badly (e.g. generating adversarial examples by hand)

Collecting this data — ideally by setting up scalable systems to both contract people to carry out these sorts of tasks as well as collect and communicate the results — could be a valuable way to support alignment researchers who use it in their experiments.

But also, once we have good alignment techniques, we may need AI companies around the world to have the capacity to implement them. That means developing systems and pipelines for the collection of this data now could make it easier to implement alignment solutions that require this data in the future. And if it’s easier, it’s more likely to actually happen.

What does this path involve?

Human data collection mostly involves hiring contractors to answer relevant questions and then creating well-designed systems to collect high-quality data from them.

This includes:

Figuring out who will be good at actually generating this data (i.e. doing the sorts of tasks that we listed earlier, like evaluating arguments), as well as how to find and hire these people
Designing training materials, processes, pay levels, and incentivisation structures for contractors
Ensuring good communication between researchers and contractors, for example by translating researcher needs into clear instructions for contractors (as well as being able to predict and prevent people misinterpreting these instructions)
Designing user interfaces to make it easy for contractors to complete their tasks as well as for alignment researchers to design and update tasks for contractors to carry out
Scheduling workloads among contractors, for example making sure that when data needs to be moved in sequence among contractors, the entire data collection can happen reasonably quickly
Assessing data quality, including developing ways of rapidly detecting problems with your data or using hierarchical schemes of more and less trusted contractors

Being able to do all these things well is a pretty unique and rare skill set (similar to entrepreneurship or operations), so if you’re a good fit for this type of work, it could be the most impactful thing you could do.

Avoiding harm

If you follow this path, it’s particularly important to make sure that you are able to exercise excellent judgement about when not to provide these services.

We think it’s extremely difficult to make accurate calls about when research into AI capabilities could be harmful.

For example, it sounds pretty likely to us that work that helps make current AI systems safe and useful will be fairly different from work that is useful for making transformative AI (when we’re able to build it) safe and useful. You’ll need to be able to make judgements about whether the work you are doing is good for this future task.

We’ve written an article about whether working at a leading AI lab might cause harm, and how to avoid it.

If you think you might be a good fit for this career path, but aren’t sure how to avoid doing harm, our advising team may be able to help you decide what to do.

Example people

Long Ouyang

After majoring in psychology, Long went on to do a PhD in cognitive psychology at Stanford. His research was at the intersection of psychology and machine learning. During his PhD, Long was convinced that it would be valuable to contribute to work on AI safety. He got a grant from the Future of Life Institute to research psychology and intent alignment. However, Long found it difficult to self-motivate in this research; as an entirely independent researcher, he felt too disconnected from important things going on elsewhere.

At the time, OpenAI was hiring social scientists to help with AI safety via debate. While the work ended up going in a different direction, Long was useful to the OpenAI safety team because of his experience in machine learning. At one point, the safety team started discussing how it would be useful to have a cognitive psychologist around to help collect human data, and Long volunteered himself for this new role. He now works as a research scientist doing human data collection at OpenAI.

How to predict your fit in advance

The best experts at human data collection will have:

Experience designing surveys and social science experiments
Ability to analyse the data collected from experiments
Some familiarity with the field of AI alignment
Enough knowledge about machine learning to understand what sorts of data are useful to collect and the machine learning research process
At least some front-end software engineering knowledge
Some aptitude for entrepreneurship or operations

Data collection is often considered somewhat less glamorous than research, making it especially hard to find good people. So if you have three or more of these skills, you’re likely a better candidate than most!

How to enter

If you already have experience in this area, there are two main ways you might get a job as a human data expert:

Find jobs at organisations working on alignment, particularly those doing empirical alignment research. For example, OpenAI, DeepMind, Anthropic, Redwood Research, and Ought are all good choices. (As March 2022, Anthropic is hiring for these sorts of roles.) Surge AI is a startup that is also carrying out this sort of work.
Consider founding an organisation to do this work, as suggested by alignment researcher Beth Barnes. However, in 2022, Matt Putz and Rudolf Laine tried to start an organisation working on this but thought there wasn’t sufficient demand. They wrote about why they didn’t found a human data for alignment organistion. If you are interested in founding an organisation, contact our team.

If you don’t have enough experience to work directly on this now, you can gain experience in a few ways:

Do academic research, for example in psychology, sociology, economics, or another social science.
Work in human-computer interaction or software crowdsourcing.
Work for machine learning companies in labelling teams — and because these roles are less popular, they can be a great way to rapidly gain experience and promotions in machine learning organisations.

The Effective Altruism Long-Term Future Fund and the Survival and Flourishing Fund may provide funding for promising individuals to learn skills relevant to helping future generations — including human data collection. As a way of learning the necessary skills (and directly helping at the same time), you could apply for a grant to build a dataset that you think could be useful for AI alignment. The Machine Intelligence Research Institute has put up a bounty for such a dataset.

Find a job in this path

If you think you might be a good fit for this path and you’re ready to start looking at job opportunities, you may find relevant roles on our job board:

View all opportunities

Want one-on-one advice on pursuing this path?

If you think this path might be a great option for you, but you need help deciding or thinking about what to do next, our team might be able to help.

We can help you compare options, make connections, and possibly even help you find jobs or funding opportunities.

APPLY TO SPEAK WITH OUR TEAM

Learn more about data collection for AI alignment

Why we’re not founding a human-data-for-alignment org
Our problem profile on positively shaping the development of AI
The 80,000 Hours Podcast on Artificial Intelligence (a collection of 10 key AI episodes from our podcast)
Our career review of AI safety technical research
AI safety needs social scientists by Geoffrey Irving and Amanda Askell

Machine learning (Topic archive) - 80,000 Hours

Nathan Labenz on the final push for AGI, understanding OpenAI’s leadership drama, and red-teaming frontier models

Software and tech skills

Key facts on fit

Why are software and tech skills valuable?

What does a career using software and tech skills involve?

How to evaluate your fit

How to predict your fit in advance

How to tell if you’re on track

How to get started building software and tech skills

Independently learning to code

Attending a coding bootcamp

Studying at university

Doing internships

AI-assisted coding

Building a specialty

Once you have these skills, how can you best apply them to have an impact?

Find jobs that use software and tech skills

Career paths we’ve reviewed that use these skills

Plus, join our newsletter and we’ll mail you a free book

Nita Farahany on the neurotechnology already being used to convict criminals and manipulate workers

Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his four-part playbook for AI risk

Lennart Heim on the compute governance era and what has to come after

AI governance and coordination

Recommended

Review status

Why this could be a high-impact career path

What kinds of work might contribute to AI governance?

Government work

Research on AI policy and strategy

Industry work

Advocacy and lobbying

Third-party auditing and evaluation

International work and coordination

US-China

Other governments and international organisations

How policy gets made

Agenda setting

Policy creation and development

Implementation

Examples of people pursuing this path

How to assess your fit and get started

Testing your fit

Types of career capital

Generally useful career capital

Policy-related career capital

Technical career capital

Other specific forms of career capital

Want one-on-one advice on pursuing this path?

Where can this kind of work be done?

How this career path can go wrong

Doing harm

Burning out

What the increased attention on AI means

Read next

Learn more

Top recommendations

Further recommendations

Plus, join our newsletter and we’ll mail you a free book

AI safety technical research

Pros

Cons

Key facts on fit

Recommended

Review status

Why AI safety technical research is high impact

Want to learn more about risks from AI? Read the problem profile.

What does this path involve?

What does work in the empirical AI safety path involve?

What does work in the theoretical AI safety path involve?

Some exciting approaches to AI safety

What are the downsides of this career path?

How much do AI safety technical researchers earn?

Examples of people pursuing this path

How to predict your fit in advance

How to enter

Learning the basics

Learning to program

Learning the maths

Learning basic machine learning