Our mistakes
We continuously strive to learn from our actions and improve our practices. The mistakes page represents a selection of lessons learned from our self-evaluations, but it is not exhaustive. We may do further updates to this page in the light of new reflections or developments.
Table of Contents
- 1 Two-year review 2021–2022
- 2 Annual review Nov 2020
- 2.1 We previously caused our most engaged readers to believe we were much more focused on AI — to the exclusion of other priorities — than we actually are
- 2.2 Several teams didn’t sufficiently prioritise capacity building through hiring and we missed our goal of a 2.5 net increase in staff
- 2.3 We had forms asking for user information on our website that were confusing and sometimes were not monitored even though they implied we might speak one-on-one with people who filled them out
- 2.4 We allowed people to find our private podcast feed
- 2.5 We removed the newsletter call to action (CTA) on our homepage
- 3 Annual review Dec 2019
- 3.1 We were overly credulous about how easy it is to cause career changes, and our investigations into these claims were insufficiently skeptical and thorough
- 3.2 We haven’t focused enough on our effects on community culture
- 3.3 We didn’t make it clear enough that there are high-impact jobs outside of organisations in the effective altruism community
- 3.4 We should have been more proactive in communicating about which career-related services we don’t intend to provide
- 3.5 Our CEO led on too much of the office set up
- 4 Annual Review 2018
- 5 Annual review Dec 2017
- 5.1 People misunderstand our views on career capital
- 5.2 Not prioritising diversity highly enough
- 5.3 Set an unrealistic IASPC target
- 5.4 Not increasing salaries earlier
- 5.5 Rated-1 plan changes from online content not growing
- 5.6 Accounting behind
- 5.7 Not being careful enough in communication with the community
- 5.8 Poor forecasting of our coaching backlog
- 6 Annual review Dec 2016
- 6.1 Challenges hiring
- 6.2 The book launch was delayed over a month, and might have had a smaller reach
- 6.3 Produced fewer high-value career reviews and problem profiles than planned
- 6.4 Too many competing priorities
- 6.5 Growth of high-value plan changes slower than medium-value
- 6.6 Many people didn’t get responses from [email protected] for 6 months, affecting about 70 emails.
- 7 Annual review June 2014 – April 2015
- 8 Annual review April 2013 – May 2014
- 9 Six Month Review Dec 2012 – March 2013
- 10 Six Month Review June – Nov 2012
Two-year review 2021–2022
Our content about FTX and Sam Bankman-Fried
Prior to the collapse of FTX in November 2022 and the subsequent indictment of its CEO, Sam Bankman-Fried, we had held Sam up as a positive example of earning to give on the 80,000 Hours website. We also interviewed Sam on our podcast.
To say the least, we no longer believe Sam is a positive example for our readers. We have removed most references to him on our site, but we have preserved the ‘reader story’ we wrote about him and the podcast episode page with notes about the collapse of FTX for the public record.
We believe we shouldn’t have prominently featured Sam on the website. Featuring him in this way is a mistake we deeply regret and feel humbled by.
We have now updated our vetting procedures for these types of stories. We also regret some elements of the way we handled the podcast episode, which you can read more about here.
More generally, the collapse of FTX has led us to ask many challenging questions that we are continuing to wrestle with about best practices in effective altruism, moral philosophy, and the potential for harm in careers. We have begun the process of updating some of our articles and advice in light of these events.
Here’s a message our Interim CEO Brenton Mayer wrote about FTX in February 2023:
We’ve been appalled to learn about what went on at FTX and Alameda Research.
Previously, we had pointed to Sam Bankman-Fried as a prominent and positive example of someone who was earning to give. We now deeply regret putting our trust in him. Personally, I found it jarring and disorienting to read the media reports on the FTX bankruptcy, and to hear the allegations that they had exposed customers to such high levels of risk, intentionally, for as long as they had. We have updated the site to reflect the change in our view of Sam, and we’re continuing to grapple with the lessons we should learn as an organisation from these events.
While 80k’s views on the implications are still in flux, you can find some of them in our statement, our podcast, and our newsletter. You can read early thoughts from Rob on Twitter and Michelle on the EA Forum.
An issue with an advising call on advancing capabilities in biology
We have become aware that we gave advice in a one-on-one call prior to 2019 that was too sanguine about the risks associated with some research in biology. This led to the advisee publishing some research to advance their career that they later regretted because they think it may contribute to some of the groundwork for dangerous technologies that could be developed in the future.
Multiple abandoned office searches
Over the past five years, we have used hundreds of hours of staff time to search for office space that we think was poorly spent. On three occasions, we came close to signing a lease, but then backed out.
These were when:
- We decided to move countries.
- We got a second opinion on how long it would take to refurbish an office and decided it was unacceptably long.
- A more careful accounting of the full costs of office space made us realise that the extra space we were considering wasn’t worth it. (In this case, we responded by deciding to delay the office move, so that we’d go through a period of the office being more crowded.)
The errors we made here included being insufficiently confident in high-level strategic assumptions and leaving interrelated areas of responsibility underspecified.
Mistakes in estimating our financial needs
At the end of 2020, we made projections of our expected spend and fundraised for an amount which we thought would allow us to end 2022 with 12 months of runway.
In these calculations, we underestimated our spending in some places without sufficient compensating overestimates elsewhere:
- We applied a % increase to our 2022 spend to calculate our 2023 spend that was lower than our historical standard (and lower than what we now expect).
- We didn’t predict an increase in office spending. (Whereas in fact we did have an increase, driven mostly by modifications we made to accommodate more people in our office).
- We underestimated the 2021 → 2022 salary increases (estimated 7%, but the changes were 15%).
As a result, we decided to raise $1.25m for general support in mid-2022.
In 2021, the accounting software that produced an estimate of our financial reserves failed. We didn’t realise the error for some months, in part due to poorly clarified responsibilities.
Lack of clarity about house views
The web team delayed settling on a policy to determine what counts as a 80,000 Hours ‘house’ view, distinct from the views of individual authors of articles, and how to communicate this to our audience. As the organisation has grown, we have had more heterogeneity among the views of our staff, which has made this issue more important to resolve. The director of the web team has made it a project in 2023 to develop more clarity on the idea of 80,000 Hours’ house views and how, if at all, they should be determined.
Mistakes in quality assurance
As we ramped up marketing efforts in 2022, we made several technical errors in our outreach, including:
- We offered a free book as an incentive to participate in a survey, but we provided insufficient information for the recipients to properly claim the book.
- One email provided incorrect instructions on how to claim a book in our free book giveaway, which ~3,250 people clicked on; when we discovered the error, we emailed them to provide the correct details.
- It also appears that an error allowed 1,700 people to get free books through our book giveaway without signing up for the newsletter. This wasn’t supposed to happen, but it’s not clear if it was actually a bad outcome because the orders appear to have been legitimately from individuals who wanted books, and we think reading the books is a good use of people’s time.
We do expect some error rate when handling large amounts of outreach. Some of these errors may be attributable to the expected learning curve of having a relatively new staffer running a major programme.
Additionally, as part of our user survey, we made a coding error that prevented a significant segment of the audience to miss out on 14 key questions. We sent a follow-up email apologising for the error and directing the users to the questions they had missed.
Annual review Nov 2020
We previously caused our most engaged readers to believe we were much more focused on AI — to the exclusion of other priorities — than we actually are
Note: our content no longer gives the impression that we are more focused on AI than we actually are. In fact, as of July 2022, our recent content seems more likely to have underemphasised our view on how high priority this is. We both think other issues are important for people to work on and think AI safety is probably the most pressing problem of our time. You can read our views in further detail in on our problem profiles page and our AI problem profile.
Over the course of ~2017–2019, we gave the impression that we were very heavily focused on AI, to the exclusion of other promising longtermist areas. This showed up in the way many of our readers talked and wrote about what we believe. As one among many examples, a November 2019 EA Forum post described us as having “a single, narrow focus on recruiting people to AI safety.”
In early 2020, Brenton did an informal poll of several highly engaged members of the EA community on what they thought 80,000 Hours’ view was on the importance of AI relative to other longtermist priorities, and then asked staff what their views actually were.
We found that the community members thought we valued AI careers 2–5x more highly relative to other longtermist priorities than we did. We also found that the community members expected us to prefer the EA community to allocate around twice the fraction of its labour towards AI than we actually did.
(Our current views line up reasonably well with the mean response from the 2019 EA Leaders Forum).
How we’ve aimed to fix this problem
We’ve done a combination of the following (though note several of these were already being considered at the point of the above investigation):
- Airing concerns about EA’s focus on AI, especially through our podcast episode with Ben Garfinkel.
Increasing the prominence of other paths and priorities. The most important example was releasing new, broader lists of career paths and problems we’d be excited for our readers to explore. Rob’s Facebook and EA Forum post on considering a wider range of jobs/problems promoted these paths, as did our lists of other promising problem areas and career paths.
We generally find it difficult to communicate about and fix this kind of mistake because it is so easy to create a new set of problems by causing people to overcorrect. For example, the most straightforward way to address this issue would have been to write a blog post saying something like “80,000 Hours believes the majority of EAs should not be working on AI.” However, a similar post we wrote in the past about earning to give created a much stronger meme than we intended. Such an overreaction would be costly in the case of AI safety, which remains our top priority problem, so we’ve taken the more moderate steps mentioned above.
Several teams didn’t sufficiently prioritise capacity building through hiring and we missed our goal of a 2.5 net increase in staff
We only made one hire in our 2020 metric year (Luisa Rodriguez, who is planning to start in summer 2021). This meant that we missed our goal of a 2.5 net increase in staff. We think this miss was partly caused by insufficiently prioritising hiring and by making some mistakes in our hiring processes.
Internal systems
The internal systems team ran a hiring round in late 2019 and another in early 2020. Neither round led to a hire. We now think we targeted the wrong candidates by focusing on people who had experience in either office management or as personal assistants.
Instead, we should have deprioritised prior experience and looked for people who were both unusually capable and strongly aligned with our mission, which would allow us to give them large amounts of responsibility over time. We made these changes to our hiring process and then hired Sashika in September 2020.
Tech team
Our tech lead Peter Hartree was spread thin over 2019–2020, and our rate of output on product development was reduced because of his split focus. We considered hiring to increase tech and web product design capacity several times during this period. We met several candidates in 2019 and trialled two contractors in autumn 2019. We trialled a contractor in November 2020, and ran a hiring round in October 2020 which led to an offer that was not accepted in January 2021.
In retrospect, it seems like we should have run a developer hiring round in the first half of 2019, and/or run a hiring round in autumn 2019 instead of trialling contractors. We now expect our tech capacity in 2021 to be lower than we’d like, especially because Peter is planning to move on from 80,000 Hours this year.
Note that while we think we made mistakes in this case, we do not believe it’s always a mistake to miss a hiring target, and we would rather miss one than make a mediocre hire.
We had forms asking for user information on our website that were confusing and sometimes were not monitored even though they implied we might speak one-on-one with people who filled them out
What happened?
Around 2017 we put a form on our website so that users interested in a role as a China Specialist could apply to get one-on-one advising from our contractor who is an expert in the area. Unfortunately, the relevant text said “speak to us,” which did not make it clear that the application was to speak with our contractor and not somebody on staff. Moreover, when we onboarded our contractor in 2017, we did not have a policy of responding to applicants we chose not to speak to, and we were not monitoring this arrangement closely enough to remember to update him when our policy changed. In mid-2020, we learned that this led at least one user (and probably more) to incorrectly believe they had applied to and been rejected from our main one-on-one advising program.
While investigating this mistake, we discovered around five forms on our website whose results we weren’t carefully monitoring, though these were on less prominent pages and did less to indicate we would get back to the person and talk to them (for example, one simply said we would record their interest and get back to them if we came up with a role suited to them).
How were our users affected?
We received around 200 responses to the China Specialist form before we noticed this problem. Some proportion of these users probably believed they had applied and been rejected for our main advising program without receiving a response. This may have had a negative effect on their morale and could even have discouraged some from getting involved with effective altruism. It probably reflected poorly on 80,000 Hours, and it probably prevented some otherwise excellent applicants from applying for advising.
In total, several hundred users filled out these other forms. We’re not sure how many people were affected because we don’t know the exact dates during which we were not monitoring them.
What did we do about it?
We think that it’s important to respect our audience’s time. Leaving forms on our site which we weren’t monitoring regularly was a failure to do so.
We changed the China Specialist form so that it links to our main advising application. If our contractor seems like a more appropriate advisor for a particular person, we forward the application to them. We’ve also asked our contractor to start sending rejection emails and to make sure those emails are clear that he’s the only person who has looked at the application. He’s also going to go through the responses to the form and flag anyone he thinks we should talk to.
We removed the other forms from our site and replaced them with links to apply directly for our advising program.
We allowed people to find our private podcast feed
To make guests more comfortable on the podcast, we promise them a chance to review their episode before it’s released, and we agree to remove any parts of their episode that they request we cut.
Prior to getting the guests’ approval, we post rough cuts on a private feed so that staff can listen, provide feedback, and suggest edits. In mid-2020 it was temporarily possible to find this private feed by searching for 80,000 Hours on the podcast app Podcast Addict, and some people found and downloaded a rough cut of one episode, which included some material we were asked to cut from the public version.
How did this happen?
Libsyn, our podcast hosting platform at the time, told us that it would not be possible for listeners to search for our private feed, and the only way to access it was to know the exact URL. This turned out to be false. We’re not sure exactly what we should have done differently given what we knew at the time, but some things we may have gotten wrong include: 1) trusting Libsyn, which we thought we could rely on because it’s the biggest podcast platform; 2) being insufficiently concerned about people typing in the exact URL; and 3) failing to conduct enough research to find a platform that could provide better privacy measures.
What did we do about it?
Libsyn could not offer us a truly private option so we moved to Transistor, a subscription-based private feed. We also notified and apologised to the affected podcast guest.
We removed the newsletter call to action (CTA) on our homepage
We removed the newsletter CTA from our homepage in April 2019, thinking it was probably mainly attracting subscribers who weren’t very engaged while having negative effects on our brand and on user experience. We knew this change would lead to a big drop in newsletter subscriptions and were monitoring this but we did not check for an effect on newsletter engagement until ~8 months later.
In November 2019 we found evidence that, contrary to our expectations, people who subscribed through the splash CTA actually did open our emails at a similar rate to other subscribers. We also learned that appeals to complete our impact survey delivered via the podcast were much less effective than appeals sent via the newsletter. We now think removing the CTA was probably a mistake, though we’re still uncertain about the size of the brand and UX effects.
We think we would have made better decisions here if we spent more time analysing our website analytics. In a postmortem, Peter Hartree thought that he was overconfident that this ‘growth hack’ was only generating relatively low-value signups and made an ex ante mistake by not doing more thorough analysis before or soon after removing it. We had previously noted that Hartree was spread too thin over many responsibilities and we perhaps should have addressed this sooner.
This mistake was potentially costly, as removing the CTA reduced our newsletter signup rate by about 40% for ~8 months. This also may have had some negative effects on the quality of our impact evaluation because people who make plan changes due to our work are much more likely to fill out our impact survey if they’re subscribed to the newsletter.
How we’ve aimed to fix this problem
We restored the newsletter appeal to our home page in November 2019. In late 2020, we decided to invest more in marketing, including making Peter McIntyre our head of growth. More marketing capacity will partly be spent on monitoring and optimising web analytics, so we expect that investing more in it will reduce the chance of similar mistakes in future.
Annual review Dec 2019
See also our 2019 annual review for a discussion of some of our biggest problems, mistakes identified in 2019, weaknesses and bottlenecks, strategic uncertainties, and arguments against funding 80,000 Hours.
We were overly credulous about how easy it is to cause career changes, and our investigations into these claims were insufficiently skeptical and thorough
This is the most important mistake we found out about in 2019.
The problems with our impact evaluation
Ajeya Cotra – a senior research analyst at Open Philanthropy – followed up with some people who made some of the top plan changes mentioned in our 2018 review, and found that when asked more detailed questions about the counterfactual (what would have happened without 80,000 Hours), some of them reported a significantly smaller role for 80,000 Hours than what we claimed in our evaluation.
This prompted us to review our impact evaluation system and investigate our previously-claimed IASPC. We now think that on average we were too optimistic about how easy it is to cause a ‘trajectory change’ (i.e. a change to the long-term direction of their career that wouldn’t have happened otherwise), especially for people who already had some ties to the EA community (since the EA community often helps people shift careers separately from our work). As a result, five former rated-100+ plan changes did not meet the new standard. We feel unsure about how big the update should be, though we believe our current evaluation uses a standard similar to Ajeya’s, who estimated that our impact on the tracked plan changes was overstated by a factor of two in the 2018 evaluation.
There were also multiple other problems with our impact evaluation:
- When evaluating speed-ups (the extent to which we accelerated a change that would have happened otherwise), we didn’t properly account for the career capital people would have gained in the meantime.
- Our estimates of opportunity cost were conceptually confused in a way that probably biased our cost-effectiveness estimates upwards.
- We were often unclear in our case studies. Often, our claims seemed more optimistic than we intended.
- IASPC ratings (the metric of our old system) were sometimes inconsistent, since our standards for assessing changes weren’t clear enough. (For instance, as mentioned last year, we think the standard for a rated-10 plan change increased over time.)
- Our investigation of top plan changes was not as thorough in general as it should have been.
- We invested significant effort in the IASPC system even though many of the donors didn’t find it useful, and it often was not that useful for internal prioritisation either.
How we’ve aimed to improve the impact evaluation system
We’ve aimed to make progress in addressing the above by investing ~1 FTE over 2019 in creating two new plan change evaluation systems.
The ‘criteria-based’ system aims to cover a wider range of plan changes in a way that’s quicker to rate, has greater interrater reliability, is more transparent to donors, and is less ‘laggy’. (Unfortunately, we haven’t finished implementing this system.)
In the ‘top plan change’ system we do a deeper evaluation of the changes that account for the majority of the expected value of plan changes.
In both systems, we aim to be more sceptical about how easy it is to cause a trajectory change, though this is most important for the top plan changes. Partly, this involves requiring a higher standard of evidence to conclude that a trajectory change is likely; for instance, if someone says they made a trajectory change due to 80k, we’re now very unlikely to make the mistake of taking it at face value. We’ve also aimed to improve the process to remind us of this. For instance, we’re working on a new case study template that asks in more detail about what other influences were important in the plan change and what would have happened otherwise.
There are still many challenges in applying the new system, and we’ll keep working on it in 2020. For instance:
- Rating top plan changes requires many judgement calls, so it’s difficult to ensure consistency across the team. (Though we think the clearer framework and greater level of investigation and discussion is an improvement on our old system.)
- Criteria-based plan changes are still laggy, and we’re not sure how well they proxy top plan changes, so we might still not have a good metric for prioritisation.
- It seems like the new system is still not tracking a substantial fraction of our impact, which could lead us to prioritise poorly.
- Implementing the system takes significant time, and it’s unclear how useful it is compared to other methods of evaluation.
We haven’t focused enough on our effects on community culture
80,000 Hours has significant effects on the overall culture of effective altruism, in part because it seems to be one of the largest ways new people get involved. However, we’ve mainly focused on addressing skill bottlenecks and getting plan changes and not on these community-wide effects.
We thought it would be higher impact to heavily focus on the people currently best placed to fill the highest priority skill bottlenecks (some of whom are in the effective altruism community but many of whom are not) and that other organisations were in a better position to guide the community. We also had a potential concern that if we focused heavily on the community itself, we’d create a perception that only EAs were in our target audience and our brand might start to feel unwelcoming to people currently outside the community who could make big contributions to priority problems. But in retrospect it seems like we may be among the best placed groups to help address certain problems that the community faces.
Moreover, some of our external advisors have concerns that some of our effects on culture are negative. We’ve made a list of such concerns in the appendix to our 2019 annual review.
It seems plausible that in the long-term, although hard to measure, effects on EA culture will be more important than short-term plan changes.
For this reason, we’ve started to focus somewhat more on cultural effects when evaluating which projects to work on.
We’re still unsure which cultural values are most important to uphold or promote at the margin, and would like to discuss it more with other community members.
Some ideas raised at the last leaders forum include (not comprehensive): more warmth and appreciation; promoting more ‘balance,’ i.e. emphasising having a rich life outside of doing good; and being more ‘outward facing,’ which could have many facets, such as exploring more problem areas and paths currently outside the mainstream of effective altruism, or making more use of external expertise. We’ve started working on content that helps with some of these, and are considering other ideas.
We didn’t make it clear enough that there are high-impact jobs outside of organisations in the effective altruism community
We noticed last year that people in the effective altruism community seemed to be focused too much on jobs at effective altruism organisations, given the number of positions, which may have been in part due to miscommunications we’d made. We wrote this post that made the point. However, this was not enough to head off the problem, leading to threads such as this one and this one on the EA Forum.
Instead, we should have written more short posts about the topic and talked about it on the podcast. We have now recorded some episodes that address this issue. We’d also like to have more coverage of other career paths in general, and have been working on this, though it will take much longer.
We also expanded the number of jobs on the job board, almost entirely by listing more jobs outside of EA organisations.
We should have been more proactive in communicating about which career-related services we don’t intend to provide
We’ve noticed that many people in the community are uncertain about which activities 80,000 Hours intends to pursue (e.g. whether we’ll advise undergraduates, how much we’ll cover global health & factory farming etc.). Sometimes 80,000 Hours gets seen as the org that will cover all EA career advice.
We think we’ve been clear in our annual reviews about what we have scope to cover, but we should have also discussed the topic on our blog and elsewhere. This might have encouraged other groups to fill these gaps sooner.
We’ve drafted a blog post making our focus clearer, but we failed to adequately prioritise releasing it. We released it in April, 2020.
Our CEO led on too much of the office set up
Ben ended up overseeing the office search, lease negotiation and management of the fit out team.
It seems likely Ben could have delegated at least some of these steps more heavily to other team members. This would have likely resulted in the process going more slowly (we moved into the office in London within ~3 months of when we started searching, which seems fast), and perhaps in a somewhat worse result. We also didn’t have a natural candidate to lead the process at the time.
However, it cost perhaps 200 hours of Ben’s time, and was often distracting and somewhat demotivating since it reduced time for writing. If it had been more delegated, we could have made faster progress on research, writing, impact evaluation, or strategy. Overall, this seems like it was the wrong call, and Ben will try harder to delegate similar tasks in the future.
Annual Review 2018
See also our 2018 discussion of weaknesses of 80,000 Hours and risks of expansion.
We still haven’t updated our writing on career capital
As we noted at the end of 2017, we haven’t done a good enough job of communicating our views on career capital, and especially how they have changed.
We intended to write more about this issue, then update the career guide. We have drafted two major articles on career capital, but didn’t publish any of them due to a shortage of writing time from Ben. In the meantime, our old article was still mis-representing our views, and this was rightly criticised on the EA Forum.
We’ve added a note to the top of the career capital page in the career guide, but we should have done this a year earlier. We’ll continue to prioritise writing about this topic in 2019.
This mistake also points towards a broader issue. We have hundreds of pages of old content, but currently only the equivalent of 2 full-time staff working on content (about 1 of which is focused on the podcast, leaving just 1 for written content), so it’s easy for our old advice to get out of sync with our views. For instance, the career quiz from 2015 often doesn’t return useful results, so we’ve added a disclaimer to it and removed most links to it. [Edit: As of 2020, the quiz has been taken down.]
Going forward, we plan to add an automatic warning that will appear on old articles saying they might not represent our views. The warning will have to be actively overridden if we think the old article is still accurate.
There are also some other areas where our core advice is not presented clearly enough on the site, and we intend to fix that with the key ideas series.
Not writing a summary of the key ideas series earlier
One of our main aims over 2018 was to make the site better appeal to our core audience of talented graduates focused on having a large social impact. In particular, we think the career guide written in 2016 was not at the right level for much of this audience. To fix this, we’ve been writing a ‘key ideas’ series to replace the career guide.
However, in the autumn we realised we could start by writing a summary instead, and that this, combined with existing articles, would be good enough to replace the career guide right away. This format might also act as a better introduction to and crisp communication of our most important advice than a longer series.
We could have probably realised this earlier in the year if we had spent more time thinking about how to minimise new writing, which we have very little capacity for right now. This could have meant addressing the main problem with our site a year earlier.
Moving from the UK to the San Francisco Bay Area in 2016
In 2019 we moved from the SF Bay Area to London.
Although we received many benefits from being based in the Bay Area, our return to the UK suggests that it may have been a mistake to leave it in 2016.
Hiring in the USA wasn’t obviously better (contrary to our expectations) and we should have probably put more weight on the long-term preferences of the senior staff in deciding where to be based.
One major element in our decision to move to London was concluding that having offices in both cities would be a bad idea. If we had looked into this question earlier in 2018, we could have saved a lot of senior staff time looking for an office in SF before deciding not to stay there.
Smaller mistakes and issues
(Not in order)
- We should have put additional effort into figuring out what level of computer science ability is needed to be a productive AI safety researcher. In early 2018, we spoke to some key people in AI safety who think that people should be able to get into top ~3 world-ranked machine learning PhD programmes in order to pursue this option. Our AI safety career review now suggests that people should be able to get into a top 10 programme, but we’re still unsure of the ideal recommendation for us to make. This means there were some people we spoke to in 2017 who are following old recommendations, but we’re insufficiently sure of our new recommendations to suggest they switch. Fortunately, ML graduate study (what most of these people are doing) is good preparation for AI policy, ML engineering and earning to give, so they still have good back-up options. We could have avoided this if we’d done more to understand the right profile of person for this path ahead of our 2017 advising. We intend to prioritise this kind of information within the research we do over 2019.
- We contributed to confusion in the community by using the term ‘talent gaps’. See our article explaining the misunderstanding and our proposed solution (talk about specific skill bottlenecks instead).
- It may have been inefficient to invest so heavily in content about AI policy before that career path had been clearly framed by its pioneers. Whenever we wanted to write about AI policy publicly, we ran into difficult and controversial issues about how it should be framed in the community. Although these discussions resulted in progress on framing, it meant we had less output in this area than expected.
- Typeform, a third party service that hosts some of our web forms, suffered a data breach in July 2018 which affected some of our users. In our July 2017 security audit, we flagged the fact that we were storing sensitive data with Typeform even though we did not highly trust their security team. We considered migrating to another service at the time, but after investigating the options decided it was not worth the cost. We do not think this was a mistake ex ante, but others might disagree. Since the incident we have further tightened our selection criteria for third-party software, continued conducting regular security audits, and implemented dozens of further measures to protect user data.
- We should have been more sceptical about the relevance of the results in our talent survey, especially concerning the value of recent hires and discount rates. We don’t think this changes our core advice, but we should rely less on the figures in justifying our positions, and make it clearer that others shouldn’t rely on them either. We’ve updated the survey results blog post to be clearer about their weaknesses. For instance, we gave too little thought to the scenario in which hiring and managing new recruits absorbs a lot of senior staff time, which significantly offsets its benefits.
- We think we let our standard for what qualifies as a rated-10 plan change drift up this year, which led to an artificial reduction in our number of plan changes reported from this group in 2018. After we realised this error we re-scored several plan changes, but expect that the number reported in this review is too low by around 20%. We’re working on writing up a detailed guide on how to score plan changes with lots of examples, which should increase our consistency in future.
- On two occasions we said we’d review advising applications more quickly than we were able to. We’ll aim to be more conservative with these commitments in future.
Annual review Dec 2017
People misunderstand our views on career capital
In the main career guide, we promote the idea of gaining “career capital” early in your career. This has led to some engaged users to focus on options like consulting, software engineering, and tech entrepreneurship, when actually we think these are rarely the best early career options if you’re focused on our top problems areas. Instead, it seems like most people should focus on entering a priority path directly, or perhaps go to graduate school.
We think there are several misunderstandings going on:
- There’s a difference between narrow and flexible career capital. Narrow career capital is useful for a small number of paths, while flexible career capital is useful in a large number. If you’re focused on our top problem areas, narrow career capital in those areas is usually more useful than flexible career capital. Consulting provides flexible career capital, which means it’s not top overall unless you’re very uncertain about what to aim for.
You can get good career capital in positions with high immediate impact (especially problem-area specific career capital), including most of those we recommend.
Discount rates on aligned-talent are quite high in some of the priority paths, and seem to have increased, making career capital less valuable.
However, from our career guide article, some people get the impression that they should focus on consulting and similar options early in their careers. This is because we put too much emphasis on flexibility, and not enough on building the career capital that’s needed in the most pressing problem areas.
We also enhanced this impression by listing consulting and tech entrepreneurship at the top of our ranking of careers on this page (now changed), and they still come up highly in the quiz. People also seem to think that tech entrepreneurship is a better option for direct impact than we normally do.
To address this problem, we plan to write an article in January clarifying our position, and then rewrite the main guide article later in the year. We’d also like to update the quiz, but it’s lower priority.
We’ve had similar problems in the past with people misunderstanding our views on earning to give and replaceability. To some extent we think being misunderstood is an unavoidable negative consequence of trying to spread complex ideas in a mass format – we list it in our risks section below. This risk also makes us more keen on “high-fidelity” in-person engagement and long format content, rather than sharable but simplified articles.
Not prioritising diversity highly enough
Diversity is important to 80,000 Hours because we want to be able to appeal to a wide range of people in our hiring, among users of our advice, and in our community. We want as many talented people as possible working on solving the world’s problems. A lack of diversity can easily become self-reinforcing, and if we get stuck in a narrow demographic, we’ll miss lots of great people.
Our community has a significant tilt towards white men. Our team started with only white men, and has remained even more imbalanced than our community.
We first flagged lack of team diversity as a problem in our 2014 annual review, and since then we’ve taken some steps to improve diversity, such as to:
- Make a greater effort to source candidates from underrepresented groups, and to use trial work to evaluate candidates, rather than interviews, which are more biased.
- Ask for advice from experts and community members.
- Add examples from underrepresented groups to our online advice.
- Get feedback on and reflect on ways to make our team culture more welcoming, and give each other feedback on the effect of our actions in this area.
- Put additional priority on writing about career areas which are over 45% female among our target age ranges, such as biomedical research, psychology research, nursing, allied health, executive search, marketing, non-profits, and policy careers.
- During our next round of board reform, we’ve found a highly qualified woman who we have asked to join.
- Do standardised performance reviews, make salaries transparent within the team, and set them using a formula to reduce bias and barriers.
- Have “any time” work hours and make it easy to remote work.
- Implement standard HR policies to protect against discrimination and harassment. We adopted CEA’s paid maternity/paternity leave policy, which is generous by US standards.
Our parent organisation, CEA, has two staff members who work on diversity and other community issues. We’ve asked for their advice, and supported their efforts to exclude bad actors, and signed up to their statement of community values.
However, in this time, we’ve made little progress on results. In 2014, the full-time core team contained 3 white men, and now we have 7. The diversity of our freelancers, however, has improved. We now have about 9 freelancers, of which about half are women, and two are from minority backgrounds.
So, we intend to make diversity a greater priority over 2018.
In particular, we intend to make hiring at least one candidate from an underrepresented group to the core team a top priority for the next year. To do this, we’ll put more effort (up to about 5-10% of resources) into improving our culture and finding candidates. We hope to make progress with less investment, but we’re willing to make a serious commitment because it could enable us to hire a much better team over the long-term, and talent is one of our main constraints.
Set an unrealistic IASPC target
As explained earlier, we set ourselves the target of tripling impact-adjusted significant plan changes (IASPC) over the year while also focusing on rated-10 plan changes. However, the IASPC metric wasn’t set up to properly capture the value of these changes, the projects we listed were more suited to rated-1 plan changes, and we didn’t properly account for a 1-3 year lead time on generating rated-10 plan changes. This meant we had to drop the target half-way through the year.
We could have anticipated some of these problems earlier if we had spent more time thinking about our plans and metrics, which would have made us more effective for several months. In particular, we could have focused earlier on specialist content that is better suited to attracting people who might make rated-10 plan changes, as opposed to improving the career guide and general interest articles.
Going forward, we’ll adjust the IASPC metric to contain a 100 and 1000 category, and we’ll think more carefully about how easy it is to get different types of plan change.
Not increasing salaries earlier
This year, we had in-depth discussions with five people about joining the team full-time. Four of them were initially concerned by our salaries, but reassured after they heard about the raise we implemented in early 2017. This suggests we might have missed out on other staff in earlier years. Given that talent is a greater bottleneck than funding, this could have been a significant cost.
It’s not obvious this was a mistake, since we weren’t aware of as many specific cases in previous years, but we were encouraged by several advisors to raise salaries, so it’s possible we could have corrected this earlier.
Looking forward, we expect there are further gains from raising salaries. After living in the Bay Area for about a year, we have a better sense of the cost of living, and our current salaries don’t easily cover a high-productivity lifestyle in the area (e.g. living close to a downtown office). Rent costs have also increased at around 10% per year, which means that comparables we’ve used in earlier years (such as GiveWell in 2012) are out of date. Our salaries are also arguably in the bottom 30% compared to other US non-profits of our scale, depending on how you make the comparison.
Rated-1 plan changes from online content not growing
Even though web traffic is up 80%, the on-going number of people reporting rated-1 and rated-0.1 plan changes from the career guide didn’t increase over the year, and the same is true of newsletter subscribers. This is because traffic to key conversion pages (e.g. the decision tool and the article about the GWWC pledge) has not increased.
This is not surprising given that we haven’t focused on driving more traffic into these pages. Instead, we’ve recently focused on driving people into the coaching applications. However, we had hoped that new traffic would spillover to a greater extent, driving extra growth from rated-1 plan changes.
Accounting behind
The CEA ops team (which we share) was short of staff over the year. This meant that our financial figures were often delayed by 3-6 months.
One problem this caused is that we only had delayed information on what our reserves were through the year, though this didn’t cause any issues this year since we maintained plenty of reserves.
Another problem was that it was hard to track spending, which meant that we didn’t catch overspending on AWS until we had incurred $5,000 of unneeded expenses (though we received a partial refund), which was about 0.7% of our budget. We also mis-paid a staff member by about the same amount due to a confusion about their salary over several months, which we decided not to recoup (in part because we wanted to raise their salary anyway).
To address this, CEA has made several operations hires over the year, increasing capacity (though illness on the team has temporarily reduced capacity again). What’s more, all our accounts have been transferred to new software (Xero), are now up-to-date within 1-2 months, and are easier to check, and we can continue to make improvements to systems. We also intend to allocate more time to checking our spending.
Not being careful enough in communication with the community
Quick comments on the EA Forum or Facebook by staff members will be taken as representing the organisation, creating problems if mistaken or misconstrued. Some comments by Ben and Rob this year ended up causing controversy. Even though the criticism was significantly based on a misunderstanding, and many people defended our comments, others didn’t see the defences, so our reputation was still likely harmed.
The most obvious solution is to raise our bar for commenting publicly, and submit this commenting to more checking, moving in the direction of Holden Karnofsky’s policies. The downside is reduced communication between us and our community, so we don’t intend to go as far as Karnofsky, but we’ve taken a step in that direction. As part of this, we updated our team communications policy and reviewed it with the team.
Poor forecasting of our coaching backlog
In November, our coaching backlog suddenly spiked to over 6 weeks, as we received a large number of applications and had reduced coaching capacity. This meant that in Sept-Oct we spent more time getting coaching applications than was needed, recent applicants had to wait a long time before starting, we’ve committed to coach people with different criteria from what we’d now use, and we’ve had to temporarily close applications.
We could have predicted this if we had made more thorough forecasts of how many hours of coaching time we’d have, and how much time it takes to coach each person.
Going forward, we’re making more conservative and detailed estimates of capacity.
Annual review Dec 2016
Challenges hiring
We had several disruptions to the team in 2015, but we put this behind us in 2016.
Over 2016, the team has worked really well together, morale has been high, and everyone has been highly productive. New hires have all said the quality of the team is a major reason for joining.
Besides hiring Peter and Brenton, who are mentioned in the section on workshops above, we hired Jesse Avshalomov. He was previously the Director of Growth and Product at Teespring, one of the most successful Y Combinator startups, where he led a team of 20, conducted hundreds of marketing & product experiments, and oversaw the growth of the company to 19 million products sold. Before that he ran SEO for the North American Apple Online Store… and did professional opera.
Our main challenge has been finding a freelance web engineer. We did a recruitment round and a trial, but didn’t end up finding someone long-term. Fortunately Peter Hartree, our former developer, is still able to give us 1-2 days per week of support. We intend to try again in 2017, taking advantage of our larger audience, advertising a higher salary, and aiming for someone full-time rather than freelance. We also learned that if we hire a part-time engineer, we should (i) make sure they spend several days talking to our existing engineer to on-board (ii) have experience in remote work and WordPress.
It also took longer to on-board Jesse than we expected, in part due to working remotely and spending 20% time on other projects. This again suggests spending longer on-boarding right at the start, and making it a priority that new staff work in the same place as everyone else for the first month. We’ll also make further efforts to avoid part-time staff in the future.
The book launch was delayed over a month, and might have had a smaller reach
The delay was due to it taking longer to on-board Jesse than we expected (as covered above), spending more time improving marketing for the workshops (which paid off, as covered above), and doing too many things at once (covered below).
Produced fewer high-value career reviews and problem profiles than planned
Roman spent more time on the plan change tracking systems than planned, while Rob spent more time on outreach. We also switched priorities several times (as covered below), which probably hurt research output.
Too many competing priorities
We make a lot of effort to create focused plans, and are much more focused than we used to be, but we still probably switched priorities too many times over 2016, and there were also times when we had too many priorities at once. For instance in spring, the research team switched from career reviews, to supporting articles for the guide, to problem profiles. In October, we were doing campus sign ups, workshops and book promotion at the same time. All this creates switching costs, is less motivating and leads to unfinished work, which contributed to many of the other problems. Some of this was made worse by being in different offices. We’ll continue to emphasise focus when creating our priorities. The plan is also for everyone full-time to be based in the Bay Area, in the same office.
Growth of high-value plan changes slower than medium-value
High-value plan changes grew only 50% compared to 360% growth in medium-value plan changes. This wasn’t a mistake but could be a worrying trend. However, the fact that it’s happening isn’t too surprising since high-value plan changes take several years, so our growth over 2016 depends on efforts made in 2014 when we were much smaller. We expect that a significant fraction (perhaps 10%) of the medium-value plan changes will eventually become high-value plan changes. At the start of 2017, we also intend to especially focus on getting high-value plan changes.
Many people didn’t get responses from [email protected] for 6 months, affecting about 70 emails.
This was due to an error with a new inbox client, which was hard to notice, though could have been found earlier with better testing.
Annual review June 2014 – April 2015
Mistakes concerning our research and ideas
We let ourselves become too closely associated with earning to give.
This became especially obvious in August 2014 when we attended Effective Altruism Global in San Francisco, and found that many of the attendees – supposedly the people who know us best – saw us primarily as the people who advocate for earning to give. We’ve always believed, however, that earning to give is just one strategy among many, and think that a minority of people should pursue it. The cost is that we’ve put off people who would have been interested in us otherwise.
It was hard to avoid being tightly associated with earning to give, because it was our most memorable idea and the press loved to focus on it. However, we think there’s a lot we could have done to make it clearer that earning to give isn’t the only thing we care about. Read more.
We presented an overly simple view of replaceability, and didn’t correct common misconceptions about it
We think many of our previous applications of the replaceability argument were correct, but we don’t think it means that you shouldn’t take jobs with direct impact (e.g. working at a nonprofit) or that it’s okay to take harmful jobs for indirect benefits.
Unfortunately some of our early content suggested this might be the case, and we didn’t vigorously correct the misconception once it got out (although we never made replaceability a significant part of our career guide). We’re concerned that we may have encouraged some people to turn down jobs at high-impact organisations when it would have been better to accept them. Read more.
Not emphasising the importance of personal fit enough
We always thought personal fit – how likely you are to excel in a job – was important, but (i) over the last few years we’ve come to appreciate that it’s more important than we originally thought (most significantly due to conversations with Holden Karnofsky) and (ii) because we didn’t talk about it very often, we may have given the impression we thought it was less important than we in fact did. We’ve now made it a major part of our framework and career profiles.
Released an interview about multi-level marketing
We asked users to send us interviews about careers they knew about. One sent us a favourable interview about multi-level marketing, which we released. We were quickly told by a reader that multi-level marketing is highly ethically dubious, and took the post down within an hour. We should have better vetted user-submitted content before release.
Operational mistakes
Allowing a coaching backlog to build up in late 2014
We allowed a large backlog of over 100 coaching applicants to build up at the end of 2014, with the result that many had to wait several months for a response. This happened because our head of coaching was repeatedly on sick leave over the last half of 2014, and I didn’t step in quickly enough to close applications. To make it right, we apologised to everyone and gave email advice to about 50 of the applicants. When we set up our new coaching process in early 2015, we closely monitored the number of applicants and response times, closing applications when our capacity became stretched.
Not improving the online guide earlier
In September 2013 we took down our old career guide, going for a year without a summary of our key advice outside of the blog. I was aware of the problems this caused – most readers don’t visit useful old posts, and it was hard to find our most up-to-date views on a topic. I could have made a minimal replacement (e.g. a page listing our key articles) back in September 2013, which would have resulted in thousands of extra views to our best old content. Instead, we focused on coaching and new research, but in retrospect I think that was lower priority.
We should also have added a newsletter pop-up earlier. We were wary of annoying readers, but it dramatically increased our conversion rate from 0.2% to over 1%. In the end, we added a more complex appeal that just slides down under the header rather than popping up, and is only shown to engaged readers, with the aim of making it less annoying. However, we could easily have added a simpler pop-up a year ago, which would have resulted in 1000-2000 extra newsletter subscribers.
Simultaneously splitting our focus between the online guide and coaching
Perhaps an underlying cause of the previous two mistakes was that we attempted to push forward with both coaching and improving the online guide at the same time, despite only having the equivalent of two full-time staff working on them. We did this despite knowing the importance of being highly focused.
With more focus, we could have had clearer and shorter development cycles, better metrics, and generally better management, which would have helped the team to be more productive.
The reason we didn’t focus more was that we were reluctant to close the coaching, even temporarily, but in hindsight this wasn’t a strong consideration compared to the benefits of focus.
Not focusing more on hiring people we’ve already worked with
It’s widely seen as best practice in startups for the first couple of team members to be people who have worked together before and can work together really closely. We were aware of this advice but pushed forward with trying to hire new people. This mostly didn’t work out, costing significant time and straining relationships.
I think it would have been better either not to hire, or to focus on doing short, intense, in-person trials, since that’s the best way to test fit quickly. Instead, we did longer but less intense trials that were often remote and spread out over the year.
See the full review of progress
Annual review April 2013 – May 2014
Mistake: Team too large and not sufficiently focused on strategic progress. If the team had been smaller, more permanent, higher quality and more focused, we probably would have had less immediate impact, in the form of changed careers, research and outreach, but would have probably had more fundamental strategic progress, such as developing product plans or prototypes, testing the impact of our programmes, recruiting staff and raising funding. Ultimately, it’s strategic progress that’s important for our chances of becoming much bigger in the future. In particular, a key cause was having too many interns. Interns allow us to have more immediate impact, but take up core staff time, reducing long-run progress, especially in the face of relatively complicated team plans. We analysed the issue of how many interns to hire in our last six month review, concluding that we should aim to have fewer in the future. In hindsight, we should have been even more aggressive in reducing the number.
What we did: First, we reduced the number of interns to one working on tech (Ozzie Gooen) and one on central CEA. After Ozzie left, we switched replaced them with a part-time paid web developer. Ozzie also significantly simplified the website, making it easier to maintain, and taught us more about how to edit it ourselves (more detail in the website review). We replaced the most useful functions of non-tech interns with long-term professional freelancers, including an oDesk editor, a volunteer editor, a virtual assistant and a contract researcher. We decided to only aim to have one or two interns over the next year, and to restrict these to people we are strongly considering hiring, or who can help with our strategic priorities. Besides reducing the number of interns, we raised the bar on hiring, and decided to focus on building a team of staff who are around to stay, and can take 80,000 Hours to scale. We decided to aim to make the team plans even more focused, by working on fewer activities at once and always having a clear top priority.
See our full list of mistakes.
Six Month Review Dec 2012 – March 2013
Mistake: We ran out of capacity in the operations team around March. This resulted in delays of over a month to the arrival of two interns, since we were unable to complete their visa applications in time, and a meant the Executive Director of 80,000 Hours had to spend a significant amount of time on operations (around 50% at peak). This happened because (i) we didn’t make a sufficiently detailed plan for operations at the beginning of the period, so we didn’t recruit enough interns to meet demand (ii) we were overoptimistic about how much time operations required (iii) The Director of Operations wasn’t given enough authority to make the decisions himself.
What we did: We decided to create a new role – the Executive Director of CEA – which was filled by Rob Wiblin around August 2013. The role was to (i) oversee the operations team (ii) take over fundraising from Will MacAskill and (iii) act as a single point of contact for issues that affect the whole of CEA. It was given equal status to the Executive Directors of 80,000 Hours and Giving What We Can, who decided to meet weekly as a three. The aim was to increase decision making capacity covering central issues, like the office, legal risks, the relationship with other organisations and recruitment, while also freeing up Will’s time from fundraising. We also decided to: (i) Ask the Director of Operations to draw up more detailed plans (ii) Hire additional interns for the central team (iii) consider hiring a second staff member for the central team (though we ended up deciding against). By our next annual review in May 2014, we were happy with the operation of the central team and Rob Wiblin’s performance in the new role.
See our full list of mistakes.
Six Month Review June – Nov 2012
Mistake: We could have been ahead of schedule if we had focused more on testing, product research and strategy from the start, rather than working as much as we did on outreach (although outreach had substantial benefits).
What we did: We addressed this by having two major strategy reassessments – one near the end of the summer and one in November – in which we assessed our competitive niche and analysed our success to date. Going forward, we’re making sure to include more time for strategy in our plans. We’ve designed an iterative product development process where we collect feedback early and use it to constantly adjust our content.
Mistake: CEA Central Operations (shared with Giving What We Can) had success in registering CEA as a charity and dealing with all the admin required to take on staff – sharing the department saved us months of staff time. However, it also had a number of failures, which wasted management attention.
What we did: We changed the role of the Director of Operations to officially answer to the Executive Director of 80,000 Hours and removed their responsibilities to work on Giving What We Can (which was dividing attention). The Executive Directors of Giving What We Can and 80,000 Hours started to meet weekly to discuss central issues, and we reviewed our allocation of interns to the central team.
See our full list of mistakes in this review.
Read our full evaluations