James Routley

Blue Dot is a “talent accelerator for beneficial AI and societal resilience,” and they lead a number of free courses. In preparation for their upcoming Technical AI Safety course, I am reading through all the materials of their more general AGI Strategy course, which aims to be “an in-depth introduction to what’s going on with AI development, what the good and bad outcomes could be, and what could be done to steer AI towards better futures.” The course has five sections:

…and consists of ~25 hours of readings from a variety of external publications, interspersed with writing exercises. As someone who has been circling around these topics for years I am excited to strengthen my understandings (and/or deepen my knowledge) by engaging thoughtfully with a representative overview of the field, as opposed to my normal strategy of passively scanning my blog feeds. Here are my reflections on the first section of the course:

The course opens with this hook:

You’re the product of 8,000 generations of humans. You were born into the civilization they built, with all its magic, beauty and flaws. Their decisions shape every aspect of our world.
We’re only the latest generation of humans, but we’re living through one of the most significant technological transformations in the history of the universe.
Our decisions today will have an immense impact on our own future, and the future of all our descendants.

I was reminded of a quote from Wilfred M. McClay’s American history book Land of Hope, which I’m currently reading as part of a national book group:

History is only very rarely the story of inevitabilities, and it almost never appears in that form to its participants. It is more often a story of contingencies and possibilities, of things that could have gone either way, or even a multitude of other ways. Very little about the life of nations is certain, and even what we think of as destiny is something quite different from inevitability. Every attempt to render history into a science has come up empty-handed. The fact of human freedom always manages to confound the effort to do so.

Human action can and does shape the course of history! And Blue Dot wants you to shape the trajectory of AI.

The course then goes straight into a source from the Institute for Progress (IFP), a non-partisan think tank focused on U.S. innovation policy. The piece Preparing for Launch lays out a compelling case for the United States’ unique responsibility in determining key aspects of our global future:

AI progress is path-dependent: The sequencing of AI progress matters — where and in what order new capabilities are developed may be just as important as which new capabilities are developed.
Given its position in the AI supply chain, and as the world’s most powerful democracy, the United States has the responsibility to shape AI development towards a path that enables — rather than smothers — human flourishing.
This proactive shaping is not without precedent. From nuclear fission to spaceflight to mRNA, the US has repeatedly changed the trajectory of emerging technologies. In the age of AI, we argue for four guiding principles:
We should take advantage of the “jagged frontier” of AI capabilities
We shouldn’t neglect the costs of stalled progress
We should redesign how many of our scientific institutions work
We should adapt to deep uncertainty while working to reduce it

…

Along the way we learn of some remarkable trends that I already knew about, but are good to see laid out in sequence and with nice graphs:

AI capabilities are (arguably) increasing rapidly and exponentially (2x increase every 7 months in time-horizon of autonomous software-engineering task completion, per METR)
The US is making massive investments in AI infrastructure (“construction spending on data centers to train and deploy AI models will likely soon overtake spending on offices for human workers.”)

Basically, AI is a big deal. And how does IFP wish to help?

We’re trying to understand the right sequence of technologies that the United States needs to build to realize the promise of an AI-enabled golden age sooner, and to ensure that we have the defensive technologies built in time to navigate the transition safely.

I am glad that the Blue Dot course opens with this largely techno-optimist (or “defensive accelerationist1”) source, as I was at first unsure how radical their agenda would be. For someone interested in getting involved, the source points to many potentially high-impact areas of contribution:

On the defensive side:
- Accelerating defensive cybersecurity
- Working on public health tech (to counter bio-risks), such as:
  - far-UVC air purification
  - quicker/cheaper vaccine development technologies
  - wastewater pathogen surveillance
And on the “general acceleration of progress” side:
- Working to unlock useful science datasets that are currently held by government agencies and universities, such that we can create beneficial AI applications from them
- Making fundraising less costly to scientists (seems nice to me, unclear how directly this relates to AI)
And on the “meta” side:
- Monitoring AI progress via benchmarks, to inform prioritization of the above
- (Presumably) working at IFP-style orgs that think about all of these things and make policy recommendations

Ultimately the IFP piece seems predicated on the premise that rapid AI progress will continue to happen, and we can’t stop it, so we might as well invest in defensive tech now. That approach seems reasonable to me, and also, I notice it’s not very different from saying “let’s generally work on tech that’s good for humanity” - I suppose the nuance they are adding is to say “let’s pay attention to AI, and understand where it’s heading, and use that to inform our defensive-tech bets right now cause this stuff is really urgent.” Notably absent are discussions of the technical alignment of general-purpose AI models themselves, which I am hoping to develop a perspective on in the following “Technical AI Safety” course.

The next source was the introduction of the book Utopia for Realists, which I enjoyed and recommend. The introduction makes the case that:

Technology is miraculous and has turned the world into a kind of paradise that our ancestors could only dream of (rising life expectancies, falling infant mortality, rising wealth, access to bountiful food, water, internet, etc.)
And yet, people are kind of miserable. As the author describes, people today are disconnected from the grand project of bettering their lives and communities, and instead end up ceding more and more control to ever more powerful and uncaring commercial interests.
The solution is to be utopian, or rather, to have greater visions of the future of society and to work towards building them.

Reading this helped me articulate an important aspect of my own belief system and motivations. Indeed I am not often inspired when I look out on society’s offerings to my peers and myself. For that reason it’s difficult for me to find meaning in work that is dedicated purely towards marginal improvements in economic efficiency (and a lot of the work available to me seems to fit this category) - such work just helps the world keep doing what it’s already doing. For my own sake and for the sake of others it feels vital that I be engaged in the form of utopianism described here. This doesn’t preclude me from contributing to the overall efficiency of the economy, but if I am not also engaged in developing and manifesting improved cultures and ways of living, I expect not to be satisfied.

I also appreciated the next module in the course, which asked participants to imagine a positive future for themselves and for the world in 20 years time. Once again, I felt that this exercise was aligned with the kind of approach that I’d most like to take to my career - one that is grounded in specific positive visions of the world I wish to create for myself and others. Perhaps I’ll soon share some of those thoughts on the blog.

The next section of the Blue Dot course was less inspiring, or at least less hopeful. It consisted of three readings, all on the topic of “how do we best control and constrain the development of AI.” Helen Toner’s In search of a dynamist vision for safe superhuman AI cautioned against overregulation and concentration of power, but offered no concrete solutions to AI risks. Vox’s It’s practically impossible to run a big AI company ethically points out a predicament: Anthropic’s founders understood the competitive dynamics that push the big labs towards recklessly advancing the frontier, claimed that they wouldn’t do that, and then succumbed to those dynamics anyways, regularly pushing forward the frontier in AI capabilities (and notable news this week is that they’ve hired superstar researcher Andrej Karpathy to focus on AI-automated AI training). The article does point towards important areas of policy (e.g. better accountability for data usage), but it does not signal any clear paths out of the current competitive regime. Finally, Seeking Stability in the Competition for AI Advantage is a critique by RAND of another paper which presented the “MAIM” concept for preventing state development of unsafe AI, fashioned after the concept that mutually-assured destruction (MAD) keeps nuclear war at bay. Among other objections, RAND authors point out that MAD only works because there’s a clear threshold at which to counterstrike (missiles in the air), which would not exist in an AI race.

I have no doubt that AI policy is important to get right and is a worthwhile endeavor, but so far my takeaways are that many potential policies would be bad, and perhaps roughly speaking the best we can do is elect competent people who can deal with issues as they arise, and meanwhile focus efforts on defensive technologies, as mentioned in the IFP piece.

There are many ways to go about planning a career. One can optimize for enjoyment, or money, or positive impact, and realistically one must optimize for all of the above in various ways as dictated by one’s values. AI safety is a tempting field because the possibility for positive impact is extremely high, but at the same time, it’s unclear what approaches would be tractable. So far in my reading I am drawn towards the development of defensive technology, but this is distinct from working to keep models and AI products from being used for harm, which is usually what I think of when I think of “AI safety.”

I also continue to mull the question of “what kinds of work are most likely to excite me,” since excitement, beyond being a close cousin of enjoyment, is generally an enabler of positive impact. I don’t have the clearest answers yet, but am happy to continue this thread of learning, as doing so may help me understand what parts (if any) of the AI safety landscape could draw my excitement and leverage my existing talents. I hope to post more on the blog as I go through these courses and as my thinking evolves.

Blue Dot AGI Strategy Course, Part 1 - My Impressions

Discussion about this post