How AI could destroy the world by accident

AI could be our biggest existential threat this century. If you enjoyed this video, here are some places to find out more about these ideas:

Human compatible US: https://amzn.to/3Pdi0qS UK: https://amzn.to/463vawM by Stuart ‘You can’t fetch the coffee if you’re dead’ Russell
The Alignment Problem: Machine Learning and Human Values US: https://amzn.to/3N7cLpV UK: https://amzn.to/45YMZgq by Brian Christian
@eightythousandhours’s problem profile on ‘Preventing an AIrelated catastrophe’: https://80000hours.org/problemprofil...
@RobertMilesAI’s channel:    / robertmilesai

Read the worst cat/sat/matbased short story ever written here: https://andrewsteele.co.uk/blog/2023/...

Amazon links are affiliates and I will receive a small payment if you choose to purchase through them. Thanks!

Chapters

00:00 Introduction
02:04 How does ChatGPT work?
06:48 Problem 0: AI misuse
08:01 Problem 1: AI is an alien mind
11:18 Problem 2: Defining goals is hard
17:05 Problem 3: ‘Instrumental convergence’
19:17 Problem 4: Exponential progress
22:32 What can we do?

Sources and further reading

On AI being an alien mind, I really enjoyed this video from @kylehill on a hilarious flaw in DeepMind’s Goplaying AI, which handily beat world champion Go players…but, knowing this flaw, was easily beaten by an amateur    • ChatGPT's HUGE Problem

This is a Twitter thread from me digesting 2022 results in AI in (then)realtime, and speculating about whether these capabilities indicate that AI could ‘do science’   / 1511722732257480711

Introduction
ChatGPT’s user growth https://www.reuters.com/technology/ch...
Try hilariously bad 2020 texttoimage generator XLXMERT here: https://visionexplorer.allenai.org/t...
Run Stable Diffusion locally using its web UI: https://github.com/AUTOMATIC1111/stab...
‘Sony World Photography Award 2023: Winner refuses award after revealing AI creation’ – BBC News https://www.bbc.com/news/entertainmen...

How does ChatGPT work?
An absolutely humungous list of papers about LLMs https://github.com/Hannibal046/Awesom...
GPT and other LLMs don’t usually work on the word level, they actually normally work on ‘tokens’—many of which are words, but not all of which are. You can get a sense for the difference by trying out OpenAI’s Tokenizer, here https://platform.openai.com/tokenizer
Emergent abilities of large language models https://openreview.net/pdf?id=yzkSU5zdwD
ChatGPT playing chess https://www.lesswrong.com/posts/xyjhF...

Problem 1: AI is an alien mind
Paper on using psychedelic specs to fool facial recognition AI https://users.ece.cmu.edu/~lbauer/pap...
‘Psychedelic toasters fool image recognition tech’ – BBC News https://www.bbc.com/news/technology4...
Thread on how little we know about how ChatGPT works—including an absolutely baffling algorithm it uses internally to add numbers together!   / 1663534255249453056

Problem 2: Defining your goals
More about OpenAI’s CoastRunnersmashing reinforcement learning algorithm https://openai.com/research/faultyre...
Astrophysicist Grant Tremblay correcting Bard on Twitter   / 1623091683603918849

Problem 3: Instrumental convergence
Great video with Rob Miles about how hard it is to build an off switch for an AI    • AI "Stop Button" Problem  Computerphile

Problem 4: Exponential progress
Article on how ChatGPT can help with code (and its limitations) https://www.nature.com/articles/d4158...
GPT4 cost over $100m to train https://www.wired.com/story/openaice...

What can we do?
AI governance is a huge field, and a good overview of resources can be found at https://80000hours.org/problemprofil... (link should take you straight to the AI governance and strategy’ heading)

Errata

I should probably have said GPTR4 ‘may’ have 1 trillion parameters, because this hasn’t actually been made public. In the absence of a definitive source, this comment thread discusses the issue:    • How AI could destroy the world by acc...

Credits

Milla Jovovich image CC BYSA Georges Biard https://upload.wikimedia.org/wikipedi...

And finally…

Follow me on Twitter   / statto
Follow me on Instagram   / andrewjsteele
Like my page on Facebook   / drandrewsteele
Follow me on Mastodon https://mas.to/@statto
Read my book, Ageless: The new science of getting older without getting old https://ageless.link/