I Read A Book
I recently finished Peter Thiel’s “Zero To One”, after a couple of thoughtful people recommended it to me. Between them and the thousands of un-thoughtful people who seem to recommend it daily, I figured I should give it a go. I am glad to report that it is roughly an 8 out of 10 on my personal scale, in that if you’re interested in the subject area, I would probably recommend it. It’s short, punchy, often funny, and at least moderately educational.
Thiel is probably the most intriguing and confounding figure in tech business over the last 20 years, and I generally don’t regret reading anything he puts out, even if I disagree with him on many important topics. I’ll be posting my abridged notes from it sometime soon, mostly as a personal log in case I lose my notebook.
The book oscillates between practical (but possibly pedestrian) advice about starting a Silicon Valley-style business, and more grand predictions of the future. While I find Thiel’s thoughts on both of these topics interesting, I think the book would have benefited from better editing to clearly delineate the advice portion of the book from the predictions.
A passage in one of the latter types of section really jumped off the page for me. Towards the end of the book, in the process of asserting that computers will not replace human labor, but will rather augment it in new and amazing ways, Thiel writes the following:
“[…] computers are far more different from people than any two people are different from each other: men and machines are good at fundamentally different things. People have intentionality–we form plans and make decisions in complicated situations. We’re less good at making sense of enormous amounts of data. Computers are exactly the opposite: they excel at efficient data processing, but they struggle to make basic judgments that would be simple for any human.”“Zero To One”, Peter Thiel, p 143
In a vacuum, this is a pretty anodyne truism you hear fairly often when discussing computing.
“Computers aren’t smart, they’re just fast, and they can do basic stuff like addition super quickly,” to quote my high school digital imaging teacher, with a statement that has been oddly seared into my mind.
A Disturbance In The Force (Of My Brain)
The above passage definitely has been a truism, but I am not sure how long that will last.
As we’ve seen with the rapid take-off of tools like ChatGPT, non-deterministic1 computing is becoming more and more powerful. We’ve already seen success in applications from biomedical research to physics, and it’s been over a decade since at least one respected programmer asserted that we should be using AI for type systems. ChatGPT is just the most recent and accessible development in the syncopated but relentless march of “Machine Learning”.
These new, non-deterministic, machines are not analyzing their entire corpus of their “knowledge” ad-hoc according to some set of complex rules . Instead2 they are tokenizing the query, and then returning outputs based on what most probably sounds right in response to the question. The key issue here, in relation to Thiel’s statement, is that the machine is no longer analyzing some relevant piece of data. It’s not, as many joke in JPEG-marred Facebook memes, a bunch of
if statements stacked on top of each other.
It’s worse! The computer is just free-associating! it’s just vibing!
if statements, critically, no matter how many you stack on top of each other3, are debuggable. They are, in lay terms, explainable.
Machine Learning on the other hand is critically very difficult to explain, at least at the time of writing.
What jumped out at me about Thiel’s statement above was that, compared to these “AIs”, humans sound much more like computers.
Sure, we can’t necessarily crunch rows and rows of data as quickly as a Python script, but at least we can explain our logic if we were forced to perform the task.
Imagine trying to ask ChatGPT to “show its work”, reproducing every transformation it did step by step. If it were even possible, each step representation would likely take as much energy as the overall answer itself.
On the other hand, maybe that’s not too different from humans, since we famously don’t really reason very often in our daily lives. If you’ve ever asked me to justify my behavior, you’ve seen just how much energy explaining a person’s logic can take.
Then I Listened To A Podcast
On a recent episode of “Cartoon Avatars,” the founder of Stability AI, Emad Mostaque, offered the following analogy when asked to explain the massive leaps in “Machine Learning” correctness we’ve seen over the last two years:
Mostaque: Yea so like machine learning was kind of under a classical paradigm. Actually one way to think about it as well is that you’ve got two parts of your brain, the part of your brain that jumps to conclusions and the logical part. So, the conclusions part is the world as it is, and that’s “holy crap there’s a tiger in the bush.” Right, so classical AI was the more logical kind of way and it was based on more and more and more data, again big data, so when deep blue beat Gary Kasparov it’s because it could think more moves ahead of him it just did pure crunching of the numbers it looked at every chess match and then it outperformed him.“Cartoon Avatars” Episode 46
It’s a great episode, but the transcript is too long and cumbersome to include here4.
Mostaque eventually made the point that in a system like Go, which is less constrained than Chess, purely crunching logical possibilities based on previous games and a hardcoded set of rules becomes untenable. Each additional prediction on these massive datasets requires exponentially more computation.
The new approach5 instead takes relatively limited datasets and maps connections between parameters. In this case “parameters” can mean anything from sentences to video frames, and “relatively limited” can mean “the size of the whole internet.”
The machine, critically, does not “know” any rules of what it is doing, in the same way that I don’t really “know” why I like a specific sweater. It does have some rough idea of correctness downstream of the fact that its training and weighting is guided by classical programs and people. This is akin to me liking certain sweaters because I’ve seen other guys complimented on them. When you go to ChatGPT and ask a question, the inference steps can then essentially be thought of as billions of “Puppies are to dogs as ____ are to ____” SAT questions being asked about your input. Incidentally, these Large Language Models have proven to be pretty decent at the SAT, and I’ve been inexplicably wearing donegal sweaters for ten years.
But At What Cost Function?
It seems like, in order to become better at satisfying complex questions, the machines have been forced to shed their logic and become more like us post-rationalizing humans. Is that good? Who knows. I am not even sure why I wrote this: I am a person, and therefore bad at logic.
I imagine one outcome will likely be that the adoption of AI will be limited in industries where decision auditing is a concern, like in loan underwriting. If the government takes a strong stance on self-driving automobile safety, it might mean that none of this “General AI” progress will be generalized to driving cars.
In “Zero To One”, Thiel argues that humans and computers will complement each other, rather than substituting each other. It seems like in today’s phase, the computers are being built to be more like us, less logical and more intuitive, in order to complement our natural-language oriented way of feeling around in our mental environment. I wonder if in turn we won’t make ourselves more like the computers in order to better approximate and surpass their performance?
I also wonder what our world, increasingly reliant on technology, loses if that technology starts becoming less logical? We already have plenty of intuition out there among us.
Companies like Mostaque’s are already starting to train text-to-image machines based on the outputs of image-to-text machines. Like much of the field, it seems like their confidence that these outputs will get better without converging to some unexpected local maxima are based more on empirical intuitions than deductive proofs.
It’s worth mentioning that later on in the above episode Mostaque states that ever since the 2016 victories of AlphaGo over world champion Go master Lee Sedol, the average skill of world-class Go players seems to have exploded at an exponential rate. In a world of less logical machines and less intuitive people, with humans mimicking computers and computers mimicking humans, I wonder if we’ll be able to explain why that might be?
- I like to call it “vibes-based”
- Simplifying quite a bit
- Nor how many you chain together in your accursed “data pipelines”, which are more like leaky Rube Goldberg machines than any piece of useful hose I’ve ever seen.
- The bit above starts at around 00:05:30.
- People have generically brand this “deep learning”, and Mostaque refers to as “transformer-based attention learning”. I am still unclear on the levels of reference here, how Deep Learning relates to LLMs, etcetera. Hopefully I can find some time to learn.