What machines can and can't do

We investigate the limitations of statistical methods, a subclass of which is called "machine learning", taking the opportunity to touch on several aspects of the industry, including the future of employment, various industry practices, and privacy. It's written as a wide-audience article, and only assumes a passing familiarity with software and statistics. We start from a seemingly hardline stance, and follow a line of flawed reasoning leading up to the point where humans become near-obsolete, an event commonly known as the "singularity". Finally, we step back to see where we stumbled.

We use the following starting point: using any combination of statistical methods, of any complexity, it is impossible to develop any sort of "understanding" that resembles human intelligence. Instead of trying to tackle questions about the nature of intelligence, or entering a deep philosophical rabbit-hole about whether "intelligence is computable", we start with the modest goal of understanding what statistical methods are.

As a starting point, one might argue that the world is made of up of a collection of "facts" that need to be "memorized" after having seen them stated multiple times correctly. A variation of this is the "monkey brain", in which you train the monkey by asking it to perform a certain task, rewarding it every time it gets it right, and penalizing it every time it gets it wrong. This is the basic idea behind "reinforcement learning", the kind used to get machines to play "games". Speed up the task-to-[reward or penalty] cycle exponentially, and you have a "smart monkey". While the "smart monkey" might make silly mistakes on a bad day for irrational reasons, a "bad day" for a statistical model is the event when it is presented with data that it doesn't know how to fit. The monkey's emotional engine can arguably be emulated by weighting the penalty that the machine receives, based on how fundamental the mistake was, and how many hours of training it had received. The procedure we've described of "training" a machine like this falls under the class of "supervised learning", which, as one might argue, is the same way a human learns from an assignment or exam.

Let us assume that a machine can successfully memorize trillions of "facts", and look into how these might be represented. In most commonly-used applications today, they are encoded in the form of matrices of floating-point numbers, with some weighted arithmetic operations connecting them, as chosen by the person building the model. The model is fed a matrix of floating-point numbers, and outputs a matrix of floating-point numbers. We might argue that the inability to interpret these individual numbers is akin to not being able to interpret the voltages of the individual neurons in the brain, and this doesn't seem to be a problem in practice. The said "facts", now lost in a soup of numbers, can be generalized into a broader class of "things that can be inferred from data". With every attempt at being able to handle a new situation, the model simply does arithmetic on the soup (a large class of which are just matrix multiplications), and the numbers in the soup adjust themselves with every "feedback" we give it. This is the basic idea behind "neural networks". While it's easy to get lost in the details, note that we're still doing statistical analysis. The additional "frills" haven't changed the essential nature of the method: the machine is "learning by example", and attempting to "generalize" it.

Among human beings, learning by example is a very effective way to learn, when starting out. Most programmers would attest to the fact that they started out by copying examples, tweaking them, and checking them with a program (these programs are known as "compilers"). Some would argue that this approach doesn't scale in human beings, because they don't have the computational power to continue learning by example, and "cheat" by reading some kind of specification as they become more experienced. Does it mean that classical programmers (aka. those not engaged in building statistical models) could be replaced by machines in the future?

The key to seeing the viability of this argument is to consider two things: (1) the structural complexity of the task, and (2) the number of examples available to concretely reinforce the patterns, and nullify the anti-patterns. It's no secret that there are several good tools for aiding with "simple" programming languages (otherwise called IDEs), which a reductionist would phrase as "machines are good at dumb tasks". Using simple frequency analysis, the IDE can order its auto-complete suggestions and corrections intelligently. To date, there is no IDE that can assist with the help of the vast repositories of open-source code, but it doesn't require a big leap in imagination to assume that this will be possible in the future. In fact, we should be open to the possibility that compilers start building repositories of error-messages, and run in the background, suggesting even more intelligent fixes, based on how others who encountered a similar error fixed it. A philosophically-minded reader would be interested in whether the "intent" of the programmer can be dissected, and if, by labelling various segments of publicly available code with some kind of abstract "function", the machine can write entire programs with a few prompts. Assuming for a minute that this might be possible, can classical programmers stay relevant by simply moving to newer languages or technologies, on which there are few examples?

This poses a quandary: are we playing some kind of primal game, where the prey constantly has to come up with new strategies to outrun the predator? Some would argue that by refining our statistical methods, one can emulate the process of generalization effectively enough to beat the prey's creativity on the "search space" completely. What if this is the nature of every human specialization? An event, commonly referred to as the "singularity", whereby humans become near-obsolete, is predicted in 2040. Why is this claim so ludicrous?


To better understand where statistical models are successful, let us enumerate some recent accomplishments in the field.

Beating the best human player at Go, a program referred to as AlphaGo. The problem is well-defined, and there are very few starting rules. How does a human become good at Go? Just like in Chess, the rules play an insignificant role in the player's learning, and most of the training is about analyzing previous games. There are huge databases of expertly-played games available, and human players usually try to follow a semi-systematic approach. It could be argued that the machine is inefficient, because it doesn't follow a systematic approach, but the raw computational power makes this handicap look irrelevant. It's a little like arguing that a machine doesn't know the decimal system, or the tables of multiplication, and simply flips 0s and 1s to perform arithmetic. Humans here are the disadvantaged class in both cases, and it is completely unsurprising that, despite certain "inefficiencies", AlphaGo essentially does what a human player does to beat them at it: analyzing lots of games.

A language engine, referred to as GPT-3. How do humans master a natural language? They do it semi-systematically, starting from the association of simple words to real-world objects, then moving on to phrases describing something, learning some construction rules and absorbing culture along the way; and finally by reading a lot of good literature. Of course, the grammar rules play a very small part of the overall learning, so it is unsurprising that GPT-3 is able to produce grammatically correct responses to human-supplied prompts. However, literature talks about things in the real world, and for it to make sense, sufficient interaction with the real world is a prerequisite. It's not a bounded game, where all the necessary information is contained on a map or a board. It is, therefore, entirely unsurprising that GPT-3 can't "know" whether dropping a metallic plate would cause it to melt. It cannot infer the relationship between real-world objects, nor can it differentiate between factual and fictional human experiences. Producing good literature requires a whole lot more than that. This landmark achievement, despite having access to terabytes of text, and an unimaginable amounts of compute power, has failed to infer much more than grammar, sentence construction, and context.

The translation engine behind Google Translate. Someone who has just started using the product with no knowledge of the underlying technology would be puzzled about why this is chalked up as an achievement. Indeed, it's unlikely that we'll even have a half-decent translation engine using purely statistical methods. Translation engines have been around for a long time, and their latest iteration might be a significant improvement over the previous ones, but on an absolute scale, something seems to be off about relying on a pure statistical method to learn natural language.

Medical diagnostics. In this field, it is absolutely imperative to organize every little piece of data about every patient in a highly systematic manner: a missing entry is a life-or-death situation. On millions of these remarkably well-organized records, it should come as no surprise that a machine can serve as a valuable aid to a doctor. Statistical models have been a resounding success in this area. The achievement here, perhaps, should be attributed to the subfield of "computer vision", which studies the best way to encode medical images and videos in the aforementioned number soup.

Proof-search on a well-defined problem phrased in a computer language, using well-defined operations to go from one step of the mathematical proof to the next (in other words, aiding proof-search in programs known as "proof assistants"). These are classic combinatorial problems, with huge search spaces, lots of rules at each node, and clear end goals. Sort of like a video game on steroids. Some progress has been made, and the IDEs are expected to get better. At first glance, this might seem a bit alarming to those who don't understand how little mathematics can be written down in this form, or how painful it is to do so.

Finding protein-folding structures, referred to as AlphaFold. Another classic combinatorial problem, with well-defined rules. The landmark achievement here is that this problem has an enormous search space, and a solution to an important problem in computational biology has been found, which is sure to accelerate research in the area. Statistical models are an unabashed success here, and it has done what humans could only dream of doing using other methods.

Facial recognition. A classic pattern-recognition problem, which can only be learnt by example. It suffices to say that this technology is a resounding success, with the unfortunate side-effect of opening the door to various abusive ways in which it can be used.

Automated support over voice and text. The little "domain knowledge" needed here is all encapsulated within a very narrow context. There are few simple rules of the type "if the human says this, then say that" (the model used to make such decisions are called "decision trees"), and humans learn to identify voice by example. The landmark achievement here is in "speech synthesis", or making the machine do text-to-human-like-speech. Despite that, it should be noted that speech-to-text and speech synthesis is still an unsolved problem for non-standard accents. We could argue that there isn't enough data on the non-standard accents, and yes, that's what this particular problem boils down to.

More classical problems. Statistical methods have been used to improve upon classical data structures and algorithms in computer science (there was one beating quicksort recently). Analyzing the numbers for patterns, then building a data structure that's best suited for those patterns, before deciding how to handle them, sounds pretty sensible. The landmark achievement here is that this model can be trained and executed faster than the corresponding classical algorithm.


Now, we're ready to see machines fall flat on their faces when presented with problems that are effortless for humans.

  1. You're given a sequence of stones, and each stone is represented by a unique symbol. What can you say about the following sequence?
[.*, %\!, &^#?]

A kindergartener would simply say that there are more stones on the right. Open-ended problems like this present a formidable challenge to machines. How is it supposed to begin approaching the problem? It is already stuck with the concrete, which it is looking at much too closely.

  1. Two identical twins, jump off from a height of 10ft simultaneously. The first time, they jump without touching each other, and the second time, they hold hands. What is the difference between the two flight timings?

This thought experiment, if not already familiar, should be a simple task for any human to simulate in their heads. It is often used to illustrate the elegance of a simple physical law. Machines have no idea what to do other than to try some kind of correlation on existing data, or if height of 10ft is relevant.

  1. I've drawn a certain kind of matrix here. Do you see what I'm trying to show you?
*
**
***
****
*****

Anyone with minimal exposure to linear algebra would immediately say that it's a lower-triangular matrix, even if they've never seen it presented that way. Yes, it's vague, but this is a classic abstraction problem, working across domains. The machine, in this case, might identify some kind of right-triangle in the textual picture, but what's the next step? How does this connect to a rectangle filled with numbers? Where are the numbers?

Notice that very little knowledge is needed to solve the above problems, and collecting more data isn't going to change a thing.


Solving these seemingly childish problems might not be of much consequence, but they are prerequisites for tackling more complicated problems:

  1. Given that a hedgehog is a rodent that some people keep as pets, which of these six pictures is the picture of a hedgehog?
  2. Derive the equation of motion for a simple pendulum from the axioms of classical mechanics.
  3. Watch a film, and critically comment on the acting, screenplay, direction, and storyline.
  4. Deliver a talk on black holes, and gauge the level of understanding of the audience from the questions you receive.
  5. Check the correctness of the proof of the abc conjecture, by directly reading the document uploaded on arXiv.

To claim that there even is such as thing as a "singularity", requires, at the very least, some kind of strategy to tackle the last problem.

At this point, the data-zealots would interject. If we collected all the data of every second of every human's life for a year, given enough storage space and compute power, we'd be able to tackle at least some of these problems using current methods, and our methods are only going to get better over time, they'd argue. Even if it were true, it is a ridiculous claim, akin to something along the lines of "given enough time, I can write a graphical web browser that runs on bare-metal, from scratch, by writing a sequence of 0s and 1s on a piece of paper". Actually, it's even more ridiculous than that, because you'd need entire galaxies of low-wage unskilled workers to "clean" and "annotate" the collected data to spoon-feed the machine. Is this the "singularity" you talk about, where humans are doing the equivalent of cleaning ditches to keep the omniscient being happy? If you can't do kindergartener-level problems today without human assistance, with so much compute power at your disposal, what hope is there of claiming any "intelligence"? But, they'd argue, what if that machine has an intelligence that's "different", but somehow "subsumes" human intelligence? This is the kind of pseudoscientific rubbish that's spewed by nutty cult leaders. Statistical methods have a firm place in modern society, but as it currently stands, machines are the ones with the dunce hats in play-school.


We're currently in a data-warzone era, where all the big players are racing to collect more data on their users. What kind of data? Boring, inconsequential, everyday lives of human beings. To put it uncharitably, go through everyone's garbage with computing clusters, and you're bound to find some half-eaten chocolates. The models behind AlphaGo, GPT-3, and AlphaFold have the same essential nature as the ones used to keep us endlessly hooked onto social platforms, and to sell us unwanted products via sweet-talking voice-assistants. A lot of industrial AI research is enormously expensive, but the dirty secret is that it is paid for with the privacy of decent unassuming folk.

Let strangers sift through your garbage if you please, but don't do it while being drunk on the "singularity" kool-aid.