2019-09-08

Review: Structure and Interpretation of Computer Programs

 Book: Structure and Interpretation of Computer Programs,
by Harold Abelson, Gerald Jay Sussman, and Julie Sussman (1996, 2nd ed.)
2.7k words (≈11 minutes) 

Spend enough time searching for good programming books on the internet, and you are bound to run into Structure and Interpretation of Computer Programs (SICP). Many regard it as the bible of programming. For good reason, as it turns out.


Beware the wizards


The Wizard Book. (Credit: MIT Press)

SICP is sometimes called the “Wizard Book”, because there’s a wizard on the cover (if your job is making an interesting cover for a programming book, what would you do?). However, this does not mean that the book has anything to do with –
“[L]earning to program is considerably less dangerous than learning sorcery, because the spirits we deal with are conveniently contained in a secure way.”
Um. Okay, I rest my case. Proceed with caution.


Contrarian SICP

For most subjects there is a standard way to present it that most books, lectures, etc. will follow.

For programming, the standard way seems to be to take some “mainstream” language, show how to print “Hello, World!” onto the screen, then start introducing things like assigning values to variables, conditionals, and so on. Pretty soon you can be doing some pretty impressive things.

SICP does not follow this route.


Why Lisp?

The first thing that might strike you about SICP is that the programming language of choice is Scheme, a dialect of Lisp (short for “LISt Processor”), which is commonly known as that obscure language invented in 1958 that wears down the parentheses keys on your keyboard.

Comic by Randall Munroe of xkcd. This comic can be found here. 
However, the authors are not just being contrarian here; there are many good arguments for using Lisp in a book like this.

First, Lisp is the closest a programming language can get to having no syntax. You don’t have to learn where curly brackets are used, or which operators/functions follow which type of syntax, or a multitude of special characters that perform arcane pointer logic (I’m looking at you, C++).
If you have an expression in parentheses, the first thing inside the parentheses is the name of the function that is being called. Everything after it is an argument to be passed to that function. Something not in parentheses represents either just itself (e.g. a string, number, or boolean), or is the name of a variable that in turn represents something.

For example: (+ 1 (* 2 3) var) evaluates to the sum of the numbers 1, the product of 2 and 3, and whichever number the variable var has been set to.

Now you know approximately 90% of Lisp syntax (there’s also a few other things, like a special syntax that stands in for an unnamed function, and some shortcuts for things you’d otherwise have to type out repeatedly).

If you follow along with SICP, Lisp is self-explanatory.

The second point in favour of Lisp follows immediately from the first: the near-absence of syntax means you don’t have to think about it. Once you get used to it, writing in Lisp feels almost like transcribing pure thought into code.

When a language implements various special syntaxes, it generally privileges certain design patterns and ways of thinking; if for-loops are unavoidable, the programmer will think in for-loops. A near-absence of syntax means neutrality. Some might call it blandness; fair enough, but Lisp’s blandness is very powerful when used right. It makes it a very useful language for a book like SICP, which tries to teach you (for example) many different ways of abstracting data, rather than the one that is made most convenient by a language’s syntax.

The third point in favour of Lisp is that what little syntax it has was chosen carefully, namely in such a way that Lisp code is also Lisp data. The example function call (+ 1 (* 2 3) var) given above is just a list of the elements +, 1, the list of the elements 2 and 3, and var. This means that it’s very easy to write Lisp code that operates on Lisp code, something that comes in handy when SICP walks through the operation of a Lisp interpreter (in more practical situations, it also enables Lisp’s powerful macro system). To put it another way, introspection is easier in Lisp than other languages.

Finally, as the (perhaps biased) authors write: “Above and beyond these considerations, programming in Lisp is great fun.”


Executable math

Once you’ve gotten over all the parentheses, the second thing you’ll notice about SICP is the order in which topics are presented.

The first chapter is entirely devoted to creating abstractions by defining functions. Only function (and variable) definition and function calling are used – no mention is made of data structures or changing the values of variables.

If you think it’s impossible to do anything interesting by just calling functions, you are wrong, and SICP will prove it.

The chapter runs through the very basics of function application, variable definitions, and the substitution model of how to apply functions (this last point will latter be amended). It discusses iterative and recursive processes, and how iterative processes can be described by recursive functions.

A lot of the things you can do by just calling function are quite math-y. SICP does not shy away from this: Newton’s method for square roots, numerical integration, and finding fixed points of (mathematical) functions are prominent examples. No prior knowledge about the math is assumed, but this may still put off many readers because it’s abstract and not directly relevant to most real-world problems. “Executable math” is a pretty good summary of what most of this chapter is about.

However, the chapter really is striking. Using just one type of abstraction (defining functions) and not too many pages, SICP scales from the very basics to solving fairly involved problems with techniques, like extensive use of higher-order functions, that would be left for much later in a more conventional work.


Finally: data!

Only in the second chapter does SICP turn to data structures. Once again the format is the same: introduce exactly one type of abstraction, and systematically introduce examples of how it’s useful and what can be done with it.

The basic Lisp data structure is creating cells that link together two values. The primitive function for this is cons. If we want to chain together many values, for instance to create a list of the elements 1, 2, and 3, we can do this with (cons 1 (cons 2 (cons 3 null))) (of course, there’s also a function – list – that creates lists like this automatically ).

Additionally, Lisp provides primitive functions for accessing the first and second element in a cons cell. For historical reasons, these functions are called car (returns the first element) and cdr (returns the second element). This means that the cdr of a list defined in the same way as above would be all but the first element of the list.

But what is data? Or do we even care? After all, all that interests us about conscar, and cdr is that if we define, say, x as (cons 1 2), then (car x) should be 1 and (cdr x) should be 2.

One clever way of implementing this – and one that will likely seem both weird and ingenious the first time you see it – is the following:

(define (cons x1 x2) ; define cons as a function on two inputs
  (define (dispatch n) ; define a function inside cons
    (if (= n 1)
      x1  ; return x1 if n = 1
      x2)) ; else, return x2
  dispatch) ; the cons function returns the dispatch function

(define (car x)
  (x 1))

(define (cdr x)
  (x 2))

What’s happening is this: cons returns the function dispatch. Let’s say x is a cons cell that we have made with cons, consisting of the elements x1 and x2.

Now we’ve defined the car of x to be whatever you get when you call the function x with 1 as the argument. x is what the cons function returned, in other words the dispatch function, and when we call that with 1 as the argument, it will return x1. Likewise, when we call x with the argument 2, the dispatch function that x represents will return x2. We have satisfied all the properties that we wanted cons, car, and cdr to have.

Is this how any reasonable Lisp implementation actually works? No.

(If you’re confused about the previous example: note that we’ve snuck in an assumption about how variable scoping works in functions. When the dispatch function is created inside the cons function, the variables x1 and x2 are obviously bound to whatever values we inputted into cons. What’s not obvious is that dispatch can access these values when it’s called later – after all, x1 and x2 were local variables for a function call that will have ended by then (and the cons function might have been called many times, meaning many x1s and x2s). However, in Lisp the environment at the time a function is called is bound to that function. When that function is called again, any local variables like x1 and x2 present in a parent function in which that function was defined will remain accessible. This type of thing is called a closure.)

Mutability (changing variable values after they’ve been defined) is only introduced in the third chapter; up until then, the book focuses purely on functional programming.

The third chapter is the culmination of the first half of the book: now that functions, data abstraction, and mutability have all been discussed, the authors introduce many examples of the structures that are now possible.


The what evaluator?

SICP walks the reader through the process of writing a Lisp evaluator in Lisp, something that is called a “metacircular evaluator”.

Writing a Lisp evaluator in Lisp might seem pointless, but remember that a programming language, especially one like Lisp, is just as much a language for setting down our thoughts about procedures as it is something to be executed by computers. A Lisp-to-Lisp interpreter has the advantage that it is one of the simplest interpreters that it is possible to write. Interpreters for Lisp benefit greatly from the simplicity of Lisp’s syntax, while interpreters written in Lisp benefit from the expressiveness and flexibility of the language. Thus, with our Lisp-to-Lisp interpreter, the essence of an evaluator is laid about as barely before our eyes as it can be.

If you're willing to forget about code readability and leave out some syntactic sugar like cond expressions, you can literally behold the metacircular evaluator at a glance. (Note the #lang sicp line – DrRacket has a package that implements the exact version of Scheme used in SICP.) 

The authors write:
“It is no exaggeration to regard this as the most fundamental idea in programming: 

‘The evaluator, which determines the meaning of expressions in a programming language, is just another program.’ 

To appreciate this point is to change our images of ourselves as programmers. We come to see ourselves as designers of languages, rather than only users of languages designed by others.”
After presenting the metacircular evaluator (and an optimisation), the authors go on to discuss three “variations on a Scheme” (haha …):
  1. Making the evaluator lazier. More precisely, delaying the evaluation of an expression until it is needed (“lazy evaluation”). This allows, for example, the convenient representation of infinite lists (“streams”), and more flexibility in creating new conditionals.
  2. Non-deterministic computing, in which the language has built-in capabilities to handle statements like “pick one of these three items”, or “search through these options until some permutation matches this condition”. With such a language, some logic puzzles can be solved by simply stating the requirements and pressing enter.
  3. A logic programming language, which can process queries about data.
Programming often involves wanting to do something, and then taking that task and “building down” to the demands of whatever programming language is used. A powerful alternative method is to also build up the language to customise it for the needs of the task at hand. The boundary between language and program blurs.

It’s almost as if …
“The evaluator, which determines the meaning of expressions in a programming language, is just another program.”

What do we say to compilers? Not today

There’s a fifth chapter to SICP, in which a register machine simulator is constructed, and then used to implement – surprise surprise – a Lisp compiler.

In a way, this completes the loop: the first three chapters show what kinds of things various programming abstractions allow, the fourth shows how these abstractions can be used to implement themselves, and the fifth looks “under the hood” of Lisp itself to consider how it can be implemented with elements simpler than itself. Of course, the question of how the simpler register machine itself can be implemented is left unanswered, but this is already starting to brings us into the realm of hardware, for which another book might be better suited.

For the first four chapters I did perhaps half of the exercises; for the last, I just read the main text. The chapter feels more theoretical than the previous ones. Even though the Lisp-to-Lisp evaluator of the fourth chapter is purely academic, I found it more interesting (and also more practical, since I recently wrote an interpreter for a project) than the construction of a compiler from simulated versions of very restrictive components. Hopefully I will return to the chapter at a later point, but for now a more thorough reading will have to wait.


First Principles of Computer Programming

SICP is a rather unconventional programming book. I think this is largely because the authors seem to have started from first principles and asked “what should a good book on deep principles in high-level programming languages look like?”, rather than making all the safest choices.

Therefore, Lisp.

Therefore, presenting one element at a time (functions, data abstraction, mutability) with care and depth, rather than the (admittedly faster and more practical) approach of introducing all the simplest things first.

Therefore, spending a lot of time hammering in the point that what evaluates/compiles your program is just another program.

SICP is not about showing you the fastest route to making an app. Unless you’re of a theoretical bent, it might not even be a particularly good introduction to programming in general (on the other hand, on several occasions I was slowed down by prior misconceptions; those with a fresher perspective may avoid some difficulties).

However, it excels as a deep dive into the principles of programming. Especially if you have experience with programming but haven't yet read a systematic treatment of the topic, SICP will be invaluable in straightening out and unifying many concepts.


Links & resources
I’m not aware of an official SICP solution set, but you will find many on the internet. This one seems to be the most complete, often featuring many solutions to a given exercise.


How to Design Programs: an alternative book

A similar, first-principles-driven, Lisp-based book on programming called How to Design Programs (HTDP) also exists (I have not read it). This book was consciously designed to emulate SICP in the good while fixing the bad, particularly in the context of being used as an introduction to programming (the authors of HTDP have written an article called The Structure and Interpretation of the Computer Science Curriculum in which they summarise their case).

Incredibly, HTDP is also available available for free online. Either MIT Press has been overrun by communists, or the people who write good programming books are far more charitable than the average textbook writer.

2019-08-20

Review: The Pleasures of Counting

 Book: The Pleasures of Counting, by Thomas William Körner (1996)
1.8k words (≈6 minutes) 


On his website, T. W. Körner introduces The Pleasures of Counting as follows:
Longer than “With Rod and Line Through the Gobi Desert”, funnier than “The Wit and Wisdom of the German General Staff” and with more formulae than “A Brief History of Time” [The Pleasures of Counting] was voted Book of the Year by a panel consisting of Mrs E. Körner, Mrs W. Körner, Miss K. Körner and Dr A. Altman (née Körner).
The Pleasures of Counting is hard to categorise. On one hand, its flow and lucidity match the best works of general non-fiction, even though the book features more tangents than an intro to derivatives course. On the other hand, in contrast to books that merely tell about math, Körner has the gall to make the reader do exercises.

The result is 500 pages of insights, proofs, exercises, and real-world applications about topics ranging from cholera to submarine warfare to weather prediction, all delivered in Körner’s personable style and with a generous heaping of witty anecdotes and the occasional bit of verse. And it is glorious.


Warning: may contain math – but don’t worry

A first question about any book involving math is how much you need to know beforehand for it to be comprehensible.

Most mathematical arguments in The Pleasures of Counting can be followed with straightforward algebra. This does not mean the results themselves are straightforward – some, for instance the derivation of the Lorentz transformation or an outline of Shannon’s theorem, require careful thought and are easy to get lost in. Some exercises also either require or benefit greatly from prior exposure to calculus. However, in general The Pleasures of Counting manages to be both accessible and fairly deep. A casual reader can skip exercises and tricky arguments while still getting the gist, while other readers will find much to dig into in the more intricate proofs and exercises. All notation used is explained in an appendix.

Most people can gain something from this book, and given the breadth of the material, I expect very few will encounter nothing new.


The pleasures of everything under the sun

Körner discusses many common examples of mathematical reasoning and results, including special relativity, Galileo’s arguments about motion, Engima machines, Turing’s work, fractals, sorting algorithms, and the effects of scaling on biology, though always with his own spin on each topic.

I particularly enjoyed Körner’s discussion of dimensional analysis in physics – a fancy way of saying that you figure out what variables some quantity should depend on, fiddle with them until you get an equation where the units (mass, length, time) check out, and then go design bridges with it.

This is an example of the “dangerous but fascinating past-time” of what Körner calls science “in a darkened room” – trying to derive scientific facts from pure thought alone. Science requires both reason and observation; relying on one alone is like trying to walk with one leg. That’s not to say it’s impossible to go places by hopping with one leg: Körner relishes in showing how you can start from small, abstract assumptions and hop over to interesting conclusions, such as why helicopters have long blades or how spacetime works.
 
The most unique and refreshing parts of The Pleasures of Counting are Körner’s presentation of the works of several somewhat less well-known scientific figures, such as G. I. Taylor, Lewis Fry Richardson, and Patrick Blackett. I have a feeling Körner’s pick of figures to examine is not random – all three are British mathematicians, physicists, or mathematical physicists who lived from the late 1800s to the mid/late-1900s, worked on war-related issues (Blackett was a major advisor on military strategy and operational research in World War II, Taylor participated in the Manhattan Project, and Richardson was an ardent pacifist who was a conscientious objector during World War I and later attempted a mathematical analysis of the causes of war), and studied/taught at Cambridge, like Körner. The timelines make it possible that Taylor and Blackett could have worked in Cambridge at the same time as Körner studied there, though I cannot recall Körner mentioning any personal knowledge of them in The Pleasures of Counting


The pleasures of digression

Körner does not restrict himself to purely mathematical matters. At one point a Socratic dialogue on the axioms of number theory segues into a discussion on the purpose of university:
TEACHER: […] When Mill wrote On The Subjection of Women [alright, the dialogue may have been going on a slight tangent even before the university stuff], he was consciously following Plato in this, and, still more importantly, in his view that everything is open to question and that positive good may come from rational discussion.
STUART [a student]: And that is what university is all about.
TEACHER: Not really.
STUART: But that is what university ought to be all about.
TEACHER: So you think the taxpayer is parting with large sums of money so that young ladies and gentlemen can sit around discussing life, the universe and everything. You are here to learn mathematics and more mathematics – not to row, play bridge, act or even to find yourselves – and that is what I am going to teach you.
STUART: But, even if that is what the taxpayers want, is it what they ought to get? A university which just trains technicians is not a university; it is a technical college.
TEACHER: Better a good technical college than a corrupt university. What ought you to learn at university besides mathematics?
STUART: Students learn to question received opinions.
TEACHER: So, after I have made you write out 100 times: ‘I must not accept authority,’, what do we do next?
ELEANOR [another student]: That’s simple. You make us write out: ‘I really, really must not accept authority.”
TEACHER: Besides which, asking questions is the easy bit. It’s finding good answers which is hard. A university is at least as much a repository for the accumulation of human experience and an instrument for passing it on as it is a device for adding to it.
STUART: But just teaching mathematics is not enough. A lot of us will go on to be engineers and managers and will have to take moral decisions. So why don’t you teach us ethics?
TEACHER: But would you actually go to lectures on ethics?
STUART: If the lecturer was good, yes.
TEACHER: But anybody would go to hear Sir Isaiah Berlin lecturing on how to watch paint dry. The question is, would you go listen to your ordinary lecturers talking about ethics?
ELEANOR: Not unless it was for examinations.
STUART: So why not examine it?
TEACHER: What would the examination questions look like? ‘Is it wrong to steal from widows and orphans? Answer yes or no and give brief reasons.’
STUART: There are lots of difficult and interesting moral problems.
TEACHER: Yes, but the problems of the human race are not those of finding the answer to moral problems in hard cases but of acting on the answer in simple ones. American law schools now include courses on ethics, but the only observable result is that the defence in cases of fraud now begins ‘My client’s behaviour has throughout been not merely legal but ethical.’ […] If wisdom were teachable it would surely be our duty to teach it. Since it is not, we simply try to teach mathematics.
Opinionated? Yes. Controversial? Perhaps. Does he have a point? Definitely.

Körner also discusses how to persuade bureaucratic committees (and when to know how to give up), the principles of successful smalltalk, and the philosophical issue of whether and how we should discount future values.

And then, after you’ve been nodding along at one of these digressions for a while, you snap out of a Körner-induced trance, realise you’re halfway through a proof, and that you’ve been enjoying it all the way.


The idle mathematician of an empty day

Ultimately, The Pleasures of Counting is not about the usefulness or applicability of mathematics, but the joy of it. Deriving truths from other truths, or looking at the messiness of the real world and capturing its broad strokes with a few symbols is not just a means to an end but also an art form, a way of thinking, and a purpose in itself.

Körner closes The Pleasures of Counting with the prologue of William Morris’s The Earthly Paradise. This poem is perhaps the best (and certainly most poetic) argument for the importance of “useless” endeavours like math or poetry. My idle blogging cannot beat Morris’s verse, so here is the poem in full:
Of Heaven and Hell I have no power to sing,
I cannot ease the burdens of your fears,
Or make quick-coming death a little thing,
Or bring again the pleasures of past years,
Nor for my words shall ye forget your tears,
Or hope again for aught that I can say,
The idle singer of an empty day.
But rather, when aweary of your mirth,
From full hearts still unsatisfied ye sigh,
And, feeling kindly onto all the earth,
Grudge every minute as it passes by,
Made the more mindful that the sweet days die –
– Remember me a little then I pray,
The idle singer of an empty day.
The heavy trouble, the bewildering care
That weighs us down who live and earn our bread,
These idle verses have no power to bear;
So let me sign of names remembered,
Because they, living not, can ne’er be dead,
Or long time take their memory quite away
From us poor singers of an empty day.
Dreamer of dreams, born out of my due time,
Why should I strive to set the crooked straight?
Let it suffice me that my murmuring rhyme
Beats with light wings against the ivory gate,
Telling a tale not too importunate
To those who in the sleepy region stay,
Lulled by the singer of an empty day.
Folk say, the wizard to a northern king,
At Christmas-tide such wondrous things did show,
That through one window men beheld the spring,
And through another saw the summer glow,
And through a third the fruited vines a-row,
While still unheard, but in its wonted way,
Piped the drear wind of that December day.
So with this Earthly Paradise it is,
If ye read aright and pardon me,
Who strives to build a shadowy isle of bliss
Midmost the beatings of the steely sea,
Where tossed about all hearts of men must be,
Whose ravenous monsters mighty men shall slay,
Not the poor singer of an empty day.

2019-08-15

Review: From Nand to Tetris

Book: The Elements of Computing Systems, by Noam Nissam and Shimon Schocken (2005)
5.9k words (≈30 minutes) 


A brief rant about the title

“From Nand to Tetris” (Nand2Tetris for short) is the name of the course, website, and overall project that the book The Elements of Computing Systems is part of. It’s an excellent name – catchy, concise, and expertly captures the content.

However, apparently it’s a law that a textbook must have a stodgy title consisting of a reference to the subject matter (bonus points for ostentatious circumlocution, like writing “computing system” instead of “computer”), perhaps attached to a generic word like “concepts” or “elements” that doesn’t mean much.

For the rest of this post, I will pointedly ignore the name “The Elements of Computing Sytems” and refer to it as “From Nand to Tetris” or “Nand2Tetris” instead.


You’re a wizard

At first glance, computers are basically magic.

Science fiction author Arthur C. Clarke once said:
Any sufficiently advanced technology is indistinguishable from magic.
A more accurate phrase might be “any sufficiently advanced technology appears to be indistinguishable from magic”. Of magic, all you can say is it just works. With technology (or, for that matter, anything in our world), there is always a reason.

The goal of Nand2Tetris is to take computers from the realm of magic into the realm of understanding.

This is a difficult task, since the technological stack connecting the physical world to a desktop operating system is perhaps the highest and most abstract technological stack humans have created. The functioning of computers is also a topic split into many layers, each of them its own broad field, from chip design to compilers to programming languages to operating systems.

What Nand2Tetris does is presents one example of a path from logic gates to chips to a machine language to virtual machines to high-level languages to an operating system. The aim is not so much to explore every nook and cranny of the computational jungle, or even to provide a map, but instead to demonstrate that such a path is even possible.

A path through the jungle

Logic gates

Boolean logic and basic gates

Most of the function of a computer can be constructed from just two pieces of hardware. The first is the NAND gate.

The only thing we need to know about the NAND gate is that it takes two inputs, each of which takes a binary value (we call the values 0 and 1), and produces a 1 as an output except when both inputs are 1, in which case the output is a 0.

We will not discuss the implementation of the NAND gate, but instead assume that such a device can be implemented by electrical engineers who are clever enough (we have to start somewhere).

In fact, it is barely relevant that the NAND gate is a physical device. We can instead think of it – and other logic gates – as a mathematical function, which maps some set of 0s and 1s to an output value.

In the case of a NAND gate, it takes two inputs and maps it to one output in the manner specified by the following table:

0, 0 -> 1
0, 1 -> 1
1, 0 -> 1
1, 1 -> 0

(The name “NAND” is an abbreviation of “not and”, since the NAND of A and B is true except when the AND of A and B is true)

Such tables are called truth tables. From Nand to Tetris provides a handy list of all 2^4 = 16 two-argument boolean functions:



If you have studied boolean logic before, you have doubtlessly spent time manipulating sets of 0s and 1s (or truths and falsities) linked together by AND, OR, and NOT operators. Why these three? In addition to them being straightforward to understand, it turns out that it is possible to specify any boolean function with AND, OR, and NOT operators.

(How? If we have a truth table of the function (if we don’t or can’t make one, then we haven’t properly specified the function!), we can simply take every row for which the value of a function is a 1, build an expression for identifying that row out of ANDs, and then chain together all of these expressions with some ORs. For example, if we want the input sequence a=1, b=0, and c=1 map onto a 1, the expression (a AND ((NOT b) AND c)) will be true for this and only this sequence. If we have a bunch of such expressions, say expressions w, x, y, and z, and we want to figure out if at least one of them is true, we can do so with the expression (w OR (x OR (y OR z))). If we have AND and OR functions/gates that can take more than two arguments, this becomes simpler, since we don’t have to nest ANDs and ORs and can instead write something like OR(AND(a, (NOT b), c), expression2, …).)

It turns out that the NAND function itself is sufficient for defining AND, OR, and NOT (NOR – the negation of OR, in the same way that NAND is the negation of AND – has the same property). This implies that if we have a logic gate that implements the NAND function, we can use it – and it alone – to build chips that implement AND, OR, and NOT functions, and hence any boolean function.

How? (NOT x) can be implemented as (x NAND x). Using this definition, we can write (x AND y) as (NOT (x NAND y)), and (x OR y) as ((NOT x) NAND (NOT y)). Using logic gate symbols, we can represent these gates and some others as follows:



(Warning: the book makes you design all of these.)

Note that the demultiplexor has two outputs. Hence “demultiplexing” is not a function (a function has only one output), and we call it a chip rather than a logic gate.

Note that the concerns of designing a chip out of NAND gates are not precisely the same as that of specifying the boolean function out of NAND operations. For instance, since we defined (NOT x) as (x NAND x) and (x AND y) as (NOT (x NAND y)), the NAND representation of (x AND y) is ((x NAND y) NAND (x NAND y)). There are three NAND operations, so it looks like we need 3 NAND gates – but no, we can split wires as in the above diagram and do it with two. Similar concerns apply to the implementation of the XOR gate in the above diagram.

There are many subtleties in optimising chips, which we (following the example of Nand2Tetris) will skip in order to get on with our journey.


Multi-bit and multi-way chips

A multi-way version of a basic gate allows for applying the function of the gate to many inputs at once. A multi-way AND outputs true if and only if every input is a 1; a multi-way OR outputs true if at least one input is a 1. The implementation is simple: for a 8-bit AND, for instance, take the AND of the first two inputs (call it A), then the AND of A and the third (call it B), then the AND of B and the fourth, and so on.

A multi-bit version of a chip is basically many of those chips in parallel, applying their function to every piece of the input.

For example, a 4-bit AND chip fed 1101 and 0100 as its inputs will output 0100 – the first output is the AND of the first digit of the two inputs, and so on. The implementation is even simpler: send bit 1 of inputs A and B through one AND gate, bit 2 of both through another, and so on.

It gets a bit more complicated when dealing with multiplexors that are both multi-way and multi-bit, but the basic principle is the same: we have a bunch of binary values that we want to group together (perhaps they represent a number), and so we build chips that allow us to deal with them together.

In addition, we don’t want to deal with the wires representing each binary digit of a number individually, so we group them into “buses” that transfer many bits at once from component to component (basically just a clump of wires, as far as I understand). On diagrams, a bus looks like a wire, except with a slash through it.


Arithmetic

Now that we have constructed our basic gates, we can begin doing something interesting – at least if addition counts as interesting.

Since our chips are built of gates that deal with binary values – true or false, 1 and 0, whatever – any reasonable low-level implementation of arithmetic will be confined to base-2 rather than our standard base-10 number system.

(To convert binary to decimal, just remember that the value of each digit goes up by a factor of 2 rather than 10 as you move from right to left; for example, 1011 (base 2) = 1 x 1 + 2 x 1 + 4 x 0 + 8 x 1 = 11 (base 10))

This turns out to make things much simpler. The algorithm for addition is the same (add corresponding digits, output a result and a carry, take the carry into account when adding the next two corresponding digits), but we have far fewer cases:
  • 0 + 0 --> 0, carry 0
  • 0 + 1 --> 1, carry 0
  • 1 + 0 --> 1, carry 0
  • 1 + 1 --> 0, carry 1
We can see that the result bit has the same truth table as the XOR function, and the carry bit has the same truth table as the AND function. Hence, for the simple purpose of determining the result digit and carry bit of two binary digits, the following chip (called a half-adder) is sufficient:



Now we have to figure out how to chain such chips to create a multi-bit adder that can deal with carry bits. Observing that, at most, we will have two input bits and one carry bit to deal with to determine the resulting bit, let’s construct a chip that takes three bits as input and outputs the result and the carry bit. If we add a 0, a 1, and a 1, the corresponding result digit is a 0 and the carry is a 1; if we add a 1, a 1, and a 1, the result bit and the carry bit are both a 1.

The result, called a full-adder, can be constructed like so:



Now that we have a full-adder, we can construct a multi-bit addition chip by simply chaining them together, feeding the carry bit from the previous full-adder as one of the three inputs into the next one.

The only complication is that we have to connect a constant-0 input to the first full-adder to fill its third input, and the final carry bit goes nowhere.

A 4-bit adder looks like this:



This is pretty cool – after all this messing around with logic and logic gates, we have finally built a piece of hardware that does something real.

We still have to consider some issues. The most important is how we represent negative numbers. If are smart enough about it, this also comes with a bonus: we don’t have to construct a separate chip to handle subtraction, but can instead subtract A from B by converting B to the representation of -B and then adding it to A.

The standard method is called the two’s complement method, and it says: in an n-bit system, represent -x as the binary representation of $$2^n - x$$.

For example, let’s say our system is 8-bit. 0000 0000 is 0 (space inserted for readability), 0000 0001 is 1, 0000 0010 is 2, 0000 0011 is 3, and so on. -0 is $$2^8$$ - 0 = 1 0000 0000, but we keep only 8 bits so this turns into 0000 0000 as well. -1 is $$2^8 - 1$$ = 255 = 1111 1111. -2 is $$2^8 - 2$$ = 254 = 1111 1110. And so on.

Another example: From Nand to Tetris presents the complete table for a 4-bit system:
 

The most important consequence of this is that our addition circuit can properly add negative and positive numbers together. -1 + 1 would be represented as inputting two buses, one containing 1111 1111 and the other 0000 0001, to our addition circuit. Adding these up gives 0000 0000 (since our adder only deals with the first 8 bits), or 0, just as intended. -1 + -1 becomes 1111 1110, which is -2. The reader can verify other examples if they so wish.

Another consequence of this is that the largest number we can represent in our n-bit system is $$2^n / 2 - 1$$, and the most negative number is $$-(2^n) / 2$$. Our 8-bit system only gave us the integers -256 to 255 to play with, but the growth is exponential. In a 16-bit system, we can represent the numbers -32768 to 32767; in a 32-bit system, -2 147 483 648 to 2 147 483 647; in a 64-bit system, -9 223 372 036 854 775 808 to 9 223 372 036 854 775 807.

(Of course, when necessary we can implement logic for handling even larger numbers at a higher level in the computer, just as we can implement logic for handling any other data type).

Yet another consequence is that if we add together positive numbers that exceed the limit, the result will be a negative number (large enough negative numbers will also add to a positive). This feature has lead to countless incidents, with some of the more notable ones (ranging from exploding rockets to nuke-obsessed Gandhis in the Civilization game series) listed in this Wikipedia article. As always: beware leaky abstractions.

The Arithmetic Logic Unit

The final piece of our journey that relies on logic gates alone is the construction of an arithmetic logic unit (ALU). Though it has a fancy name, all it does is implements some basic operations we will need: adding, ANDing, negating, and zeroing input buses.

The ALU constructed in Nand2Tetris operates on two 16-bit buses, and also takes 4 bits to configure the inputs (to determine whether to negate and/or zero the x and y inputs), 1 bit to change the function (switches between ANDing and adding the 16-bit input buses), and 1 bit to determine whether or not to negate the output. In addition to outputting a 16-bit bus with the result of the computation, it also outputs 1 bit saying whether or not the output is zero, and another saying whether or not it’s a negative number.

(Note that “negating” a binary bus means flipping each bit (e.g. 0101 --> 1010), and is a different process from switching the sign of a number (e.g. 0101 (5) --> 1011 (-5) with the two’s complement method))

My implementation of such an ALU looks like this (for clarity, I have first defined a separate “ALUPreP” ALU pre-processor chip to perform the negating/zeroing of the x/y inputs):


Slashes on lines indicate a 16-bit bus, not a single wire. The 16s on various bits indicate that they are 16-bit versions of the chip (the AND-gate marked MW 16 is instead a multiway chip; see above discussion on multibit vs multiway chips). The splitting of the first bit from the result bus is a bit questionable as a chip design element, but it works since in the 2's complement method all negative numbers begin with a 1.

 
Having a bunch of bits to zero and negate outputs and inputs and whatever may seem pointless. However, such a design allows us to compute many different functions in only one chip, including x + y, x AND y, x OR y, -x, x+1, y-x, and so on. From Nand to Tetris provides a table (where the notation & = AND, | = OR, and ! = NOT is used):




Remember that the only piece of hardware needed to implement all of this is the humble NAND gate.

(To give a sense of scale: by my count, my ALU design above requires 768 NAND gates (768 happens to be 2^9 + 2^8, but this is just a coincidence).)

Flip-flops

I mentioned earlier that only two pieces of hardware are required to implement most of our computer. In the previous section, we examined what we can do with NAND gates; now, we will turn to flip-flops (no, not that type of flip-flop).

NAND gates allow us to perform any feat of (binary) logic that we wish, but they do not allow for memory.

The way in which the Nand2Tetris computer implements memory is with a data flip-flop (DFF). The principle of a DFF, like a NAND gate, is simple: its output at one “tick” of the computer’s central clock is its input at the previous tick.

Thus, to add DFFs to our computer, we need to assume the existence of some type of clock, which broadcasts a signal to all DFFs. This allows us to divide time into discrete chunks.

Real electronics always involves a delay in passing the signal from one component to another. Thus, when we pass inputs to our ALU, there’s a brief moment before the ALU stabilises to the true result. Inputs arriving from different parts of the computer also do not arrive simultaneously. Dividing time into ticks allows us to abstract away all such concerns (as long as the ticks are long enough for everything to stabilise); all we care about is the state of the computer at each tick, not what happens in between two ticks while the state is transitioning.

A DFF and a multiplexor (a logic gate with two inputs and one selector bit, outputting the first input if the selector bit is 0 and the second if the selector bit is 1) can be combined to create a 1-bit register:



The operation of this register is as follows:
  • If the selector bit is 1, the DFF’s output at time t is the input value at time t-1.
  • If the selector bit is 0, the DFF’s output at time t is its output at time t-1.
Hence, we can set a value (either a 0 or a 1) into the 1-bit register, and it will keep outputting that value until we tell it to change to a new value (by sending it the new value and sending a “1” as the input to the multiplexor’s selector bit).

Of course, a storage of 1 bit doesn’t allow for very many funny cat GIFs, so clearly there’s still some work to be done.

The first thing we do is we make the registers bigger, simply by adding many 1-bit registers in parallel. Most useful elements on which we do computations (numbers, letters (which are stored as numbers), etc.) take more than one bit to specify, so it’s useful to split memory into chunks – 16 bit chunks in the case of the Nand2Tetris computer.

Next, let’s take many of these bigger registers, and put them in parallel with each other. The problem now is accessing and setting registers in the memory independently of each other. We can add a series of address bits as inputs to our memory chip and then build some circuitry so that the output will be the contents of the memory with the address specified by the address bits, and if we load a new input, the input will be loaded into the register with the address that is being inputted.

A simple memory unit of four 16-bit registers, each uniquely identified by 2 address bits (address are: 00, 01, 10, and 11), and its control logic can be implemented as follows:

Less complicated than it looks. The input is split so that it reaches every register. If load=1, the three multiplexors on the left route it to one of the registers, and the input is loaded into that register (if load=0, nothing is meant to be loaded this tick and all load?-inputs into the registers are 0, so nothing happens). The register output buses are passed through a series of multiplexors to select which one's output is finally sent out of the memory chip.

To construct larger memory chips, all we need to do is add registers and expand our address access logic. If we want a memory chip with, say, 64 registers, we need log2(64) = 6 bits to specify which address we are talking about, and hence 6 address bits (n address bits gives you 2^n unique addresses, which is why computer memory sizes are usually powers of 2: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, etc.).

Since we can access and set any part of the memory at the same speed and in any order, we call this “random access memory” (RAM). RAM is not permanent memory – maintaining state requires DFFs constantly passing looping signals to each other, which in turn requires power. Turn off power, and you lose memory contents.

(RAM based on DFFs is called static RAM or SRAM, and is faster but more expensive and power-hungry than the alternative capacitor-based DRAM design. Hence SRAM is mainly used for small CPU caches (in the kilobyte or single-digit megabyte range), while the main system memory – what people generally think of when they hear “RAM” – uses DRAM, with capacities in the one or two-digit gigabytes.)

Nand2Tetris does not examine the functioning of hard disk drives (HDDs) or solid state drives (SSDs) or other more permanent data storage devices.

Instruction sets & machine language

So far, using nothing but NAND gates and DFFs (and, well, wires and buses and clocks and so on), we have figured out how to:
  • perform arbitrary logic operations on binary data (and hence also perform basic arithmetic in base-2), and
  • store arbitrary binary data in memory of arbitrary size.
All this arbitrariness gives us a lot of power. Let’s put it to use.

The next thing we want to implement is the ability to give our computer instructions. We have already shown that it is possible to build chips that carry out arithmetic on binary numbers, so if we encode instructions as binary strings, we can identify and handle them just fine (though the control logic is complex).

In Nand2Tetris, two basic types of instructions are implemented, each 16 bits long. I list them here to give you an impression of what they’re like:
  • If the first bit is a 0, the next 15 bits are interpreted as a memory address (in a memory of size 2^15 = 32768 bits), and the contents in memory at that point (a 16-bit value in our 16-bit system) are loaded into a special address register inside our CPU.
  • If the first bit is a 1, then:
    • Bits 2 and 3 do nothing.
    • Bit 4 determines whether the second input we will pass to the ALU is the contents of the address register, or the contents of the memory location that the address register points to (the first input to the ALU is always the contents of a special data register in our CPU).
    • Bits 5-10 are the 6 bits passed to the ALU to determine which function it will compute on its inputs (see ALU table above).
    • Bits 11, 12, and 13 determine whether or not to send the output of the ALU to, respectively: i) the address register, ii) the data register, and/or iii) the memory location that the address register points to.
    • Bits 14, 15, and 16 determine which of 8 (=2^3) possible jump conditions apply. If all three bits are zero, do not jump (the next instruction executed is the next one in the program); the remaining 7 possibilities encode things like “jump if ALU output is negative”, “jump if ALU output is zero”, “jump regardless of ALU output”, and so on. The destination of the jump is always to the instruction in instruction memory* with the address currently in the address register.
(*For simplicity, the computer design in Nand2Tetris features a separate memory space for the instructions that make up the program and for the data the program operates on)

The Nand2Tetris CPU instruction set is a rather minimalist one, but even so it allows for real computation.

If you need convincing that this is true, consider a program that adds all numbers from 1 to 100. An annotated instruction sequence which achieves this is given below, interspersed with what the code might look like in a more readable language:

D refers to the data register, A to the address register, and M[A] to the memory contents that the address register points to. Before each machine language segment are one or two lines of higher-level code which may translate into the machine code underneath. The choice of M[0] and M[1] as the places where we store the two variables is arbitrary.

Such a list of instructions is called machine language.

With machine language, we have finally risen from the abyss of hardware to the surface world of software. Having achieved this, all that remains (to misquote Winston Churchill) is the proper application of overwhelming abstraction.

Assemblers & virtual machines

Machine language, though powerful, suffers from a significant flaw: no one wants to write machine language.

Thankfully (these days), practically no one has to.

The first layer we can add on machine language is ditching the ones and zeroes for something marginally more readable, but keeping a one-to-one mapping between statements in our programming language and machine instructions. For example, instead of “0000 0000 0010 1011” to load the contents of memory location 43 into the address register, we write “LOAD 43”, and use a program that converts such statements to the machine language equivalents (if such a program does not exist yet, we have to do it manually).

We can also write programs that let us define variables as stand-ins for memory addresses or data values, and then convert these to the corresponding memory locations for us. A massive benefit for the programmer is also ditching the insistence on one-to-one correspondence between lines and machine instructions. A single statement in a high-level language translates into many machine language instructions.

Programming languages that retain a strong correspondence between statements and the computer’s machine language are termed assembly languages. The program that performs the work of converting an assembly language into machine language is called an assembler.

In general, a program that converts another program written in language A into a version that runs in language B are called compilers. The process of running any program eventually ends with it being compiled, often through many intermediate steps, into machine language.

Virtual machines

Often, we want our programs to work not just on one processor type and one computer, but on many computers with different processors and hence possibly different underlying instruction sets.

One way to achieve this is to have a specification for a virtual machine (VM). For any language that we want to implement, we write a compiler that converts that language into instructions that the VM can understand. For all computer platforms we need to support, we write a compiler that converts the VM instruction language into the machine language of that computer platform.

This means that whenever we want to add a new programming language, we need to write only one compiler (language -> VM), rather than having to write separate compilers for every platform (assuming VM -> platform machine language compilers exist for every platform). The standardised VM specification serves as a station through which all programs on the journey to execution.

(The Java Virtual Machine (JVM) is one famous – or infamous – example of a VM.)

There are two main models for a VM: register-based and stack-based. Nand2Tetris implements a stack-based VM.

A stack is a data structure on which we have two operations: push, which adds an element to the top of the stack, and pop, which returns and removes the topmost element. We also assume we can implement functions that pop two elements off the stack, add them together, and push the result onto the stack.

The first surprising fact about stacks is that it is possible to compute any arithmetic expression with stacks. From Nand To Tetris gives an example:


Likewise, we can determine the truth or falsity of logical expressions by implementing, for example, an equal-to function that pops two elements of the stack, pushes “true” to the stack if they’re equal, and pushes “false” otherwise (boolean values may be implemented as false being 0 and true being any other value, for instance).

The second surprising fact about stacks is that it also lends itself naturally to modelling program flow.

Consider what flow control must do. Most simply, the flow must split if we have an if-statement. Even our machine language can handle this with its jump commands.

The trickier part is calling functions. If a program encounters a function call while executing, the program should now switch to running the commands that make up that function, but not before figuring out what the arguments to the function are supposed to be. After the function has finished executing, its value should be returned to the expression that was executing when it was called, and execution should continue where it left off before the function call.

All of this gets more complicated when we consider that the function we call may itself call many different functions and these functions may call the original function yet again, or that some functions may call themselves, and so on. How can we possibly implement such complicated logic with a stack machine?

Consider – what is the natural representation of a nested series of function calls? A stack. We want the ability to add things to the top of the called-function stack when new functions are called (pushing), and removing them from the stack when they finish (popping).

The details of implementing function calling is rather complex. The Nand2Tetris implementation involves maintaining various special memory segments, used for things like function arguments, local variables, and so on, which I will not discuss here.

It should be noted that though VMs are a useful abstraction, they are not strictly necessary and many compilers still work by compiling directly to machine language.

High-level languages & operating systems

Now that we have our VM, we can start implementing high-level programming languages (ones that are made for humans to write, not computers to execute) on top of it.

The implementation process typically looks like writing a specification for a language and then writing a compiler that translates the new language into VM or machine code that does what the language specification says it should do. Alternatively, we can have an interpreted language, where a lower-level programming language determines the values and effects of statements in the new language on the fly, though this is typically far slower.

Armed with a high-level language, we can now implement an operating system in a reasonable amount of time and without driving the developers crazy (imagine writing 100 000 lines of code for a stack machine, let alone in machine language).

(However, to implement an operating system, it is also important that the programming language is sufficiently low-level that we can deal with hardware with the required finesse.)

The task of an operating system is to provide commonly-needed functions like mathematical operations (in the Nand2Tetris platform, only addition is implemented in hardware, so even multiplication must be implemented in software), string handling, parsing of input from other devices, sending instructions to other devices, and allocating time on the processor. Imagine if the programmer had to specify exactly which pixels on the screen should be black to render the letter ‘a’ – not fun.

We’ve come a long way – and skipped steps here and there for brevity – but now we’re finally ready to implement Tetris.

Credit to Randall Munroe of xkcd. Original can be found here.

 

Tetris not included

The book’s website has free software for implementing (a software emulation of) everything discussed in the book. In theory, this allows the reader to build (an emulation of) the computer hierarchy specified in the book, all the way from logic gates to a simple Pong game (in an example of patently criminal false advertising, Tetris is never actually implemented in the book).

In practice, I found the software to be rather slow and finicky to use (it might run better in Windows). Also, if you follow along with the book, you won’t actually build a self-contained computer emulation yourself, but rather go through the implementation level by level, implementing each level yourself but using the pre-built implementation for all levels below. This is a rather trivial difference, but I find it does remove some of the motivation from the project (though I am sure that doing so would still be of enormous educational value).

After deciding to ditch the companion software, I used Racket (a language in the Lisp family that comes with a very friendly IDE) to write code that emulates logic gates. Once this was no longer sufficient, I followed the book with circuit diagrams for a few more chapters, after which I resorted to just reading (often many times, until things sunk in).

I found the first part of the book to be the most interesting (you may have noticed that my discussion of chips and ALUs was much more in-depth than that of later topics; rest assured that this is my own personal bias rather than the book’s).

The primary value of From Nand to Tetris is that it dispels any sense of magical mystery surrounding the functioning of a computer. After you’ve designed the hardware and implemented a machine language, all that follows is the implementation of ever fancier languages and programs on top of those. If you have decent programming experience, though you may have no idea about the details, it does not require you to suspend much disbelief that such a thing is possible. I read From Nand to Tetris after having already spent a lot of time programming, but having only the most rudimentary understanding of logic gates and no clue what DFFs were; hence the first chapters were eye-opening, and while the later chapters contained a wealth of new information, they weren’t captivating in the same way.

I wholeheartedly recommend From Nand to Tetris for people who have done some programming and want to see how such a miracle can be implemented in hardware in the first place. The first part of the book is remarkably self-contained, assuming no knowledge of electronics (logic gates and DFFs are assumed as primitive objects and the details of their construction not discussed), and though prior exposure to boolean logic is helpful, it is not necessary.

The key point to understand with From Nand to Tetris is that it’s goal is to get you from NAND-gates and DFFs to operating systems and high-level languages as quickly as possible. It almost entirely omits discussion of alternative design decisions, though entirely different architecture possibilities exist on virtually every step of the journey. Optimisation is not discussed. If you want an in-depth understanding of some part of the journey, you are better off reading something else.

If you want to see how some steps are even possible, or to see one complete example of what the entire computer technology stack looks like at a (300-page) glance, then From Nand to Tetris is an impressively complete and concise guide.

If you’re ever stuck on a desert island with nothing but a large supply of NAND-gates and DFFs, electrical wiring, a copy of From Nand to Tetris, and a lot of patience, don’t worry – just build a computer, program it, and play Tetris to pass the time.

2019-08-01

Review: Amusing Ourselves to Death

Book: Amusing Ourselves to Death: Public Discourse in the Age of Show Business, by Neil Postman (1985)
4.8k words (≈16 minutes) 

At first glance, Amusing Ourselves to Death seems neither interesting nor relevant. It is a short book, published in 1985, about the effect of television on public discourse in United States. However, take a second glance.

In his oft-quoted foreword, Postman writes:
Contrary to common belief even among the educated, Huxley and Orwell did not prophesy the same thing. Orwell warns that we will be overcome by an externally imposed oppression. But in Huxley’s vision, no Big Brother is required to deprive people of their autonomy, maturity and history. As he saw it, people will come to love their oppression, to adore the technologies that undo their capacities to think.
What Orwell feared were those who would ban books. What Huxley feared was that there would be no reason to ban a book, for there would be no one who wanted to read one. Orwell feared those who would deprive us of information. Huxley feared those who would give us so much that we would be reduced to passivity and egoism. Orwell feared that the truth would be concealed from us. Huxley feared that the truth would be drowned in a sea of irrelevance. Orwell feared we would become a captive culture. Huxley feared we would become a trivial culture […].
[…]
This book is about the possibility that Huxley, not Orwell, was right.
I have not read Brave New World, so I cannot vouch for the accuracy of this comparison. But, at least in Postman’s telling, Huxley’s fears seem more relevant than ever.

Countless bookshelves, newspaper stands, and – these days – AWS servers groan under the weight of critiques about the effects of various information technologies on society. Ironically, these critiques themselves are often shallow, hackneyed, and repetitive, and consequently they drown each other out in a sea of irrelevance much like the one they so doggedly warn us of.

This is not to say that there is no problem. Television, YouTube, and social media, while all useful technologies, have also distorted parts of daily life and public discourse for the worse. However, thoughtful discussion of the effects of new mediums on society is rare (some of Cal Newport’s writings come to mind, though they focus exclusively on internet’s impact on individual’s lives).

Enter Neil Postman, with his topical yet three-and-a-half decade old Amusing Ourselves to Death.


Mediums at large

Let’s say you have a revolutionary idea and you want to get it out to people. You write a book, make a TV documentary, and do a TED talk about your brilliant insight that, say, we should eat more apples and less oranges.

What does the book look like? First, it has to be long, otherwise it isn’t a real book, so if it falls much below 250 pages you pad it out with some redundant repetition, more case studies, and an extra-extended notes section if you’re really desperate (Postman thankfully does not fall prey to this temptation). You put in a lot of effort into structuring your argument, all the way from the sentence-to-sentence to the chapter-to-chapter level. Maybe the first third is on the foundations of fruitology going back to ancient times, complete with Plato quotes, the second third involves meticulously compiled statistics together with countless case studies of apples and oranges in the wild and quotes from leading economists, and the third third wraps up with a look at future trends, an eloquent plea for action, and a closing quote from some pretentious nineteenth century poem. You cannot lean much on images, let alone video or music, so all the way through you have to make sure that the words alone carry the argument (as well as the reader’s attention).

What does the TV documentary look like? It opens with a dramatic time-lapse shot of an apple tree against a setting sun. “Apples”, intones the narrator in a silky smooth voice, while shots of anything even tangentially apple-related, all the way from the Macintosh computer to Isaac Newton, flash across the screen. Cut to a slightly crazed professor talking about the importance of apples. Continue this way for fifteen minutes, at which point your program is interrupted for advertisements. You spend the next five minutes explaining everything again, using re-cut versions of the exact same shots (both to establish familiarity and because footage is expensive). Then, cut to a foreboding image of an orange tree blotting out the sun, as ominous music begins to play. Cut to another professor who says a few words (literally) about the dangers of oranges. Continue cutting back and forth between talking heads and other footage for the duration of the program. Make sure that a half-asleep viewer tuning in midway through still manages to catch all points.

What does the TED talk look like? You stand up on stage and recite precisely memorized lines summarizing your argument. No, you don’t recite – you perform. Wait, no, you don’t perform, you hypnotize the audience with your perfect elocution and poise. Talk as if you were casually summarizing the topic to a friend, except you had coincidentally spent the past six months devising the wittiest argument against oranges the world had ever seen, and you know your friend will be stationary and mute for fifteen minutes. Keep precisely to the time limit. Close with a one-liner worthy of Winston Churchill, for instance: “it’s not like comparing apples to oranges, is it?”

It’s obvious that different mediums differ. Postman’s argument is that such differences are deeper than they appear.

Written text (what Postman somewhat clumsily terms the “typographic medium”) is not very visually engaging, unless you use a font that kills any chance of anyone reading what you write. Text cannot be written in real-time, so an interaction with a text is always also an interaction with the past. Text is impersonal. We have evolved to pick up on a lot of non-verbal information when talking with someone; text provides none. Finally, in any exclusively verbal medium it is very hard not to say something. A grammatical sentence generally makes some sort of proposition. These features of the written word make it well-suited for abstract ideas and rational debate.

Television is different. Most obviously, it is less impersonal because it conveys appearance and voice. Postman further argues that by enabling near-instantaneous information transfer, live-streaming of footage, mood-setting music, and easy joining of segments, it changes our model of information to a shallower and more disjointed one.

The natural endpoint of such a medium, Postman claims, is something like the following description of televised news (by a co-anchor and executive editor of an American news show):
[The idea] is to keep everything brief, not to strain the attention of anyone but instead to provide constant stimulation through variety, novelty, action, and movement. You are required […] to pay attention to no concept, no character, and no problem for more than a few seconds at a time.
Though Postman is correct in pointing out differences in the nature of mediums, he largely ignores the impact of business and economics on how a medium is used.

I think this might be because in the 1980s it was difficult to imagine a decentralized multimedia medium, and hence it was reasonable to assume that any high-tech medium would be subject to the same pressures to pander to the lowest common denominator and sacrifice everything for the sake entertainment in order to maximize viewer retention. The internet changed this. Much of it is governed by big companies that seek “user engagement” with greater zeal than ever, but its cheapness also allows for much else to exist in the cracks. This makes the difference between the inherent worth of a medium and its equilibrium business model clearer.


Justice and the medium

Here’s Postman’s example of justice in an oral culture:
When a dispute arises, the complainants come before the chief of the tribe and state their grievances. With no written law to guide him, the task of the chief is to search through his vast repertoire of proverbs and sayings to find one that suits the situation and is equally satisfying to both complainants. That accomplished, all parties are agreed that justice has been done, that the truth has been served.
Just as premodern ideas of justice leaned heavily on what works in an oral medium, the modern western justice system leans heavily on the typographic medium. Laws are quintessential prose: they are detail-heavy written texts unadorned by poetic meter or other rhetorical devices and hence difficult to memorize (as many a law student no doubt knows). As a result, building a legal system with laws (of the modern kind) as its basis is virtually impossible in an oral medium.

The reliance of the rule of law on the typographic medium goes deeper. Because written sources cannot be created on the fly in the same way that speech can, typography is an inherently backwards-looking medium. Likewise the legal system is all about analyzing previously-written laws and precedents. Capturing any thought in the unchanging written medium also allows for a level of analysis that is not possible in an oral medium. Likewise legal work relies on close textual analysis of a type that people in an oral culture would probably regard as obsessive.

Of course, typography is not the only medium used in courts. Postman notes that testimony is still given orally, and a part of oral culture survives in the belief that spoken testimony is more reliable than written testimony. Even so, the laws themselves are written, and the process of their interpretation is as typographic as culture can get.

In the second part of Amusing Ourselves to Death, Postman talks about the influence of television on religion, politics, advertising, and education, but not justice. I think this is because television is not a natural medium for justice. As Postman emphasizes, the strength of television is amusement, and amusement has little to do with justice.

What about the internet?

Postman points out that Huxley’s dystopia has come closer to fruition: if our culture is in danger, it is because of distraction and amusement, not control and hate. However, the internet has turned out to be a uniquely powerful conduit for hate. Television’s capacity for hate is limited by the fact that it must optimize for mass appeal, but social media algorithms can target individuals. Online echo-chambers dedicated to enforcing an in-group identity and hating on a perceived out-group have much more to do with Orwell’s Two Minutes Hate than anything in a Postman/Huxley-esque amusement dystopia. And hate – or, perhaps more accurately, outrage – often leads to demands for justice.

If oral justice looks little like our modern typographic justice, then we should not be surprised that the court of Twitter is very different from any current court.

No one, of course, is suggesting that the judicial process should relocate to the internet. But the danger with a new medium is never so overt. The danger Postman warns of is not so much that we decide to replace education and politics with entertaining TV programs, but that the customs and metaphors of a TV program will leak into the institutions of any culture that makes television their predominant medium. Likewise, the danger for justice is not that real courts will be overrun by the Twitter Supreme Court™, but that the culture of online outrage mobs becomes the way in which we talk about justice, and hence eventually the way we carry out justice.

Real justice requires painstakingly detailed interpretation of the law, slow and careful collection of evidence, time for everyone to be heard, reasoned debate, and careful, precedent- and evidence-backed decision-making. Each of these elements is well-served by the current mix of typographic and oral mediums. Each of these elements is opposed to the short timescales, groupthink, and vanity of social media.

Postman describes the trial of Socrates:
[Socrates] tells his Athenian brothers [at the beginning of his trial] that he will falter, begs that they not interrupt him on that account, asks that they regard him as they would a stranger from another city, and promises that he will tell them the truth, without adornment or eloquence. Beginning this way was, of course, characteristic of Socrates, but it was not characteristic of the age in which he lived. For, as Socrates knew full well, his Athenian brothers did not regard the principles of rhetoric and the expression of truth to be independent of each other. People like ourselves find great appeal in Socrates’ plea because we are accustomed to thinking of rhetoric as an ornament of speech – most often pretentious, superficial and unnecessary. But to the people who invented it, the Sophists of fifth-century B.C. Greece and their heirs, rhetoric was not merely an opportunity for dramatic performance but a near indispensable means of organizing evidence and proofs, and therefore of communicating the truth.
To disdain rhetorical rules, to speak one’s thoughts in a random manner, without proper emphasis or appropriate passion, was considered demeaning to the audience’s intelligence and suggestive of falsehood. Thus, we can assume that many of the 280 jurors who cast a guilty ballot against Socrates did so because his manner was not consistent with truthful matter, as they understood the connection.
What if the customs of social media become the new rhetoric? Some will no doubt benefit. Socrates wouldn’t.


Who are we to say what is better?

Today we believe ourselves to be superior – closer to the truth and closer to what is just – than those who lived in oral cultures before us. We simultaneously look down on the next medium shift. This is suspiciously similar to the general (and generally incorrect) tendency to believe that all change up to our day was good and that any future change is bad. Can we argue in any absolute sense why a typographic medium is better than alternatives? More broadly, Postman asks, can we say whether any particular medium is better than another? Might our typographic laws or quantitative worldview be fundamentally no different than someone else’s oral laws or creation myths?

Postman devotes some space to arguing for the virtues of typography; I have summarized some key points in a previous section. However, in the general case Postman leaves the question unanswered, pausing only to reassure the reader that he’s not a total cultural relativist.

I think we can do a bit better.

Humans are flawed. If we wish to aspire towards ideals like truth or justice, we need to be very careful not to be lead astray by our baser tendencies: avoid falling prey to groupthink or charisma, avoid resorting to moral outrage before careful analysis, resist the temptation of a good story over the literal truth.

Postman writes: “Television’s strongest point is that it brings personalities into our hearts, not abstractions into our heads.” In a way, this makes television very human: we desire to have personalities in our hearts more than to have abstractions in our heads. Social media, carefully optimized to cater to our every desire, is even more human.

However, these mediums are all too human. Look carefully at a television program, social media feed, or an oral performance, and what stands out is not the shape of any abstract ideal, but the shape of the human desires into which the content has been bent.

Therefore, if it’s truth or justice or some other ideal that you’re after, you should expect to have to work in an impersonal medium.

Ancient creation myths filled the universe with human figures – jealous lovers and loving fathers and whatnot. Later it turned out these myths were more like mirrors than telescopes, and only by ditching them in favor of real telescopes and the cold abstractions of mathematics could we make progress.

More and more human mediums give much in terms of human connection and happiness. At the same time, they also privilege our intuitions and preconceptions. It is worth keeping in mind that our intuitions are not truth and our preconceptions are not justice.


Consistently inconsistent
Indeed, the President continues to make debatable assertions of fact but news accounts do not deal with them as extensively as they once did. In the view of White House officials, the declining news coverage mirrors a decline in interest by the general public.
This is an excerpt from a New York Times article which Postman quotes. The year is 1983 rather than 2019, and the president is Reagan rather than Trump, but the parallels are uncanny.

Another quote about Reagan:
If a movie […] could be made from his misleading accounts of government policy, if there were a break-in of some sort or sinister characters laundering money, attention would likely be paid. We do well to remember that President Nixon did not begin to come undone until his lies were given a theatrical setting at the Waterhouse hearings. But we do not have anything like that here. Apparently, all President Reagan does is say things that are not entirely true. And there is nothing entertaining about that.
Likewise, the greatest Trump scandal (at the time of writing) is his potential dealings with nefarious Russians, not his routine disregard for truth or consistency.

Consistency, in Postman’s view, is a value of a typographic culture. An unchanging record of your words makes inconsistency stand out. Though today Twitter provides a public backlog of anyone’s tweets, it seems the inherent disjointedness of the medium nevertheless de-emphasizes consistency.


Government of the medium, by the medium, for the medium

The use of video biases the political process in several ways, most notably towards appearances. A political candidate benefits from being fit, tall, and good-looking. And god forbid you seem anything less than maximally relatable when you talk.

Medium-driven bias in leader selection is not at all a modern phenomenon, though Postman often gives this impression. Postman examines debates between Abraham Lincoln and his opponents as a paragon of rational public discourse. However, the selection process that chose Lincoln was biased towards selecting people who could rouse crowds with immortal speeches.

Of Churchill, it has been said that “[he] lived by phrase-making. He thought rhetorically, and was constantly in danger of his policy being made by his phrases rather than vice versa” (the quote is from Roy Jenkins, as cited here).

Churchill was certainly a capable leader, but his predilection towards what makes for a good speech (or witty anecdote) likely lead to at least a few mistakes he wouldn’t have otherwise made. What else do you expect of a great leader chosen during the age of radio?

The overlap between good leadership and good speechmaking is no doubt much larger than the one between good leadership and good looks, but it is not a perfect one. We should expect the biases of our selection process to be echoed in the mistakes of our leaders.

Where Postman is certainly correct is in arguing that rational public debate is the best tool we have, and that anything that attacks the foundations of such debate is sure to make things worse.


What is wrong with the buyer

Postman’s brilliant analysis of television commercials is worth quoting at length:
Indeed, we may go this far: The television commercial is not at all about the character of the products to be consumed. It is about the character of the consumers of the products. Images of movie stars and famous athletes, of serene lakes and macho fishing trips, of elegant dinners and romantic interludes, of happy families packing their station wagons for a picnic in the country – these tell nothing about the products being sold. But they tell everything about the fears, fancies and dreams of those who might buy them. What the advertiser needs to know is not what is right about the product, but what is wrong about the buyer. And so, the balance of business expenditures shifts from product research to market research.
In short: “One can like or dislike a television commercial, […] [b]ut one cannot refute it.”
This is in marked contrast to any written piece. To set something down in words is to set it up for refutation. In contrast, to present something in a video laden with catchy music, emotional footage, and charismatic people is the surest way yet discovered of obscuring the propositions being made.

Postman provides an example of a late-1700s newspaper advertisement:
Whereas many persons are so unfortunate as to lose their Foreteeth by Accident, and otherways, to their great Detriment, not only in Looks, but Speaking both in Public and Private:—This is to inform all such, that they may have them re-placed with false Ones, that look as well as the Natural, and Answers the End of Speaking to all Intents, by PAUL REVERE, Goldsmith, near the Head of Dr. Clarke’s Wharf, Boston.
If you’re used to online video advertisements, this is a breath of fresh air: non-intrusive (and silent!) text, stating a target audience and how they may benefit. Of course, exaggeration, emotional appeals, appeals to authority, or any number of fallacies are possible in a typographic medium (Postman gives slogans as an example of an anti-rational yet pervasive element of written advertisements). But when the words are there, abstract and silent and unchanging, it is far easier to subject them to scrutiny.


Knowledge is strength, ignorance is knowledge

Postman also worries about the influence of television on education. I think this part of the book has not aged as well as the rest. Worrying about Sesame Street as the leading example of the future of education is rather quaint.

Earlier I mentioned that a major difference between television and the internet is that the former must appeal to a large audience, while the latter is perhaps the most niche-appealing medium humans have ever devised. This results in splintering and, in many cases, divergence.

Many parts of the internet do promote a shallow view of learning, and the pervasiveness of such sites likely has a negative effect on our culture. At the same time, the internet gives us Wikipedia and KhanAcademy, just to name two prominent examples. Previously any number of barriers could prevent you from learning something. Today the only barriers are willpower and time.

Some online learning resources are also great examples of innovative medium use. RedBlobGames provides interactive visualizations of common algorithms. YouTube channels like Mathologer and 3Blue1Brown make good use of the visual capabilities of video to explain math. Michael Nielsen writes about even more ambitious new mediums.

However, it is easy to make excessively bold claims. Every new medium and communication technology has been prophesied as a revolutionizer of education. Yet education still consists of reading, listening, and taking notes. Sure, the reading is easier than ever to find, the listening is occasionally to a video, and the note-taking occasionally on a tablet, but the process itself has changed little. It’s almost as if what really matters is what goes on in the student’s head.

To summarize:
  1. New mediums can make learning more accessible, and even introduce new mental tools or modes of thought.
  2. With the possible exception of brain-machine interfaces, nothing will ever supplant the hard work of actually thinking.

Amusing ourselves to the future
There is a strong argument to be made that ensuring that the long-term future goes well is our most important priority. A critical sub-problem is protecting civilization against irreversible decline.

Most of the attention concerning this problem is focused on big splashy disasters, like nuclear war, pandemics, or climate turmoil. These are by no means unimportant, but Postman’s analysis raises the specter of a more subtle form of decline.

The danger, as Postman repeatedly emphasizes, is not some technology in itself, or even its use. Idle entertainment is generally harmless, at least in terms of the epistemological problems Postman worries about (though its effects on an individual’s use of time may be more worrisome). As Postman writes: “we do not measure a culture by its output of undisguised trivialities but by what it claims as significant” – the problem is that the customs of any medium that becomes central in a culture tend to leak through to all parts of that culture.

Postman is not a fan of modern televised debates, pointing out that they have been bent to the breaking point by the emphasis on visuals and short time-scales that the medium imposes. I found myself largely agreeing with Postman’s analysis. However, on the (few) occasions when I have chanced on a televised political debate, I automatically assumed that it was a serious, prestigious, and reasonable form of discourse, without considering how it’s shaped by the constraints of television. I doubt I’m alone in this. Concerns about a medium are abstract, meta-level issues and hence difficult to realize on your own.

Lincoln (a lawyer) was a victor in the age of live oral debate, Churchill (a writer) in the age of radio, Reagan (a Hollywood actor) in the age of television; today, Trump (a television personality) sits in the White House, and Ukraine’s new president is a comedian (if you want to go back further, Genghis Khan was a victor in the age when public discourse consisted of hacking people’s heads off from horseback). It is hard not to see the influence of the medium.

However, it is important to not overstate this trend. While the choice of medium shapes the political process, it does not determine it. The transition from one medium to another is never absolute, either; long-form text will survive, even if its role in public discourse is diminished.

What is the solution? Television and the internet cannot be beaten back in the realm of entertainment. Nor should they; they are the best at it.

What would be useful are norms of public discourse that grant special prestige for content that adopts the good practices of the typographic medium: content that is long-form, structured, makes its propositions clearly, highlights the ideas rather than the author, and abstains from bells and whistles that do not add propositional content. Generally, such content would look like lots of plain text, but because of the greater configurability of the internet compared to television, I’m more optimistic than Postman that thoughtful variations can be successful.

Such norms would be difficult to maintain because upholding them would require not just resisting the appeal of more emotionally desirable content, but also maintaining a broad understanding of why such things matter. Currently such norms exist in places like academia, but extending them to (for example) the arena of political debate seems difficult.

What does the future hold? The worst-case scenario is that a large chunk of public discourse is increasingly captured by a culture of entertainment and outrage, leading to mob-driven justice and gimmick-driven education, while political power trades hands from clown to clown until someone’s antics go too far and we stumble into nuclear war.

A more realistic scenario is that anti-rational new mediums take a recurring but non-fatal toll. Societal problems are tackled in ways that are slightly worse than they would have been had entertainment-oriented mediums not exerted their biases on the debate. The need to pander to the demands of such mediums slows the spread of many good ideas, though occasionally reason coincides with entertainment and a good idea will spread much faster than it otherwise would have.

The best-case scenario is that a counter-reaction to the ills of the modern internet leads to a widespread shift to more thoughtful mediums, as well as more thought about mediums. The latter, combined with the possibilities of what can then rightfully be called information technology, allows for the creation of mediums specifically designed to promote, for example, understanding algorithms, careful argumentation, or other positive goals.