How teaching vi to secretaries brought LLMs to humanity

Story time... I'm going to connect vim to the current LLM megatrend :)

Time: around 1976

Place: Bell labs

Unix was new, and vi made people think that computers could be used by non-programmers, for example secretaries (who used to do a lot of typing, dictation, on typewriters!). If you could get secretaries to use vi, they will same so much time!

They liked this idea enough to bring psychologists on board.

One of them was Tom Landauer:

You can say they were thinking about UX before the mouse even existed, not to mention GUIs.

They placed secretaries in front of vi and looked at what they did.

They realized that yes, while vi is pure English, human language, they will still misremember commands. Was it 'delete word' (dw) or 'remove word' (rw)? This was a core problem to increase the usability of vi.

The research group decided that synonyms in natural language were a problem; more than synonyms, words that cover partially the same space in semantics as other words, but not completely.

Computers would be much easier to use if they represented meaning. Rather than representing the 'delete' command as a character (d), they should represent different commands in ways that overlap with each other. And they should learn how to do this from looking at human language usage (ie, it's not good enough to have a pre-programmed lookup table with d -> delete, r -> delete etc). Too many possibilities.

The result of this research was the idea that you can represent semantics as a multidimensional space. A word is a vector. Two words with similar meaning are vectors with a high cosine similarity (sounds familiar :) ? ).

They created LSA:

The paper was revolutionary, because it demonstrated that giving a computer enough text, they could learn the meaning of words. To the point that they could reach a grade in the TOEFL test enough to get admitted in university.

You can see this has impressive implications. Linguists, like Chomsky, said that we humans come equipped from birth with a system of rules that allows us to learn language. And here you had a machine that learned language from scratch (tabula rasa, just a very general learning mechanism based on coocurrence word-context).

Plus, you can see how the computer gets better as it gets more text. You can train it with only grade 3 text (in size and complexity) and it makes the same mistakes a child of that age does.

The following are the specifics for each space:

name grade maxDRP # docs # terms # dims

tasa03 3 51 6,974 29,315 300

tasa06 6 59 17,949 55,105 300

tasa09 9 62 22,211 63,582 300

tasa12 12 67 28,882 76,132 300

tasaALL college 73 37,651 92,409 300

These were the 90's. Computers were tiny by modern standards. The model in that 1997 paper was not a NN (too computationally expensive). LSA uses truncated singular value decomposition (SVD), which does something that approximates what a NN would do during learning, but much more efficiently. There was code from U of Tennessee that allowed us to do sparse matrix SVD on a matrix of... 37,651 x 92,409 (docs by unique terms).

The computer that ran that was the one with the most memory in the entire campus. An alpha machine, with unix, not linux. It had a whooping... 2gb ram.

They even sold us an Itanium later. Which we barely got any use of, because that platform was full of problems.

LSA was the grandaddy of topics models, BERT, word2vec etc etc to modern day transformers and LLMs.

And all of this... because secretaries couldn't learn vi commands :)

Bell labs was a wonderful place. Tom had lots of stories about conversations in corridors between scientists from different disciplines. Bell labs and The Royal society of London (Newton, Locke etc) are my perfect places in history, where humans were doing what we were supposed to do.

Comment on Hacker News: