Lof's Gemlog//lof.flounder.online/gemloglofHow teaching vi to secretaries brought LLMs to humanity2024-01-09T00:00:00Ztag:lof.flounder.online,2024-01-09:/gemlog/2024-01-09 How teaching vi to secretaries brought LLMs to humanity.gmi# How teaching vi to secretaries brought LLMs to humanity
Story time... I'm going to connect vim to the current LLM megatrend :)
Time: around 1976
Place: Bell labs
Unix was new, and vi made people think that computers could be used by non-programmers, for example secretaries (who used to do a lot of typing, dictation, on typewriters!). If you could get secretaries to use vi, they will same so much time!
They liked this idea enough to bring psychologists on board.
One of them was Tom Landauer:
=> https://en.wikipedia.org/wiki/Thomas_Landauer
You can say they were thinking about UX before the mouse even existed, not to mention GUIs.
They placed secretaries in front of vi and looked at what they did.
They realized that yes, while vi is pure English, human language, they will still misremember commands. Was it 'delete word' (dw) or 'remove word' (rw)? This was a core problem to increase the usability of vi.
The research group decided that synonyms in natural language were a problem; more than synonyms, words that cover partially the same space in semantics as other words, but not completely.
Computers would be much easier to use if they represented meaning. Rather than representing the 'delete' command as a character (d), they should represent different commands in ways that overlap with each other. And they should learn how to do this from looking at human language usage (ie, it's not good enough to have a pre-programmed lookup table with d -> delete, r -> delete etc). Too many possibilities.
The result of this research was the idea that you can represent semantics as a multidimensional space. A word is a vector. Two words with similar meaning are vectors with a high cosine similarity (sounds familiar :) ? ).
They created LSA:
=> https://en.wikipedia.org/wiki/Latent_semantic_analysis
The paper was revolutionary, because it demonstrated that giving a computer enough text, they could learn the meaning of words. To the point that they could reach a grade in the TOEFL test enough to get admitted in university.
=> http://wordvec.colorado.edu/papers/Landauer_Dumais_1997.pdf
You can see this has impressive implications. Linguists, like Chomsky, said that we humans come equipped from birth with a system of rules that allows us to learn language. And here you had a machine that learned language from scratch (tabula rasa, just a very general learning mechanism based on coocurrence word-context).
Plus, you can see how the computer gets better as it gets more text. You can train it with only grade 3 text (in size and complexity) and it makes the same mistakes a child of that age does.
The following are the specifics for each space:
name grade maxDRP # docs # terms # dims
tasa03 3 51 6,974 29,315 300
tasa06 6 59 17,949 55,105 300
tasa09 9 62 22,211 63,582 300
tasa12 12 67 28,882 76,132 300
tasaALL college 73 37,651 92,409 300
=> http://wordvec.colorado.edu/word_embeddings.html
These were the 90's. Computers were tiny by modern standards. The model in that 1997 paper was not a NN (too computationally expensive). LSA uses truncated singular value decomposition (SVD), which does something that approximates what a NN would do during learning, but much more efficiently. There was code from U of Tennessee that allowed us to do sparse matrix SVD on a matrix of... 37,651 x 92,409 (docs by unique terms).
The computer that ran that was the one with the most memory in the entire campus. An alpha machine, with unix, not linux. It had a whooping... 2gb ram.
They even sold us an Itanium later. Which we barely got any use of, because that platform was full of problems.
LSA was the grandaddy of topics models, BERT, word2vec etc etc to modern day transformers and LLMs.
And all of this... because secretaries couldn't learn vi commands :)
Bell labs was a wonderful place. Tom had lots of stories about conversations in corridors between scientists from different disciplines. Bell labs and The Royal society of London (Newton, Locke etc) are my perfect places in history, where humans were doing what we were supposed to do.
Comment on Hacker News:
=> https://news.ycombinator.com/item?id=38925459[With the year coming to an end, I was thinking abo...]2023-12-23T00:00:00Ztag:lof.flounder.online,2023-12-23:/gemlog/2023-12-23 Is Firefox team too small to do serious security tests.gmiWith the year coming to an end, I was thinking about what would make me switch away from FF. There's only one thing: security. With a dwindling market share, and a C-suite that seems to be distracted at best, the risk is that the team is too small.
What do you think?
Looking at raw numbers of cve reports, FF is doing better than chrome:
https://www.cvedetails.com/product/3264/Mozilla-Firefox.html?vendor_id=452
https://www.cvedetails.com/product/15031/Google-Chrome.html?vendor_id=1224
But the severity of those reports is another matter. One single report could make a night and day difference.
Comments on HN:
=> https://news.ycombinator.com/item?id=38749467On C popularity vs python, size of stdlib, package managers in langs2023-12-06T00:00:00Ztag:lof.flounder.online,2023-12-06:/gemlog/2023-12-06 On C popularity vs python, size of stdlib, package managers in languages.gmi# On C popularity vs python, size of stdlib, package managers in langs
I have a friend who is a ridiculously advanced programmer and he half-jokingly said that in C, there's an implicit slogan that 'You want a function? write it yourself!'. This is very much in the spirit of 'C', there's very little in the standard library and that's by design (when compared with other languages).
=> : https://en.cppreference.com/w/c/11 Although C11 comes with libraries now
C++ went overboard with the STL later on. While still nothing compared to what would come later with interpreted languages.
This is how I experienced this evolution (I'm old).
From now on, I'm talking about Unixy workflows on terminal processing large amounts of text (not gui, not servers). This is what I was doing in the 90's.
There was a time where only 'things that had to be fast' were written in C. Hardcore cleaning, vector multiplications, Singular value decomposition... were in C. The rest were crappy bash scripts calling the C programs using pipes.
=> https://news.ycombinator.com/item?id=38550867 Comments on Hacker news
This was brittle as hell. When a tool gives you an error, it doesn't propagate 'up' if you called it with a pipe into a different programming language. Let alone that bash is not a real programming lang and errors are pretty bad.
Then Perl 5 came. Perl was wonderful for processing text. Regex was as fast as C. It has humor built into the language. The keyword to make something an object was 'bless' (terrible tackled-on OO in Perl, not worth doing). With Perl (perhaps with the web becoming mainstream beyond dialup modems), programmers started sharing code. There was a community (a mailing list that was very active). C didn't really have that, C was like programming alone, Perl was like programming with friends over beer. Compared to C + bash, Perl was a relief. Although Perl was still calling that C code using pipes, which I now consider a very bad practice. My friend (who is a very advanced programmer, where conversation started) showed me the pipefail bash variable array, that I completely missed then.
=> https://www.howtogeek.com/782514/how-to-use-set-and-pipefail-in-bash-scripts-on-linux/#dealing-with-failures-in-pipes pipefail bash variable array
With code sharing, the first attempt at having a package manager on a language came about. It was CPAN, a website that collected perl packages (no install or update mechanism intitially). It was very primitive, but better than nothing. People experimented there. There were ordered hashes, a hybrid between a hash (dictionary in python terms) and an array. The man page started with 'it's alive!, it's alive!' :) There wase PDL, the perl data language, a very old ancestor of pandas that astronomers used. Packages increased in complexity and had dependencies (Fortran, C) that you needed to install with the OS package manager.
=> https://www.cpan.org/ CPAN
This was the first time that I, as a programmer, experienced that you could reuse someone else's solution rather than writing it from scratch (or copying it from a book :) ).
Then python came, with a much better OO story that made it possible to write bigger programs without going crazy. Perl 5 tried to be Perl 6 and failed, becoming extinct.
Python brought the slogan 'batteries included'. In modern times we have pushed that to an extreme, I think iOS had the idea 'there's an app for that'. Which means: you don't need to code anything to solve your problems, just install something from the app Store. Or the lang's package manager.
'Batteries included' became wildly successful. Now you had to decide: C's offering was still 'want a function? write it yourself!', and python was 'here's a bazaar of anything you can possibly need, just import it'. Programmers grumbled when they had to implement something in C.
R had tons of packages that did extremely niche, specific things in stats. People published papers with an R library. But R lived in a tiny corner of the world, whereas python was everywhere. Unixes started shipping python, the system python, to do glueing. It used to be perl, and before perl, bash. Python won. Python interface with C was good enough that anything computationally intensive was done calling the C library that the package shipped.
Code in those repositories was of varying quality, like everything on the web, but popularity and word of mouth usually got you to discriminate quickly.
Nowadays, even text editors have package managers. Gone are the days of downloading .vim files by hand to your home and calling them from .vimrc. When I started vim in the 90s, I had 2000 lines of .vimrc. Now I can replicate that with 20 (calling libs).
But the instinct looking into a package manager (or stack overflow!) became ingrained in programmers all over.
Not to mention everyone seemed to be lenient on trusting 3rd party code (more than what you could wipe out yourself) 'There's an app for that' thinking, but for programmers, who are used to build things.
## The reckoning
I'm skipping steps in the History of package managers for programming languages here, but bear with me.
At this point, most 'popular, mainstream' programming langs had a package manager (go, rust, javascript, python etc). People relied on dozens of packages even for apps that don't really do all that much. Packages got updates, package managers would bring them to your app and you would find that things break. It's by no means a solved problem.
Then the cracks started to show. A security vulnerability here. A package author that turns his key package into 'protestware' (non-working code with a message, often social of political, to capitalize on the attention they gained), breaking code for everyone using it. You would worry about updating your packages and breaking your app. You would have no idea how many LOC your app was clocking, because those packages, if included, would make the number embarrasingly large. Not to mention authors would abandon a package, or a new, fancier one would come up and you would have to update your code to use the newer one. Not fun.
Or you would have a company taking ownership of the servers hosting the packages, or somehow of the process running the package manager (Javascript PPM and M$ come to mind but there are others)
Then we all looked at each other: do we need all this complexity?
The value prop of C 'want a function? write it yourself!' became somewhat suggestive.
## The modern day: packages are less useful
C is like a shark. Still alive, and kicking ass, surrounded by creatures that are far more evolved and modern, that had added a lot of complex adaptations (mammals, birds). Still sharks are extremely successful in their ecosystem.
What if the ecosystem has changed, favoring the shark's adaptations? What if 'want a function? write it yourself!' has become not much harder than using a library with a ton of dependencies?
Enter 2023. Code generators (chatGPT, copilot) actually produce code from a text description. And they can write tests for it, and even debug it themselves (this is early days). It's another tool for a programmer that wants to implement things from scratch. It will be abused by junior programmers and probably the code quality of humanity on average WILL go down. But that doesn't mean that you (as someone who has written code for decades) cannot use it as an extra pair of hands, knowing full well of the limitations.
There's a clear alternative to 'import this' 'import that' and have > 2000 LOC before you started writing a single line. And this is new.
That is, code generators make it possible for lazy people to stop depending on libs. Although very likely the code generated by chatGPT will suggest libraries :) on languages with a package manager.
But C? You can actually perhaps write a solution in C as fast as one in python 'batteries included' now just because of those code generators. And there are advantages to doing that (beyond being faster):
1. It complies on lots of architectures. Behold what Justine Tuney is doing :) Portable C is actually real!
=> https://justine.lol/cosmopolitan/functions.html Justine Tuney
2. You don't have to audit code from a lib, nor trust it by default
3. You don't have to fear that updating packages in your system (or virtual env, or container) would break your 'batteries included' code
4. You can do formal verification (this I know nothing about, just learning now on your prompting!)
5. It's far easier to deploy
This is a very attractive value proposition!
ChatGPT4 spam will help the internet because companies will have to trade growth for trust2023-07-24T00:00:00Ztag:lof.flounder.online,2023-07-24:/gemlog/2023-07-24 ChatGPT4 spam will help the internet because companies will have to trade growth for trust.gmi# ChatGPT4 spam will help the internet because companies will have to trade growth for trust
> The really interesting thing is that the massive social networks like Twitter and Reddit are going to have to adapt in the same ways that smaller communities like Indie Hackers are adapting: they're going to have to get smaller by sacrificing growth for trust.
> IDs, subscriptions, invite-only sign-ups, etc. I don't really see a way around it. And of course Twitter and the other platforms are already rolling these kinds of features out.
> If these assumptions are correct, it might lead to a slightly more decentralized internet, since platforms like Twitter will be a little less powerful at amplifying people's messages. It's like a second stab at the world promised by the Web 3 enthusiasts.
> — Channing Allen
>⚡️ Co-Founder of Indie Hackers
=> https://www.indiehackers.com/post/courtland-allen-on-ai-spam-bots-and-the-future-of-social-platforms-like-ih-2b979782b0attention and engagement as incentives to misbehave2023-04-10T00:00:00Ztag:lof.flounder.online,2023-04-10:/gemlog/2023-04-10 attention_and_engagement_as_incentives_to_misbehave.gmi# attention and engagement as incentives to misbehave
I've been gone for a while, but kept thinking about Gemini and what it may mean to write here compared to write somewhere else.
There's an influence war out there. Everyone (brands, influencers, even individuals with an opinion) want a piece of your attention.
You have to infer the incentives for every statement you read online.
The more polished, well-produced a piece is, the more likely there's some interest behind it.
The more money someone has spent in pushing an idea, the more you have to question it.
In fact, ideas now propagate on social networks in a way that is proportional to engagement. So everyone is optimizing for engagement.
And what is engagement? Well, it's the force that keeps users glued to the screen, and it's correlated with... get this... frustration!
Game designers don't optimize for enjoyment, they optimize for frustration because that drives engagement. Social media too. A tweet with lots of retweets is likely incendiary, and spiking strong negative emotions in people.
This is not exclusive of the digital world: pubs are desgined to get people frustrated (that they cannot talk to each other, cannot get the attention of girls, etc) so that they consume more alcohol.
Enter gemspace. Nothing here is designed to capture and retain attention. Which means conversations are real, low-pressure. Nobody is faking and flexing (like on instagram lol).
This is extremely valuable. It's a bit sad that we humans can only maintain this valuable environment in circumstances like gemspace: there's not much attention (no eyeballs) so no attention whoring. I wonder if anyone is working on a recipe that would produce the same benefits while still getting more eyeballs on what we write. That would be revolutionary.
lof.flounder.online/
Proxy
Proxied from the at
Firefox CPU management2022-06-02T00:00:00Ztag:lof.flounder.online,2022-06-02:/gemlog/2022-06-02 firefox CPU management.gmi# Firefox CPU management
As many of you, I'm struggling to keep my browsing experience as CPU efficient as possible. The modern web makes this near impossible.
It's silly, but with both chromium and FF I need to keep 'task manager' on a second monitor open at all times. Because there's usually a site or two that are eating the CPU for breakfast. Modern computing is simply draining your attention reserves because you are dual tasking between doing the task you want to do and babysitting task manager.
If you have a powerful laptop, the fans will let you know when a site is misbehaving. That's the sign to go check task manager. So is life. (if anyone has any solutions for this, do let me know! --disabling JS??)
Now the problem I have : I can't trust task manager. I have htop telling me FF is using 100% of cpu (I have 12e cores) but task manager tells me all sites are 'low' on energy impact.
I suspect this has gotten worse with newer versions of FF. I'm on Manjaro linux, so pretty sure I'm running the latest Firefox.
Any solutions?
Comments on HN:
=> https://news.ycombinator.com/item?id=31592972On the shortcomings of gemini protocol2022-05-30T00:00:00Ztag:lof.flounder.online,2022-05-30:/gemlog/2022-05-30 On the shortcomings of gemini protocol.gmi# On the shortcomings of gemini protocol
Marek, a fried who is a scientist, has this to say about gemini:
>Not allowing images is not great, because in technical reports/science/...
> we still need diagrams, plots, ...
>
> But def there should be a requirement that all images must be relevant to the text; not like some generic random meaningless stock photos
>
>Plus I don't think Gemini supports tables? They're important too
FWIW, images are supported. They are not rendered by default, you have to click on them and then it does render them (if the browser can, that is; there are terminal-based ones; though even terminals can display images nowadays! even xterm, which I use!).
But there's one criticism I've seen that might be more damaging:
> I don't believe Gemini has a marker to distinguish the end of a file, which is good because I imagine theoretically any document can contain such a marker. However, what it also doesn't have is a way to say when the file has ended.
>
> If you don't implement heartbeats in your connection (which is also not a reliable indicator that the server is done sending the file -- see streamed generated bodies) then AFAICT there's no way to tell that the server is done sending the file."
-- ryansquared, a 'friend' from Matrix
This is harder to solve. And they are probably adamant about not adding stuff to the protocol, in fear it will grow features and make server and client implementation harder.
Simplicity is valuable.
In a way gemini: is saying "we fucked up the web big time and now we all rely on a big piece of code we don't control -chrome-) for all our interactions: we can do better." Part of doing better implies to close down the design so that there's no future feature added that could take in the direction we are now with the web:
As per LWN:
=> https://lwn.net/Articles/845446/
>“Another interesting piece of the protocol is that it is explicitly designed to be non-extensible. There is no version number in the protocol, and the response layout was carefully constructed to make extending it hard:
>
>To minimise the risk of Gemini slowly mutating into something more web-like, it was decided to [include] one and exactly one piece of information in the response header for successful requests. Including two pieces of information with a specified delimiter would provide a very obvious path for later adding a third piece - just use the same delimiter again. There is basically no stable position between one piece of information and arbitrarily many pieces of information, so Gemini sticks hard to the former option, even if it means having to sacrifice some nice and seemingly harmless functionality”
Comments on HN:
=> https://news.ycombinator.com/item?id=31560509
How a key-infra open source project can get compromised2022-05-20T00:00:00Ztag:lof.flounder.online,2022-05-20:/gemlog/2022-05-20 how an open source project can get compromised.gmi# How a key-infra open source project can get compromised
I was checking how to remove background noise on calls so that everyone can be unmuted all the time. There's a tool, krisp.ai, that offers that for win and mac. The closest thing to it for linux is Noisetorch:
=> https://github.com/lawl/NoiseTorch
Checking it's website I read this:
> At least one of my systems has probably been compromised, don't use the source either. One could hide things in a large diff.
>
> If the community can help review ALL of the code, maybe we can trust the code again and work from there.
How sad. It's a product that interfaces with pulseaudio, giving it access over every audio stream in your system. A wet dream for a spy agency. And of course, it got compromised.
What's interesting is that in a large codebase, it's hard to review the code even if you are the author. This is why he asks for community support.
What this tells me:
- If you are going to install something security sensitive on your machine, check the repo first, plus any RSS integrators that cover recent vulnerabilities
- Don't assume for a second that because you are using linux you are not the target of major attacks
- What if the author didn't realize he was compromised? The chages would have gone unnoticed, for how long? Open source code seems more secure because there are more eyes on it. But are there, really? Who donates their time to find a compromise in a large codebase?
Comments on HN:
=> https://news.ycombinator.com/item?id=31444895
The not so open web2022-04-23T00:00:00Ztag:lof.flounder.online,2022-04-23:/gemlog/2022-04-23 the not so open web.gmi# The not so open web
Reasons to degoogle and going selfhosted:
>
>Beyond market control, the algorithms powering these platforms can wade into murky waters. According to a recent study from the American Institute for Behavioral Research and Technology, information displayed in Google could shift voting preferences for undecided voters by 20 percent or more — all without their knowledge. Considering how narrow the results of many elections can become, this margin is significant. In many ways, Google controls what information people see, and any bias, intentional or not, has a potential impact on society.
>
=> https://dri.es/can-we-save-the-open-webThe value you get by being part of a community, part 22022-04-15T00:00:00Ztag:lof.flounder.online,2022-04-15:/gemlog/2022-04-15 The value you get by being part of a community, part 2.gmi# The value you get by being part of a community, part 2
You are reading this in gemspace (or an html proxy), so you probably understood that the 'deal' you get on mainstream social media is not great. You are wondering what value you get out of them and what consequences in your mental clarity you accept when using mainstream social media.
But what is the value of community membership? What is the 'why'? Do we think about this? Or do we react to a dopamine-inducing, carefully designed stimulus?
Here's a FAANG dev (note the handle: unplugnow) not wanting to work on tracking (from that HN thread on my first post)
[–] unplugnow 1 day ago | unvote | prev | next | mute
I am a 20+ year veteran of FAANGs. I studied compsci at the best schools and personally had a hand in shaping a lot of the tech that powers features used by billions of people. I’ve refused to work on projects that track people just to use them as the product. I refuse to work on products that are purposely designed to be addictive. I won’t let my kids near this stuff. Let that sink in for a moment.
=> https://news.ycombinator.com/item?id=30998250
In her book Working in Public, Nadia Eghbal describes the four kinds of communities that exist based on her research into open source communities:
Federations: larger groups where there are significant amount of contributors creating content
Stadiums: larger groups that are following and engaging around a single creator
Clubs: small groups where most or all of the members are contributing significantly
Toys: a simple idea for a new community or platform that's still very small and experimental
The challenge for gemspace is to go from Toy to Club. This probably needs federation (atom feeds are a good step), and this federation needs to be decentralized, like Mastodon.
One beautiful thing about this gemspace being so small is that search engines can show you backlinks. This is good to track the origin of ideas. Gemspace could be a way to share zettelkastens, but that's a post for another day!
=> https://news.ycombinator.com/item?id=31037564 Hacker news commentsFour types of online communities and value you get from being a participant2022-04-14T00:00:00Ztag:lof.flounder.online,2022-04-14:/gemlog/2022-04-14 Four types of online communities and value you get from being a participant.gmi# Four types of online communities and value you get from being a participant
I see three ways to look at community: as mastermind, as marketing, or OOC
## Mastermind
This is usually a paid community. It's built on the audience of the community leader.
Examples: write of passage, altMBA, second brain,
## Marketing
The community supports sales or retention for another product, example a CBC
Example: any big brand forum (say valve), or anything like coursera communities
## Outcome oriented community OOC
The goal is to get a certain outcome. You can tell whether the community is helping or not by looking at the outcomes people get after being involved.
Example: a comunity of nonfiction book writers that want to get a book finished
The 'man' here is RobFitz.com, he's writing a book on it.
## 'Just for fun' communities
There's no ulterior motive here. People write about their passion projects, that are often 'useless'. There's no karma whoring, no likes, no follows. That makes people stop 'performing' and write and talk the way they are
Example: gemspace
Books I've read:
* 'Business of Belonging' (David Spinks)
* 'Community Masters' (Varum Maryya)
* Outcome oriented community OOC (work in progress, Rob Fitz)
Who is the best person to read/watch with actionable advice about launching a community?
=> https://news.ycombinator.com/delete-confirm?id=31027480&goto=newest Hacker News commentsHow I decided to move away from bigTech for my children and myself2022-04-11T00:00:00Ztag:lof.flounder.online,2022-04-11:/gemlog/2022-04-11 How I decided to move away for bigTech for my children and myself.gmi# How I decided to move away from bigTech for my children and myself
I have two children, 7 and 9, and they have friends who have phones. I've read 'digital vegan' and I'm convinced social media has a net negative effect (even if you disregard the terrible effect of being monitored). So they are starting to ask for a phone, and while we won't give it to them anytime soon, they are at risk of being of being a social pariah. Not being able to make plans with friends to hang out... will affect their social life and you don't want that on a pre-teen.
I've been blocking ads, avoiding google for search, and being judicious with tech use most of my adult life. Now I have to lean in towards the more radical end of the continuum. I may go without a smartphone completely AND tell my children they cannot have a smartphone.
First realization: even if you block ads, prevent children to sign up for google/MS/FB etc accounts, and don't let them install apps, trackers/spyware can bypass even third-party blocking and same-origin-policy by using CNAME cloaking. There's almost no hope of having privacy on the web, other than to resort to other protocols like Gemini (which is lots of fun, in case you haven't tried it; you might be reading this on a gemini capsule).
=> https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/ CNAME cloaking so that you cannot hide on the web
## A story that made me rethink everything we do with our smartphones
I was scrolling Tiktok on my phone. I don't think I registered as an user. I wanted to know what the fuss was about. I might have opened it 3-4 times, scrolled for a min or two, kept going with my day. Well, I see a video of a boy doing a trick on a scooter, and I think "he looks like (my son) L". He WAS L. Who doesn't have a phone, nor an account on Tiktok. His friend T made the video, hence the dangers of children having access to tech that they don't understand and control.
How did Tiktok know to show the video to me, someone who doesn't have an interest in scooters?
T might have been at our home, and just by proximity (of his phone to our home Wifi) Tiktok hit me with that eerie recommendation.
But of course other more sinister ideas came to my mind. I uninstalled Tiktok, and all social media apps, from my phone. Before the end of the month I want to be degoogled completely.
=> https://datascienceretreat.com/ My company, Data Science Retreat, will rely only on self-hosted libre software by May 2022
=> https://news.ycombinator.com/item?id=30985684 Comments on Hacker news