From the Alphabet to Blockchain: Encoding the Author’s Voice

A recent thread on fidelity and compression explores human stories, alphabets and language models. What can we do to preserve originality in the digital world?

Feb 16, 2023

Hello! Welcome back to Cloud Vertigo, a weekly newsletter on deep stuff that matters. Today’s topic is both recent and relevant.

I really loved yesterday’s Seth Godin piece 🔥Fidelity, compression and culture (Feb 14). It builds on Jeff Jarvis’ 📚Journalism is lossy compression (Buzz Machine, Feb 12) a clever rebuttal to Ted Chiang’s viral 🤖 ChatGPT Is a Blurry JPEG of the Web (The New Yorker, Feb 6), which provocatively compares ChatGPT to Xerox machines, to ask the question OpenAI’s chatbot offers paraphrases, whereas Google offers quotes. Which do we prefer?

This thread offers us a sharp lens to look at the world and peek into the future: information compression and fidelity of communications. If you think about it, NFTs became the Lo-Fi records of the web, a contrasting symbol of a world where high-fidelity copies of high-compression digital goods are ubiquitous.

💡 What is web3 if not a web where there is some “analog” non-reproducible information?

I feel today there is much to unpack, so let’s get to it.

David

PS: All credits for the images go to my buddy StableDiffusion. If you’d like to read more, do send me a note and share it with a friend who may enjoy it.

It means the world <3.

From the Alphabet to Blockchain: Encoding the Author’s Voice

In everything we do together we repeat and amplify each other’s thoughts and feelings. In human communication we reproduce information on every scale. From one-to-one gossiping to meetings and organisations we thrive summarising each other. References are almost as frequent in social media as in academic journals. We do not simply copy, but we reframe and compress information in stories, because compressed forms give us a chance for higher fidelity and cheaper networking, for higher reliability and wider distribution. Our communication has enabled collaboration on unprecedented scale, which improved our evolutionary resilience as a species. In human content, high-fidelity, which is to say how accurately a copy reproduced its source, was undisputedly recognisable, until AI arrived.

The Debate Around the Lossy Compression of ChatGPT

What is a large language model, such as ChatGPT, if not a lossy text-compression algorithm that has been fed the entire internet? It is hard to disagree with the premise of Ted Chiang’s argument that ChatGPT Is a Blurry JPEG of the Web. The subtext of his article is that lossy compression of information is pretty of dangerous. It makes us forget sources and sometimes, like with Xerox copies, unreadable details become deceitful. Jeff Jarvis’ reply “Journalism is lossy compression” takes issue with how unselfaware media are when covering technology. He argues back that journalism itself is a lossy compression of the world.

While the argument may at first sound ad hominem, it brings up valid points. The discussion around ChatGPT is compared to a debate between book historians in the early days of the invention of print. On one hand, some valued print since they understood its key property of high-fidelity (“typographical fixity”) as capable of bringing authority and culture. Others contested that printed books, often sloppy and wrong, were not fixed and authoritative, and dismissed print culture. What may have been true for early books, now sounds like a moot point. The issue of the argument is to make assumptions and set expectations about the new based on the presumptions of the old.

Over time and after many technological improvements new institutions of publishers emerged that granted authority and stability to print. The Church’s central authority was challenged and new decentralised power structures emerged. The consequences of print, from the first essays to the novel, all the way to the newspapers unfolded over centuries. From print culture to the web another great leap has occurred, but the story is analogous: the high-fidelity of a new communication medium allowed for mass reproduction of information and unprecedented knowledge sharing. Jarvis summarises from his future book The Gutenberg Parenthesis and quotes from David Weinberger’s Everyday Chaos:

Why have we so insisted on turning complex histories into simple stories? Marshall McLuhan was right: the medium is the message. We shrank our ideas to fit on pages sewn in a sequence that we then glued between cardboard stops. Books are good at telling stories and bad at guiding us through knowledge that bursts out in every conceivable direction, as all knowledge does when we let it. But now the medium of our daily experiences — the internet — has the capacity, the connections, and the engine needed to express the richly chaotic nature of the world.

To come back to Jarvis’ and Chiang’s broader points, it is useful to consider ChatGPT as a lossy compression of the web rather than an “agent”, but it’s worth remembering that the lossiest algorithm of them all is the form of the story itself.

The Cunning Power of the Written Word

Why stop here? Printed books compound the power of the alphabet, which is the first and greatest high-fidelity and low-compression miracle. Writing encodes the voice of the author in its 26 characters, preserving it immutably through time. Seth Godin’s Fidelity, compression and culture intuition brings us one step deeper into the topic. In a dense and fascinating piece, he explores fidelity and compression in meetings, politics and packaged chocolate bars.

The focus shifts not just to the compression of a message but also to the opposite phenomenon: interpolation. Examples of interpolation include the dramatic execution by an actor of a script, the musician’s interpretation of a score, and the reader engagement with a text. Are these faculties so much unlike the rendering of a JPEG? If stories compress the truth, interpretation “decompress” it in a way that is understandable to the listener, it does not just reconstruct the original, it enriches it with new meanings, new links and new semantic content.

As an aside, Socrates would have probably disagreed with Seth about the alphabet. The compression of the written word is not so low as one may think. Warnings about the dangers of compression are as old as time. In Plato's Phaedrus, he surprisingly argues against writing ironically in written dialogue. The alphabet is not a “medicine for remembering” as much as a recipe for forgetting. Writing helps you only to appear wise and intelligent. Reading and hearing from a book does not amount to knowing. What is particularly baffling for Socrates is that a written page cannot be further questioned and leaves the author’s words unable to further argue their worth, similar to how a statue stands still in front of the passers-by.

The little literature widely available in Plato’s time may have further made the arguments contained in books feel quite unrelated, disconnected. The network, the web, of written culture was still very sparse.

Now more than ever, we realise that culture is made up of what remains after everything else has been forgotten (Umberto Eco)

Similarly, before the invention of the print, it was hard to imagine books and libraries that would forge dialogues across centuries. Yet, some things did not change. The debate between the dangers of compression (information degradation and loss of competency) versus its benefits (resiliency and distribution) is one of the oldest of philosophy.

The Digital Paradox: the Search for Low-Fidelity Information

Today, most of our collective cultural production and consumption is digital. It’s encoded in a way so that it can be reproduced with maximum fidelity, countless many times. This would have been impossible with analog media, where each reproduction of a tape increases the noise-to-signal ratio. Copy it enough times and you end up with static void. But What If there was some digital information that similarly could not be replicated?

This is precisely the problem that blockchains sets out to solve. Money is hard to forge. High-fidelity copies of banknotes are hard to achieve, so unitarily the cost of forgery is higher than the note’s value. Bitcoin forgery is similarly impracticable.

In the high-compression digital world, blockchains introduce zero-fidelity/low-compression by design. They encode a "world state" with a series of irrevocably timestamped transactions, so that some digital patterns are attributable. Tokens are therefore impossible to copy with high-fidelity, or any fidelity at all. This brings about programmatically digital scarcity for fungible tokens (such as Bitcoin) and unicity for non-fungible ones. This is achieved by preserving the whole history of the digital universe in which the token has a meaning and making it auditable.

Trustless time-stamping introduces an inalterable time ordering, which - by thermodynamics - it’s almost like to say it adds a notion of entropy to the system. As it turns out, entropy is the very same quality that makes a special quality chocolate hard to replicate with high-fidelity in the physical world.

Compression is the process of reducing entropy. Cheap chocolate bars achieve consistency precisely by lowering the recipe’s entropy. Intuitively, we understand this relationship between time and entropy when we says a good painting or any high-fidelity record defies time. Even the alphabet's high-fidelity and low-compression wonder is that it makes the flowing of time stop!

While high-fidelity copies are indistinguishable from the high-compression originals. An analog reproduction, on the other hand, preserves a higher-entropy state, even though it inevitably degrades it a bit. Evaluating how lossy is ChatGPT’s compression is a hard task, because the stories the web is made of, unlike the alphabet which we use to represent them in high-fidelity, are a high-entropy mess. As Seth Godin’s points out:

AI redefines fidelity altogether, sometimes embellishing what was there before and presenting something that might mistakenly be seen as a high fidelity original.

The logical consequence is that time-stamping our digital work is the only way we have to preserve claims of human originality, which becomes ever more urgent when fidelity is redefined.

I’d love to hear your feedback and engage in conversations around these themes. Remember you can get this newsletter delivered right in your inbox for free.

Discussion about this post

Ready for more?