In Empty Labels and then Replace the Symbol with the Substance, we saw the technique of replacing a word with its definition – the example being given:
All [mortal, ~feathers, bipedal] are mortal. Socrates is a [mortal, ~feathers, bipedal]. Therefore, Socrates is mortal.
Why, then, would you even want to have a word for “human”? Why not just say “Socrates is a mortal featherless biped”?
Because it’s helpful to have shorter words for things that you encounter often. If your code for describing single properties is already efficient, then there will not be an advantage to having a special word for a conjunction – like “human” for “mortal featherless biped” – unless things that are mortal and featherless and bipedal, are found more often than the marginal probabilities would lead you to expect.
In efficient codes, word length corresponds to probability—so the code for Z1Y2 will be just as long as the code for Z1 plus the code for Y2, unless P(Z1Y2) > P(Z1)P(Y2), in which case the code for the word can be shorter than the codes for its parts.
And this in turn corresponds exactly to the case where we can infer some of the properties of the thing, from seeing its other properties. It must be more likely than the default that featherless bipedal things will also be mortal.
Of course the word “human” really describes many, many more properties – when you see a human-shaped entity that talks and wears clothes, you can infer whole hosts of biochemical and anatomical and cognitive facts about it. To replace the word “human” with a description of everything we know about humans would require us to spend an inordinate amount of time talking. But this is true only because a featherless talking biped is far more likely than default to be poisonable by hemlock, or have broad nails, or be overconfident.
Having a word for a thing, rather than just listing its properties, is a more compact code precisely in those cases where we can infer some of those properties from the other properties. (With the exception perhaps of very primitive words, like “red”, that we would use to send an entirely uncompressed description of our sensory experiences. But by the time you encounter a bug, or even a rock, you’re dealing with nonsimple property collections, far above the primitive level.)
So having a word “wiggin” for green-eyed black-haired people, is more useful than just saying “green-eyed black-haired person”, precisely when:
- Green-eyed people are more likely than average to be black-haired (and vice versa), meaning that we can probabilistically infer green eyes from black hair or vice versa; or
- Wiggins share other properties that can be inferred at greater-than-default probability. In this case we have to separately observe the green eyes and black hair; but then, after observing both these properties independently, we can probabilistically infer other properties (like a taste for ketchup).
One may even consider the act of defining a word as a promise to this effect. Telling someone, “I define the word ‘wiggin’ to mean a person with green eyes and black hair”, by Gricean implication, asserts that the word “wiggin” will somehow help you make inferences / shorten your messages.
If green-eyes and black hair have no greater than default probability to be found together, nor does any other property occur at greater than default probability along with them, then the word “wiggin” is a lie: The word claims that certain people are worth distinguishing as a group, but they’re not.
In this case the word “wiggin” does not help describe reality more compactly—it is not defined by someone sending the shortest message—it has no role in the simplest explanation. Equivalently, the word “wiggin” will be of no help to you in doing any Bayesian inference. Even if you do not call the word a lie, it is surely an error.
And the way to carve reality at its joints, is to draw your boundaries around concentrations of unusually high probability density in Thingspace.
Eliezer Yudkowsky, Mutual information and density in thingspace, 23 February 2008
And so even the labels that we use for words are not quite arbitrary. The sounds that we attach to our concepts can be better or worse, wiser or more foolish. Even apart from considerations of common usage!
Eliezer Yudkowsky, Entropy, and Short Codes, 23 February 2008
Added to diary 15 January 2018