Since my imagination was first captured by the idea of a non-arbitrary ideal language, twenty-five years ago, this idea has gone through many variations, some with quite interesting properties.
Oddly enough I think the original idea of an acronymic language with its summarization and expansion properties has held up reasonably well. I think. The summarization property was simply the ability to summarize any phrase, clause, or sentence in a single word by simply taking the initial letters of each word and concatenating them. The idea that such an acronym would in fact be an accurate summary of the phrase or sentence is really the defining property of a whole class of languages, of which I imagined there would be a single best language, The Acronymic Language, which might best be referred to as TAL, though for complicated reasons I usually used the unpronounceable acronym TGW instead. The expansion property is obviously the inverse operation, though the ability of context to control expansion makes it at once less fundamental and more interesting.
So of the other variants of the idea have not fared so well, mostly because they run afoul of big numbers, the combinatorial explosion. When I first started actively working on the acronymic language– trying to actually create it, all by myself , I did not realize how many combinatorial difficulties would stand in my way.
Of all variants of the acronymic language, the one that is easiest to explain is one of the hardest ones to actually do — and is the site of one of my more egregious errors. I found myself try to explain the acronymic language to my niece Marika a few weeks back — without mentioning the project’s appellation. Most of what I what I said made no impact, but I caught a wee glimmer of understanding with the old “dictionary-order = thesaurus-order” idea.
I have somewhere on my shelves a spelling dictionary — just an alphabetical list of words, no definitions — a lot more words than there could be in a small paperback if definitions were included, and so quite useful if you know what the word means but have forgotten how to spell it.
I also have on my shelves several synonym-books or thesauri, none entirely satisfactory.
One of these is the huge Rodales Synonym-Finder which is like many of the pocket saurians school kids seem to prefer: no separate index, no rational scheme like Peter Mark Roget invented for his thesaurus, just plain old alphabetical order. The trouble with that scheme is obvious: instead of putting ‘cut’, ‘slice’, and ‘trim’ in one category where all the other synonyms can be found as well, there has to be a lot of repetition, in which almost identical lists of synonyms are provided for each word. And so, for all its size the huge Rodales book contains fewer synonyms per word than the Roget dinosaur it sits next to.
For years I dragged out the phrase ‘thesaurus order’ to describe something much like the large main list of Roget, ignoring the index.
Remembering that this is part of a description for an ideal language, not a scheme for use with English, it seems possible to claim that every word in the language has just 1 basic meaning, putting aside metaphorical uses, and that words that look similar, containing many of the same letters in the same order, would be approximate synonyms.
And so, finally, it seems possible to imagine a useful thesaurus in which each word is listed just once, next to various synonyms — between it’s two closest synonyms, in fact. No index would be necessary since this list of words would be in alphabetical order (for whatever alphabetical order is used in the language). Since the words would be in alphabetical order, the list would be identical with a spelling dictionary as described above: just a list of words in alphabetical order.
Well now. That’s quite an idea, really, but is it entirely coherent? Does it make sense? Is it logically consistent? And if it is a coherent, consistent idea, is it actually possible to create such a list? Surely I must answer yes to at least one of these questions or there’s no point in all this.
I think there is some point in all this, because each of those questions can “almost” be answered yes. All the questions about consistency and coherence must technically be answered “no”, because I have conflated two consistent ideas, but I can salvage something from them.
The question about the possibility of creating such a list can technically be answered “yes” for either of the two consistent ideas, which I’ll describe below, but only technically — some omnipotent deity might indeed create either list, but no human or computer is likely to because here’s where we run smack into one of those large numbers that I spend so much time trying to evade.
Well, my first task is to take cap in hand and apologize to you and the rest of humanity for conflating two ideas, each of which has some merit alone, but cannot logically be combined.
Briefly, it seems that if we are designing an artificial language, we can consistently envision a useful thesaurus-sort-of-list in which each word is listed just once, and that can be sorted in some alphabetical order, like a spelling dictionary, so that spelling will also indicate the meaning of each word. But it does not follow that each word in that list will be sandwiched between its two nearest synonyms.
On the other hand, we can consistently envision a somewhat similar list of words, in which again each word occurs exactly once, which does have the interesting property that each word does occur adjacent to its two closest synonyms.
The easier task to explain is the latter one: can we create such a list of words for our hypothetical language, in which each word is listed exactly once, between its two closest synonyms? As far as I can see this is an entirely consistent and coherent idea, and in principle it could be done.
But nobody is going to make such a list any time soon, because it is an NP-complete problem like the travelling salesman problem, which require a drastically increasing amount of processing as the number of nodes or points increases.
Indeed, if you imagine each word as a town on the salesman’s itinerary, with neighbouring towns as close synonyms, you should be able to see that it really is the travelling salesman problem in disguise.
In fact all NP-complete problems seem to be the travelling salesman problem in disguise, since there is a profound similarity amongst all problems of that type, such that the discovery of a method for solving any one of them in a reasonable amount of time would make it possible to solve all of them in a similar amount of time. As far as I know nobody has actually proven such a discovery impossible, but most mathematicians and computer science people seem to believe it is impossible.
I mention above that I had (to my shame) conflated two formulations of the problem. What is the other one? Suppose you have a alphabetical list of words like this sequence:
dib dic did dif … dix diz dob doc dod dof
Rather than thinking of ‘diz’ being a word intermediate in meaning between ‘dix’ and ‘dob’, it would be entirely consistent to suggest that all the words beginning ‘di’ are parallel in meaning to words beginning ‘do’ so that ‘dob’ is not similar meaning to the nearby word ‘diz’ but to the more distant word ‘dib’. Indeed, the sequence of words from ‘dib’ to ‘dif’ would then be seen as parallel to the sequence of words from ‘dob’ to ‘dof’.
It turns out that this problem is no easier to implement than the previous one. The relationship between them is the same as the relationship between two equivalent was of encoding numbers with a binary code, the ordinary binary code and the Gray code in which only one bit changes between adjacent numerical values. Had we solved either problem we could immediately use this relationship to solve the other.
Indeed, as I have worked with various versions of this problem over the years I have become quite attached to binary representations, instead of real or complex numerical representations. More about this at another time.
For now, all I want to say is that the NP-completeness of these problems is not a death sentence for the project. There are approximation schemes which could approximately solve either problem, and I’ve had some limited success in doing just that.
I want to replace the artificial example (the words from ‘dib’ to ‘dof’) with a real one for a small data set. How long with that take? Well, probably forever, if you are going to wait for an example that meets my impossibly high standards. But if you want a woefully inadequate example that just barely gets the point across, perhaps a few days.
Copyright © 1998 Douglas P. Wilson