For many years I have been obsessed with languages, including natural human languages, constructed alternatives to natural languages, computer programming languages, mathematics, and music. All of these can be called languages, in a sense, and to me they all seem related.

But they all seem rather imperfect or incomplete, as well.

As someone who studied linguistics, I see computer programming languages and mathematics as deficient in a way — they lack any phonetic, phonological, or phonemic component. I’ve always had trouble explaining this lack to engineering colleagues with no linguistic background, but I feel that in some way the sound of a language is an utterly vital part of it. Linguists have almost always held that writing is merely transcription, that the real language is the part spoken and listened to.

But as someone who spent years writing computer programs for a living, I also sense a deficiency in natural languages, which seem much too vague, arbitrary, and perhaps even contradictory, in comparison with mathematics and programming languages. Imagine if you can a computer with only a natural language interface. How easy would it be to tell it to perform some very detailed calculation or elaborate sequence of steps?

And then there is music. Clearly music is essentially sound, and as with natural languages the printed transcription is clearly just that, just a notation for music, not the music itself. But music seems to lack a lexical or semantic component. It is true that some pieces of “program music” like Smetana’s Moldau have an intended mapping into the real world, but most of the great pieces of music are rather abstract.

If you tried to imagine programming a computer with only a natural language interface, at my suggestion above, now try to imagine programming a computer by whistling to it.

As I’ve mused on these thoughts over the years I’ve come around to the believe that mathematics and music are rather complementary, one lacking sound and the other lacking a real semantics. But they are both rather pure or ideal extremes.

And so for a long time I have thought of mathematics and music as parts of one whole language, as components of one ideal language.

Stimulated by this idea, I have studied the history of artificial languages, including the many ideal language schemes intended to produce an ideal philosophical or international language for human use. At one time I studied formal logic, but my clearest memories of that phase was of cutting classes in logic to read up on the descriptor languages used by librarians, which seemed much closer to reality.

Perhaps the most interesting idea I encountered during years of reading about artificial languages was Wilhelm von Humboldt’s notion that there is in fact some underlying ideal language, and that what we call natural languages (or national languages) are merely the continual and unconscious expressions of national culture in this underlying medium.

Attempting to formalize Humboldt’s idea led me to propose a linear model of human language use, based on a psychology somewhat like associationism. My graduate work involved exploring that model, using computer methods.

Years before encountering Humboldt’s ideas I had a moment of insight myself in which I suddenly found a way of defining an ideal language that could be entirely non-arbitrary. There were actually two moments of insight separated by a couple of hours in the course of a very long night.

The first insight was the sudden realization that it would be possible to define a language that consisted entirely of acronyms — a language in which each word was an acronym made from other words in the language. The second insight, after a couple of hours deep thought about the first, was that the definition of an acronymic language could be satisfied by any one of an infinite number of languages, all possessing the properties I had defined — but that there could be (must be!) one distinguished member of this class of languages, an optimum or best language.

When I first came up with the idea for an acronymic language, some twenty years ago, or so, I thought of it as an ideal language for humans to use, a replacement for natural languages — something like Esperanto. This is still a possibility that interests me, but I have been downplaying it somewhat lately, prefering instead to discuss it as a convenient descriptor language for information retrieval.

Let’s do a little requirements analysis on the idea of a descriptor language for information retrieval.

The most important requirement for a descriptor language is a close relationship between the form and meaning of terms — close enough that machines can easily arrange things according to similarity of meaning, by simply arranging them according to similarity of form.

An example of arrangement according to similarity of form is a simple alphabetical sort. If we sort the words in English in alphabetical order, do we at the same time sort them by meaning? No! English is clearly not a good descriptor language. By contrast, try sorting terms in the Dewey Decimal System according to form by using an ordinary ASCII alphanumeric sort, like the Sort command in MS-DOS. The result does indeed have an arrangement of the terms according to their meaning, or very nearly so.

It is important to be more specific about this relationship. The ability to sort meanings by sorting forms comes from the implication

Similarity of Form Implies Similarity of Meaning.

What about the converse? Can we say that

   Similarity of Meaning Implies Similarity of Form?

As will be shown below, this is a much more difficult property to realize. If we do have the implications going in both ways, then we can say that there exists an isomorphism between form and meaning. This is clearly desirable, but hard to achieve. But we do need at least the one-way implication or homomorphism from meaning to form.

Another important requirement for a descriptor language is for a mechanical process or algorithm for assigning a descriptor to a particular text. It must be possible for a machine to read a text file and produce a descriptor that correctly describes that text, without human intervention. Without this, the human labour involved in using the descriptor language would be far too great for the language to be anything more that a tool for librarians.

If this automatic processing of texts is essentially a linear operation, then it follows from the well-known properties of linear operators that similar texts will always be mapped to similar descriptors, and the fact that this is so would prove that we do indeed have an isomorphism between text and meaning. I think that the converse probably holds true as well: if we do have an isomorphism between meaning and form, (and not just the one-way homomorphism), then we could perform this automatic processing of text into descriptors by a linear operation.

I will note here without further elaboration that the processing of text written in an acronymic language to form a descriptor which is essentially an acronym can be consided a linear operation.

The idea that the ideal descriptor language be realized through acronyms is clearly not a requirement but a design scheme.

Nevertheless, the idea of an acronymic language quickly leads to a real requirement, which I called the Summary Property . Imagine a sequence of words in an acronymic language. To summarize this sequence of words, we need only take the initial letter of each word and form them into a word. In an acronymic language the resulting word is a good approximation in meaning to the original sequence, and therefore can serve as a summary of it.

A very brief summary may indeed serve as a descriptor, but the Summary Property is a much stronger requirement than the ones we have previously given for descriptors. A descriptor need only indicate what some text says. A summary also indicates how the text says it.

Whether or not we use an acronymic language for our ideal descriptor language, I believe a Summary Property is essential. The Internet is and World Wide Web is giving us access to more and more text. Even now we cannot read and understand all of the text that is relevent to our needs, and the future will make the problem worse, unless it provides tools for automatically creating brief summaries for us.

We also will need some mechanical means, or algorithm, for dealing with large numbers of descriptors, by summarizing groups of them. So a very general Summary Property is definitely a requirement.

For now, let us suspend our requirements analysis with just the three basic requirements:

1. An isomorphism between meaning and form, or at very least a homomorphism from form to meaning.

2. A mechanical process for processing text and automatically generating a descriptor for it.

3. A mechanical process for summarizing text, including texts which consist simply of lists of descriptors.

In articles posted on the Internet a few years ago, I described a process that used vectors to indicate content. I suggested that texts be mapped into a vector space, and that the searching and storage of article be related to that vector space. The only difference is that I am now speaking of a descriptor language, where before I spoke of vectors. In fact, nothing has changed, since at the innermost heart of the process, I still envision a vector space. The space of possible descriptors is no more than an intermediate layer between the human being and the vector space.

This is rather similar to the use of assembly language to program a computer. The human being writes expressions like JSR GETINPUT, or ANDX count,y and each of these expressions stands for a simple machine language instruction. Human beings can and have programmed directly in machine code by writing hexadecimal numbers instead of these mnemonics, but it is much easier to write in the more easily understood assembly language.

What I have been doing is working towards such a goal by experimenting with vector spaces that encode meaning or content, and by trying to develop a descriptor language that would provide an easy way for humans to work with such a vector space.

Let us assume that we have already defined a suitable vector space for information retrieval — that is to say, let us assume that someone has achieved what I have been trying so hard to do. Given a well-defined vector space, we are now faced with the problem of talking about individual vectors — individual points in that space. The normal mathematical way of talking about a vector is to list its coordinates relative to some set of coordinate axes. That would mean describing the vector as a list of numbers, e.g.

{1.6 7.3 9.2 -0.5 0.0 -0.9 6.5 6.3 -5.3 1.6 6.6 -3.1 1.7 0.0 0.2}.

This is just not a practical means of communication. Lists of numbers are just too hard to deal with.

But suppose we could accurately specify a vector by using a sequence of letters that form a pronounceable word, like any of the words in this sentence. Surely that would be an excellent way to talk about the vectors in this vector space. But is it possible. I think it is possible, and I have worked out several ways to do it. What I am going to do here is explain one of those methods — not the best one, perhaps, but certainly the easiest to explain.

Before I begin I must make an observation about precision. The list of numbers given above seems to precisely pick out one vector from a 15 dimensional vector space, but anyone in the physical sciences would object that it uses only two significant figures for each coordinate, and therefore only delimits a small volume or neighbourhood within the vector space. That is correct — and I think that is all we can ever hope to do. We can increase the precision and therefore pick out a smaller volumn of space, but we can never hope to specify single vectors.

Let me contrast this lack of precision with an achievable goal, which is a lack of ambiguity. As I use these terms, precision refers to the size of the volume of space described, but ambiguity means that a given descriptor actually describes two or more distinct volumes of space.

Many words in natural languages like English are ambiguous in that they have two or more distinct meanings. The word `table’, for example, may mean a flat board with several legs, that we eat off of, or it may mean a multi-column list on a computer, or it may mean the act of laying aside a proposed bill in the legislature. Each of these meanings would correspond to some separate volumn or neighbourhood in the vector space. A good descriptor language would have separate descriptors for each of these volumes of space, and would therefore be unambiguous, as I use the term. But each descriptor would still have some imprecision, since it picks out a volume of space rather than a single vector.

With this distinction noted, I will now describe the simplest scheme for describing a vector by means of a word or sequence of words.

The simplest scheme bears a family resemblance to Hebrew in that it ignores vowels in writing words, but adds them for speaking purposes. Let us suppose we ignore A, E, I, O, U, and Y, in writing words, and thus have 20 consonants. The basic idea is to operate in a 10 dimensional space, so that each consonant used in a word represents a significant component of the vector along the corresponding coordinate axis in one direction or the other.

The acronymic property comes from a decrease in weighting as we move from leftmost to rightmost letter in the word.

As an example, suppose we consider trying to specify a point on the 2-dimensional page or computer screen by using an alphabet of four letters, T, (short for Top), B (short for Bottom), L (short for Left) and R, (short for Right). We take as the origin the center of the screen or page.

The letter T, by itself, means “somewhere in the top half of the screen”; B means somewhere in the bottom half; L, somewhere on the left half; and R, somewhere on the right. TR means somewhere in the top right quadrant, LB or BL, somewhere in the bottom left quadrant, and so on.

But now look at each of the 4 quadrants and imagine dividing them in exactly the same way as you divided the whole screen. Therefore TRBL means “the bottom left subquadrant of the top right quadrant”, and TRBLTL means “the top left subsubquadrant of the bottom left subquadrant of the top right quadrant”.

It should be obvious that we can specify any point on the screen this way, to any finite level of precision.

Suppose we add the letters `F’ for front or foremost, and `H’ hindmost, (B and R being already in use), then we can specify any point in a 3-dimensional cube by using strings of these six letters. FRTHLB would be the “hindmost left bottom subcube of the foremost right top cube”.

As it happens, talking about square quadrants and subcubes is misleading, since the most efficient use of this form of notation is to use non-square rectangles and non-cubic rectalinear solids. If we use cubes for the three dimensional case, then FRT, RTF, TFR, and TRF all mean the same thing. It is much more efficient to use a uniformly descresing set of weights in which each successive letter represents a slightly smaller portion of the original space. If we do that, then RFT and TFR represent quite different volumes of space. RFT is a rectalinear solid that is short and wide, while TFR is one that is tall and narrow.

By having each successive letter weighted less than the previous one, we can be sure that letters to the right of a sequence are less important that those to the left. To obtain an approximation of a sequence by a shorter one, it suffices to truncate the sequence to the right, leaving letters to the left intact. The most extreme case is one in which the sequence of letters is truncated to the single initial letter. The initial letter of a word in this language is therefore the best single letter approximation to that word — a property that is fundamental to an acronymic language.

Early descriptor languages used by librarians, like the Dewey Decimal system, might be better described as descriptor vocabularies, since they used only single word descriptors. More recent descriptor languages used pairs of single word descriptors, but none that I know of has actually used long sequences of descriptor words resembling the sequences of words in a natural language. But as will be seen, this is very important.

By simply adding letters to a single word descriptor, we decrease the volumn of space and increase the precision at the same time. The word TRF specifies a large volume of space, and TRFFRBTHL specifies a much smaller one. We can also consider that TRF is not very precise and TRFFRBTHL is much more precise. But what if we want to specify a large volumne of space very precisely, or a small volume of space, but with less precision? We need to somehow decouple volume and precision, so we can specify any volume with any degree of precision.

The answer is to use sequences of words instead of single words. Without going into details here, it should suffice to say that a long sequence of short words can specify a small volume without much precision, and a short sequence of long words can specify a large volume very precisely. And it is worth noting that the acronymic properties hold: a single word approximation to a sequence of words is the acronym of that sequence.

There are many more details that could be given at this point, but what I have said here should be enough to suggest that an acronymic descriptor language can easily be made once a suitable vector space is defined. That is the hard part.

Briefly, what I have been doing is linking together meanings according to the synonymity of words that express them.

First, I needed to find a good notation for meaning. As the “table” example mentioned above indicates, words in English are ambiguous, so they do not provide a good way of indicating meaning. But pairs of English words are much better. The words “table” is ambiguous, and so is the word “list” which may mean a kind of table, or the verb describing a boat that is tilting to one side. But the pairs “table, list”, “table, desk”, “table, delay” all have different meanings, as do the pairs “list, table” and “list, tilt”.

Pairs of words form an adequate notation for meanings, and where they fail, triples of words will succeed. Using pairs of words it is possible to define a form of similarity between meanings based on synonymity of words as follows: if both of the words in one pair are listed as synonyms of both of the words in the other pair, then the two pairs of words describe similar meanings. This holds true with very few exceptions, and can be strengthened by using triples instead of pairs.

What I have done is to define several thousand meanings by using pairs of one-syllable words. The restriction to one-syllable words was made in part to limit storage requirements and in part to allow simpler analysis of word forms. I was able to create a network structure (what mathematicians call a connected graph), by using the definition of similarity given above. In this graph, meanings (defined by pairs of words) are the nodes, and nodes representing similar meanings are joined by edges (or arcs).

This graph can be considered as embedded in a vector space of many dimensions. The problem is to find a definition of that vector space, and then to find a 10-dimensional approximation to it. There are well-known mathematical ways of doing this, but I have not quite succeeded in accomplishing it, partially because of the very large size of the graph, which defeats any straightforward approach. Nevertheless, I am quite confident it can be done, and hope to have it available soon.

Once the vector space is well-defined, I should be able to make a simple acronymic descriptor language from it, and this would be of very great value in organizing the data that is available on the Internet. A lot of work will remain to be done, but I feel that I am working on something very fundamental.

Copyright © 1998 Douglas P. Wilson

This entry was posted in Old Pages. Bookmark the permalink.

Leave a Reply