Interpedia.html

This page seems terribly out of date because of the successes of the Wikipedia and seach engines like Google.  But neither of these fulfill the goals of the Interpedia Project.  That project was to index all of the pages of the Internet, as Google does, then link them together in encyclopedia format, something like the Wikipedia, but these pages would usually not be Wiki pages.  Instead they would be ordinary web pages like this one, linked together by topic, automatically, using some form of clustering algorithm, then given an indexing framework (or more than one, probably), also by automatic means.

New:   see the new Interpedia Development page.

These web pages have been put up largely to explain and promote the idea of applying combinatorial optimization to society.  But that particular nutty idea has a curious relationship to a couple of other nutty ideas, the much older idea of an acronymic language , and the ideas behind the late lamented Interpedia project .   Here is some more information about the latter: —

The Interpedia project was born as attempt to write a public domain encyclopedia for the Internet. It was then hijacked by a couple of disreputable characters including myself, who were interested in the “Internet as Encyclopedia” — an encyclopedia-like interface to the whole internet.

The Interpedia project appeared for a while as a discussion in a mailing list, later spawning a Usenet newsgroup, comp.infosystems.interpedia which still existed when last I looked, but has had nothing but spam in it for quite a while.

Though many people wrote messages about the article-writing side of the project, in my own postings to the mailing list I wrote about it more as a software project than as a writing project.

There is now a vast amount of text freely available on the WWW, and this could indeed be organized into a more systematic form by the kind of software I described, but my own project to write such software soon disappeared into the rather strange bit of lexicography that I describe in these pages as the creation of a naturalistic descriptor language.

When I was most interested in the idea of the Interpedia, I wrote an account of its possible impact on society. This account started by noting how very difficult to extract information from the Internet. This was before search engines were readily available, but even now with a multitude of search engines, it is still far from easy to find what you want, so my comments still apply. Here is that text, which dates from late in 1993:

I keep thinking, “Why is this so hard?” and “Surely, it has to get easier!”

To me, what we are seeing and participating in is a revolution like the personal computer revolution, and we will see the Internet become both common, readily accessible, and much easier to use.

But how, exactly, could it be easier to use?  The Interpedia offers an answer, maybe the answer. Some people don’t think it could be or will be, but let me make two substantial assertions:

  1. The Interpedia could be a solution, an easy to use general interface to the Internet.
  2. Regardless of what happens to the Interpedia project, t here will be an easy to use general interface to the Internet — it is only a matter of time.

The time will come , when schoolchildren will find it easy and convenient to access the Internet, both for information and to communicate with other schoolchildren.

The time will come when an elderly Oxford scholar who is more at home in dusty libraries and thought he was too old to play with anything new like a computer, will happily browse electronic copies of ancient manuscripts from around the world and argue with colleagues in distant countries.

The time will come when barely literate villagers in the African countryside will find it easy to explore the world of ideas by spending a few minutes in front of a satellite-linked terminal at the village schoolhouse or the tent belonging to some aid-worker.

The Interpedia project can be a part of all this, at least, and maybe more, since the encyclopedia metaphor is such a natural one. Stuart Spivack suggested a library metaphor, and I have thought long and hard about that, but libraries are not nearly as well integrated as encyclopedias, and we want the Interpedia to have at least the appearance of a well-integrated whole. Also, card catalogues and their online equivalents offer only listings to books — in contrast, an encyclopedia offers instant information and references to books, often an annotated bibliography.

Once the Interpedia starts, there will be no stopping it, and it will become the size of a large library, but it will still be more like an encyclopedia, partly because its first and best supported function will be as a quick reference. Jeff says it won’t be more, and perhaps shouldn’t be more. With all due respect, I think this is somewhat shortsighted. It will grow, and it will grow, and it will grow. Why won’t it? What’s to stop it? Why shouldn’t it be more than a first resort? If people want things to be added to it, who are we to say no? If people want it to be an access point for the very latest scholarly articles, and for all manner of discussions, who will dare to stand in their way?

So I am sure that the Interpedia could indeed become an easy to use general interface to the Internet, and that is indeed what more and more people want. So, this little project of ours could be something big.

And it could have a great impact on society. It could bring unimaginable educational resources to schoolchildren everywhere, benefitting most the underprivileged, whose ghetto and village worlds will suddenly have a window on the universe. People in the various K12 projects are already demanding that the Internet serve this role, though they don’t seem to know quite how it can do so. But they recognize a need for an interface a child can use. We can provide that need. If we don’t, somebody else will, and I think they will end up with something like an Interpedia, no matter where they start. But we already have that vision, so we can streamline the process and get them what they already want with much less trial and error.

People in underdeveloped countries or developing countries, (or whatever the PC term is this month) are already looking to the Internet as a way of bypassing all the obstacles to higher-learning that developed countries faced. They cannot wait the many years it will take to create modern universities with good facilities, including a good library and a good faculty. They want and need all the educational advantages that the Internet can provide, and are already hoping it will be their salvation — but the Internet as it stands is too hard for most of them to use. They need what the Interpedia can provide, a powerful tool that does not require highly skilled people to use it.

So these are my reasons being wildly enthusiastic about the Interpedia project: it can fill a recognized need, one that will someday be filled one way or another; it can fill that need sooner than any other approach, since we already have such a fine vision, and it will help children and the underprivileged, including people in developing countries.

Those seem like pretty good reasons to me, and I’d hope others agree. If any significant fraction of what I have said here is true, then the Interpedia project deserves your consideration. As I said in a previous message, we must depend on people who are willing to take the initiative and publicize this project, encouraging others to join our mailing list. I hope you will all do so.

Several years ago I posted on the Internet a message suggesting that the communications and organizational problems necessary to produce the Interpedia would also suffice to solve the optimization problems of society.

This argument, which I will reproduce below, amounts to a reduction of one set of problems to another. Instead of saying that we need to solve the combinatorial problems of society, and saying that we also need to solve the organizational problems of creating an Interpedia, this argument states that solving the Interpedia problems would result in a solution to the optimization problems.

The most fundamental idea behind the Interpedia was the idea of universality, the idea that the Interpedia would provide easy access to all the information and other resources on the net, including text, tables, music, video, newsgroups, online discussions, and people.

With so much to choose from, how could any user find what he or she wants? The answer was to say that everything that is available would have an associated descriptor, and that people could search for what they want by using these descriptors.

Did this mean that people would have to know the descriptor language?  No, not at all. Some people, those involved in operating and maintaining the Interpedia, would surely know the descriptor language, but most people need have no knowledge of it.

People using the Interpedia were to have terminals or workstations that would be smart enough to translate human requests into the descriptor language.

Let us look briefly at some of the intended modes of interaction with the Interpedia. (I’ll switch to the present tense here — in fact this article was first written in the present tense for a project that was still alive, but I have been translating it in some sort of nostalgic subjunctive up to this point.)

To begin with, assume you are a regular Interpedia user. When you sit down to your terminal and activate it, what you see first is probably what you last looked at, at the end of your previous session.

I believe that the most natural way of interacting with the system is through a number of knobs and switches. An example of a knob would be the Reading Level knob: at the low end of its rotation it indicates the reading level of a child in the first grade, and at the high end it indicates the reading level of an expert in the field, from academia.

My notion is that this knob is a virtual knob, which can be called into existence if you want to look for an article at a different reading level, but is not normally displayed. In fact, all the knobs and switches which control searches are virtual knobs and virtual switches: available when needed, but not normally displayed.

But if you do call up the Reading Level knob, and assign it to one of the actual physical knobs on your workstation, then you can use it to ask for articles at different reading levels. Suppose you are looking at an article on elephants, written at (let us say) college level. Your daughter in grade one sees you reading it and wants to know about elephants. You need then to instruct the workstation to search for a similar article written at the first grade reading level. So you call up the Reading Level knob and twist it counter-clockwise to the extreme position. If such an article exists, anywhere in the world, it will soon appear. If not, the very closest match will be shown. Perhaps the closest matching article is at a grade 2 reading level. That article will be displayed, and its actual reading level noted somewhere on your screen.

Inside your workstation, what has happened is this: the article you were reading had a descriptor. One field of that descriptor may indicate reading level. When you call and twist the knob, your workstation generates a new descriptor, with that field altered, and sends out on the Internet a request for the best match to this new descriptor.

In articles posted on the Internet several months ago, I described a somewhat similar process that used vectors to indicate content. I suggested that Interpedia articles be mapped into a vector space, and that the searching and storage of article be related to that vector space. The only difference is that I am now speaking of a descriptor language, where before I spoke of vectors. In fact, nothing has changed, since at the innermost heart of the process, I still envision a vector space. The space of possible descriptors is no more than an intermediate layer between the human being and the vector space.

This is rather similar to the use of assembly language to program a computer. The human being writes expressions like JSR GETINPUT, or ANDX count,y and each of these expressions stands for a simple machine language instruction. Human beings can and have programmed directly in machine code by writing hexadecimal numbers instead of these mnemonics, but it is much easier to write in the more easily understood assembly language.

So in terms of the vector space, here is what is happening during the above scenario:

When you call up the knob for reading level, you are defining a hyperplane in the vector space that contains all articles a the same reading level as the one you are reading. And you are defining a line, which is orthogonal (perpendicular) to that hyperplane, intersecting it at the point which represent the article you are reading. This line joins articles with exactly the same content, but different reading levels. Movement up and down this line is governed by the reading-level knob you have called into existence.

You may have found that article by searching the whole space, without limiting your search to that reading-level hyperplane, but when you twist the knob to the extreme counter-clockwise position, you are definitely asking for an article on a specific reading-level hyperplane: the one that contains all articles written at a grade one level. And you will be given the closest matching article, whichever that is.

We may note that the vector space is infinite, but the number of actual articles is very finite. So finding the closest match implies some metric (distance measure) for the underlying vector space. This implies some way of doing the trade-off between what you asked for and what is available. Suppose that the article you were reading was about the dietary requirements of elephants. The only available articles may happen to be a very general discussion of elephants written at the grade one level, and a more specific one about what elephants eat, written at the grade three level. Which one do you get when you twist the knob?

It should be obvious that there needs to be some default choice, and that there should be some other knob for changing that choice. In the articles I posted about the vector space, I suggested that it needs to be a complex vector space.  Complex numbers need to be described by two ordinary real numbers, which can be either real and imaginary components or magnitude and phase components.

This suggests the obvious conclusion, that all knobs must come in pairs. I consider that a pair of knobs represents the magnitude and phase components of a complex number. In the example above, the phase component would represent the actual reading level, from grade one to the heights of academia. The magnitude component indicates the importance of reading level. Twisting it counter-clockwise reduces the importance of reading level, and would therefore force the selection of the article on what elephants eat written at the grade three level. Twisting it clockwise would emphasize the importance of reading level, thus forcing the selection of the more general article on elephants written for grade one students.

It may seem at first glance that this is no more than a powerful educational tool, which would have no more effect on society than an improvement in the ease in which people acquire information. But it will be more so, much more so, because of the Universality Property of the Interpedia, the property which guarantees that the Interpedia provides easy access to all resources on the Internet.

By far the most important resources on the Internet are not databases but people, and the Interpedia must make it easy to find and communicate with exactly the right people.

In a message posted several months ago, I tried to define several important properties that any network interface should have. One pair of properties from that list is particularly important:

  1. You should be able to quickly access precisely the messages that are most relevant to you, without having to scan lots of irrelevant messages.
  2. You should be confident that any message you sent out is reaching precisely the people most likely to find it relevant to them, to your benefit.

These two properties are clearly a complementary pair: one delimits the messages reaching you, ensuring that you neither miss important messages, nor receive irrelevant ones; the other channels the messages you send out so that they are seen by the people who you would want to see them, and only those people.

It should be obvious that this type of delimitation and channelling of messages necessitates the use of descriptors for people, as well as for text. This can only be done if we have very good privacy established — we do not want strangers to have access to our personal descriptors. I feel certain that various cryptographic methods can be used to ensure that our personal descriptors are available for use in channelling messages, but not available for anyone to examine or manipulate.

It is clear to me that by making sure that messages flow between people who share some common interest or need, the Interpedia or other network interface will establish links between people. If the software works well, these links will be desirable ones, and therefore the combinatorial problems of society will begin to dissolve.

For example, we may consider the needs of employers to contact potential employees and the converse need of people wanting a job to contact potential employers. Neither employer or job-seeker need broadcast their messages to hundreds or thousands of people who will find them irrelevant, instead the employer’s messages would reach precisely the most likely candidates for the job opening, and the job-seeker’s messages would reach precisely the most likely employers.

Is this kind of precision channelling of messages possible? I believe it is. And given time and energy, I think I could describe exactly how to do it, or even write the software to do it. What I have been doing is working towards such a goal by experimenting with vector spaces that encode meaning or content, and by trying to develop a descriptor language that would provide an easy way for humans to work with such a vector space.

Interpedia software will have to make it possible to navigate quickly and easily through a very large number of articles. This is not possible with lo-tech methods such as pointer chasing, so it is important to take advantage of the algebraic structure of articles.

In particular, it will be important to use powerful methods from mathematical disciplines such as linear algebra and group theory. All of the operations available to the Interpedia user can be considered as elements of a group of operators. For any group of operators it is possible to define a representation of that group in terms of an underlying vector space.

In such a representation, each operator is represented as a matrix, and composition of operators is represented by matrix multiplication.

A specific article in the Interpedia can be represented by a vector, which can be thought of as representing a point in the underlying vector space over which the group of operators is represented. To apply a specific operator to get a related article, you multiply the vector representing the current article by the matrix representing the operator, and the result is a vector representing the related article.

Although hypertext will be a key aspect of the Interpedia, hypertext is theoretically ugly, so I talk mostly in terms of relations, functions, and operations. Given an article, the user may want to access a number of related articles:

  • a more general overview of this and related topics
  • an article on the academic discipline associated with this topic
  • an article on the history of this topic

These relations are part of the underlying algebraic structure of Interpedia articles.

Here’s a few more design concepts, in point form:

  1. Text and data files will have unique identifiers, but will also have an associated vector representation, which is probably unique, but need not be so. This vector representation is made up from concatenating two vectors with fewer components:
    • Associated with each topic will be a complex vector that defines this topic as a point in an abstract topic space. (Yes, math-people, I mean complex, not complicated.)
    • Associated with each article is a complex vector which defines the quality and style (etc.) as a point in an abstract attribute space.
    • Each user will have an associated preference vector, which defines the attributes of quality and style (etc.) that the user desires, in the same abstract attribute space as in item 9b, above.
    • For keyword access, the keyword supplied by the user is looked up in a lookup-table which will return a vector representation of that keyword as a point in the same abstract topic space as in item 1a , above. This vector is concatenated with the user’s preference vector to form a vector which can be matched against the vector representations of text and data files.
    • By using a vector-matching algorithm, the available text and data files are examined and the best matches found. This will normally retrieve at least three distinct types of files:
      • text files, for articles
      • link files, which describe hypertext-links for text files
      • structure files, which describe the position of a text file in a hierarchy
      • Users may contribute new articles, and may also create new link files and structure files. Often these will be slightly modified copies of existing files — that will be especially true for link and structure files, which are small non-ascii files containing a few bytes of information: if a user adds a new hypertext link to a text file, this actually makes a modified copy of the existing hypertext link file.
      • The attribute vector representing the style and quality of an article (and those for it’s associated link and structure files) are arrived at on the basis of reviews by other people, with some consideration given to its originator’s views, but is not the same as the preference vector of its originator. A person may prefer to read authoritative articles, but that does not mean that what he or she writes is an authoritative article.
      • Anyone may add articles, add links to an article, and place an article in a hierarchy, but the only effect of this is to create new files. The attribute vector of newly created files is such that they will not be confused with well-reviewed files. Anyone may review a text file, an assignment of hypertext links for a text file, or the positioning of a text file in a hierarchy, and the effect of reviews is to change the attribute vector of these files, increasing or decreasing the chance of the file being selected in the vector matching process of the search engine.

        (The way this is done is rather technical, having to do with the way attributes are encoded as complex vectors, and will be the subject of a further design note.)

      • A mean vector may be created by a complex-arithmetical mean process from a set of vectors representing a number of articles. If this is done for all of the articles stored on a particular host, that mean vector can serve as representation of the subject area served by this host. As well as the lookup-table which associates vectors with articles, the more advanced prototypes will have a lookup table associating vectors with hosts.   If a sufficiently good match for a search vector is not found on the article lookup-table, the search engine will move on to the host lookup-table, and find the host which is the best match to the search vector. A search request can then be sent to that host, which will compare the search vector with the articles it contains (and possibly also search its host lookup-table, to continue the search.)
      • If a text, link, or structure file contributed by a user is not a sufficiently good match to the mean vector of the host on which it originates, either in topic, quality, or style, (etc.), then this host may search its host lookup-table to find a better location for it, and send it there. In this way, text and associated files can be clustered on hosts where they are most likely to be used or appreciated or welcome, regardless of where they originate. (If unwelcome anywhere else, presumably the person who contributed them will be willing to take them back on his or her own personal machine.)

        (Some of you may recognize this as an attempt to solve the optimization problem underlying any attempt to create an Interpedia — a given item has one best location, even if copies exist elsewhere, and if we store it at that location (or near it), then overall network traffic and access time will be reduced.)

I hope this shows how we can let anybody add anything to the Interpedia, without disturbing people who only want to see quality stuff.

Most people will set preferences which demand articles (etc.) with some good (and no bad) reviews from those they consider to be experts.

If other people add articles or screw around with the hyperlinks to an existing article, all they succeed in doing is creating files whose associated attribute vectors indicate they are unreviewed. But people who seek out articles (etc.) to review will find those files, and can make subtle, limited changes in those vectors, until eventually these new files are better matches than the old ones.


Copyright © 1993 and 1998 Douglas P. Wilson    


NewFut.fut


Related Web Pages are:

The main Social Technology page.

FindCompatibles , the key page, with the real solution to all other problems explained

Technological Fantasies , a page about future technology

Practical Immortality , not the immortality of the body, nor making a copy of the mind in a machine,  but actual transfer of a person, personality, memory  and consciousness into a supercomputer

Social Tech a page about Social Technology, technology for social purposes.  I think I was the first person to use this phrase on the Internet, quite a long time ago.


Roughly corresponding to these web pages are the following blogs :

FindCompatibles devoted to matching people with friends, lovers, jobs, places to live and so on, but doing so in ways that will actually work, using good math, good algorithms, good analysis.

Technological Fantasies devoted to future stuff, new ideas, things that might be invented or might happen, such as what is listed above and below.

Practical Immortality yes, practical immortality.   Don’t write this off as insanity, please.  See the first entry in the blog first.

Sex-Politics-Religion is a blog about these important topics, which I have been told should never be mentioned in polite conversation.  Alright that advice does seem a bit dated, but many people are still told not to bring up these subjects around the dinner table.

I believe I was the first person on the Internet to use the phrase Social Technology — years before the Web existed.

Those were the good old days, when the number of people using the net exceeed the amount of content on it, so that it was easy to start a discussion about such an upopular topic.  Now things are different.  There are so many web pages that the chances of anyone finding this page are low, even with good search engines like Google.   Oh, well.

By Social Technology I mean the technology for organizing and maintaining human society.  The example I had most firmly in mind is the subject of  FindCompatibles , what I consider to be the key page, the one with the real solution to all other problems explained.

As I explained on my early mailing lists and later webpages, I find that social technology has hardly improved at all over the years.   We still use representative democracy, exactly the same as it was used in the 18th century.  By contrast, horse and buggy transporation has been replaced by automobiles and airplanes, enormous changes.

In the picture below you will see some 18th century technology, such as the ox-plow in the middle of the picture.  How things have changed since then in agricultural technology.  But we still use chance encounters, engagements and marriages to organize our home life and the raising of children.  

I claim that great advances in social technology are not only possible but inevitable.  I have written three novels about this, one preposterously long, 5000 pages, another merely very very long, 1500 pages.  The third is short enough at 340 pages to be published some day.  Maybe.  The topic is still not interesting to most people.   I will excerpt small parts of these novels on the web sometime, maybe even post the raw text for the larger two.


This site includes many pages dating from 1997 to 2008 which are quite out of date.  They are included here partly to show the development of these ideas and partly to cover things the newer pages do not.  There will be broken links where these pages referenced external sites.  I’ve tried to fix up or maiintain all internal links, but some will probably have been missed.   One may wish to look at an earlier version of this page , rather longer, and at an overview of most parts of what can be called a bigger project.

Type in this address to e-mail me.  The image is interesting.  See Status of Social Technology

Copyright © 2007, 2008, 2009, Douglas Pardoe Wilson

I have used a series of e-mail address over the years, each of which eventually became out of date because of a change of Internet services or became almost useless because of spam.  Eventually I stuck with a Yahoo address, but my inbox still fills up with spam and their spam filter still removes messages I wanted to see.  So I have switched to a new e-mail service.  Web spiders should not be able to find it, since it is hidden in a jpeg picture.   I have also made it difficult to reach me.  The picture is not a clickable link.  To send me e-mail you must want to do so badly enough to type this address in.  That is a nuisance, for which I do apologize, but I just don’t want a lot of mail from people who do not care about what I have to say.

This entry was posted in Old Pages. Bookmark the permalink.

Leave a Reply