Can It Work?

Some problems are easy, some far to difficult.  Can we actually create software which will do what is necessary?

The most important distinction goes under the name “Inverse Problems”, which will be the title of a web page soon. This is shorthand for a well understood mathematical phenomenon that has been at the root of modern mathematics since Newton, at least. It has been very difficult to explain this to people until very recently when technology based on inverse problems has become commonplace.

When it’s ready I’ll refer you immediately to that Inverse Problems page for a better explanation but for now you should simply consider the nature of the public key cryptography now used all over the net. Public key cryptography is used in Pretty Good Privacy, PGP, and similar cryptographic systems based on the well-known mathematical phenomenon wherein two problems which are inverses of one another are known, but one of them is easy and one hard.

Usually the problems involved are multiplying two large numbers to get their even larger product and the inverse problem of factoring a very large number to demonstrate that it is the product of two smaller ones. When the numbers are integers, (sometimes called whole numbers), and the multiplication must be exact, the second problem, factoring, can be very very hard, while the first one remains easy.

Mathematicians and computer scientists often use the shorthand phrase “inverse problems” to describe this situation. It is a misleading term, but a common one. Strictly speaking multiplication and division are inverses, as are addition and subtraction, but the term “inverse problem” is usually reserved for the case when the extreme difficult of one of these inverses causes a big problem.

Again, strictly speaking, encryption using a public key and decryption using a private key are inverses, like the multiplication and division they are based on, but the shorthand term “inverse problem” is used here to refer to the much harder problem of cracking the code so as to decrypt the encrypted text without knowing the private key beforehand. Another way to describe this is to say that PGP uses a “trapdoor algorithm” — if you push on the trapdoor from one side it opens easily, but to make it open from pushing on the other side is hard, you have to actually break it down.

I’ll hope all the publicity given to cryptography in recent years means you understand the basic concept, so I can just move on to other examples and to the bigger picture they suggest.

Another inverse problem comes up in data compression, especially in “lossy” data compression. Compressing files with zip or gzip is carefully done to lose no information at all so that the expanded (or inflated, or exploded) text recovered by decompressing the compressed file later reproduces the original uncompressed file exactly, with nothing lost in the translation.

If you have zipped up large directories full of big files as I do so often you may have noticed that it takes longer to compress the files than it does later on to unzip them. Compression and decompression are obviously inverse operations, that goes without saying, but they are not equally easy to do, and so we start to use the term “inverse problems” to say in shorthand that the inverse of the easy operation is a hard problem.

But in lossy compression such as is used in making jpeg and mp3 files, some data is lost — the file that results from decompression is not exactly the same as the file that was compressed. The trick is to make sure the data that was lost wasn’t important anyway, mostly just noise, while preserving the important data that we can think of as the signal. Another, even more powerful form of lossy data compression is fractal compression where very high compression ratios are available through mathematical magic that almost nobody understands.

In all these lossy compression systems it is much easier to decompress the compressed file (see the jpeg, listen to the mp3 file) than it is to create those files by compressing the originals. In fractal compress the situation is even more extreme and doing the original compression operation is very difficult and time consuming, much much harder than decompressing later.

The mathematics behind CASA and its companion projects is almost entirely based on inverse problems, actually on methods very similar to those used in jpeg and mp3 compression. The results are then used in combinatorial optimization operations that are also mathematically challenging, but none of this would be practical without the lossy compression step.

I’ll leave the discussion of technical details to the “How It Works” and “Does It Work?” pages. This page here is for a different purpose, the explanation of a very important concept.

On these pages I am trying to find a middle ground where there is none, in the middle of the vast crevass between C.P. Snow’s “Two Cultures”. I’ve spent a lot of time over the years in various kinds of mathematical activity that would interest the scientists and engineers on one side of the chasm but are mostly incomprehensible to the arts and humanities on the other side. But I’ve also spend a lot of time writing text in plain English (more or less plain English) which scientists and engineers often find to vague or imprecise — not technical enough. I am (somewhat) comfortable on both side of the gap, so I do both things.

As a result I (think I can) produce pages and programs so that others from either camp will find something of value in what I’ve done. Computer programs, especially the rather mathematical ones I usually write, are quite technial and formal objects — if written in a real language like ML they can even be proven correct. Pages of text like this one are very much informal objects that have no precise definition and aren’t meant to. They are meant to be roughly understood, not proven correct or used to calculate anything.

But one thing I don’t do very much is write formal mathematical equations or formal proofs. I’ve obtained a lot of symbolic mathematics software which makes manipulating equations quite easy and a lot of theorem proving software which makes proving theorems quite easy, but I still don’t do it very much. It has taken me many long years to figure out why that is.

For a long time I have found my own disinterest in equations puzzling. It’s not that I lack interest in equations found in books. I have often used a math text to find an important equation so I could translate it into a computer program. Especially in numerical analysis software it is often a quite mechanical task to translate the equation into a program, and that’s something I’ve often done. But I’ve never spent much time manipulating equations on my own — and have often felt at fault for not doing so, as if I am too stupid to do it creativly.

But in the past few years I’ve come to see this as a situation involving an inverse problem. Turning an equation from a book into a piece of software is quite mechanical, and in some programming languages like APL the resulting program looks a lot like the original equation. But the inverse operation is not so easy. It is not so easy to make a computer program (not derived from equations) that works, then turn it into a set of equations or a formal proof.

Similarly it is not very difficult to take an existing computer program that works and write an explanation of the underlying concepts and tell people how it works. But it is not so easy to go from concepts and understanding in the other (inverse) direction and make a piece of software out of them.

So, as you might guess, I have come to suspect an inverse problem at the heart of this. In recent years I’ve tried to describe this problem many times to many people without much success but maybe it will be easier now. My own code name for this is “SCIVTECH”, which I pronounce in a way that better reflects its linguistic roots as if it was spelled “skivtech”, and which abbreviates the distinction of “science versus technology”, “sci vs. tech”.

What Isaac Newton did was science, and his equations describing planetary motion have been held up as an example that has inspired many generations of scientists all over the world. But actually applying Newton’s equations and formulae is simply a matter of technology. One can download from the web some nice little computer programs that accurately predict the positions of the planets for hundreds of years in the future. Such programs are not at all intelligent — none of them does what Newton did when he figured out his scientific theory — they are just tools, applied science, techniques, or piece of technology.

It can be very hard to come up with mathematical equations modelling the real world and the history of modern physics illustrates just how hard it can be. But applying these equations is just a matter of technology. Not always easy, but just a matter of following where the equations lead.

So I see the inverse problem here as one of Science vs. Technology. Science is hard — often too hard. I think that’s why the so-called Social Sciences have been such a disappointment. They are trying to do the science, to come up with generalizations of human being and human society that might play a role analogous to Newton’s physical laws. That’s too hard. That’s the hard side of the inverse problem. The easier side is “Social Technology” and I intend to demonstrate just how easy that is by doing it.

But to talk of capital-S Science and captital-T Technology is to open philosophical beds of worms and breed arguments where what I really want to do is use tools and techniques to accomplish something useful, to make society work better.

It is much easier to talk of data compression and expansion, and to use them in situations involving other inverse problems that can be managed successfully only if we understand this vital distinction and avoid tilting at windmills. It is much easier to turn around and do the inverse, tilting away from windmills.

This exact distinction can be clearly seen in the work of psychologists who study the human personality, especially that of R.B. Cattell and his students. They asked people large numbers of questions and then used factor analysis to study the answers. The results were expressed as vectors (sequences of numbers) representing positions in a 20 dimensional space — more commonly reduced to 16 dimensions.

Apparently Cattell thought he was doing Science, but factor analysis is a mathematical technique based on the same methods of generalized fourier analysis as the cosine transforms in the software that makes jpeg compressed images. Just as we don’t think of these programs as little scientists that develop theories of pictures, just technology — tools, we shouldn’t really consider Cattell’s work as science. He was just prototyping various techniques and tools. He was doing technology.

CASA and the other projects described on these pages are also based on this technology. People will be asked to fill out questionaires and the results will be compressed into short descriptive vectors which compress or summarize the answers to the questions and thus the personalities of the people who supplied those answers. But the results are not going to be published as scientific discoveries. They will just be used to do something useful. In the CASA technology they will be used as inputs to other programs that will solve combinatorial optimization problems then turn the results into suggested matchings between people and people, people and jobs, people and other social activities, and so on.

No real science there, just technology. Social technology. The easy side of an inverse problem. Human beings are such complicated creatures and human society so much more complicated that only the easy side may ever be done. A general theory of people and human society has been proposed for centuries but we still seem a long way from it. I say that it is, like other kinds of science, the hard side of the inverse problem, and perhaps too hard to ever happen. But I’m sure we can manage the easy side and use what little we do know for something useful.

I’ve been talking and writing about this for many years but have somehow failed to get through to people. So instead of writing more long messages like this one and posting it to the SocialTechnology mailing list, I have been writing software to demonstrate and further develop the technology. It’s not all ready to go, but I do have a prototype — held together with duct tape, bailing wire, and epoxy cement, but more or less workable.

I’m now in the process of tidying up this prototype and installing it on the web for anyone to use. This is not going to be easy, so I hope I can get some help with it. As one way of getting help I am posting a request for it on every page I put up. Another way of making progress by myself is to put up as much as I can of what exists, with only minor changes, then invite people to play with it. This will take a while because there is a lot of junk to sort through.

I’ve made a firm committment to incremental progress — putting up a little bit each day and doing a little bit of testing of cleanup, testing, and proofreading as I go along. I’ve also make a firm committment to online progress reports, updated daily, at www.SocialTechnology.ca/progress.html (or one of two other pages, www.SocialTechnology.ca/status.html and the included, or framed file at www.SocialTechnology.ca/progplan.html which is better for those on slow modems because it doesn’t need a frameset).

After years of trying I have managed to interest a very small number of people who might be willing to help with these projects. So far they fall into two groups: those only willing to work on this if it is a non-commercial open-source project and those only willing to work on this if it is a commercial venture to make money. There are so few people interested at all that I don’t want to risk losing half of them by opting for the other. So there will be pages about both.

The non-commerial open-source stuff is written up in various places, especially on the old SocialTechnology.ca index page, now at www.SocialTechnology.ca/oldindex.html while the commercial venture is still described at www.SocialTechnology.ca/casa.html (and various other places not linked from it). An overview of the obscenely large overall project of which these two are but components can be found at www.SocialTechnology.ca/overalls.html — it is not for the faint of heart.

I am posting this on the SocialTechnology mailing list and will soon turn it into a webpage. I don’t know if this should count as progress and be recorded on my progress page (listed above), but trying to put all this in words on the web has been an ongoing of mine for many years now so I’ll mention it there anyway.

Your comments and question will always be welcome, and soon there will be a guestbook on the website to make that easier.

Copyright (c) 1998 and 2000  Douglas P. Wilson

This entry was posted in Old Pages. Bookmark the permalink.

Leave a Reply