Note: Many humanities and social sciences people, especially those interested in politics, and many ordinary people as well, have an apparent inability to believe that anything containing the word ” optimization ” could involve low-level interpersonal networking and be strictly consensual — they insist in looking for the controllilng hand of the bureaucrat or capitalist. That is a complete misinterpretation, and so other names or expressions for SNO are being sought, such as the term “computer assisted social activity”, as adopted temporarily for the CASA Proposal .
This is a quick sketch of a prototype personality and skills matching system. It was intended only as one part of a system that would be extended to use combinatorial and graph theoretic methods.
There was a secondary title with information that this was a Social Network Optimization (SNO) Prototype. That makes sense only after the following explanatory paragraphs lifted wholesale from the SocialTechnology.ca homepage :
SocialTechnology.ca is an organization devoted to finding and promoting genuine solutions to social problems. Older solutions that are genuine enough, but too limited, include those discussed on the brief Introduction to Social Algorithms but such solutions are surprisingly restricted and ineffective in comparison to solutions based on combinatorial and graph theoretic methods. So currently the solutions receiving the most attention involve the creation of new social structure through the matching of individuals, organizations, and social contexts .
This approach to social problems is based on the following observations:
- the poorest people are those who lack social connections (jobs, family, friends) and the route away from poverty almost always involves making such connections
- the most dangerous people are those whose social connections are poor or inappropriate (loners, gang members) and the only way such people stop being a threat to themselves and others is through better social contacts
- the saddest and loneliest people are those with poor social connections (unmarried, few and poor friendships, etc.) and these are also the people most likely to commit suicide unless they gain such connections
Poverty, crime, and social disorder are all symptoms of social mismatch or the lack of social integration. For human society to survive and prosper almost everybody should be well integrated into society with suitable work to do and strong social relationships. To make that possible it should be easy to find productive work and to maintain viable communities based on strong interpersonal relationships.
- It should be easy to find a good job, but it isn’t. It is even harder to find a truly suitable job which matches your abilities.
- It should be easy to find a compatible spouse or sexual partner, but again, it isn’t. And it is much harder to find someone truly compatible where the attraction is mutual.
- It should also be easy find close friends and make other important social connections, but even this is hard, and it is especially hard to make long-lasting friends.
All of this may soon change with the introduction of new social technology.
This sketch, with other analysis and design documentation will be written up as (perhaps a small) part of the (yet unnamed) knowledge product for the Social Technology Pavillion at Frontiers, and I also have serious intentions of implementing and testing what is described here.
There are several goals or targets, which will eventually be written more formally as a requirements analysis document. Here are some of these goals:
- Privacy and Security — people may be reluctant to participate if their personality profile or other personal data is not keep secret. This information must also be kept secure against outside meddling — we can’t have people changing other people’s data.
- Theory-proof and Theory-correcting. This system should not depend on any particular theory of personality, and although some theories may be used for making initial estimates, the use of this system should bring in new information which can be used to correct the theoretical models.
- Easy to use. The system should be user-friendly and not require a lot of work from users up front. It should also be easy to upgrade personal information.
- Aid to self-awareness. Whether or not a particular user wants matching done, the data should be available as an aid to self-awareness.
- Fun. The system should be fun to play with.
- Any and all use of this must be entirely optional, with lots of chances to back away from involvment with the system, and in particular it should be very easy to ignore or back away from suggestions made by the system.
- The system must confine itself to suggestions, and these suggestions must be keep a secret from all but the interested people themselves.
More goals will be added. Now here are some design ideas and the further goals generated by and associated with them:
- Initially the user will be asked to fill out a questionaire.
- Questions should be interesting.
- It should be possible to play with the system, answering the questionaire several times with different answers, to see what happens.
- Feedback should be immediate, or at least quick.
- If possible an interactive system that shows personality and skill estimates as they change with each question answered
- The user should be able save answer sets, load them, modify them, and to submit them as an experiment or as real submission for use.
- The system should support several standard personality measures, such as Myers-Briggs, Cattell, Sheldon, and some skills or aptitude measures as well.
- For security users will be given a randomly chosen number or an alias they choose, or both, so that whoever is processing the data (me for the time being) will not know what data set represents what person.
- This is not good enough for true privacy and security, so provisions to upgrade to a safer system should be made in advance.
- Test questions should be chosen to reduce the possibility of abuse of data — e.g., though it would be nice to include birthdate information, that is too easy to misuse, so it can’t be used. No data that identifies the user should be collected.
- Matching suggestions should be made in a careful way using aliases or numbers (or both), with some kind of handshaking protocol to let people mutually identify each other after some preliminary exchanges of information. E.g. both user 57809 and 12744 may be told that the system has suggested a match for some purpose, then if that is acceptable, an anonymous meeting can be set up, and if that is successful the two can reveal themselves to each other.
- matching for working-together-on-some-task is a high priority since we all have so much work to do
- matching for the buddy system should be a high priority, because it is needed in Frontiers
- matching for simple friendship is easy enough, I think
- matching for more intimate connections is something many people want, but we have to be very careful about it — to be discussed elsewhere
- matching for real world jobs is a future possibility, may need to be developed in cooperation with some employment agency
I have written about all this elsewhere, and will be writing more about it. This is just a draft, no, just the sketch of a draft.
The key thing that is different about this proposal is that I want to actually implement it, using CGI and Java with HTML, and I want people to start using it as soon as possible.
Implementation notes, with further goals and design ideas generated from and associated with them:
- The user interface will be a combination of HTML text and forms with a Java applet that will let the user answer questions and will show personality estimates in some graphical form — perhaps several forms, maybe a pie chart for Sheldon stuff, a table for Myers-Briggs, and a histogram for Cattell stuff.
- To make it interactive, the estimates should be based on Bayesian estimation with the estimates changing as each question is answered.
- For fun, the user should be able to play and fantasize, entering ficticious names like “Adolph” and trying to guess how Hitler would have answered the questions. It should also be possible to do matching in small ficticious populations, such as imagining an island with all sorts of notorious villans and heroes, and seeing who should have done what with whom.
- For real matching the system should keep for each user a best set of estimates — or for security it may be best to for each user to keep his or her own best estimates.
- Real matching suggestions for real people should be as tactful and carefully done as possible, which may involve making suggestion forms available for discussion and community design.
- As an initial theory we can (at least) use my own theory which matches people who have interests, some aesthetic tastes, and some knowledge in common but are otherwise as different as possible. Actually use of the system will fix problems with this or any other theory.
- For the actual combinatorial optimization steps there are several algorithms already implemented in C, Pascal, and Fortran that can easily be translated if need be. It is probably best to use at least:
- a bipartite matching (each person gets one match) algorithm
- non-bipartite (Gabor’s algorithm) in which each person gets at most one match, but some people may be left unmatched
- a minimum-spanning-tree algorithm (I use Kruskal’s) in which people are connected in a(n unrooted) tree structure, with some people getting several matches (links), but at least one good match or link is provided for everybody.
- Two types of data should be collected after suggestions are made:
- was the suggestion acceptable, was it acted upon
- if acted upon, was it any good?
- The key idea is to use feedback to maximize the last two measures, so that the suggestions are more and more likely to be acted upon, and so that when acted upon they are better and better.
That’s all for now. This is obviously a very big project, but the first step is just to get started, and writing this page is me getting started. More and better write-up to follow, and then I’ll start prototyping code.
What have I said here that you disagree with or don’t like? Must be something!
Graph Partitioning for Creating Teams and Discussion Groups
This is about a related idea for doing something with people-matching, in this case as part of the Frontiers presence at Club Connect.
I’ve written mostly about matching individuals, but that has a lot of complications in a very interactive environment. So for Club Connect I’d like to try something else first. The Club Connect room limitations immediately suggest a combinatorial optimization problem known as graph partitioning, which is harder technically but easier on the people involved.
I understand that on a typical evening Club Connect may have somewhere around 100 people scattered around the site. Let’s suppose for example that we are able to get 60 people to participate in our event — more about that later, this is just for sake of argument, as an example. At six non-ghosts per room that would mean we’d have to have 10 rooms available, but counting each of our personal apartments that’s probably a realistic number.
The problem then is to partition the 60 people into 10 groups of 6 so that the people in each room are as compatible as possible, with as few personality conflicts as possible. Each roomful of people should then have a lot in common, will enjoy meeting or being with each other, and should emerge from the experience as friends.
In order to encourage this bonding it is probably useful to have each group of people work together on some problem — that will take more thought! This might well be the place where the role playing part will come in.
The rest of this message is somewhat technical, so I’d like to suggest you stop here for a moment and think about this basic idea: people wanting to participate provide us with some information about their preferences, interests, and personalities, and then we divide a large group of participants into much smaller groups that can have private conversations and perhaps work on some joint problem or game.
If all the data collection and analysis works well, each small group will have a very positive experience and will emerge from the exercise with 5 friends.
I would propose trying to make this a fairly frequent event, so that it might become the standard way for people to make new friends at CC. It might even be done weekly, though I’d suggest monthly is a more reasonable frequency to start with. It could then be made more or less frequent depending on how it works out.
I think it’s worthwhile writing this up a bit more formally for the Club Connect newletter, and it may amuse people to read just how many different ways there are of dividing 60 people into groups of 6, a number in the tens of millions, I think — I’ll work it out. The number will be slightly reduced by the need to have the room owner in each room, but it is still big.
I refer to this as a graph partition problem because the first step is to visualize (the possible) social connections between people as links or edges joining the people, and the resulting structure is called a graph — I’ll post a picture when I put this up on the STP round table. I refer to people as nodes or vertices sometimes because those are the usual terms in graph theory.
Graph partition is actually an NP-complete problem, by the way, but there are good approximation algorithms.
In order to make this work we need to get participants to fill out a questionaire — there are some privacy and security problems in that, which I will only hint at right now, (but I am working on them).
There are people who don’t like filling in questionaires, but for now let’s suppose we can get enough people willing to cooperate. I think the results will be good enough to encourage others to join in, next time we run the event.
The hard part is finding questions to ask, and that partly depends on how good are the privacy guarantees: at one end of the scale, maximum privacy, we can ask anything and people will feel free to answer honestly, but at the other end, with less privacy, we might only be able to ask the most general questions about interests.
For sake of discussion I’ll assume what I think is the most useful and easily handled level of privacy — the one I prefer — where whoever is processing the data (me) can read the answers given to each questionaire, but does not know who filled them in. This is not very good privacy, in general, but should be adequate if we restict the questions somewhat. I’m quite willing to try for more privacy, but for now let’s assume this level.
As well as general questions on personality and interests, I’d like to ask for known avatar likes and dislikes — disguising the replies with a serial number. So if I mentioned that my avatar would rather not be in a group with one named, oh, say, Rat Fink, to choose an example at random, the person compiling the data will see only that person 1432 would rather not be put in with 7893, and have no way of identifying either individual.
These questions about known likes and dislikes are very important for getting started, though I’m sure some people will feel uncomfortable about them. I think we could eventually phase them out, if too many people are unhappy about them.
At some point it may be useful to have some external monitor verify that the collected data has been run through a numbering program to replace all avatar names with numbers.
Other than those specific questions about known likes and dislikes amongst the avatar population, I’d like to collect a set of personality and interest questions into a database, where they can be sorted and priorized by the program. This database can then be changed as a result of feedback after each event.
As well as giving people a questionaire before the events, I’d like to encourage them to supply some information afterwards, so that the database can be updated and the usefulness of various questions can be evaluated.
I would also like to collect all responses from before and after each event in a database, and that poses more serious privacy problems. I’d like to try and get an external monitor for that, too, eventually, but perhaps not right away. The basic idea would be to limit access to the database to trusted programs that are available in source code form for monitoring.
In one of her comments Rowena mentioned that my concern for privacy seemed more than necessary, but I expect the reverse is more likely the case. The more guarantees of privacy we can provide the better this will all operate.
I haven’t said enough about the questionaire questions, but those will be very similar to questions on the more common short personality tests, like Myers-Briggs, supplemented with questions about group behaviour (aggressiveness, shyness, impatience, and so on). In one posting on the STP round table I mentioned plans for something like this to run from my own web site, and in that I mentioned preferences for entertainment such as movies and music. It may well be a good idea to include that as well, but I don’t want to start out with too many questions.
The key idea is to collect many questions, but priorize them by effectiveness and only ask a dozen or perhaps 20 of the most effective questions. Actually it’s not that simple, since effectiveness in context should be considered, but there are ways of dealing with that, and I’ll leave the details for later.
In one of the earlier posts on the STP round table I mentioned trying to do this in a “theory-proof” way that does not depend too much on any psychological theory but is more of a technology — a set of general tools and techniques rather than an application of some theory.
I think you can see that asking people about avatars they know they like or dislike is rather theory-proof and the answers provided can simply be used to establish or veto links between nodes (edges between vertices, or possible social connections between people, in other words).
There is also a sense in which almost any set of answers about personality and interests can be used immediately to establish or veto links, without reference to any psychological theory. Basically if two people are clearly similar, giving very similar answers to the same questions, then people who have stated they like one of these people will probably also like the other — and people who disliked one will probably also dislike the other.
This lets us extend the stated likes and dislike to other people, including newcomers that nobody knows yet, but it doesn’t help us with people who are not sufficiently similar to any others.
That does gives us two “theory-proof” methods, however, one based on stated preferences for other avatars, another deduced from that by similarity analysis. The collected results from both methods can then be used by an estimator-generator, which also does not depend on any given psychological theory, although the estimator generated is very much like a theory.
There are two estimator generation scheme that should be used. Both take personality and interest questions as a source and the results collected above as a target, and generates an estimator that maps the source data into the target data and generalizes the mapping to guess what the target data would have been for cases where it is missing. This is the part that lets us produce estimates for people who are not very similar to any others.
The simplest estimator-generator generates a Best Linear Unbiased Estimator (BLUE) using standard statistical techniques. We can go further to generate a non-linear estimator by using neural network techniqes — using the paired source and target data to teach a neural network to produce the target data from the source, then applying the trained network to new data. The abilities of a neural network to generalize beyond the training set depend on the kind of data and the implementation, but can be significantly better than a BLUE.
As well as all these methods that are more or less theory-proof we can apply a few methods based on various psychological theories. I’m rather suspicious of any psychological theories except (of course) my own — and I’m hesitant even to recommend that one! But I think we can use them in a limited way to check the results of groupings suggested by applying the graph-partitioning algorithm.
I have written programs of my own for graph-partitioning, but there are at least two packages written by the experts that are available over the internet. These packages, Metis and Smooth, were originally written for dividing up large sets of mathematical data amongst the several independent processors in a supercomputer system, and are quite reliable. I’ve often made of use of Metis for partitioning semantic networks — yet another entirely different application!
I will be working more on this and a few other people-matching ideas, but I’d like to encourage you all to comment on this one — even if you don’t understand all of it. Technical details aside, what do you think? Please e-mail me with comments or post a message on the online forum at the Frontiers Social Technology Pavilion. For more information on social technology in general please see The Social Technology Page .
Copyright © 1998 Douglas P. Wilson