Approximation algorithms for huge instances of the
Traveling Salesman Problem
This page is now obsolete, but may be of some interest. What it says is not wrong, it is just out of context, missing the key information about why this problem is part of the greater project being undertaken (see links below).
(No-Backtracking, Minimal Look-Ahead, Fast Enough for Huge Problems, Not Too Bad an Approximation)
Motivation: far from being an academic exercise, the travelling salesman problem has several real-world applications and shows up in research work in many areas. The algorithms discussed below grew out of work on the Acronymic Language project. For that project a need arose to (approximately) solve a TSP for a graph with millions of vertices.
The key notion for the first algorithm is maintaining a priority queue of edges, where the highest priority is given to edges whose unavailability would cause the most grief later on.
Repeatedly take the edge with the highest priority and add it to the tour. Each time this done, the two vertices the edge joins will have one fewer possible edge, from a maximum of two (one incoming and one outgoing edge per vertex). When the number of edges joining a vertex to the tour reaches two, no more can be added, and therefore all edges incident on that vertex become unavailable (and are therefore removed from the queue). An unpleasant side-effect of declaring an edge unavailable is that it may force the vertex at the other end of it to have longer connections to the tour. So we priorize edges to minimize that possibility. A better algorithm might be quite clever at guessing the future side effects of using up edges, looking many steps ahead like a good computer chess program, but the one presented here just looks one step ahead. Good enough for now.
The edges incident on a given vertex can be sorted in terms of length. There may be many of them — one graph studied recently had 638 edges meeting at one busy vertex. If these are sorted, then the two shortest edges represent the best possible way of connecting this vertex to the tour — ignoring the entire rest of the graph. If the shortest edge becomes unavailable because the vertex at the other end of it acquires both of the edges it is allowed in any tour, then the best way of connecting the current vertex will be through the second and third shortest edges. That will not improve anything — at best the sum of the lengths of the second and third shortest edges will be no more than the sum of the first and second was, but in a great many cases, most cases, the sum of second and third will be longer and so the best possible tour (from the point of view of this vertex) will have gotten longer. That increase in length is the penalty paid for losing the use of the shortest edge. From the point of view of the vertex at the other end of this edge there is also (probably) an increase in overall tour length if the edge becomes unavailable. The sum of these two numbers represents a first crude estimate of the damage done by prematurely losing the use of that edge, so it is also a good priority figure to apply to the edge, associating therefore a higher priority with a larger penalty (larger estimated increase in tour length if edge not used).
If all vertices have lots of edges of nearly the same lengths, then there are many almost equivalent solutions. This algorithm deals with the other kind of vertex, one with a few edges of dramatically different lengths. In that case it is vital to avoid the long edges. So the shorter edges of those vertices get a higher priority. Priorities are adjusted as we go along.
Note: almost the same algorithm is also used for huge instances of a bipartite matching problem . In both of these the problem is to select a special subgraph of the orginal graph. Like the TSP, the Minimal Spanning Tree problem involves creating a connected subgraph with all the original vertices. The Bipartite Matching problem involves creating a graph with half as many connected (2-vertex) components as the original graph has vertices. But all three involve a minimization over the length of edges in the result. The MST problem is easy because there are no side-effects and so the algorithm can be greedy.
This algorithm does no backtracking — once an edge is added to the tour it is there to stay, we never exchange it for some other edge. There is some moderate look ahead : when an edge is added to the tour (because it had the highest priority) we must re-examine the two vertices it connected, and if either or both of them still have one edge to be added, the other edges incident on them must be examined and possibly re-prioritized. But we don’t go any further than that. And so the result might not be very good. It will be (roughly) as good as it can be without backtracking and deeper lookahead, but that’s pretty good in a lot of cases.
A data structure:
List the vertices in (some) order.
For each vertex list the incident edges, sorted in ascending order by length.
Pack this into an array, vertex after vertex and within each vertex the edges. (array has as many entries as edges * 2)
Index into this array the start and end of each vertex’s edgelist in another array (array has as many entries as vertices)
(use tool to create problem instance then turn into table formatted like this:)
Number 1 1 1 2 2 3 3 3 3 3
(this page is an example of Work In HTML Directly)
Some kind of Code-And-Go where I could write Java (or some other language) right here and test it as I go would be nice. — Is that possible with what I have? I suppose I could set up everything so this page points to class files with a prespecified name, click into ms-dos prompt to write code and compile it, then click back here to run the class files. Try it! For this problem what would be nice would be having simultaneously the visible graph and the tabular-format arrays, both changing in real time as the program runs — algorithm animation on graph and array at once.
Settle for less! (but plan for more later — graph and array could be kept logically separate, windowless mirrors, etc.) Note: you are using html for memo and idea stuff. Could this be made better and integrated with memoware?
(My priorities, though, must be text first, then diagrams (static ones), then java stuff that moves).
(in part this and other graph theory pages are being included to teach the non-mathematical the stuff they need to learn to understand the rest of the pages, hence all the english)
Copyright (c) 1998 Douglas P. Wilson