Towards a Simulation of the World Economy
I’d like to write a program to run a simulation of the world economy, to see if any of the arguments from people like Jay Hanson follow from their own data, and then I’d like to do a sensitivity analysis to see what are the most important variables.
I propose to put this program up on the web in source code form as well as compiled binaries, so that everyone can check to see how plausible the model is, and can play with it a bit.
I do have some expertise in this area, having worked on a simulation of part of the Canadian air traffic control system in Ottawa a while back, but by putting actual source code up on the web I hope to minimize the dependence of the model on my own personal expertise.
To do this, I will need some data, and I hope Jay Hanson and others will be willing to provide the data underlying his own conclusions.
I see this as a worthwhile project, consistent with my often stated preference for algorithms over arguments, and it is also a good example of social technology in action.
I want to do this myself, but I am willing to accept help, and I am most interested in help collecting data. If I get enough help I am quite willing to surrender all claims to this project to some interested group of people such as the members of the futurework mailing list, and will gladly let someone else take over and run the project if the other people involved have no serious objections at the time.
I would like someone more reliable than me to keep an archive of whatever is produced, in case I get run over by a bus, and I’d prefer it to be made available by FTP to anyone who wants it.
This is not a project that I’d undertake for my personal amusement. I like to think we can do this well enough that some people will have take notice, at which point it may have some small effect on decision makers. But who knows? It’s worth a try, and I want to try it. If anyone wants to help, please do!
A Freely Available Simulation
I’d like to use only public domain or freely redistributable math libraries so the whole project can be freely redistributed under the GNU public license. I think it important to have a good simulation that is freely available to anyone who wants it, with full source code and all available data.
Use of Very Common Programming Language
I think the target language should be plain-vanilla ANSI C with as few preprocessor directives as possible, and it should compile with the GNU gcc compiler (djgpp on DOS or Windows systems) using -Wall to enable all warnings — but without any warnings produced. Having just said that, I will probably do some rapid-prototyping using Pascal which is easier and supports array bounds checking, then translate it into C using the p2c translator at some point. I’d like to use a real programming language, a good one, but unfortunately none has ever been written!
(Though I could say a few nice things about ML or Python, which most people have never heard of.)
I believe in the methodology of systems engineering, and so I think the first step in this project must be requirements analysis, followed by design, and only then can code be written — except for a small amount of rapid-prototyping as proof of concept. I think this methodology is important because it helps catch mistakes early on, and because mistakes caught early are much easier and cheaper to fix.
I’m still working on the requirements analysis, but I have a few preliminary design ideas in mind. I’d like to make it entirely database driven, for several reasons, not least of which is my feeling that this should really be a big project and will need to grow a lot once implemented.
As I see it, the programs which actually run the simulation are in some ways the least important part of it, and should take the least work. The hard part is collecting good data and making it available in some standard form. I envision the collection of data being expandable almost endlessly, so that the simulation that might be up and running in a few months could be expanded with several man-years more work by simply collecting more data.
It is my intention that the simulation will model arbitrarily small units, which could eventually be individuals, but to start out with the units will probably be quite large, such as “third-world teenagers” or “literate women of childbearing age”. Exactly what units are used will be defined in the database.
I’d like to make it possible to run simulations that are equivalent some well-known models, so we can take apart the existing ones and see just where and why they don’t work.
As I see it now, the database should include a number of named simulations, each one of which could be loaded and run, modified and saved under a new name, and printed out with graphs of results.
I would also like to try defining an algebra or space of models (both, really, I think both algebraic and topological structure needs to be represented), so that one could do various algebraic operations on models — combining two somewhat inadequate models that are significantly different into a single model that is better than either of them.
Modelling Political and Economic Views
I would definitely like to model all real world processes to some extent, and much as it may seem like ivory-tower poppycock, various political and economic views are part of the real world — they influence decision makers and can be reflected in interest rates and other factors.
We should therefore incorporate a number of data records reflecting political and economic views. The influence of a particular viewpoint is questionable and will have to be guessed at, to start with, and some models will simply have zeroes to indicate no influence whatsoever. To choose an example at random, we might have a “Jay” entry whose fields represent the influence of Jay Hanson’s views on the world economy as a whole. You are at liberty to provide your own estimates as to the values of those fields!
And so, for completeness we ought to include an entry for the effect of this simulation on the world’s economy as a whole. You are also at liberty to provide your own estimates as to the values of the fields in that entry. I like to think that more of the world’s decision makers would trust the results of our simulations than would trust the speculations of any single person, but then again, many of those people are politicians, not a group known for their enlightenment.
It may seem that the simulation would have difficulty capturing its own effects, but this is simply a matter of recursion (or iteration), and not really a problem. One version of the simulation could be run with the fields representing its own effects on the world economy set to zero — most certainly the right initial conditions. Another version may have some higher values in those fields, and a more sophisticated version may have second and higher order effects represented so that the effects start out at zero and change with time.
I wouldn’t be doing this at all if I thought the simulations would have no effect on anybody, ever.
Dealing with Unpredictable Influences
There are sources of unpredictable values that will cause problems for any simulation. We can not yet predict the weather more than a few days in advance, and yet weather can have dramatic effects on the world economy. Political changes are also hard to predict, and so are the results of human innovation.
But the usual way to handle values that cannot be predicted is to use random number generation to supply such data and then rerun the model many times, and I think that should work OK. The basic idea is to use what are sometimes called Monte Carlo methods to deal with anything that cannot be accurately modelled because of unpredictables like the weather, and to therefore discover the common features that are independent of the unpredictables by re-running the model with different random numbers, and averaging the results.
Design Ideas and Requirements Analysis
I’ve now done some tentative impure requirements analysis for a simulation of the world economy. I say impure because I can’t get design ideas out of my head. It is probably best to admit this up front and also to hint broadly at what these design ideas are.
Currently I envision this project as implemented by a relatively simple program and a much more elaborate database containing a lot of carefully selected data. By ‘database’ I mean a collection of named files, probably bound together by a common base name and distinguished by distinctive extensions.
The system that I envision is something like a spreadsheet that loops through a series of records making changes by applying certain rules which are also stored in these records, and indeed could be implemented as a spreadsheet, though I probably won’t implement it as one. If you are in doubt about what I have in mind, try thinking of this as a spreadsheet and you probably won’t be too far wrong.
Indeed, use of a spreadsheet for rapid-prototyping may be wise. People comfortable with spreadsheets may want to play around with this a bit, to see what is possible.
Some people who use spreadsheets for ordinary business applications may be unused to the idea of using a spreadsheet as a programming language for writing programs that can simply go on and run by themselves.
Usually people set up something on a spreadsheet and then change various numbers to see what will happen — the result being a change from one static state based on the old numbers to another static state based on the new ones. That is, of course, their most common use.
But it is quite possible to use a spreadsheet as a programming languages for dynamical programs that have a continuous loop, which runs until interrupted (or until a stable equilibrium state is reached).
When working this way, a spreadsheet loops though all fields (columns) of all records (rows) and does whatever updates are necessary as they are encountered, but no more than once per loop. That is not behaviour that a large scale economic model should use, because it would involve updating some records too often.
What I have in mind could be thought of as a spreadsheet in which each record has a first field which is countdown timer, changed once per iteration, and all other data fields in the record have an implicit “and timer=0” condition, so they are only updated when the timer counts down to zero — the timer itself resetting by coping its initial value from another field the iteration after it reaches zero.
As I envision it, the program loops though a series of records in a database, reading and updating them according to “instructions” also contained in that database. By ‘instructions’ I don’t mean computer programs but just important data values and limits, or symbolic expressions, such as could be represented in any cell of a spreadsheet.
As I see it, a few of these instruction are hardly ever changed, and in fact cannot be changed without going through several steps of supplying appropriate passwords. Included in this list of special instructions are flags or values stating which instructions fall into this class, and which may be more easily modified. Some other instructions are (therefore) more easily modified but can only be changed by the operator, and a few more instructions can be modified by a running program.
Broadly speaking I think there will be two main classes of records:
- Population-geographic units, representing a region of the earth or a population-cohort (or, ultimately, a single individual)
- Information-estimation units, which represent a fact or supposed fact, piece of information or estimate. What I have called “instructions”, values or symbolic expressions central to the operation of this model fall into this class.
Tentative List of Requirements
Below I have a list of preliminary requirements, and after some of them a design or implementation hint based on this quasi-spreadsheet model. After a few of these requirement statements I include a design note in parentheses.
- The proposed simulator shall be an open system, available to anyone who wants it at no cost, in both source code and in binary format, with documentation, and written in a very popular programming language so it can easily be installed on almost any system.
- The simulator shall be data-driven without any hardwired or hardcoded algorithms other than those of a fundamental mechanism for reading and updating entries in a database according to rules or instructions also represented as entries in a database. The term database in this requirement shall not be interpreted as implying any specific type of storage or retrieval system, but shall not include anything that involves changes to the source code of the simulator.(Symbolic expressions in cells of the (quasi-)spreadsheet would be considered data, and could be modified, but in general would be modified rarely and not normally by the operations of the program.)
- The simulator shall be capable of generating, saving, copying, renaming, deleting, and reloading and running any one of several simulation-datasets and shall have a convenient mechanism for allowing the operator to choose from amongst a collection of named simulation-datasets kept in mass storage.
- The simulator shall be capable of simulating the economic activity of arbitrarily large geographic regions up to the entire planet (solar system? !) and arbitrarily small regions down to the size of a human being, except as limited by disk space or other mass storage limitations. For the purposes of this requirement economic activity shall be interpreted as including all exchanges of commodities, money, and other legal instruments of exchange including electronic transactions; also all human activities that directly and significantly effect such exchanges, including physical and mental labour; communication by request, pronouncements of opinion, or exchange of information; also human sexual activity and reproduction, and also all similar activities carried out by machines.(A record or spreadsheet row could represent a large geographic region, and in simple models the whole earth, or could represent a much smaller region like a village. It could represent a large group of people, like all males, or all young males, or young males of an ethnic group, or ultimately a single person.)
- The simulator shall be capable of simulating lengths of time up to the approximate age of the human species, and down to a lower limit dependent on the processing speed of the host machine, but at least as small as one second.(How long a length of time is represented by the values in one row of the quasi-spreadsheet would be encoded in the initial values for the countdown timer, and would be updated from all other relevant records each time the counter reached zero.)
- The simulator shall have no built-in limitations on the granularity or level-of-detail of a simulation other than imposed by machine disk space or other mass storage limitations.
- The same granularity or level-of-detail shall not be imposed on all subsets of a simulation-dataset, some of which may be more or less detailed than others.(Region or person granularity would depend only on how many rows and how much space or number of persons each row represents. Time granularity is represented by the initial values placed in the countdown timers and can be changed by changing the values in the (semi-constant) fields used to reinitialize the timers.)
- The simulator shall be capable of cooperative sharing of data and processing with other simulations runing on the same machine or on other machines linked to it through a network or internet.(It should be possible to include in some “cells” of the “spreadsheet” a URL representing a file to obtained by FTP, or a procedure to be run by remote procedure call.)
- Cooperation with other simulations shall be coordinated or mediated by very popular protocols, including at least the FTP protocol for file sharing, and the details of this cooperative activity shall be specified in a publicly available protocol document that shall not restrict cooperation to simulations of the same type nor using the same simulator program.
- The simulator shall include capabilities for monitoring using video display or printout, or both, with numerical and graphical representation or both.
- The simulator shall include capabilities of halting a running simulation, manually entering changes, and resuming.
- The simulator shall include capabilities for rolling back a simulation to an earlier state and restarting from that point.
- The simulator shall include a capability for causing a simulation to fork or divide into two simulations that differ only in certain fields or entries modified by the operator and can be saved separately under different names.
- The simulator shall include a capability for changing the granularity or level of detail of a simulation on the individual database record level.(Space or person-count granularity could be changed by splitting rows — duplicating them then making necessary changes to reduce region size or person counts in each new row.)
- The simulator shall include a capability for changing the granularity or level of detail of a simulation globally to increase or decrease the granularity in space or time or both of each database record, up or down to the next unit, except where disk space or processing speed limitations make this impossible.(It should be possible to perform a single operation to make large changes in time-scale over a wide set of rows by simultaneously changing from daily update to weekly update, or annual update to monthly. Similarly it should be possible to make large scale changes in regional or cohort data by splitting country records into province records or province records into township records — etc.)
- The simulator shall include capabilities for printing out all data in a simulation or summaries of this data, or both, and shall also include capabilities for plotting data or summaries in graphical form.
- The simulator shall include capabilities for logging all activity and printing these logs on operator request.
- The simulator shall include capabilities for entering, saving, appending to, and printing bug and problem reports, but shall not allow such reports to be edited or deleted by anyone except the root or supervisor responsible for the system.
- The simulator shall include capabilities for downloading simulation, logging, and problem reports from another by requesting it via network on operator request or according to an operator-prepared schedule, and shall also include the corresponding capabilities to upload this information to another machine over the network if authorized to do so.
This is only a first draft, incomplete and very crude in places, but it should suggest the general idea. I’d like anyone interested in this simulation project to look it over and make comments, preferably to the mailing list as a whole — so that it will stimulate more discussion, or to me personally if you prefer. I will try to save all comments so that at some point they can be archived together.
Quick Summary of Requirements
The requirements given above are just rough drafts, and already have probably proven just how rough they are by confusing people.
Essentially, they boil down to this: I don’t want to write some oversimplification, I want to write something that can be used for state-of-the-art models — something without built-in limitations, that can be incrementally improved by adding better data. And I’d like to let them incorporate better data automatically by obtaining it from remote sites, so that the resulting simulation can be very large and powerful without requiring any of us to be experts.
This should make it possible for a simulation based on this software to be a distributed simulation, making use of results obtained over the net. If someone was to use this software (or some other) to create a good simulation of the Canadian, U.S., or British economy, rather than doing it over again or manually downloading data from that other simulation, it should be possible for the simulator running the whole-world simulation to acquire or update it over the net on a regular basis.
The Importance of a Good Free Simulation
Some people have expressed concern that simulations could help some people make a lot of money. Obviously a good simulation could be used by people to make money, by suggesting advantages transactions, but this was one reason that I wrote about coding the simulator in the most widely available programming language and making it freely available under the GNU public license.
I want a good simulation widely used for the public good, and I suspect that other good models that have been kept for the personal gain of the few have been a destabilizing influence behind some that massive flow of cash around the planet — currency speculation, and related ways of profiting from good guesses.
I want to produce a good enough simulation that it will be widely used, and I want to make it freely available in source code format so that it will not only be used but will be maintained and improved.
The really good free open-source software products like Linux, GNU Emacs, and the GNU gcc compiler are much, much better than they would be if they were only the product of one person (or even one development team) — they are the result of large numbers of people cooperating to collectively produce something useful to all of them.
So, that’s the project. My thanks to all those people on the futurework mailing list who have joined in the discussion and contributed ideas to this nascent project, and I look forward to more.
Please give me your comments!
Copyright © 1998 Douglas P. Wilson