Web Site and Code Management Tools

This is a brief analysis and a somewhat longer design for a tool which will help organize text, software, and web page components such as HTML code, scipts, downable source and binary archives, images, audio, video, databases, and anything else that could be put together as a modern website. It owes something to the familar make utility used for software, and something to the website content management systems used to create and maintain large websites.

A prototype is already under development and should be available soon since it makes use of a lot of existing code. This code and whatever is added to it will be made available as open source software on the web, and it seems like a very good idea to get this project written up and installed on even before the prototype is finished.

It is worth mentioning that the motive for this project was the rapidly growing website for that is being put together by Doug Wilson. He also has a personal website that is just as badly in need of organization, and large amounts of software source code and ordinary plain text, written over a couple of decades as part of a vast project that is only now becoming visible on the web.

  • collect and organize text, to save what is still of value from work done over many years, segregating the rest, and backing everything up, then setting up a system so that new text will be integrated with the old
  • interact with a mailer, so message written to various people in the past can be made more accessible in the future, and so the text in them can be extracted for use elsewhere
  • collect and organize software, again to save what is still of value from work done over many years, again segregating the junk and backing everything up, then setting up a system to reorganize this material
  • collecting best and earlier versions of programs, removing old or inferior subroutines and arranging for new ones to be used in their place, and so on, so that work done in the past is made the best use of and work done in the future will be easier
  • (the original motive for this software project) to be a website creation and maintenance system for putting all old material than anybody might be interested in out on the web for all to access, making it as presentable as possible, making it easier to maintain, and providing an easy way to add new material as it is developed

a website creation and maintenance system that will be object oriented with a class hierarchy. Pages similar to one another can all be created in a uniform way by including material from a parent or superclass. As is often the case, the key design idea arrived early on, before the requirements analysis was properly done. As design went on some requirements that should be considered evolved. It is probably better to address those here, first, before going into the design details.

This program or system has been suddenly envisioned as a way to simplify the creation and maintenance of web pages, but it should be observed that these web pages include a lot of material that was originally written for a book. It should still be possible to have it as a book, and this piece of software should encourage keeping that option open.

Several scenarios are envisioned for the working system. The most basic one is a system for putting a lot of text up on the web with very little HTML work. The text will simply be collected from various places then dispersed into new locations appropriate for this piece of software. The possibility of keeping it where it is and having it pointed at instead of moved should be kept open.

A bit of design intrudes here. The eventual organization of material is intended to be a tree of nested directories. It should be possible to have this simulated by pointers or links to existing material in existing directories, which, however can be moved into the newer structure when necessary.

further thoughts …

This system makes a lot of directories, since its object-oriented features depend on the existence of a directory hierarchy with one directory for each webpage to be created and maintained. In addition, node of the directory tree higher than the leaves of the tree will each require a separate directory. A website containing 100 web pages may therefore require a directory tree containing somewhere from 110 to 150 nested directories.

However, these directories will contain few files and not take up much space on the disk — by eliminating the need for multiple copies of common components of the pages the total amount of diskspace used may well be considerably less than if the pages were stored individually in their final form.

To build a page, the program goes into the directory dedicated to that page and looks for a makefile. If there is a make file, it is used and as a result the contents of that directory may grow by one or more file. Most directories will not contain make files, since this system is not based on such an approach and doesn’t need it. But for extra flexibility it seems a good idea to allow for a file creation step just before the assembly of components into a web page.

If there is a file creation step, in many case that will simply be the execution of a text-to-html program to make the body of an HTML page from a text file. If the directory contain no HTML body file, nor makefile to create one, but does contain an ordinary text file with some appropriate extension, that file will be transformed into HTML by applying whatever text-to-html program is available or is specified somewhere.

After the file creation step has been performed or bypassed, depending on the presence of a makefile, the program then goes into inheritance mode where missing files are replaced with a reference (not a copy) to the corresponding file in the parent directory. Since this directory is probably one intended to contain some real content, the central file that is most likely present is the file containing the actual body of the (normally HTML) document, this directory probably will contain the body of an HTML document, but it may not contain anything else. Many leaves of the directory tree will inherit the other (header, footer, etc.) from the parent node.

The first step in building a page from its directory is to find the components file which lists the components of the page and the order in which to assemble them. A components file might contain simply this:


in which case the program will concatenate a header file, a body file, and a footer file to produce the page. If any of the files specified in the components file does not exist, the program will seek a file of that name in the parent directory. If it is not found in the parent directory, it will seek it higher up the directory tree. There should be at least one file of each name in the root directory for the website, but if not the program can create one — but only after warning and prompting the user.

The components file itself might be missing from a webpage’s directory, but again that can be provided by searching up the directory tree.

The system comes with an initialization program that creates a directory tree from an indented plain text file. Typically this file is created by the user by hand using a text editor, and it might look something like this:

root navigate index sitemap content forms text overview links

Of the 8 directories listed here, three of them, root, navigate, and content probably will not correspond to web pages, but are simply classes. Under navigate, index and sitemap are probably directories for actual pages, but under content the three directories forms, text, and content are probably classes, with only overview corresponding to an actual web page.

The system could generate the pages in any order, but to include the possibility of having a simple linear order from first to last, like the chapters of a book, an ordering system will be used. Since this program is much more general than it need be, it could actually be used for assembling the chapters of a book. Therefore it should use a depth-first left-to-right traversal of the directory tree. The possibility of starting each node of the tree by a breadth first collection of summaries to make a hierarchical table of contents should be considered as well.

It should be possible to run the program on the same directory with different options selected and generate a website in one case and a book in another.

and still further thoughts …

The prototype will probably just put things in an actual directory hierarchy, but really the actual directory structure, as known to the operating system, should not be a necessary part of this, and should not necessarily be visible to the higher level parts of the software.

In particular, it is well known that the hierarchy is not a very natural form of organization and that the same material could just as well be organized in many different hierarchies. That should be kept as an open possibility. As a design note, perhaps the descent of a parent into a child node should be accompanied by some kind of key which tells the program or component working at the child level which particular hierarchy this particular assembling operation is workinb from.

Linear ordering as TSP should be noted and provided for, as well as CASA like numerical analysis and combinatorial optimization of many kinds. Graphics for visualization should be added.

But this extra functionality should be added, not built into the very first proto-proto-prototype. So write a bit of crud-ugly crude code and try it out.

… and that’s it — more work on this to come soon

Copyright © 2000 Douglas P. Wilson

This entry was posted in Old Pages. Bookmark the permalink.

Leave a Reply