Next: SUMMARY AND PARTING WORDS Up: ASSORTED REACTIONS Previous: An alternative

DISTRIBUTED SYSTEM CODE

The design of the outer layers of CAL TSS (disk/directory and command processor) depended heavily on distributed system code. I feel this was a mistake, and that there was a better alternative. In this thesis I have used the phrase ``distributed system code'' to describe the following idea: global data (which represents the state of the virtual machine) that will be manipulated by code which runs in protected domains (subprocesses) within a user process; several instances of this code may, in principle, run simultaneously. This idea has been around for a long time, and a detailed account of its use may be seen in Saltzer [S1]. It is difficult to argue persuasively for distributed system code. Saltzer gives the argument that distributed system code makes it easier to provide a different appearing system for each user. An individual user's process would contain that version of the system code which he desired. (I would view this as providing different sets of virtual instructions.) In practice it is difficult to take advantage of this idea. If this portion of the system code is in fact part of a distributed system and manipulates global system data, then it is sensitive system code. Hence it must be checked out by whatever painstaking methods are used for all system code. Consequently it is unlikely that two versions of a portion of the system would be completed. A major problem introduced by constructing a distributed system is that of interlocks. Since there are many representatives of the system attempting to read and modify some global data, they must avoid interfering with each other. (For example, while one process is reading, adding one, and rewriting a count, another process may attempt the same action.) At the worst, if the interlocks are designed carelessly one may be confronted with a 'deadly-embrace'. Examples of the difficulties one can get into are provided by the disk system, our major attempt at a distributed system. The majority of the CPU time consumed by the disk system was in calls on the ECS system. We have evidence that about one half of this time was spent on ECS file read and write actions, with the other half going to event channel actions. At least half of these event channel actions must have been involved with interlocks. Thus on the order of 25 CPU time spent by the disk system was involved with interlocks. As an extreme example, at one point in the development of the system, we discovered that while one disk system representative had an item of data locked, two others could get into a loop asking each other for permission to use the locked data. This was, of course, fixed, but demonstrates that much care must be taken. I believe that we would have been better off to avoid distributed system code, at least in as complicated a form as the disk system. (The proposal made in the preceding chapter would have relieved us from providing the disk system.) Necessary global data bases could be manipulated by dedicated processes, which receive coded instructions from event channels. Any interlocking of modifications to the data base would then be purely internal to the process, and thus simpler to achieve.

Next: SUMMARY AND PARTING WORDS Up: ASSORTED REACTIONS Previous: An alternative

Paul McJones
1998-06-22