Next: SUMMARY AND PARTING WORDS
Up: ASSORTED REACTIONS
Previous: An alternative
The design of the outer layers of CAL TSS (disk/directory and command
processor) depended heavily on distributed system code. I feel this
was a mistake, and that there was a better alternative.
In this thesis I have used the phrase ``distributed system code'' to
describe the following idea: global data (which represents the state
of the virtual machine) that will be manipulated by code which runs in
protected domains (subprocesses) within a user process; several
instances of this code may, in principle, run simultaneously. This
idea has been around for a long time, and a detailed account of its
use may be seen in Saltzer [S1].
It is difficult to argue persuasively for distributed system code.
Saltzer gives the argument that distributed system code makes it
easier to provide a different appearing system for each user. An
individual user's process would contain that version of the system
code which he desired. (I would view this as providing different sets
of virtual instructions.)
In practice it is difficult to take advantage of this idea. If this
portion of the system code is in fact part of a distributed system and
manipulates global system data, then it is sensitive system code.
Hence it must be checked out by whatever painstaking methods are used
for all system code. Consequently it is unlikely that two versions of
a portion of the system would be completed.
A major problem introduced by constructing a distributed system is
that of interlocks. Since there are many representatives of the system
attempting to read and modify some global data, they must avoid
interfering with each other. (For example, while one process is
reading, adding one, and rewriting a count, another process may
attempt the same action.) At the worst, if the interlocks are designed
carelessly one may be confronted with a 'deadly-embrace'.
Examples of the difficulties one can get into are provided by the disk
system, our major attempt at a distributed system. The majority of the
CPU time consumed by the disk system was in calls on the ECS system.
We have evidence that about one half of this time was spent on ECS
file read and write actions, with the other half going to event
channel actions. At least half of these event channel actions must
have been involved with interlocks. Thus on the order of 25
CPU time spent by the disk system was involved with interlocks.
As an extreme example, at one point in the development of the system,
we discovered that while one disk system representative had an item of
data locked, two others could get into a loop asking each other for
permission to use the locked data. This was, of course, fixed, but
demonstrates that much care must be taken.
I believe that we would have been better off to avoid distributed
system code, at least in as complicated a form as the disk system.
(The proposal made in the preceding chapter would have relieved us
from providing the disk system.) Necessary global data bases could be
manipulated by dedicated processes, which receive coded instructions
from event channels. Any interlocking of modifications to the data
base would then be purely internal to the process, and thus simpler to
achieve.
Next: SUMMARY AND PARTING WORDS
Up: ASSORTED REACTIONS
Previous: An alternative
Paul McJones
1998-06-22