More on audit and "media"
This started as notes to myself in preparation for a review tcon tomorrow about adding clipboard manager export/import events to the DICOM Audit log. It may be worth sharing.
The use of the term "media" has become a problem for the audit trail. It was a fairly clear concept when this work started (in the late 1990's). At that time the use of media was extremely common in radiology, and the meaning was clear.
Media was a physical device that held images in a semi-permanent storage medium. The earliest old examples were "paper" and "film". It's true that these media could be copied or destroyed, but they were to a large degree permanent in a medical records context. They were also usually had some sort of label identifying the images or report. Unlabeled media was usually a violation of medical records policy. A site that gathered an audit log but didn't follow normal policy might exist, and but it would be a fringe case that violated normal rules for the audit log.
New media, initially defined by DICOM, were the various forms of removable disks. Magneto-optical disks were extremely popular in ultrasound, and CD-Rs (later DVDs) were very popular for shipping medical records between sites. Millions of disks were generated and shipped annually.
As new media types were added and technology changed the definitions got stretched. Initially the extensions were modest variations. A network virtual disk does not have a physical instantiation. But it could still be described as if it were a physical disk. That was the basic concept behind calling it a "virtual disk".
This got stretched further with the addition of ZIP files and email messages. These lack physical instantiation and they are only vaguely disk-like. Still, they were identifiable by name or label.
There was the issue of what to do with network services. The initial dominant network service was the DICOM Archive. This was not considered a media. It was described as a network service, with a network address, etc. DICOM didn't explicitly tackle network services like database servers, but the same basic rules used for a DICOM server applied to the other kinds of servers. The network address, application service description, etc. can be used to identify these servers.
Most recently these stretched definitions reached a breaking point. The proximate cause is the addition of:
- clipboard services
- web email services
It started with clipboard services. Shortly thereafter it became clear that the existing email description was not appropriate for web email services. A review of all the other existing extended and stretched media types revealed increasing confusion about definitions. The wide variety of web resources didn't fit easily into the original or stretched concepts.
Feedback and exploration about what does "media" mean is ongoing. How should we define and describe the concept?
This will be discussed and reviewed again tomorrow to see if we have arrived at a suitable description. Where we are at the moment is:
- The "export" message is still used to describe data transfers across a security boundary. The old assumption that security boundaries correspond to device boundaries no longer applies. In the example of a clipboard manager the service boundary is between the secured application and the window/display manager complex. The window/display manager may be secured, but it is a different security domain. The sending/imported medical application knows that information went across the boundary to a clipboard manager. It does not have any virtual analog to a physical media. The information is not attached to any persistent identification or contents. The clipboard manager is like an application service, but it lacks all the identification that would be available for a database server or DICOM server.
- Web email does not (necessarily) comply with the Internet Message standard. The Internet Message is the standard format used for email exchange by SMTP, and for a variety of other less common protocols. The most important difference is that web email might not provide a Message-ID. Instead of having a Message-ID all you get is the possible destination or source address. There is no way to identify one message from another to or from the same address. So the virtual mapping of concepts from a physical media mail to a web email is impossible. With the old email message there was the virtual mapping from the email with Message-ID to a pile of paper with the text contents of the email and headers, identified by the Message-ID on the paper.
The current discussion suggestion is to split media types into two categories:
- "Identifiable" media that has a label, message-id, or something that identifies the specific contents of this one export/import. This does not mean that there is only one copy of the media, or that it is immutable. That's especially significant for email messages, where the transmission and modification history of the messages on their path from the sending user mail agent to the receiving user mail agent often involves making many copies and having intermediate copies in a variety of caches, mailboxes, and mail queues. When examining an email related situation this tracking by message-ids is crucial.
- "Non-identified" media, which does not map onto any clear virtual physical device. The clipboard manager is the proximate example of this. Most clipboard managers do not generate an identifier for the information cut or the information pasted. From the perspective of an investigator the audit message says "this information was given to that service." There is no further information about this particular data export/import.
We'll see whether those words survive. There is another review tomorrow and we may find a better way to describe this situation. I will be stubborn about recognizing and not solving the problem of other new forms of "non-identified" media. There are obvious examples of "what about Facebook? what about SMS/RMS? what about other chat systems? ..."
My answer for those at present is "you figure out what information can be captured into the log and we'll see what changes are needed." A very quick look at Facebook, SMS/RMS, and other chat services indicates that each one will require some serious investigation and effort. We shouldn't start until there are some participants with a genuine need who can invest the time needed to examine the details of that system. For now, they're all just more kinds of "non-identified" media.
Maybe we will come up with a new term that makes more sense that non-identified media, and rewrite the descriptions to fit.