Why does DICOM audit consider email a media type? (Part 1)

How should this change (20 years later)? (Part 2)

Jun 26, 2023

The decision to call email “media” was made to meet the needs of security audits and the structure of email systems in 2004. The goal has not changed dramatically, but the context is changed. The structure of email systems has changed as technology changes. In 2004 Web 2.0 had just gotten its name. The first Web 2.0 conference was held in 2004. In 2023, we’re past Web 3.0 and into the universality of Internet messaging systems. (For example, Google’s gmail was first opened to the public in 2009. It’s now radically different than that initial email client.)

The goal has not changed significantly over time. The intention is to capture a description of the import and export of medical information by means of email. Import means that the information is being brought inside the security domain that is subject to audit. Export means the information is leaving the security domain. Back then identifying the boundary of security domains was fairly easy. The security domain was usually the same as the enterprise administrative boundaries. Now it’s less obvious.

At that time there were three major models of email architecture:

The large internal enterprise email, where each internal user had an email client that communicates with an internal email server. When the email was destined for an external address the internal email server would communicate with external email servers using SMTP. External email would be delivered to the internal email server using SMTP. In this model the logging application is on the email client, and it has limited access to the details of the SMTP transfers.
The small enterprise email, where each user has an email client the communicates with an external email server usually provided by the Internet Service Provider. When the client sends an email it communicates directly with the external email server using SMTP (or equivalent). When the client wants to receive email it communicates with the external email server using POP, IMAP, or equivalent. These email clients will log the activity and will have access to more information than the clients of large internal enterprise email servers.
The AOL (and its competitors) model provided a webmail client that was proprietary Javascript and transfer information. You might find customers of AOL that used a local email client and the same SMTP, POP, and IMAP interfaces as a small enterprise email. Those could also be modeled as type 2 above.

The question then becomes “what is an email?”. Fortunately, all three models agreed on the logical model of “an email is a message”, which was composed of

A header, typically dynamic and mostly used to record email transfer related information.
A body, static, which would contain':
- A plain text message (required), and
- Attachments (optional) which could be structured text or binary file contents

They also agreed and used terminology where the message would have:

A message-ID, a unique identifier for this message, and
A destination. The destination is dynamic. A client could send a message to list@host from the client. Intermediate machines might make copies of the message and send them to list-member1@host1, list-member2@host2, and so forth.

So in one sense the message is like a virtual media in that it has a unique message-ID and contents that should not change. It is like media in that it can be copied, but unlike media in that the copies all have the same message-ID. It is like and unlike media in that in the small enterprise model there are probably four copies of the message that are “at rest” in storage somewhere:

One copy in the “sent” mailbox on the client
One copy in the “sent” mailbox on the sending server
One copy in the “incoming” mailbox on the receiving server
One copy in the “incoming” mailbox on the receiving client
Other copies might be kept in other mailboxes on either client
There may be short lived copies while the email is in transit on intermediate systems, but these are usually removed after successful delivery

There will also be copies “in motion” while being transferred between clients and servers.

From the perspective of post-facto analysis of logs and correlating the DICOM log with other logs generated by email clients and servers, it makes sense to identify the message and not the endpoints. The message-ID is unique and can be used to identify messages found in mailboxes all through the system. The message-ID is usually used to identify messages within the internal logs of various servers and clients. The sending client probably knows only the endpoint of the sending email server and does not know the subsequent endpoints. The sending client knows the message-ID. Similarly, the receiving client probably knows only the endpoint of the receiving server. It will know the message-ID of the message and the message-ID will be the same unique message-ID that was known by the sending email client.

So, although it sometimes has the characteristics of data in flight, the email message also has characteristics of permanent media. It was lumped into media because most of the characteristics of data in flight would be endpoint characteristics of the local email servers and would not be useful to match sender logs with receiver logs. The message-ID can be like the other media IDs, and the mailboxes are similar to media structures. (In practice the mailboxes were either databases, structured files, or files and directory structures.)

Next, how the world has changed, and the question of how should audit be changed to reflect that.

rjh’s Substack

Discussion about this post