This section will be updated and edited based on feedback and future decisions.
The DICOM standard doesn’t have a good place to save the rationale behind decisions. This is because a standard must remain usable for the long term and primarily needs to capture the decisions made. The process of making those decisions is important but not part of the standard. So where should this go? Should it remain as part of my substack? Is there somewhere officially associated with DICOM to capture it? Should it be a really big rationale section on the CP? That remains to be seen.
The decisions made will be captured as changes to the standard. That remains to be written and will be written as part of the DICOM process. These sections are part of the rationale.
Desired uses for the export and import audit messages for email, i.e., goals
Describe an email export event from the point of view (POV) of the exporter
Describe an email import event from the POV of the importer
Find or recognize copies of the email message that are in mailboxes or stored in files.
Correlate DICOM applications logs with the system logs from email and other system services
Correlate DICOM import and export events on different systems
Track copies of emails in other system logs, e.g., backups, file transfers, etc. This might be:
Authorized activities like backups
Unauthorized activities like data breach exfiltration
Track back from email messages found in breach data to the original source of the disclosure.
Information available about the email message
The information available about an email message that remain uniformly understood and defined is:
Message-ID, the unique ID of the email message. This should be the same for all copies of the message, but in the real world there are systems that replace or remove the Message-ID. That reduces the value of the Message-ID, but it remains the primary key into email system logs.
Destination identifier (e.g., mailto:). This corresponds to the mailto: identifier. It may be multiple email addresses. It may include aliases or distribution lists. It may include To:, CC:, and BCC: elements. It may be changed while a message is in transit.
Source identifier (e.g., from). This is provided to the recipient and identifies some aspect of the source. It is not necessarily the exporter. Mailing lists routinely replace the source. Some senders use aliases. Some recipients use aliases. Malware often falsifies the source information. Intermediate email systems may modify the source. Nonetheless, it may be useful.
This information is not always available to the audit source, but when available it can be captured and have the same meaning for any of the different models of email system.
Proposed changes
The DICOM code (110031, DCM, Email) is currently defined as
Email and email attachments used as a media for data
transport.
This was intended to describe what the RFCs called a “message”. Consider changing this to be “Email message” rather than “Email”, clarify wording to capture the equivalence to the RFCs use of message, and refer to RFCs.
Clarify in the import and export log event reports that when the media is “Email message” that one of the identifiers in the report must be the Message-ID, and that “Email message” should not be used if Message-ID is not available.
Add an “Application” code (xxxxx, DCM, Mailto:) to be used when Message-ID is not available. If this is an export, identify the mailto: destination. If this is an import, identify the from address. This is used to describe sending a message into a system that does not provide Message-ID tracking information. This needs a good description. It should probably encompass all sorts of messaging applications, not just email. Systems like Twitter, Simple Messaging System (SMS), Rich Communication Services (RCS), and Signal are also messaging systems that provide a destination but not individual message identification or tracking.
Write a section explaining or referring to RFC for messages and Message-ID. When Message-ID is available, use the media type Email message. Then explain that Message-ID information is not always available. When it’s messing use Mailto: information on the export log, and use “from” information on the import log.
Explain that there may well be an asymmetry of export and import audit log messages. Many exporting systems do not provide a Message-ID. Most importing systems do have the Message-ID. It is highly desirable that import record both the Message-ID and the from, even though there may be no mapping to the mailto: at the exporter. Should we extend the import description to also have a parsing of the message header to extract guesses of the mailto: at the exporter?
Are these examples of SMTP logs and email message header contents useful?
Example SMTP logs:
These logs show the logs and search results related to Message-ID: <20230621040005.D625F733B5E@localhost>
Example of the export SMTP log
2023-06-21T00:00:05.881934-04:00 proxy postfix/cleanup[26668]: D625F733B5E: message-id=<20230621040005.D625F733B5E@localhost>
2023-06-21T00:00:05.890894-04:00 proxy postfix/qmgr[2380]: D625F733B5E: from=<root@localhost>, size=8034, nrcpt=1 (queue active)
2023-06-21T00:00:05.896418-04:00 proxy postfix/cleanup[26668]: DAC09733B5F: message-id=<20230621040005.D625F733B5E@localhost>
2023-06-21T00:00:05.906090-04:00 proxy postfix/local[26695]: D625F733B5E: to=<root@localhost>, relay=local, delay=0.04, delays=0.03/0/0/0.01, dsn=2.0.0, status=sent (forwarded as DAC09733B5F)
2023-06-21T00:00:05.906160-04:00 proxy postfix/qmgr[2380]: DAC09733B5F: from=<root@localhost>, size=8154, nrcpt=1 (queue active)
2023-06-21T00:00:05.906412-04:00 proxy postfix/qmgr[2380]: D625F733B5E: removed
From the import SMTP log.
2023-06-21T00:00:05.927526-04:00 quad postfix/smtpd[24590]: connect from proxy[192.168.3.10]
2023-06-21T00:00:05.928898-04:00 quad postfix/smtpd[24590]: E2C0E59B50: client=proxy[192.168.3.10]
2023-06-21T00:00:05.929388-04:00 quad postfix/cleanup[24579]: E2C0E59B50: message-id=<20230621040005.D625F733B5E@localhost>
2023-06-21T00:00:05.935059-04:00 quad postfix/smtpd[24590]: disconnect from proxy[192.168.3.10] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
2023-06-21T00:00:05.935335-04:00 quad postfix/qmgr[3248]: E2C0E59B50: from=<root@localhost>, size=8307, nrcpt=1 (queue active)
2023-06-21T00:00:05.940228-04:00 quad postfix/local[24581]: E2C0E59B50: to=<hornrj@quad>, relay=local, delay=0.01, delays=0.01/0/0/0, dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail)
2023-06-21T00:00:05.940329-04:00 quad postfix/qmgr[3248]: E2C0E59B50: removed
Things to note:
message-id is preserved, so the activity in the mail logs can be matched.
the destination on the export side (root@localhost) does not match the destination on the import side (hornrj@quad). That’s because the email address used by the sender is an alias on the destination side.
In this case the “from” on the export side matches the “from” on the import side, but they are actually two different users on two different machines. “localhost” is the local machine in both cases.
The “to” address fields do not match between exporting side and importing side. The destination uses an alias that rewrote the “to” field while in transit.
That email is still in a mailbox. It’s no longer in sent or inbox mailboxes. Now it’s in another mailbox, and a search of mailboxes for the Message-ID found it. The part of the header that holds the Message-ID is shown below:
Received: by localhost (Postfix, from userid 0)
id D625F733B5E; Wed, 21 Jun 2023 00:00:05 -0400 (EDT)
From: Zeek <zeek@proxy>
Subject: [Zeek] Connection summary from 22:00:00-00:00:00
To: root@localhost
User-Agent: ZeekControl 2.3.0-5
Message-Id: <20230621040005.D625F733B5E@localhost>
Date: Wed, 21 Jun 2023 00:00:05 -0400 (EDT)
A message-Id search will find the copies of these messages in both the sending mailbox and the receiving mailbox until they are removed. Note that the From: in the message is the From field provided by the sender, and is not the from fields used by the mail transfer systems.