The new email audit world

Life gets harder for good auditing

Jun 28, 2023

Three more models of email interaction have become common in the past 20 years. I’ll call them

RESTful API - which I’ll describe using Google’s Gmail RESTful API as an example.
Library based - which I’ll describe using Google’s Python library API and a couple others as examples.
Mailto: handler - where a browser invokes an app or external application to perform a mailto: function. The browser is configured at the user endpoint to use the application that they prefer.

All of these make it harder, or impossible, to use the Message-ID to identify the email being sent. Without the Message-ID it is harder to correlate the DICOM audit log with other logs on the various system involved. Those other logs will have more details about the locations and transfers of the email message. So far, the newer methods leave it reasonably easy to obtain the original intended destination address, i.e., the mailto: address.

The communications between independent email servers remains SMTP (with quite a few additions to header and processing requirements since 2004). So in theory there should be no need to change the audit message. In practice, it has become much harder to obtain the message-ID, and the information from SMTP logs is often lost. This also means file scans and other data exchange scans that detect email messages being exfiltrated cannot be correlated with the DICOM audit logs.

Sending email (Data Export)

The RESTful API

The Gmail API does expose the Message-ID and the API is structured around the message as the unit of email. So it is feasible to use the original audit message structure and capture both Message-ID and destination information. The problem is with implementation difficulty and documentation difficulty.

The Python snippets for the simplest use of the Gmail RESTful API is:

def send_message(service, destination, obj, body, attachments=[]):
    return service.users().messages().send(
      userId="me",
      body=build_message(destination, obj, body, attachments)
    ).execute()

# test send email
send_message(service, "destination@domain.com", "This is a subject", 
            "This is the body of the email", ["test.txt", "anyfile.png"])

This makes use of one of the RESTful endpoints for sending messages and generates one of these HTTP transactions:

POST /gmail/v1/users/{userId}/messages/send 
POST /upload/gmail/v1/users/{userId}/messages/send

These send the message to the recipients in the To, Cc, and Bcc headers.

Note that the Message-ID is not mentioned anywhere. It will be filled in automatically by the Gmail server as part of the send. It is not returned to the application to be put into an audit message.

It is possible to get the Message-ID by using two different methods, which in combination also send an email message:

users.drafts.create, then
users.drafts.send

You can get the Message-ID from the draft message that is created by the first POST. But, this is extra work and it is not the easy way to send an email.

Similarly, it is hard for an interactive user to get the Message-ID from the Gmail web page interface. It’s a Javascript app that is using the same resources. To get the Message-ID you must send the message, then go to the “sent” mailbox, then select the correct message, then select “show original”. The SMTP headers are shown and the Message-ID is one of them.

There is no “show original” for messages that are being edited or for drafts by the interactive user application.

Other APIs will differ from the Gmail API. Most of them make it difficult or impossible to obtain the Message-ID for a message that was just sent.

Without the Message-ID it will be hard to locate copies of the message on various systems, and it will be very hard to identify actual destinations for any messages where the destination email address is an alias, list, or otherwise subject to rewriting.

Library based

Library functions are generally harder to audit than direct uses of RESTful APIs. They typically provide a very easy path to send an email, and only sometimes provide the Message-ID. The Google Python library supports both getting a Message-ID and simply sending a message. The code examples are in Google Python documentation. The first example shows creating a draft message, getting the Message-ID from the draft message, and printing the draft’s ID. If the application programmer is motivated and informed they may be willing to use this more complex code path to obtain the Message-ID.

Another Python library “Mailtrap” will send an email with the code

import mailtrap as mt
# create mail object
mail = mt.Mail(
   sender=mt.Address(email="mailtrap@example.com", name="Mailtrap Test"),
   to=[mt.Address(email="your@email.com")],
   subject="You are awesome!",
   text="Congrats for sending test email with Mailtrap!",
)
# create client and send
client = mt.MailtrapClient(token="your-api-key")
client.send(mail)

No Message-ID is provided by that library. All you have is the destination address. But the code is really simple, so this is appealing to developers.

Using the standard Python “smtplib” (and removing all the extraneous details needed to make it really work) is:

import smtplib

host = "server.smtp.com"
server = smtplib.SMTP(host)
FROM = "testpython@test.com"
TO = "bla@test.com"
MSG = "Subject: Test email python\n\nBody of your message!"
server.sendmail(FROM, TO, MSG)

server.quit()
print ("Email Sent")

Again, no Message-ID is available.

mailto: handler

The situation with the HTML mailto: URI is simpler and bleaker. There is usually no returned information. The information provided in the mailto: element is sent to the program that was configured to handle mailto: URIs. In a browser the full mailto: URI is provided as a command argument to the mailto: handler. Unless the mailto: handler is specially programmed for DICOM, it will not generate any thing in a DICOM audit log. The generic mail handling applications do not generate DICOM audit log messages. They might generate messages in other logs.

The browsing app or program (e.g., a DICOM viewer app running in a browser), could be programmed to generate an audit message into the DICOM log to capture the mailto: URI and the fact that it was sent to the mailto: handler. This is not very informative, but at least provides a time tag and a destination. That might be enough to find the other logs that indicate what happened with this message.

Receiving email (Data import)

The data import is in much better shape than data export. The message that is delivered does contain the Message-ID. The issues that arise are mostly those related to programmer convenience:

Restful API - The message header, where the Message-ID is located, might be retrieved from a different resource than the message body. This is usually easy to obtain because it is where information like To: and CC: are found.
Library based - As with Restful API, the Message-ID will be in the message. It may be a different library call than the one to retrieve the body, but there is usually an easy access because email recipients need some of the header information.
mailto: handler - Not relevant. Mailto: handlers only deal with sending mail. The mail reception will be done using one of the other five methods.

This means that the existing audit message for Data Import can be used if the developers of applications are aware of the need to obtain the Message-ID from whatever tools they are using.

Next, what should be changed in the DICOM Audit messages to react to the changing environment and still meet our goals?

rjh’s Substack

Discussion about this post