Thursday, March 12, 2015

Parsing exported mailboxes using Python

Google Apps domain administrators can use the Email Audit API to download mailbox accounts for audit purposes in accordance with the Customer Agreement. To improve the security of the data retrieved, the service creates a PGP-encrypted copy of the mailbox which can only be decrypted by providing the corresponding RSA key.

When decrypted, the exported mailbox will be in mbox format, a standard file format used to represent collections of email messages. The mbox format is supported by many email clients, including Mozilla Thunderbird and Eudora.

If you don’t want to install a specific email client to check the content of exported mailboxes, or if you are interested in automating this process and integrating it with your business logic, you can also programmatically access mbox files.

You could fairly easily write a parser for the simple, text-based mbox format. However, some programming languages have native mbox support or libraries which provide a higher-level interface. For example, Python has a module called mailbox that exposes such functionality, and parsing a mailbox with it only takes a few lines of code:

import mailbox

def print_payload(message):
# if the message is multipart, its payload is a list of messages
if message.is_multipart():
for part in message.get_payload():
print_payload(part)
else:
print message.get_payload(decode=True)

mbox = mailbox.mbox(export.mbox)
for message in mbox:
print message[subject]
print_payload(message)

Let me know your favorite way to parse mbox-formatted files by commenting on Google+.

For any questions related to the Email Audit API, please get in touch with us on the Google Apps Domain Info and Management APIs forum.

Claudio Cherubino   profile | twitter | blog

Claudio is a Developer Programs Engineer working on Google Apps APIs and the Google Apps Marketplace. Prior to Google, he worked as software developer, technology evangelist, community manager, consultant, technical translator and has contributed to many open-source projects, including MySQL, PHP, Wordpress, Songbird and Project Voldemort.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.