--- title: Email importer keywords: fastai sidebar: home_sidebar nb_path: "nbs/importers.EmailImporter.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

This importers fetches your emails and accounts over IMAP, it uses the python built-in imap client and some convenience functions for easier usage, batching and importing to the pod. This importer requires you to login with your email address and an app password. It is tested on gmail, but should work for other IMAP-servers.

{% include note.html content='The recommended usage for Gmail is to enable two-factor authentication. In this case, make sure you allow SMTP-connections and set an application password (explained in the same link)' %}

ImapClient

The EmailImporter communicates with email providers over imap. We created a convenience class around pythons imaplib , called the ImapClient that lets you list your mailboxes, retriev your mails and get their content.

{% raw %}

class IMAPClient[source]

IMAPClient(username, app_pw, host='imap.gmail.com', port=993, inbox='"[Gmail]/All Mail"')

{% endraw %} {% raw %}

IMAPClient.list_mailboxes[source]

IMAPClient.list_mailboxes()

Lists all available mailboxes

{% endraw %} {% raw %}

IMAPClient.get_all_mail_uids[source]

IMAPClient.get_all_mail_uids()

retrieves all mail uids from the selected mailbox

{% endraw %} {% raw %}

IMAPClient.get_mail[source]

IMAPClient.get_mail(uid)

Fetches a mail given a uid, returns (raw_mail, thread_id)

{% endraw %}

EmailImporter

{% raw %}

class EmailImporter[source]

EmailImporter(*args, **kwargs) :: ImporterBase

Imports emails over imap.

{% endraw %} {% raw %}
{% endraw %}

The email importer has the following parameters

  • username Your email address
  • password Your email password. In case you're using gmail, use your application password
  • generic attributes
  • host The URL of the host (defaults to imap.gmail.com)
  • port The port of the server (defaults to 993 for gmail)
  • max_number Max number of emails to download. Leave unset for unlimited
{% raw %}

EmailImporter.get_content[source]

EmailImporter.get_content(message)

Extracts content from a python email message

{% endraw %} {% raw %}

EmailImporter.create_item_from_mail[source]

EmailImporter.create_item_from_mail(mail, thread_id=None)

Creates a schema-item from an existing mail

{% endraw %} {% raw %}

EmailImporter.run[source]

EmailImporter.run(importer_run, pod_client=None, verbose=True)

This is the main function of the Email importer. It runs the importer given information provided in the importer run. if you pass a pod client it will add the new items to the graph.

{% endraw %}

Usage

Download all mails from your account

{% raw %}
pod_client = PodClient()
{% endraw %} {% raw %}
# This cell is meant to be able to test the importer locally
def get_gmail_creds():
    return read_file(HOME_DIR / '.memri' / 'credentials_gmail.txt').split("\n")[:2]

imap_user, imap_pw = get_gmail_creds()
importer           = EmailImporter.from_data()
importer_run       = get_importer_run(imap_user, imap_pw)
importer_run.add_edge('importer', importer)
pod_client.create(importer_run)

importer.run(importer_run=importer_run, pod_client=pod_client)

assert importer_run.progress == 1.0
assert importer_run.runStatus == "done"
pod_client.delete_all()
Using, HOST: imap.gmail.com, PORT: 993
RUN STATUS: running
PROGRESS MESSAGE: downloading emails
PROGRESS: Importing 50.0% of 10 
PROGRESS: Importing 100.0% of 10 
PROGRESS MESSAGE: merging duplicate items
PROGRESS MESSAGE: creating accounts
PROGRESS MESSAGE: creating threads
Finished running EmailImporter (#None)
RUN STATUS: done
{% endraw %}

Parse emails

{% raw %}
test = b"""\
Message-id: 1234\r
From: user1 <a@gmail.com>\r
To: user1 <b@gmail.com>\r
Reply-to: user1 <c@gmail.com>\r
Subject: the subject\r
Date: Mon, 04 May 2020 00:37:44 -0700\r

This is content"""

email_importer = EmailImporter()
mail_item = email_importer.create_item_from_mail(test, 'message_channel_id')

assert mail_item.externalId == '1234'
assert mail_item.sender[0].externalId == 'a@gmail.com'
assert mail_item.receiver[0].externalId == 'b@gmail.com'
assert mail_item.replyTo[0].externalId == 'c@gmail.com'
assert mail_item.subject == 'the subject'
assert mail_item.content == 'This is content'
assert mail_item.dateSent == email_importer.get_timestamp_from_message(email.message_from_bytes(test))
assert mail_item.messageChannel[0].externalId == 'message_channel_id'
{% endraw %}

Attachments

{% raw %}
email_importer = EmailImporter()
message = email.message.EmailMessage()
message.set_content('aa')
message.add_attachment(b'bb', maintype='image', subtype='jpeg', filename='sample.jpg')
message.add_attachment(b'cc', maintype='image', subtype='jpeg', filename='sample2.jpg')
content = email_importer.get_content(message)
attachments = email_importer.get_attachments(message)

assert content == 'aa\n'
assert attachments[0].get_content() == b'bb'
assert attachments[1].get_content() == b'cc'
{% endraw %}