--- title: Email importer keywords: fastai sidebar: home_sidebar nb_path: "nbs/importers.EmailImporter.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

This importers fetches your emails and accounts over IMAP, it uses the python built-in imap client and some convenience functions for easier usage, batching and importing to the pod. This importer requires you to login with your email address and an app password. It is tested on gmail, but should work for other IMAP-servers.

{% include note.html content='The recommended usage for Gmail is to enable two-factor authentication. In this case, make sure you allow SMTP-connections and set an application password (explained in the same link)' %}

ImapClient

The EmailImporter communicates with email providers over imap. We created a convenience class around pythons imaplib , called the ImapClient that lets you list your mailboxes, retriev your mails and get their content.

{% raw %}

class IMAPClient[source]

IMAPClient(username, app_pw, host='imap.gmail.com', port=993, inbox='"[Gmail]/All Mail"')

{% endraw %} {% raw %}

IMAPClient.list_mailboxes[source]

IMAPClient.list_mailboxes()

Lists all available mailboxes

{% endraw %} {% raw %}

IMAPClient.get_all_mail_ids[source]

IMAPClient.get_all_mail_ids()

retrieves all mail ids from the selected mailbox

{% endraw %} {% raw %}

IMAPClient.get_mail[source]

IMAPClient.get_mail(id)

Fetches a mail given a id, returns (raw_mail, thread_id)

{% endraw %}

EmailImporter

{% raw %}

class ImporterBase[source]

ImporterBase()

{% endraw %} {% raw %}

class EmailImporter[source]

EmailImporter(*args, **kwargs) :: ImporterBase

Imports emails over imap.

{% endraw %} {% raw %}
{% endraw %}

The email importer has the following parameters

  • username Your email address
  • password Your email password. In case you're using gmail, use your application password
  • generic attributes
  • host The URL of the host (defaults to imap.gmail.com)
  • port The port of the server (defaults to 993 for gmail)
  • max_number Max number of emails to download. Leave unset for unlimited
{% raw %}

EmailImporter.get_content[source]

EmailImporter.get_content(message)

Extracts content from a python email message

{% endraw %} {% raw %}

EmailImporter.create_item_from_mail[source]

EmailImporter.create_item_from_mail(mail, thread_id=None)

Creates a schema-item from an existing mail

{% endraw %} {% raw %}

EmailImporter.run[source]

EmailImporter.run(importer_run, pod_client=None, verbose=True)

This is the main function of the Email importer. It runs the importer given information provided in the importer run. if you pass a pod client it will add the new items to the graph.

{% endraw %}

Usage

Download all mails from your account

{% raw %}
pod_client = PodClient()
Could no connect to backend
{% endraw %} {% raw %}
# # This cell is meant to be able to test the importer locally
# def get_gmail_creds():
#     return read_file(HOME_DIR / '.memri' / 'credentials_gmail.txt').split("\n")[:2]

# imap_user, imap_pw = get_gmail_creds()
# importer           = EmailImporter.from_data()
# importer_run       = get_importer_run(imap_user, imap_pw)
# importer_run.add_edge('importer', importer)
# pod_client.create(importer_run)

# importer.run(importer_run=importer_run, pod_client=pod_client)

# assert importer_run.progress == 1.0
# assert importer_run.runStatus == "done"
# pod_client.delete_all()
{% endraw %}

Parse emails

{% raw %}
# Message-id: 1234\r
# From: user1 <a@gmail.com>\r
# To: user1 <b@gmail.com>\r
# Reply-to: user1 <c@gmail.com>\r
# Subject: the subject\r
# Date: Mon, 04 May 2020 00:37:44 -0700\r

# This is content"""

# email_importer = EmailImporter()
# mail_item = email_importer.create_item_from_mail(test, 'message_channel_id')

# assert mail_item.externalId == '1234'
# assert mail_item.sender[0].externalId == 'a@gmail.com'
# assert mail_item.receiver[0].externalId == 'b@gmail.com'
# assert mail_item.replyTo[0].externalId == 'c@gmail.com'
# assert mail_item.subject == 'the subject'
# assert mail_item.content == 'This is content'
# assert mail_item.dateSent == email_importer.get_timestamp_from_message(email.message_from_bytes(test))
# assert mail_item.messageChannel[0].externalId == 'message_channel_id'
{% endraw %}

Attachments

{% raw %}
email_importer = EmailImporter()
message = email.message.EmailMessage()
message.set_content('aa')
message.add_attachment(b'bb', maintype='image', subtype='jpeg', filename='sample.jpg')
message.add_attachment(b'cc', maintype='image', subtype='jpeg', filename='sample2.jpg')
content = email_importer.get_content(message)
attachments = email_importer.get_attachments(message)

assert content == 'aa\n'
assert attachments[0].get_content() == b'bb'
assert attachments[1].get_content() == b'cc'
{% endraw %}