--- title: Email importer keywords: fastai sidebar: home_sidebar nb_path: "nbs/importers.GmailImporter.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

This importers fetches your emails and accounts over IMAP, it uses the python built-in imap client and some convenience functions for easier usage, batching and importing to the pod. This importer requires you to login with your email address and an app password. It is tested on gmail, but should work for other IMAP-servers.

{% include note.html content='The recommended usage for Gmail is to enable two-factor authentication. In this case, make sure you allow SMTP-connections and set an application password (explained in the same link)' %}

ImapClient

The EmailImporter communicates with email providers over imap. We created a convenience class around pythons imaplib , called the ImapClient that lets you list your mailboxes, retriev your mails and get their content.

{% raw %}

class IMAPClient[source]

IMAPClient(username, app_pw, host='imap.gmail.com', port=993, inbox='"[Gmail]/All Mail"')

{% endraw %} {% raw %}

IMAPClient.list_mailboxes[source]

IMAPClient.list_mailboxes()

Lists all available mailboxes

{% endraw %} {% raw %}

IMAPClient.get_all_mail_ids[source]

IMAPClient.get_all_mail_ids()

retrieves all mail ids from the selected mailbox

{% endraw %} {% raw %}

IMAPClient.get_mail[source]

IMAPClient.get_mail(id)

Fetches a mail given a id, returns (raw_mail, thread_id)

{% endraw %}

EmailImporter

{% raw %}
class Account(Item):
    def __init__(self, dateAccessed=None, dateCreated=None, dateModified=None, deleted=None,
                 externalId=None, itemDescription=None, starred=None, version=None, id=None, importJson=None,
                 handle=None, displayName=None, service=None, itemType=None, avatarUrl=None, changelog=None,
                 label=None, genericAttribute=None, measure=None, sharedWith=None, belongsTo=None, price=None,
                 location=None, organization=None, owner=None):
        super().__init__(dateAccessed=dateAccessed, dateCreated=dateCreated, dateModified=dateModified,
                         deleted=deleted, externalId=externalId, itemDescription=itemDescription, starred=starred,
                         version=version, id=id, importJson=importJson, changelog=changelog, label=label,
                         genericAttribute=genericAttribute, measure=measure, sharedWith=sharedWith)
        self.handle = handle
        self.displayName = displayName
        self.service = service
        self.itemType = itemType
        self.avatarUrl = avatarUrl
        self.belongsTo = belongsTo if belongsTo is not None else []
        self.price = price if price is not None else []
        self.location = location if location is not None else []
        self.organization = organization if organization is not None else []
        self.owner = owner if owner is not None else []
{% endraw %} {% raw %}

class EmailImporter[source]

EmailImporter(*args, **kwargs) :: ImporterBase

Imports emails over imap.

{% endraw %} {% raw %}
{% endraw %}

The email importer has the following parameters

  • username Your email address
  • password Your email password. In case you're using gmail, use your application password
  • generic attributes
  • host The URL of the host (defaults to imap.gmail.com)
  • port The port of the server (defaults to 993 for gmail)
  • max_number Max number of emails to download. Leave unset for unlimited

Methods

{% raw %}

EmailImporter.get_content[source]

EmailImporter.get_content(message)

Extracts content from a python email message

{% endraw %} {% raw %}

EmailImporter.create_item_from_mail[source]

EmailImporter.create_item_from_mail(mail, thread_id=None)

Creates a schema-item from an existing mail

{% endraw %} {% raw %}

EmailImporter.run[source]

EmailImporter.run(importer_run=None, pod_client=None, verbose=True)

This is the main function of the Email importer. It runs the importer given information provided in the importer run. if you pass a pod client it will add the new items to the graph.

{% endraw %}

Usage

Download all mails from your account

{% raw %}
pod_client = PodClient()
assert pod_client.add_to_schema(EmailMessage(externalId="x", subject="x", dateSent=2, content="x"))
assert pod_client.add_to_schema(MessageChannel(externalId="x"))
assert pod_client.add_to_schema(Person(externalId="x", firstName="x"))
assert pod_client.add_to_schema(Account(externalId="x"))
{% endraw %} {% raw %}
def get_gmail_creds():
    return read_file(HOME_DIR / '.memri' / 'credentials_gmail.txt').split("\n")[:2]
{% endraw %} {% raw %}
imap_user, imap_pw = get_gmail_creds()
importer           = EmailImporter.from_data()
importer_run       = ImporterRun(username=imap_user, password=imap_pw)
{% endraw %} {% raw %}
importer.run(importer_run=importer_run, pod_client=pod_client)
Using, HOST: imap.gmail.com, PORT: 993
Finished running EmailImporter (#None)
{% endraw %}

Test individual methods

{% raw %}
importer.set_imap_client(None, imap_user, imap_pw)
Using, HOST: imap.gmail.com, PORT: 993
{% endraw %} {% raw %}
mail_ids = importer.imap_client.get_all_mail_ids()
assert len(mail_ids) > 0
{% endraw %} {% raw %}
all_mails = importer.get_mails(mail_ids[:int(10)],
                           importer_run=None,
                           pod_client=pod_client)

assert len(all_mails) > 0
for x in all_mails: assert isinstance(x, EmailMessage)
{% endraw %} {% raw %}
all_accounts = get_unique_accounts(all_mails)
assert len(all_accounts) > 0
{% endraw %}

Parse emails

{% raw %}
test = b"""\
Message-id: 1234\r
From: user1 <a@gmail.com>\r
To: user1 <b@gmail.com>\r
Reply-to: user1 <c@gmail.com>\r
Subject: the subject\r
Date: Mon, 04 May 2020 00:37:44 -0700\r

This is content"""

email_importer = EmailImporter()
mail_item = email_importer.create_item_from_mail(test, 'message_channel_id')

assert mail_item.externalId == '1234'
assert mail_item.sender[0].externalId == 'a@gmail.com'
assert mail_item.receiver[0].externalId == 'b@gmail.com'
assert mail_item.replyTo[0].externalId == 'c@gmail.com'
assert mail_item.subject == 'the subject'
assert mail_item.content == 'This is content'
assert mail_item.dateSent == email_importer.get_timestamp_from_message(email.message_from_bytes(test))
assert mail_item.messageChannel[0].externalId == 'message_channel_id'
{% endraw %}

Attachments

{% raw %}
email_importer = EmailImporter()
message = email.message.EmailMessage()
message.set_content('aa')
message.add_attachment(b'bb', maintype='image', subtype='jpeg', filename='sample.jpg')
message.add_attachment(b'cc', maintype='image', subtype='jpeg', filename='sample2.jpg')
content = email_importer.get_content(message)
attachments = email_importer.get_attachments(message)

assert content == 'aa\n'
assert attachments[0].get_content() == b'bb'
assert attachments[1].get_content() == b'cc'
{% endraw %}

_Run Local

Not implemented