user avatar
Cleaned up function nsming
angella authored
5ec6ee05

Twitter importer plugin

About

This plugin is designed on top of the Twitter API. The twitter API uses OAuth 1.0a and OAuth 2.0 Bearer Token. In this plugin, we are using OAuth 1.0a which is documented here. The plugin is accessing the twitter API using tweepy an easy-to-use Python library for accessing the Twitter API

Data imported

In this plugin, the data fetched for a given user handle includes:

  • User
  • Contacts: Following and Followers
  • User timeline i.e. tweets of the user
  • Followings timelines i.e. tweets of followings
  • Tweets include Text Media: Photos, Videos (url) Links

Data structure of a user, a contact(follower or friend) and a tweet

    user = {
    created_at: string,
    id: integer,
    id_str: string,
    name: string,
    screen_name: string,
    location: string,
    profile_location: string,
    description: string,
    url: string,
    entities: dictionary,
    protected: boolean,
    followers_count: integer,
    friends_count: integer,
    listed_count: dictionary,
    created_at: string,
    favourites_count: integer,
    utc_offset: string,
    time_zone: string,
    geo_enabled: boolean,
    verified: boolean,
    statuses_count: integer,
    lang: string,
    status: dictionary,
    contributors_enabled":false,
    is_translator: boolean,
    is_translation_enabled: boolean,
    profile_background_color: string,
    profile_background_image_url: string,
    profile_background_image_url_https: string,
    profile_background_tile: boolean,
    profile_image_url: string,
    profile_image_url_https: string",
    profile_link_color: string,
    profile_sidebar_border_color: string,
    profile_sidebar_fill_color: string,
    profile_text_color: string,
    profile_use_background_image: string,
    has_extended_profile: boolean,
    default_profile: boolean,
    default_profile_image": boolean,
    following": boolean,
    follow_request_sent: boolean,
    notifications: boolean,
    translator_type: string,
    withheld_in_countries: list
   }
   
    followings = {
       "id": integer,
       "id_str": string,
       "name": string,
       "screen_name": string,
       "location": string,
       "description": string,
       "url": string,
       "entities": string,
       "protected": boolean,
       "followers_count": integer,
       "friends_count": integer,
       "listed_count": integer,
       "created_at": string,
       "favourites_count": integer,
       "utc_offset": string,
       "time_zone": string,
       "geo_enabled": boolean,
       "verified": boolean,
       "statuses_count": integer,
       "lang": string,
       "status": string,
       "contributors_enabled": boolean,
       "is_translator": boolean,
       "is_translation_enabled": boolean,
       "profile_background_color": string,
       "profile_background_image_url": string,
       "profile_background_image_url_https": string,
       "profile_background_tile": boolean,
       "profile_image_url": string,
       "profile_image_url_https": string,
       "profile_banner_url": string,
       "profile_link_color": string,
       "profile_sidebar_border_color": string,
       "profile_sidebar_fill_color": string,
       "profile_text_color": string,
       "profile_use_background_image": boolean,
       "has_extended_profile": boolean,
       "default_profile": boolean,
       "default_profile_image": boolean,
       "following": boolean,
       "live_following": boolean,
       "follow_request_sent": boolean,
       "notifications": boolean,
       "muting": boolean,
       "blocking": boolean,
       "blocked_by": boolean,
       "translator_type": string,
       "withheld_in_countries":list
       }
       
    tweet = {
        created_at: string,
        id: integer,
        id_str: string,
        full_text: string,
        truncated: boolean,
        display_text_range: list,
        entities: dictionary,
        source: string,
        in_reply_to_status_id: string,
        in_reply_to_status_id_str: string,
        in_reply_to_user_id: string,
        in_reply_to_user_id_str: string,
        in_reply_to_screen_name: string,
        user: dictionary,
        geo: string,
        coordinates: string,
        place: string,
        contributors: string,
        retweeted_status: dictionary,
        is_quote_status: string,
        retweet_count:,
        favorite_count:,
        favorited,
        retweeted
       }

How data is imported

Using tweepy, all data is paginated from the latest items by date created to the oldest available items.

User data

This is fetched using the provided user_name which on twitter is a twitter handle or screen_name. User data is fetched once for a given username.

Followers and followings

These are fetched everytime the username is given. However, if a follower or friends is found exisiting among the accounts in the pod only their edges will be updated if they were not following the related follow account.

Tweets

Tweets are fetched from the last tweet id saved for the mentioned user. E.g. If a following's latest tweet in the pod is with id 3, then tweets fetched will be greater that id 3. The same applies for the user's tweets

Duration for running the plugin

The amount of time it takes to import the data varies depending on the amount of data to be fetched and the internet speed. Typically this can take between 30 seconds (for records less than 10) to 20 minutes (for 200 records plus) in case of no ratelimit interruption. Records refers to each data to fetch. For example 200 followings, 200 followings, 200 tweets of each following, 200 tweets of the user

Unsupported functionality

  • The plugin does not fetch user contact information such as phone numbers and emails.
  • The plugin does not fetch user direct messages.
  • The plugin does not support tweeting, commenting, retweeting on the speficied twitter handle account.

Future improvements

The plugin will with the support of pymemri save videos associated to tweets that have been imported.

Limitations

Depending on your subscription category for the twitter developer account, you will have limitations to how many tweets you can fetch per month. Read more about it here. Ensure to log into your developer account and know how much you can fetch.

For each call made to the Twitter API, the maximum number of tweets that can be fetched is 200 items. However in this plugin, you can specify if you would like to fetch less than 200 items in the run function by simply speficy how many items you want to fetch. This will apply on followers, followings and timeline items.Followings tweets have been set to 5 items for each following but this is susceptible to change.

Twitter also has rate limits per minute for each subscription level for a given user_handle for a given developer account. What this means is that for each item e.g tweets, there is a limit to how much you can fetch per minute for the specified username with a given developer account. Read more about it here. Currently, when the importer hears a rate limit error about to come in, it will wait until the time elapses so that it is not blacklisted. A rate limit error lasts an hour and in this time, no data can be imported. Note that the amount of data limited depends on the account subscription level. Subscribing to an account with less limit is advantegous.

How to run project

  • Create twitter developer account. First, you need to create an account with twitter developer here

  • Clone this repository

  • Once you have a developer account, set your application callbackurl in your developer account settings

  • Get your api credentials listed below. Go to /config and create a new settings.py file. Add your twitter developer account credentials as illustrated below.

    path /config/settings.py

    tokens = { "ACCESS_TOKEN": "YOUR_ACCESS_TOKEN", "ACCESS_SECRET": 'YOUR_"ACCESS_SECRET"', "CONSUMER_KEY": 'YOUR_CONSUMER_KEY', "CONSUMER_SECRET": 'YOUR_CONSUMER_SECRET' }

Note: These maybe named differently depending on how recent your account is Use this to guide you on the naming

Client credentials:
App Key === API Key === Consumer API Key === Consumer Key === Customer Key === oauth_consumer_key
App Key Secret === API Secret Key === Consumer Secret === Consumer Key === Customer Key === oauth_consumer_secret
Callback URL === oauth_callback
Token credentials:
Access token === Token === resulting oauth_token
Access token secret === Token Secret === resulting oauth_token_secret

Run in Docker

The importer can be invoked by the Pod by launching a Dockercontainer. To build the image for thiscontainer, run:

docker build -t twitter_importer .

Locally

To run the plugin, after starting the POD (on dev branch) and building this docker image, run the following commands:

setup a virtual environment and enter into it [optional]
pip install .
python scripts/client_simulator.py {POD_URL} twitter_importer 

To run tests

pytest ./tests/*

Tools

Tweepy

The plugin is accessing the twitter API using tweepy an easy-to-use Python library for accessing the Twitter API

Pymemri

This plugin avails access to the schema and pod to be used in saving data to the pod. More details can be found here

To contribute

Simply reach out to memri for any requests or recommendations