Rss Importer
Memri RSS Importer takes an RSS feed, processes it, and extracts the contents of the articles using Postlight. These are then sent to the Semantic Search and ZeroShot plugins for tagging and sent to Summary Plugin for summarization. The results are stored in your Memri Pod and can be accessed at any time.
Install
To install as a Python package, run:
pip install -e .
Dependencies
RSS Importer relies on other Memri plugins and third party plugins to provide full functionality. Currently, the following plugins are required:
- Postlight parser API (for scraping the full text of articles)
- Memri Semantic Search Plugin (for tagging the imported articles)
- Memri ZeroShot Plugin (for tagging the imported articles)
Memri hosts a public instance of the Postlight parser API, which is used by default. If you want to run the parser API yourself, see the section on scraping. Memri also hosts a public instance of the Semantic Search and ZeroShot plugins, which are used by default. If you want to run these plugins yourself, see their respective repositories.
How to scrape
To run the plugin with full-text scraping enabled, set RSS_SETUP_ON_START = 1
in the plugin configuration. This will attempt to scrape any RSS entry that is imported by the plugin
NOTE Not every RSS Entry can be scraped. Many websites will have some protection against scraping, and many RSS content types (like media feeds) are not supported.
Run
Make sure you have a pod running. Then, you can run the plugin locally with:
python rss_importer/app.py
To enable scraping at start, set RSS_SETUP_ON_START
environment variable to 1
.
You can set RSS_MAX_ENTRIES_ON_START
to limit the number of entries that are scraped on start.
Default feeds
By default, there are several feeds in plugin.py
are imported to the Pod.
These feeds can be removed by interacting with the plugin endpoints, and more feeds can be added.
See the next section.
Using the plugin
Note: The best way to inspect the plugin API is by interacting with the Swagger documentation, hosted on http://0.0.0.0:8010/docs or https://rss.dev.backend.memri.io/docs. Note that the port might differ depending on your setup, the port is printed in your commandline when starting the plugin.
-
PUT /v1/feed?url=$feed_url
addsfeed_url
to your active feeds, or returns a HTTP error on failure -
GET /v1/feed
returns a list of all active feeds in your pod, as json string. -
GET /v1/entry/{feed_id}
returns a list of all entries in your pod for feed with id$feed_id
, as json string. -
GET /v1/categories
returns a list of all available categories for tagging, as json map.
Building the Docker image
To build the docker image:
make build
To run the docker image:
make run
How to Use & Integration
You can interact with the plugin using the API endpoints as described in the 'Using the Plugin' section. This allows you to add or remove feeds, fetch entries, and access available categories for tagging.
Remember that you can always refer to the Swagger documentation for a more detailed understanding of the plugin API.