Merge branch 'dev' into whatsapp-importer

4d747974 · Bijun Li · d077ff83 · cc7d7c5d · 4d747974 · 4d747974
Commit 4d747974 authored 4 years ago by Bijun Li
Hide whitespace changes
Inline Side-by-side

Showing

with 177 additions and 85 deletions
+177 -85
--- a/README.md
+++ b/README.md
@@ -2,8 +2,15 @@

 Pod is the open-source backend for [Memri](https://memri.io/) project.

-It's written in Rust and provides an [HTTP interface](./docs/HTTP_API.md) for use by the clients.
+It's written in Rust and provides an HTTP interface for use by the clients.

+See documentation on:
+
+* Pod-s [HTTP API](./docs/HTTP_API.md)
+* Running [Integrators](./docs/Integrators.md)
+* What is a [Shared Server](./docs/SharedServer.md)
+* How are data types defined in [Schema](./docs/Schema.md)
+* How to run Pod (this document)

 ## Run in docker
 To run Pod inside docker:
@@ -71,47 +78,3 @@ For example, `.schema` will display the current database schema.

 If you want to fill the database with some example data, execute
 `res/example_data.sql` inside the database.
-
-
-## Schema
-In order to store items in the database, Pod needs to be aware of their types in advance.
-This information is stored in a "schema".
-
-### Understanding the schema
-The schema is located in `/res/autogenerated_database_schema.json`.
-It lists all types that can be stored on Pod, and their properties.
-
-Valid types for properties are, at the moment:
-
-* `Text` UTF-8 string.
-* `Integer` Signed 8-byte integer.
-* `Real` 8-byte IEEE floating-point number.
-* `Bool` Boolean. Internally, booleans are stored as Integers 0 and 1. This is never exposed
-to the clients, however, and clients should only ever receive/send `true` and `false`.
-* `DateTime` The number of non-leap-milliseconds since 00:00 UTC on January 1, 1970.
-Use this database type to denote DateTime.
-Internally stored as Integer and should be passed as Integer.
-
-All column definitions of the same case-insensitive name MUST have the same type and indexing.
-All column names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`.
-All type names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`
-(same as column names).
-
-### Changing the schema locally
-If you want to make local changes to the schema while developing
-new functionality, you can edit the schema directly.
-It's located in `/res/autogenerated_database_schema.json`.
-
-Simply re-start the Pod to apply the changes.
-
-### Contributing your schema
-The schema is also used in iOS and other projects.
-To make it available universally, please submit your schema to the "schema" repository:
-[https://gitlab.memri.io/memri/schema](https://gitlab.memri.io/memri/schema).
-
-Changes made to "schema" repository will allow you to generate new definitions
-for other projects, and for Pod.
-You can copy the newly generated JSON to Pod during development.
-
-You can contribute to the schemas by making a Merge Requests for the "schema" repository.
-Please refer to that repo's documentation on how to work with it and do it best.
--- a/docs/HTTP_API.md
+++ b/docs/HTTP_API.md
 # About
-There are various components that communicate with the Pod:
+This documentation is part of [Pod](../README.md).

-* Clients like iOS app, web app;
-* Indexers that enrich data/photos/other content;
-* Importers/Downloaders that import data from other systems, e.g. from evernote.
-
-All of that data goes through Pod HTTP API.
+HTTP API is the interface that Pod provides to store and access user data.
 This document explains the data types that Pod can store,
-and current API that lets you store or retrieve the data.
+and current API provided for that.


 # Items
@@ -205,15 +201,19 @@ Mark an item as deleted:
 ```
 Insert a tree with edges (of arbitrary depth) in one batch.

-Each item should either be an object with only `uid` and `_edges` fields:
+Each item should either be "a reference", e.g. an object with only `uid` and `_edges` fields,
+or a full item which would then be created.
+
+"Reference" type of items look like that
+(the `uid` property mandatory, no other properties are present):
 ```json5
 {
  "uid": 123456789 /* uid of the item to create edge with */,
  "_edges": [ /* see below edges definition*/ ]
 }
 ```
-Or the full item to be created, in which case `uid` is optional,
-but all standard mandatory item fields need to be present:
+
+And in order to insert an item, specify all its properties (`uid` is optional in this case):
 ```json5
 {
  "_type": "SomeItemType",
@@ -288,7 +288,7 @@ Typical examples of services are services that import emails/messages into Pod.
 }
 ```
 Run a downloader on an item with the given uid.
-See [RunningServices](./RunningServices.md).
+See [Integrators](./Integrators.md).

 ⚠️ UNSTABLE: Downloaders might be merged with importers soon.

@@ -307,7 +307,7 @@ See [RunningServices](./RunningServices.md).
 }
 ```
 Run an importer on an item with the given uid.
-See [RunningServices](./RunningServices.md).
+See [Integrators](./Integrators.md).


 ### POST /v2/$owner_key/run_indexer
@@ -324,7 +324,7 @@ See [RunningServices](./RunningServices.md).
 }
 ```
 Run an indexer on an item with the given uid.
-See [RunningServices](./RunningServices.md).
+See [Integrators](./Integrators.md).


 # File API

--- a/docs/RunningServices.md
+++ b/docs/RunningServices.md
-# Running services from Pod
+# About
+This documentation is part of [Pod](../README.md).

-This documentation explains how Pod runs the services,
-namely "downloaders", "importers" and "indexers".
+Integrators are various components that can enrich your data,
+help you import your data from external services, push some data outside if you want, etc.
+
+This page explains how Pod runs various integrators.
+
+# Running integrators from Pod

 ### How to trigger
-First, the Pod needs to receive a request to run a service.
+First, the Pod needs to receive a request to run an integrator.
 This is done via [HTTP API](./HTTP_API.md).

+In the future, it is planned to also support database triggers to execute various integrators.
+
 ### What is triggered
-Upon receiving a service request, Pod will extract the `uid` from the request
+Upon receiving an integrator request, Pod will extract the `uid` from the request
 and check that item with this uid exists in the database.
 Pod will then determine the relevant **docker image**,
 and run it with specific environment variables set (see below).
@@ -17,19 +24,19 @@ and run it with specific environment variables set (see below).
 * For Importers, docker image `memri-importers:latest` will be run
 * For Indexers, docker image `memri-indexers:latest` will run

-### How services started
-Services are started via **docker**.
-Pod will set the following environment variables for services running in docker:
+### How are integrators started
+Integrators are started via **docker**.
+Pod will set the following environment variables for integrators running in docker:

 * `POD_FULL_ADDRESS`, the address of Pod to call back,
  e.g. `https://x.x.x.x:80` or `http://localhost:3030`.
  You can call the endpoints via a URL like `$POD_FULL_ADDRESS/v2/version`.
 * `POD_ADDRESS`, same of the above, but without the scheme and port.
-* `RUN_UID`, the item `uid` that the service needs to run against.
-  This item is commonly the first thing that the service requests from the Pod in order
+* `RUN_UID`, the item `uid` that the integrator needs to run against.
+  This item is commonly the first thing that the integrator requests from the Pod in order
  to understand the task and continue going forward.
 * `POD_SERVICE_PAYLOAD`, a JSON that is taken from `servicePayload` from Pod-s HTTP request body,
-  and passed-through to the service. The JSON is not escaped anyhow, and can be parsed directly.
+  and passed-through to the integrator. The JSON is not escaped anyhow, and can be parsed directly.

 Additionally, Downloaders and Importers will have a volume `/usr/src/importers/data`
 shared with them, so that files can be stored

--- a/docs/Schema.md
+++ b/docs/Schema.md
+# About
+This documentation is part of [Pod](../README.md).
+
+In order to store items in the database, Pod needs to be aware of their types in advance.
+This information is stored in a "schema".
+
+### Understanding the schema
+The schema is located in `/res/autogenerated_database_schema.json`.
+It lists all types that can be stored on Pod, and their properties.
+
+Valid types for properties are, at the moment:
+
+* `Text` UTF-8 string.
+* `Integer` Signed 8-byte integer.
+* `Real` 8-byte IEEE floating-point number.
+* `Bool` Boolean. Internally, booleans are stored as Integers 0 and 1. This is never exposed
+to the clients, however, and clients should only ever receive/send `true` and `false`.
+* `DateTime` The number of non-leap-milliseconds since 00:00 UTC on January 1, 1970.
+Use this database type to denote DateTime.
+Internally stored as Integer and should be passed as Integer.
+
+All column definitions of the same case-insensitive name MUST have the same type and indexing.
+All column names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`.
+All type names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`
+(same as column names).
+
+### Changing the schema locally
+If you want to make local changes to the schema while developing
+new functionality, you can edit the schema directly.
+It's located in `/res/autogenerated_database_schema.json`.
+
+Simply re-start the Pod to apply the changes.
+
+### Contributing your schema
+The schema is also used in iOS and other projects.
+To make it available universally, please submit your schema to the "schema" repository:
+[https://gitlab.memri.io/memri/schema](https://gitlab.memri.io/memri/schema).
+
+Changes made to "schema" repository will allow you to generate new definitions
+for other projects, and for Pod.
+You can copy the newly generated JSON to Pod during development.
+
+You can contribute to the schemas by making a Merge Requests for the "schema" repository.
+Please refer to that repo's documentation on how to work with it and do it best.
--- a/docs/SharedServer.md
+++ b/docs/SharedServer.md
+# About
+This documentation is part of [Pod](../README.md).
+
+A Shared Pod is a type of Pod that multiple people can write to. For example:
+
+* communities, e.g. people interested in plants, food, etc
+* family
+* data that you can contribute to help building community Machine Learning tools ("datasets")
+* teams in companies
+* wikipedia-like articles
+* etc
+
+
+# Front-ends
+
+In order for front-ends to send information to Shared Pods,
+they need to support their configuration.
+
+Each Shared Pod has:
+
+* `database_key`, which must be filled in by the user as it is a shared secret for all Shared Pod participants
+* URL of the Shared Pod (similar to the one of Pod itself)
+
+It is the front-end-s decision on which data to send to a particular Shared Pod.
+It always needs to be done with user confirmation.
+
+
+# Implementation
+
+Shared Pods are implemented as a variation of Pod. Currently,
+it is limited to `insert_tree` and `version`, which are basically write-only endpoints.
+This makes sure that you can submit even your sensitive data if you trust the Shared Pod maintainer.
+
+In the future we expect users to be able to share data with more fine-grained permissions,
+e.g. by allowing reads but not edits, etc.
+
+All information stored by a Shared Pod is stored in a single database,
+and in order to write to a Shared Pod, you need to know its `database_key`.
+
+Shared Pod maintainer must access the database from the filesystem.
+(To do so, they also need to have the `database_key` of course.)
+
+Run Pod with `--help` to see CLI help on setting up a Shared Pod.
--- a/src/command_line_interface.rs
+++ b/src/command_line_interface.rs
@@ -91,6 +91,11 @@ pub struct CLIOptions {
    #[structopt(long)]
    pub insecure_http_headers: bool,

+    /// Run server as a "SharedServer". See `/docs/SharedServer.md` documentation
+    /// for details on what it is, and how it works.
+    #[structopt(long)]
+    pub shared_server: bool,
+
    /// Validate a schema file, and exit.
    /// This allows testing whether a given schema is suitable for use in Pod.
    /// See README.md#schema on the general definition of a valid schema.

--- a/src/internal_api.rs
+++ b/src/internal_api.rs
@@ -250,9 +250,17 @@ pub fn bulk_action_tx(tx: &Transaction, bulk_action: BulkAction) -> Result<()> {
    Ok(())
 }

-pub fn insert_tree(tx: &Transaction, item: InsertTreeItem) -> Result<i64> {
+pub fn insert_tree(tx: &Transaction, item: InsertTreeItem, shared_server: bool) -> Result<i64> {
    let source_uid: i64 = if item.fields.len() > 1 {
        create_item_tx(tx, item.fields)?
+    } else if shared_server {
+        return Err(Error {
+            code: StatusCode::BAD_REQUEST,
+            msg: format!(
+                "Cannot create edges to already existing items in a shared server {:?}",
+                item
+            ),
+        });
    } else if let Some(uid) = item.fields.get("uid").map(|v| v.as_i64()).flatten() {
        if !item._edges.is_empty() {
            update_item_tx(tx, uid, HashMap::new())?;
@@ -265,7 +273,7 @@ pub fn insert_tree(tx: &Transaction, item: InsertTreeItem) -> Result<i64> {
        });
    };
    for edge in item._edges {
-        let target_item = insert_tree(tx, edge._target)?;
+        let target_item = insert_tree(tx, edge._target, shared_server)?;
        create_edge(tx, &edge._type, source_uid, target_item, edge.fields)?;
    }
    Ok(source_uid)

--- a/src/warp_api.rs
+++ b/src/warp_api.rs
@@ -127,12 +127,14 @@ pub async fn run_server(cli_options: &CLIOptions) {
        });

    let init_db = initialized_databases_arc.clone();
+    let cli_options_arc = Arc::new(cli_options.clone());
    let insert_tree = items_api
        .and(warp::path!(String / "insert_tree"))
        .and(warp::path::end())
        .and(warp::body::json())
        .map(move |owner: String, body: PayloadWrapper<InsertTreeItem>| {
-            let result = warp_endpoints::insert_tree(owner, init_db.deref(), body);
+            let shared_server = cli_options_arc.shared_server;
+            let result = warp_endpoints::insert_tree(owner, init_db.deref(), body, shared_server);
            let result = result.map(|result| warp::reply::json(&result));
            respond_with_result(result)
        });
@@ -254,15 +256,17 @@ pub async fn run_server(cli_options: &CLIOptions) {
                }
            });

-    let main_filter = version
+    let shared_pod_filters = version
        .with(&headers)
-        .or(get_item.with(&headers))
-        .or(get_all_items.with(&headers))
        .or(create_item.with(&headers))
+        .or(insert_tree.with(&headers));
+
+    let owned_pod_filters = get_item
+        .with(&headers)
+        .or(get_all_items.with(&headers))
        .or(bulk_action.with(&headers))
        .or(update_item.with(&headers))
        .or(delete_item.with(&headers))
-        .or(insert_tree.with(&headers))
        .or(search_by_fields.with(&headers))
        .or(get_items_with_edges.with(&headers))
        .or(run_downloader.with(&headers))
@@ -295,7 +299,13 @@ pub async fn run_server(cli_options: &CLIOptions) {
            IpAddr::from([127, 0, 0, 1])
        };
        let socket = SocketAddr::new(ip, cli_options.port);
-        warp::serve(main_filter).run(socket).await
+        if cli_options.shared_server {
+            warp::serve(shared_pod_filters).run(socket).await
+        } else {
+            warp::serve(shared_pod_filters.or(owned_pod_filters))
+                .run(socket)
+                .await
+        }
    } else {
        let cert_path = &cli_options.tls_pub_crt;
        let key_path = &cli_options.tls_priv_key;
@@ -314,12 +324,21 @@ pub async fn run_server(cli_options: &CLIOptions) {
            std::process::exit(1)
        };
        let socket = SocketAddr::new(IpAddr::from([0, 0, 0, 0]), cli_options.port);
-        warp::serve(main_filter)
-            .tls()
-            .cert_path(cert_path)
-            .key_path(key_path)
-            .run(socket)
-            .await;
+        if cli_options.shared_server {
+            warp::serve(shared_pod_filters)
+                .tls()
+                .cert_path(cert_path)
+                .key_path(key_path)
+                .run(socket)
+                .await;
+        } else {
+            warp::serve(shared_pod_filters.or(owned_pod_filters))
+                .tls()
+                .cert_path(cert_path)
+                .key_path(key_path)
+                .run(socket)
+                .await;
+        }
    }
 }


--- a/src/warp_endpoints.rs
+++ b/src/warp_endpoints.rs
@@ -99,9 +99,12 @@ pub fn insert_tree(
    owner: String,
    init_db: &RwLock<HashSet<String>>,
    body: PayloadWrapper<InsertTreeItem>,
+    shared_server: bool,
 ) -> Result<i64> {
    let mut conn: Connection = check_owner_and_initialize_db(&owner, &init_db, &body.database_key)?;
-    in_transaction(&mut conn, |tx| internal_api::insert_tree(&tx, body.payload))
+    in_transaction(&mut conn, |tx| {
+        internal_api::insert_tree(&tx, body.payload, shared_server)
+    })
 }

 pub fn search_by_fields(

--- a/tests/test_internal_api.rs
+++ b/tests/test_internal_api.rs
@@ -90,7 +90,7 @@ fn test_insert_item() {
                },
            ]
        });
-        let result = internal_api::insert_tree(&tx, serde_json::from_value(json).unwrap());
+        let result = internal_api::insert_tree(&tx, serde_json::from_value(json).unwrap(), true);
        result.expect("request failed");
        tx.commit().unwrap();
    }