Commit 4d747974 authored by Bijun Li's avatar Bijun Li
Browse files

Merge branch 'dev' into whatsapp-importer

parents d077ff83 cc7d7c5d
Showing with 177 additions and 85 deletions
+177 -85
......@@ -2,8 +2,15 @@
Pod is the open-source backend for [Memri](https://memri.io/) project.
It's written in Rust and provides an [HTTP interface](./docs/HTTP_API.md) for use by the clients.
It's written in Rust and provides an HTTP interface for use by the clients.
See documentation on:
* Pod-s [HTTP API](./docs/HTTP_API.md)
* Running [Integrators](./docs/Integrators.md)
* What is a [Shared Server](./docs/SharedServer.md)
* How are data types defined in [Schema](./docs/Schema.md)
* How to run Pod (this document)
## Run in docker
To run Pod inside docker:
......@@ -71,47 +78,3 @@ For example, `.schema` will display the current database schema.
If you want to fill the database with some example data, execute
`res/example_data.sql` inside the database.
## Schema
In order to store items in the database, Pod needs to be aware of their types in advance.
This information is stored in a "schema".
### Understanding the schema
The schema is located in `/res/autogenerated_database_schema.json`.
It lists all types that can be stored on Pod, and their properties.
Valid types for properties are, at the moment:
* `Text` UTF-8 string.
* `Integer` Signed 8-byte integer.
* `Real` 8-byte IEEE floating-point number.
* `Bool` Boolean. Internally, booleans are stored as Integers 0 and 1. This is never exposed
to the clients, however, and clients should only ever receive/send `true` and `false`.
* `DateTime` The number of non-leap-milliseconds since 00:00 UTC on January 1, 1970.
Use this database type to denote DateTime.
Internally stored as Integer and should be passed as Integer.
All column definitions of the same case-insensitive name MUST have the same type and indexing.
All column names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`.
All type names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`
(same as column names).
### Changing the schema locally
If you want to make local changes to the schema while developing
new functionality, you can edit the schema directly.
It's located in `/res/autogenerated_database_schema.json`.
Simply re-start the Pod to apply the changes.
### Contributing your schema
The schema is also used in iOS and other projects.
To make it available universally, please submit your schema to the "schema" repository:
[https://gitlab.memri.io/memri/schema](https://gitlab.memri.io/memri/schema).
Changes made to "schema" repository will allow you to generate new definitions
for other projects, and for Pod.
You can copy the newly generated JSON to Pod during development.
You can contribute to the schemas by making a Merge Requests for the "schema" repository.
Please refer to that repo's documentation on how to work with it and do it best.
# About
There are various components that communicate with the Pod:
This documentation is part of [Pod](../README.md).
* Clients like iOS app, web app;
* Indexers that enrich data/photos/other content;
* Importers/Downloaders that import data from other systems, e.g. from evernote.
All of that data goes through Pod HTTP API.
HTTP API is the interface that Pod provides to store and access user data.
This document explains the data types that Pod can store,
and current API that lets you store or retrieve the data.
and current API provided for that.
# Items
......@@ -205,15 +201,19 @@ Mark an item as deleted:
```
Insert a tree with edges (of arbitrary depth) in one batch.
Each item should either be an object with only `uid` and `_edges` fields:
Each item should either be "a reference", e.g. an object with only `uid` and `_edges` fields,
or a full item which would then be created.
"Reference" type of items look like that
(the `uid` property mandatory, no other properties are present):
```json5
{
"uid": 123456789 /* uid of the item to create edge with */,
"_edges": [ /* see below edges definition*/ ]
}
```
Or the full item to be created, in which case `uid` is optional,
but all standard mandatory item fields need to be present:
And in order to insert an item, specify all its properties (`uid` is optional in this case):
```json5
{
"_type": "SomeItemType",
......@@ -288,7 +288,7 @@ Typical examples of services are services that import emails/messages into Pod.
}
```
Run a downloader on an item with the given uid.
See [RunningServices](./RunningServices.md).
See [Integrators](./Integrators.md).
⚠️ UNSTABLE: Downloaders might be merged with importers soon.
......@@ -307,7 +307,7 @@ See [RunningServices](./RunningServices.md).
}
```
Run an importer on an item with the given uid.
See [RunningServices](./RunningServices.md).
See [Integrators](./Integrators.md).
### POST /v2/$owner_key/run_indexer
......@@ -324,7 +324,7 @@ See [RunningServices](./RunningServices.md).
}
```
Run an indexer on an item with the given uid.
See [RunningServices](./RunningServices.md).
See [Integrators](./Integrators.md).
# File API
......
# Running services from Pod
# About
This documentation is part of [Pod](../README.md).
This documentation explains how Pod runs the services,
namely "downloaders", "importers" and "indexers".
Integrators are various components that can enrich your data,
help you import your data from external services, push some data outside if you want, etc.
This page explains how Pod runs various integrators.
# Running integrators from Pod
### How to trigger
First, the Pod needs to receive a request to run a service.
First, the Pod needs to receive a request to run an integrator.
This is done via [HTTP API](./HTTP_API.md).
In the future, it is planned to also support database triggers to execute various integrators.
### What is triggered
Upon receiving a service request, Pod will extract the `uid` from the request
Upon receiving an integrator request, Pod will extract the `uid` from the request
and check that item with this uid exists in the database.
Pod will then determine the relevant **docker image**,
and run it with specific environment variables set (see below).
......@@ -17,19 +24,19 @@ and run it with specific environment variables set (see below).
* For Importers, docker image `memri-importers:latest` will be run
* For Indexers, docker image `memri-indexers:latest` will run
### How services started
Services are started via **docker**.
Pod will set the following environment variables for services running in docker:
### How are integrators started
Integrators are started via **docker**.
Pod will set the following environment variables for integrators running in docker:
* `POD_FULL_ADDRESS`, the address of Pod to call back,
e.g. `https://x.x.x.x:80` or `http://localhost:3030`.
You can call the endpoints via a URL like `$POD_FULL_ADDRESS/v2/version`.
* `POD_ADDRESS`, same of the above, but without the scheme and port.
* `RUN_UID`, the item `uid` that the service needs to run against.
This item is commonly the first thing that the service requests from the Pod in order
* `RUN_UID`, the item `uid` that the integrator needs to run against.
This item is commonly the first thing that the integrator requests from the Pod in order
to understand the task and continue going forward.
* `POD_SERVICE_PAYLOAD`, a JSON that is taken from `servicePayload` from Pod-s HTTP request body,
and passed-through to the service. The JSON is not escaped anyhow, and can be parsed directly.
and passed-through to the integrator. The JSON is not escaped anyhow, and can be parsed directly.
Additionally, Downloaders and Importers will have a volume `/usr/src/importers/data`
shared with them, so that files can be stored
......
# About
This documentation is part of [Pod](../README.md).
In order to store items in the database, Pod needs to be aware of their types in advance.
This information is stored in a "schema".
### Understanding the schema
The schema is located in `/res/autogenerated_database_schema.json`.
It lists all types that can be stored on Pod, and their properties.
Valid types for properties are, at the moment:
* `Text` UTF-8 string.
* `Integer` Signed 8-byte integer.
* `Real` 8-byte IEEE floating-point number.
* `Bool` Boolean. Internally, booleans are stored as Integers 0 and 1. This is never exposed
to the clients, however, and clients should only ever receive/send `true` and `false`.
* `DateTime` The number of non-leap-milliseconds since 00:00 UTC on January 1, 1970.
Use this database type to denote DateTime.
Internally stored as Integer and should be passed as Integer.
All column definitions of the same case-insensitive name MUST have the same type and indexing.
All column names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`.
All type names MUST consist of `a-zA-Z0-9_` characters only, and start with `a-zA-Z`
(same as column names).
### Changing the schema locally
If you want to make local changes to the schema while developing
new functionality, you can edit the schema directly.
It's located in `/res/autogenerated_database_schema.json`.
Simply re-start the Pod to apply the changes.
### Contributing your schema
The schema is also used in iOS and other projects.
To make it available universally, please submit your schema to the "schema" repository:
[https://gitlab.memri.io/memri/schema](https://gitlab.memri.io/memri/schema).
Changes made to "schema" repository will allow you to generate new definitions
for other projects, and for Pod.
You can copy the newly generated JSON to Pod during development.
You can contribute to the schemas by making a Merge Requests for the "schema" repository.
Please refer to that repo's documentation on how to work with it and do it best.
# About
This documentation is part of [Pod](../README.md).
A Shared Pod is a type of Pod that multiple people can write to. For example:
* communities, e.g. people interested in plants, food, etc
* family
* data that you can contribute to help building community Machine Learning tools ("datasets")
* teams in companies
* wikipedia-like articles
* etc
# Front-ends
In order for front-ends to send information to Shared Pods,
they need to support their configuration.
Each Shared Pod has:
* `database_key`, which must be filled in by the user as it is a shared secret for all Shared Pod participants
* URL of the Shared Pod (similar to the one of Pod itself)
It is the front-end-s decision on which data to send to a particular Shared Pod.
It always needs to be done with user confirmation.
# Implementation
Shared Pods are implemented as a variation of Pod. Currently,
it is limited to `insert_tree` and `version`, which are basically write-only endpoints.
This makes sure that you can submit even your sensitive data if you trust the Shared Pod maintainer.
In the future we expect users to be able to share data with more fine-grained permissions,
e.g. by allowing reads but not edits, etc.
All information stored by a Shared Pod is stored in a single database,
and in order to write to a Shared Pod, you need to know its `database_key`.
Shared Pod maintainer must access the database from the filesystem.
(To do so, they also need to have the `database_key` of course.)
Run Pod with `--help` to see CLI help on setting up a Shared Pod.
......@@ -91,6 +91,11 @@ pub struct CLIOptions {
#[structopt(long)]
pub insecure_http_headers: bool,
/// Run server as a "SharedServer". See `/docs/SharedServer.md` documentation
/// for details on what it is, and how it works.
#[structopt(long)]
pub shared_server: bool,
/// Validate a schema file, and exit.
/// This allows testing whether a given schema is suitable for use in Pod.
/// See README.md#schema on the general definition of a valid schema.
......
......@@ -250,9 +250,17 @@ pub fn bulk_action_tx(tx: &Transaction, bulk_action: BulkAction) -> Result<()> {
Ok(())
}
pub fn insert_tree(tx: &Transaction, item: InsertTreeItem) -> Result<i64> {
pub fn insert_tree(tx: &Transaction, item: InsertTreeItem, shared_server: bool) -> Result<i64> {
let source_uid: i64 = if item.fields.len() > 1 {
create_item_tx(tx, item.fields)?
} else if shared_server {
return Err(Error {
code: StatusCode::BAD_REQUEST,
msg: format!(
"Cannot create edges to already existing items in a shared server {:?}",
item
),
});
} else if let Some(uid) = item.fields.get("uid").map(|v| v.as_i64()).flatten() {
if !item._edges.is_empty() {
update_item_tx(tx, uid, HashMap::new())?;
......@@ -265,7 +273,7 @@ pub fn insert_tree(tx: &Transaction, item: InsertTreeItem) -> Result<i64> {
});
};
for edge in item._edges {
let target_item = insert_tree(tx, edge._target)?;
let target_item = insert_tree(tx, edge._target, shared_server)?;
create_edge(tx, &edge._type, source_uid, target_item, edge.fields)?;
}
Ok(source_uid)
......
......@@ -127,12 +127,14 @@ pub async fn run_server(cli_options: &CLIOptions) {
});
let init_db = initialized_databases_arc.clone();
let cli_options_arc = Arc::new(cli_options.clone());
let insert_tree = items_api
.and(warp::path!(String / "insert_tree"))
.and(warp::path::end())
.and(warp::body::json())
.map(move |owner: String, body: PayloadWrapper<InsertTreeItem>| {
let result = warp_endpoints::insert_tree(owner, init_db.deref(), body);
let shared_server = cli_options_arc.shared_server;
let result = warp_endpoints::insert_tree(owner, init_db.deref(), body, shared_server);
let result = result.map(|result| warp::reply::json(&result));
respond_with_result(result)
});
......@@ -254,15 +256,17 @@ pub async fn run_server(cli_options: &CLIOptions) {
}
});
let main_filter = version
let shared_pod_filters = version
.with(&headers)
.or(get_item.with(&headers))
.or(get_all_items.with(&headers))
.or(create_item.with(&headers))
.or(insert_tree.with(&headers));
let owned_pod_filters = get_item
.with(&headers)
.or(get_all_items.with(&headers))
.or(bulk_action.with(&headers))
.or(update_item.with(&headers))
.or(delete_item.with(&headers))
.or(insert_tree.with(&headers))
.or(search_by_fields.with(&headers))
.or(get_items_with_edges.with(&headers))
.or(run_downloader.with(&headers))
......@@ -295,7 +299,13 @@ pub async fn run_server(cli_options: &CLIOptions) {
IpAddr::from([127, 0, 0, 1])
};
let socket = SocketAddr::new(ip, cli_options.port);
warp::serve(main_filter).run(socket).await
if cli_options.shared_server {
warp::serve(shared_pod_filters).run(socket).await
} else {
warp::serve(shared_pod_filters.or(owned_pod_filters))
.run(socket)
.await
}
} else {
let cert_path = &cli_options.tls_pub_crt;
let key_path = &cli_options.tls_priv_key;
......@@ -314,12 +324,21 @@ pub async fn run_server(cli_options: &CLIOptions) {
std::process::exit(1)
};
let socket = SocketAddr::new(IpAddr::from([0, 0, 0, 0]), cli_options.port);
warp::serve(main_filter)
.tls()
.cert_path(cert_path)
.key_path(key_path)
.run(socket)
.await;
if cli_options.shared_server {
warp::serve(shared_pod_filters)
.tls()
.cert_path(cert_path)
.key_path(key_path)
.run(socket)
.await;
} else {
warp::serve(shared_pod_filters.or(owned_pod_filters))
.tls()
.cert_path(cert_path)
.key_path(key_path)
.run(socket)
.await;
}
}
}
......
......@@ -99,9 +99,12 @@ pub fn insert_tree(
owner: String,
init_db: &RwLock<HashSet<String>>,
body: PayloadWrapper<InsertTreeItem>,
shared_server: bool,
) -> Result<i64> {
let mut conn: Connection = check_owner_and_initialize_db(&owner, &init_db, &body.database_key)?;
in_transaction(&mut conn, |tx| internal_api::insert_tree(&tx, body.payload))
in_transaction(&mut conn, |tx| {
internal_api::insert_tree(&tx, body.payload, shared_server)
})
}
pub fn search_by_fields(
......
......@@ -90,7 +90,7 @@ fn test_insert_item() {
},
]
});
let result = internal_api::insert_tree(&tx, serde_json::from_value(json).unwrap());
let result = internal_api::insert_tree(&tx, serde_json::from_value(json).unwrap(), true);
result.expect("request failed");
tx.commit().unwrap();
}
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment