Commit f655c048 authored by Youp's avatar Youp
Browse files

Merge branch 'dev' of https://gitlab.memri.io/memri/pyintegrators into gmail

parents 72c36a19 dccd10c3
Showing with 1644 additions and 70 deletions
+1644 -70
......@@ -9,6 +9,7 @@
_tmp*
tmp*
tags
pod_docker
# Byte-compiled / optimized / DLL files
__pycache__/
......
default:
image: python:3.6
before_script:
- apt-get update && apt-get install -y libsqlcipher-dev
- pip install nbdev jupyter
- pip install -e .
......@@ -25,7 +26,7 @@ check if there is diff library/nbs:
run tests:
stage: test
script:
- nbdev_test_nbs
- ./tools/test_in_ci.sh
pages:
inherit:
......
......@@ -2,7 +2,7 @@
> Integrators integrate your information in the pod. They import your data from external services (gmail, whatsapp, icloud, facebook etc.), enrich your data with indexers (face recognition, spam detection, duplicate photo detection), and execute actions (sending mails, automatically share selected photo's with your family).
Integrators for memri have a single repo per language, this repo the one for python, but other repo's exist for [node](https://gitlab.memri.io/memri/nodeintegrators) and in the future for rust. This repo is build with [nbdev](https://github.com/fastai/nbdev) and therefore all code/documentation/tests are written in one place as jupyter notebooks and exported to a python-package/jekyll-website/unit-tests.
Integrators for memri have a single repo per language, this repo the one for python, but other repo's exist for [node](https://gitlab.memri.io/memri/nodeintegrators) and in the future for rust. This repo is built with [nbdev](https://github.com/fastai/nbdev) and therefore all code/documentation/tests are written in one place as jupyter notebooks and exported to a python-package/jekyll-website/unit-tests.
## Install
......@@ -10,7 +10,7 @@ Integrators for memri have a single repo per language, this repo the one for pyt
`nbdev_install_git_hooks`
This last command clears your notebooks of unnecessary metadata
This last command clears your notebooks of unnecessary metadata when making a commit.
## How to develop with nbdev
......
......@@ -9,12 +9,12 @@ entries:
- output: web,pdf
title: Overview
url: /
- output: web,pdf
title: ItemBase
url: itembase.html
- output: web,pdf
title: Pod Client
url: pod.client.html
- output: web,pdf
title: ItemBase
url: itembase.html
output: web
title: Getting Started
- folderitems:
......@@ -27,6 +27,19 @@ entries:
- output: web,pdf
title: Overview
url: indexers.indexer.html
- output: web,pdf
title: GeoIndexer
url: indexers.GeoIndexer.html
subfolders:
- output: web
subfolderitems:
- output: web,pdf
title: Parser
url: indexers.NoteListIndexer.Parser.html
- output: web,pdf
title: Data
url: indexers.NoteListIndexer.NoteList.html
title: NoteListIndexer
output: web
title: Indexers
- folderitems:
......
......@@ -54,7 +54,7 @@ nb_path: "nbs/basic.ipynb"
<div class="output_markdown rendered_html output_subarea ">
<h4 id="read_file" class="doc_header"><code>read_file</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/basic.py#L11" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>read_file</code>(<strong><code>path</code></strong>)</p>
<h4 id="read_file" class="doc_header"><code>read_file</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/basic.py#L13" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>read_file</code>(<strong><code>path</code></strong>)</p>
</blockquote>
</div>
......@@ -78,7 +78,7 @@ nb_path: "nbs/basic.ipynb"
<div class="output_markdown rendered_html output_subarea ">
<h4 id="read_json" class="doc_header"><code>read_json</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/basic.py#L14" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>read_json</code>(<strong><code>path</code></strong>)</p>
<h4 id="read_json" class="doc_header"><code>read_json</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/basic.py#L16" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>read_json</code>(<strong><code>path</code></strong>)</p>
</blockquote>
</div>
......@@ -102,7 +102,7 @@ nb_path: "nbs/basic.ipynb"
<div class="output_markdown rendered_html output_subarea ">
<h4 id="write_json" class="doc_header"><code>write_json</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/basic.py#L18" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>write_json</code>(<strong><code>obj</code></strong>, <strong><code>fname</code></strong>)</p>
<h4 id="write_json" class="doc_header"><code>write_json</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/basic.py#L20" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>write_json</code>(<strong><code>obj</code></strong>, <strong><code>fname</code></strong>)</p>
</blockquote>
</div>
......
repository: koenvanderveen/integrators
output: web
topnav_title: integrators
site_title: integrators
company_name: Memri
description: Integrating information for the memri pod
# Set to false to disable KaTeX math
use_math: true
# Add Google analytics id if you have one and want to use it here
google_analytics:
# See http://nbdev.fast.ai/search for help with adding Search
google_search:
host: 127.0.0.1
# the preview server used. Leave as is.
port: 4000
# the port where the preview is rendered.
exclude:
- .idea/
- .gitignore
- vendor
exclude: [vendor]
highlighter: rouge
markdown: kramdown
kramdown:
input: GFM
auto_ids: true
hard_wrap: false
syntax_highlighter: rouge
collections:
tooltips:
output: false
defaults:
-
scope:
path: ""
type: "pages"
values:
layout: "page"
comments: true
search: true
sidebar: home_sidebar
topnav: topnav
-
scope:
path: ""
type: "tooltips"
values:
layout: "page"
comments: true
search: true
tooltip: true
sidebars:
- home_sidebar
permalink: pretty
theme: jekyll-theme-cayman
\ No newline at end of file
......@@ -31,7 +31,7 @@ nb_path: "nbs/index.ipynb"
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Integrators for memri have a single repo per language, this repo the one for python, but other repo's exist for <a href="https://gitlab.memri.io/memri/nodeintegrators">node</a> and in the future for rust. This repo is build with <a href="https://github.com/fastai/nbdev">nbdev</a> and therefore all code/documentation/tests are written in one place as jupyter notebooks and exported to a python-package/jekyll-website/unit-tests.</p>
<p>Integrators for memri have a single repo per language, this repo the one for python, but other repo's exist for <a href="https://gitlab.memri.io/memri/nodeintegrators">node</a> and in the future for rust. This repo is built with <a href="https://github.com/fastai/nbdev">nbdev</a> and therefore all code/documentation/tests are written in one place as jupyter notebooks and exported to a python-package/jekyll-website/unit-tests.</p>
</div>
</div>
......@@ -52,7 +52,7 @@ nb_path: "nbs/index.ipynb"
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This last command clears your notebooks of unnecessary metadata</p>
<p>This last command clears your notebooks of unnecessary metadata when making a commit.</p>
</div>
</div>
......
......@@ -56,6 +56,8 @@ nb_path: "nbs/indexers.GeoIndexer.ipynb"
<div class="output_markdown rendered_html output_subarea ">
<h2 id="GeoIndexer" class="doc_header"><code>class</code> <code>GeoIndexer</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/geo/geo_indexer.py#L18" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>GeoIndexer</code>(<strong>*<code>args</code></strong>, <strong>**<code>kwargs</code></strong>) :: <a href="/integrators/indexers.indexer.html#IndexerBase"><code>IndexerBase</code></a></p>
</blockquote>
<p>Provides a base class for all items. All items in the schema inherit from this class, and it provides some
basic functionality for consistency and to enable easier usage.</p>
</div>
......@@ -69,7 +71,7 @@ nb_path: "nbs/indexers.GeoIndexer.ipynb"
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="create-a-toy-dataset">create a toy dataset<a class="anchor-link" href="#create-a-toy-dataset"> </a></h1>
<h1 id="Create-a-toy-dataset">Create a toy dataset<a class="anchor-link" href="#Create-a-toy-dataset"> </a></h1>
</div>
</div>
</div>
......@@ -81,6 +83,14 @@ nb_path: "nbs/indexers.GeoIndexer.ipynb"
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">client</span> <span class="o">=</span> <span class="n">PodClient</span><span class="p">()</span>
<span class="n">location</span> <span class="o">=</span> <span class="n">Location</span><span class="o">.</span><span class="n">from_data</span><span class="p">(</span><span class="n">latitude</span><span class="o">=-</span><span class="mf">37.81</span><span class="p">,</span> <span class="n">longitude</span><span class="o">=</span><span class="mf">144.96</span><span class="p">)</span>
<span class="n">address</span> <span class="o">=</span> <span class="n">Address</span><span class="o">.</span><span class="n">from_data</span><span class="p">()</span>
<span class="n">indexer</span> <span class="o">=</span> <span class="n">GeoIndexer</span><span class="o">.</span><span class="n">from_data</span><span class="p">()</span>
<span class="n">indexer_run</span> <span class="o">=</span> <span class="n">IndexerRun</span><span class="o">.</span><span class="n">from_data</span><span class="p">(</span><span class="n">progress</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">targetDataType</span><span class="o">=</span><span class="s2">&quot;Address&quot;</span><span class="p">)</span>
<span class="n">indexer_run</span><span class="o">.</span><span class="n">add_edge</span><span class="p">(</span><span class="s2">&quot;indexer&quot;</span><span class="p">,</span> <span class="n">indexer</span><span class="p">)</span>
<span class="n">address</span><span class="o">.</span><span class="n">add_edge</span><span class="p">(</span><span class="s2">&quot;location&quot;</span><span class="p">,</span> <span class="n">location</span><span class="p">)</span>
</pre></div>
</div>
......@@ -97,10 +107,24 @@ nb_path: "nbs/indexers.GeoIndexer.ipynb"
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">node1</span> <span class="o">=</span> <span class="n">Location</span><span class="o">.</span><span class="n">from_data</span><span class="p">(</span><span class="n">latitude</span><span class="o">=-</span><span class="mf">37.81</span><span class="p">,</span> <span class="n">longitude</span><span class="o">=</span><span class="mf">144.96</span><span class="p">)</span>
<span class="n">node2</span> <span class="o">=</span> <span class="n">Address</span><span class="o">.</span><span class="n">from_data</span><span class="p">()</span>
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">test_registration</span><span class="p">(</span><span class="n">GeoIndexer</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<span class="n">node2</span><span class="o">.</span><span class="n">add_edge</span><span class="p">(</span><span class="s2">&quot;location&quot;</span><span class="p">,</span> <span class="n">node1</span><span class="p">)</span>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">IndexerData</span><span class="p">(</span><span class="n">items_with_location</span><span class="o">=</span> <span class="p">[</span><span class="n">address</span><span class="p">])</span>
</pre></div>
</div>
......@@ -117,15 +141,28 @@ nb_path: "nbs/indexers.GeoIndexer.ipynb"
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">indexer</span> <span class="o">=</span> <span class="n">Indexer</span><span class="o">.</span><span class="n">from_data</span><span class="p">(</span><span class="n">indexerClass</span><span class="o">=</span><span class="s2">&quot;GeoIndexer&quot;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s2">&quot;GeoIndexer&quot;</span><span class="p">)</span>
<span class="n">indexer_run</span> <span class="o">=</span> <span class="n">IndexerRun</span><span class="p">(</span><span class="n">progress</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">targetDataType</span><span class="o">=</span><span class="s2">&quot;Address&quot;</span><span class="p">)</span>
<span class="n">indexer_run</span><span class="o">.</span><span class="n">add_edge</span><span class="p">(</span><span class="s2">&quot;indexer&quot;</span><span class="p">,</span> <span class="n">indexer</span><span class="p">)</span>
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">updated_items</span><span class="p">,</span> <span class="n">new_items</span> <span class="o">=</span> <span class="n">indexer</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">indexer_run</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">new_items</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s2">&quot;Australia&quot;</span> <span class="ow">and</span> <span class="n">updated_items</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">city</span> <span class="o">==</span> <span class="s2">&quot;Melbourne&quot;</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>indexing 1 items
</pre>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
......
---
title: Note
keywords: fastai
sidebar: home_sidebar
nb_path: "nbs/indexers.NoteListIndexer.NoteList.ipynb"
---
<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: nbs/indexers.NoteListIndexer.NoteList.ipynb
# command to build the docs after a change: nbdev_build_docs
-->
<div class="container" id="notebook-container">
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h2 id="INote" class="doc_header"><code>class</code> <code>INote</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/notelist.py#L14" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>INote</code>(<strong><code>dateAccessed</code></strong>=<em><code>None</code></em>, <strong><code>dateCreated</code></strong>=<em><code>None</code></em>, <strong><code>dateModified</code></strong>=<em><code>None</code></em>, <strong><code>deleted</code></strong>=<em><code>None</code></em>, <strong><code>externalId</code></strong>=<em><code>None</code></em>, <strong><code>itemDescription</code></strong>=<em><code>None</code></em>, <strong><code>starred</code></strong>=<em><code>None</code></em>, <strong><code>version</code></strong>=<em><code>None</code></em>, <strong><code>uid</code></strong>=<em><code>None</code></em>, <strong><code>importJson</code></strong>=<em><code>None</code></em>, <strong><code>title</code></strong>=<em><code>None</code></em>, <strong><code>abstract</code></strong>=<em><code>None</code></em>, <strong><code>datePublished</code></strong>=<em><code>None</code></em>, <strong><code>keyword</code></strong>=<em><code>None</code></em>, <strong><code>content</code></strong>=<em><code>None</code></em>, <strong><code>textContent</code></strong>=<em><code>None</code></em>, <strong><code>transcript</code></strong>=<em><code>None</code></em>, <strong><code>itemType</code></strong>=<em><code>None</code></em>, <strong><code>changelog</code></strong>=<em><code>None</code></em>, <strong><code>label</code></strong>=<em><code>None</code></em>, <strong><code>genericAttribute</code></strong>=<em><code>None</code></em>, <strong><code>measure</code></strong>=<em><code>None</code></em>, <strong><code>sharedWith</code></strong>=<em><code>None</code></em>, <strong><code>audio</code></strong>=<em><code>None</code></em>, <strong><code>citation</code></strong>=<em><code>None</code></em>, <strong><code>contentLocation</code></strong>=<em><code>None</code></em>, <strong><code>locationCreated</code></strong>=<em><code>None</code></em>, <strong><code>video</code></strong>=<em><code>None</code></em>, <strong><code>writtenBy</code></strong>=<em><code>None</code></em>, <strong><code>file</code></strong>=<em><code>None</code></em>, <strong><code>recordedAt</code></strong>=<em><code>None</code></em>, <strong><code>review</code></strong>=<em><code>None</code></em>, <strong><code>comment</code></strong>=<em><code>None</code></em>, <strong><code>noteList</code></strong>=<em><code>None</code></em>) :: <code>Note</code></p>
</blockquote>
<p>Provides a base class for all items. All items in the schema inherit from this class, and it provides some
basic functionality for consistency and to enable easier usage.</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="NoteLists">NoteLists<a class="anchor-link" href="#NoteLists"> </a></h1><p>A notelist object denotes a list contained in a written html note.</p>
</div>
</div>
</div>
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h2 id="INoteList" class="doc_header"><code>class</code> <code>INoteList</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/notelist.py#L25" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>INoteList</code>(<strong><code>dateAccessed</code></strong>=<em><code>None</code></em>, <strong><code>dateCreated</code></strong>=<em><code>None</code></em>, <strong><code>dateModified</code></strong>=<em><code>None</code></em>, <strong><code>deleted</code></strong>=<em><code>None</code></em>, <strong><code>externalId</code></strong>=<em><code>None</code></em>, <strong><code>itemDescription</code></strong>=<em><code>None</code></em>, <strong><code>starred</code></strong>=<em><code>None</code></em>, <strong><code>version</code></strong>=<em><code>None</code></em>, <strong><code>uid</code></strong>=<em><code>None</code></em>, <strong><code>importJson</code></strong>=<em><code>None</code></em>, <strong><code>title</code></strong>=<em><code>None</code></em>, <strong><code>abstract</code></strong>=<em><code>None</code></em>, <strong><code>datePublished</code></strong>=<em><code>None</code></em>, <strong><code>keyword</code></strong>=<em><code>None</code></em>, <strong><code>content</code></strong>=<em><code>None</code></em>, <strong><code>textContent</code></strong>=<em><code>None</code></em>, <strong><code>transcript</code></strong>=<em><code>None</code></em>, <strong><code>itemType</code></strong>=<em><code>None</code></em>, <strong><code>category</code></strong>=<em><code>None</code></em>, <strong><code>changelog</code></strong>=<em><code>None</code></em>, <strong><code>label</code></strong>=<em><code>None</code></em>, <strong><code>genericAttribute</code></strong>=<em><code>None</code></em>, <strong><code>measure</code></strong>=<em><code>None</code></em>, <strong><code>sharedWith</code></strong>=<em><code>None</code></em>, <strong><code>audio</code></strong>=<em><code>None</code></em>, <strong><code>citation</code></strong>=<em><code>None</code></em>, <strong><code>contentLocation</code></strong>=<em><code>None</code></em>, <strong><code>locationCreated</code></strong>=<em><code>None</code></em>, <strong><code>video</code></strong>=<em><code>None</code></em>, <strong><code>writtenBy</code></strong>=<em><code>None</code></em>, <strong><code>file</code></strong>=<em><code>None</code></em>, <strong><code>recordedAt</code></strong>=<em><code>None</code></em>, <strong><code>review</code></strong>=<em><code>None</code></em>, <strong><code>span</code></strong>=<em><code>None</code></em>, <strong><code>itemSpan</code></strong>=<em><code>None</code></em>, <strong><code>note</code></strong>=<em><code>None</code></em>) :: <code>NoteList</code></p>
</blockquote>
<p>Provides a base class for all items. All items in the schema inherit from this class, and it provides some
basic functionality for consistency and to enable easier usage.</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="ULNoteList">ULNoteList<a class="anchor-link" href="#ULNoteList"> </a></h2><p>A ULNoteList is the most vanilla kind of list. It is a list of items encapsulated by \<ul> \</ul> tags.</p>
</div>
</div>
</div>
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h2 id="ULNoteList" class="doc_header"><code>class</code> <code>ULNoteList</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/notelist.py#L56" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>ULNoteList</code>(<strong><code>dateAccessed</code></strong>=<em><code>None</code></em>, <strong><code>dateCreated</code></strong>=<em><code>None</code></em>, <strong><code>dateModified</code></strong>=<em><code>None</code></em>, <strong><code>deleted</code></strong>=<em><code>None</code></em>, <strong><code>externalId</code></strong>=<em><code>None</code></em>, <strong><code>itemDescription</code></strong>=<em><code>None</code></em>, <strong><code>starred</code></strong>=<em><code>None</code></em>, <strong><code>version</code></strong>=<em><code>None</code></em>, <strong><code>uid</code></strong>=<em><code>None</code></em>, <strong><code>importJson</code></strong>=<em><code>None</code></em>, <strong><code>title</code></strong>=<em><code>None</code></em>, <strong><code>abstract</code></strong>=<em><code>None</code></em>, <strong><code>datePublished</code></strong>=<em><code>None</code></em>, <strong><code>keyword</code></strong>=<em><code>None</code></em>, <strong><code>content</code></strong>=<em><code>None</code></em>, <strong><code>textContent</code></strong>=<em><code>None</code></em>, <strong><code>transcript</code></strong>=<em><code>None</code></em>, <strong><code>itemType</code></strong>=<em><code>None</code></em>, <strong><code>category</code></strong>=<em><code>None</code></em>, <strong><code>changelog</code></strong>=<em><code>None</code></em>, <strong><code>label</code></strong>=<em><code>None</code></em>, <strong><code>genericAttribute</code></strong>=<em><code>None</code></em>, <strong><code>measure</code></strong>=<em><code>None</code></em>, <strong><code>sharedWith</code></strong>=<em><code>None</code></em>, <strong><code>audio</code></strong>=<em><code>None</code></em>, <strong><code>citation</code></strong>=<em><code>None</code></em>, <strong><code>contentLocation</code></strong>=<em><code>None</code></em>, <strong><code>locationCreated</code></strong>=<em><code>None</code></em>, <strong><code>video</code></strong>=<em><code>None</code></em>, <strong><code>writtenBy</code></strong>=<em><code>None</code></em>, <strong><code>file</code></strong>=<em><code>None</code></em>, <strong><code>recordedAt</code></strong>=<em><code>None</code></em>, <strong><code>review</code></strong>=<em><code>None</code></em>, <strong><code>span</code></strong>=<em><code>None</code></em>, <strong><code>itemSpan</code></strong>=<em><code>None</code></em>, <strong><code>note</code></strong>=<em><code>None</code></em>) :: <a href="/integrators/indexers.NoteListIndexer.NoteList.html#INoteList"><code>INoteList</code></a></p>
</blockquote>
<p>A <ul> </ul> list extracted from a note.</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">ULNoteList</span><span class="o">.</span><span class="n">from_data</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s2">&quot;Awesome title&quot;</span><span class="p">,</span> <span class="n">content</span><span class="o">=</span><span class="s2">&quot;Awesome content&quot;</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre># Awesome title
</pre>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Span">Span<a class="anchor-link" href="#Span"> </a></h2><p>We use spans to specify a range within a piece of text. If we for instance have a piece of text "Memri solves all your problems" and a span with startIdx=6 and endIdx=16, it points to "solves all".</p>
</div>
</div>
</div>
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h2 id="ISpan" class="doc_header"><code>class</code> <code>ISpan</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/notelist.py#L77" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>ISpan</code>(<strong><code>dateAccessed</code></strong>=<em><code>None</code></em>, <strong><code>dateCreated</code></strong>=<em><code>None</code></em>, <strong><code>dateModified</code></strong>=<em><code>None</code></em>, <strong><code>deleted</code></strong>=<em><code>None</code></em>, <strong><code>externalId</code></strong>=<em><code>None</code></em>, <strong><code>itemDescription</code></strong>=<em><code>None</code></em>, <strong><code>starred</code></strong>=<em><code>None</code></em>, <strong><code>version</code></strong>=<em><code>None</code></em>, <strong><code>uid</code></strong>=<em><code>None</code></em>, <strong><code>importJson</code></strong>=<em><code>None</code></em>, <strong><code>startIdx</code></strong>=<em><code>None</code></em>, <strong><code>endIdx</code></strong>=<em><code>None</code></em>, <strong><code>changelog</code></strong>=<em><code>None</code></em>, <strong><code>label</code></strong>=<em><code>None</code></em>, <strong><code>genericAttribute</code></strong>=<em><code>None</code></em>, <strong><code>measure</code></strong>=<em><code>None</code></em>, <strong><code>sharedWith</code></strong>=<em><code>None</code></em>) :: <code>Span</code></p>
</blockquote>
<p>A span of an element in a piece of text</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="get_span" class="doc_header"><code>get_span</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/notelist.py#L87" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>get_span</code>(<strong><code>note</code></strong>, <strong><code>elem</code></strong>, <strong><code>parsed</code></strong>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
</div>
---
title: HTMLListParser
keywords: fastai
sidebar: home_sidebar
nb_path: "nbs/indexers.NoteListIndexer.Parser.ipynb"
---
<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: nbs/indexers.NoteListIndexer.Parser.ipynb
# command to build the docs after a change: nbdev_build_docs
-->
<div class="container" id="notebook-container">
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h2 id="HTMLListParser" class="doc_header"><code>class</code> <code>HTMLListParser</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/parser.py#L15" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>HTMLListParser</code>()</p>
</blockquote>
<p>Extracts lists from HTML data, generated by an HTML text editor like evernote</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="HTMLListParser.get_lists" class="doc_header"><code>HTMLListParser.get_lists</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/parser.py#L27" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>HTMLListParser.get_lists</code>(<strong><code>note</code></strong>)</p>
</blockquote>
<p>Extracts lists from a note</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="HTMLListParser.get_unformatted_lists" class="doc_header"><code>HTMLListParser.get_unformatted_lists</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/parser.py#L67" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>HTMLListParser.get_unformatted_lists</code>(<strong><code>note</code></strong>, <strong><code>txt</code></strong>, <strong><code>parsed</code></strong>)</p>
</blockquote>
<p>retrieve lists without <ul></ul> tags. We have two options:
1) multiline lists prefixed with a title keyword (e.g. "Buy:" "Read:")
2) single element single line lists</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="HTMLListParser.get_single_line_list" class="doc_header"><code>HTMLListParser.get_single_line_list</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/parser.py#L49" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>HTMLListParser.get_single_line_list</code>(<strong><code>elem</code></strong>)</p>
</blockquote>
<p>Get single list lists. An example could be: '<strong>read</strong>: great book title'</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="Running-the-indexer">Running the indexer<a class="anchor-link" href="#Running-the-indexer"> </a></h1>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Lets see how this works for an example note. We start with a note that was imported from evernote as example and show its content.</p>
</div>
</div>
</div>
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">evernote_file</span> <span class="o">=</span> <span class="n">PYI_TESTDATA</span> <span class="o">/</span> <span class="s2">&quot;notes&quot;</span> <span class="o">/</span> <span class="s2">&quot;evernote&quot;</span> <span class="o">/</span> <span class="s2">&quot;evernote-test-note-1.html&quot;</span>
<span class="n">txt</span> <span class="o">=</span> <span class="n">read_file</span><span class="p">(</span><span class="n">evernote_file</span><span class="p">)</span>
<span class="n">note</span> <span class="o">=</span> <span class="n">INote</span><span class="o">.</span><span class="n">from_data</span><span class="p">(</span><span class="n">content</span><span class="o">=</span><span class="n">txt</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">note</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_subarea output_stream output_stdout output_text">
<pre>INote (#None) &lt;div&gt;
&lt;div&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;/div&gt;
&lt;div&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Buy groceries&lt;/li&gt;
&lt;li&gt;Call john&lt;br clear=&#34;none&#34; /&gt;&lt;/li&gt;
&lt;li&gt;Do the taxes&lt;/li&gt;
&lt;li&gt;Take out the trash&lt;/li&gt;
&lt;li&gt;Reply to carls mail&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Buy groceries&lt;/li&gt;
&lt;li&gt;Call john&lt;ul&gt;
&lt;li&gt;He really needs to pick up&lt;/li&gt;
&lt;li&gt;Because I need to speak to him&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Do the taxes&lt;/li&gt;
&lt;li&gt;Take out the trash&lt;/li&gt;
&lt;li&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;/li&gt;
&lt;li&gt;Reply to carls mail&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;strong&gt;Buy&lt;/strong&gt;: Toothpaste&lt;br clear=&#34;none&#34; /&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;em&gt;Read&lt;/em&gt;: The age of surveillance capitalism&lt;br clear=&#34;none&#34; /&gt;&lt;br clear=&#34;none&#34; /&gt;Watch: Parasite&lt;br clear=&#34;none&#34; /&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;u&gt;Do&lt;/u&gt;: The dishes&lt;br clear=&#34;none&#34; /&gt;&lt;br clear=&#34;none&#34; /&gt;Read&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Twenty one lessons for the 21st century&lt;/li&gt;
&lt;li&gt;Dreams from my Father&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;strong&gt;Read&lt;/strong&gt;&lt;br clear=&#34;none&#34; /&gt;The Great Gatsby&lt;br clear=&#34;none&#34; /&gt;Alice&#39;s Adventures in Wonderland&lt;br clear=&#34;none&#34; /&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;strong&gt;Buy&lt;/strong&gt;&lt;br clear=&#34;none&#34; /&gt;groceries&lt;br clear=&#34;none&#34; /&gt;Shoes&lt;br clear=&#34;none&#34; /&gt;&lt;br clear=&#34;none&#34; /&gt;Read&lt;br clear=&#34;none&#34; /&gt;The Great Gatsby&lt;br
clear=&#34;none&#34; /&gt;The odyssey&lt;br clear=&#34;none&#34; /&gt;&lt;br clear=&#34;none&#34; /&gt;&lt;/div&gt;
&lt;/div&gt;
</pre>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Which corresponds to this when rendered</p>
</div>
</div>
</div>
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">display</span><span class="p">(</span><span class="n">HTML</span><span class="p">(</span><span class="n">note</span><span class="o">.</span><span class="n">content</span><span class="p">))</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_html rendered_html output_subarea ">
<div>
<div><br clear="none" /></div>
<div><br clear="none" /></div>
<ul>
<li>Buy groceries</li>
<li>Call john<br clear="none" /></li>
<li>Do the taxes</li>
<li>Take out the trash</li>
<li>Reply to carls mail</li>
</ul>
<div><br clear="none" /></div>
<ul>
<li>Buy groceries</li>
<li>Call john<ul>
<li>He really needs to pick up</li>
<li>Because I need to speak to him</li>
</ul>
</li>
<li>Do the taxes</li>
<li>Take out the trash</li>
<li><br clear="none" /></li>
<li>Reply to carls mail</li>
</ul>
<div><br clear="none" /><strong>Buy</strong>: Toothpaste<br clear="none" /><br clear="none" /><em>Read</em>: The age of surveillance capitalism<br clear="none" /><br clear="none" />Watch: Parasite<br clear="none" /><br clear="none" /><u>Do</u>: The dishes<br clear="none" /><br clear="none" />Read</div>
<ul>
<li>Twenty one lessons for the 21st century</li>
<li>Dreams from my Father</li>
</ul>
<div><br clear="none" /><strong>Read</strong><br clear="none" />The Great Gatsby<br clear="none" />Alice's Adventures in Wonderland<br clear="none" /><br clear="none" /><strong>Buy</strong><br clear="none" />groceries<br clear="none" />Shoes<br clear="none" /><br clear="none" />Read<br clear="none" />The Great Gatsby<br
clear="none" />The odyssey<br clear="none" /><br clear="none" /></div>
</div>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>We can parse these using the <a href="/integrators/indexers.NoteListIndexer.Parser.html#HTMLListParser"><code>HTMLListParser</code></a></p>
</div>
</div>
</div>
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">parser</span> <span class="o">=</span> <span class="n">HTMLListParser</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">lists</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">get_lists</span><span class="p">(</span><span class="n">note</span><span class="p">)</span>
<span class="n">lists</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_text output_subarea output_execute_result">
<pre>[ULNoteList # Untitled
Buy groceries
Call john
Do the taxes
Take out the trash
Reply to carls mail
,
ULNoteList # Untitled
Buy groceries
Do the taxes
Take out the trash
Reply to carls mail
,
ULNoteList # Untitled
Twenty one lessons for the 21st century
Dreams from my Father
,
(INoteList) # Buy:
Toothpaste
,
(INoteList) # Read:
The age of surveillance capitalism
,
(INoteList) # Watch:
Parasite
,
(INoteList) # Do:
The dishes
,
(INoteList) # Read
The Great GatsbyAlice&#39;s Adventures in Wonderland
,
(INoteList) # Buy
groceriesShoes
,
(INoteList) # Read
The Great GatsbyThe odyssey
]</pre>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
</div>
---
title: NoteListIndexer
keywords: fastai
sidebar: home_sidebar
nb_path: "nbs/indexers.NoteListIndexer.ipynb"
---
<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: nbs/indexers.NoteListIndexer.ipynb
# command to build the docs after a change: nbdev_build_docs
-->
<div class="container" id="notebook-container">
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h2 id="NotesListIndexer" class="doc_header"><code>class</code> <code>NotesListIndexer</code><a href="" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>NotesListIndexer</code>(<strong>*<code>args</code></strong>, <strong>**<code>kwargs</code></strong>) :: <code>Indexer</code></p>
</blockquote>
<p>Extracts lists from notes and categorizes them.</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h2 id="ListTypePredictor" class="doc_header"><code>class</code> <code>ListTypePredictor</code><a href="" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>ListTypePredictor</code>()</p>
</blockquote>
<p>Predicts one of <a href="/integrators/indexers.NoteListIndexer.NoteList.html#LIST_CLASSES"><code>LIST_CLASSES</code></a> for a list in a note.</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">evernote_file</span> <span class="o">=</span> <span class="n">PYI_TESTDATA</span> <span class="o">/</span> <span class="s2">&quot;notes&quot;</span> <span class="o">/</span> <span class="s2">&quot;evernote&quot;</span> <span class="o">/</span> <span class="s2">&quot;evernote-test-note-1.html&quot;</span>
<span class="n">txt</span> <span class="o">=</span> <span class="n">read_file</span><span class="p">(</span><span class="n">evernote_file</span><span class="p">)</span>
<span class="n">note</span> <span class="o">=</span> <span class="n">INote</span><span class="o">.</span><span class="n">from_data</span><span class="p">(</span><span class="n">content</span><span class="o">=</span><span class="n">txt</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
{% endraw %}
</div>
---
title: Util
keywords: fastai
sidebar: home_sidebar
nb_path: "nbs/indexers.NoteListIndexer.util.ipynb"
---
<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: nbs/indexers.NoteListIndexer.util.ipynb
# command to build the docs after a change: nbdev_build_docs
-->
<div class="container" id="notebook-container">
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="get_toplevel_elements" class="doc_header"><code>get_toplevel_elements</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L13" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>get_toplevel_elements</code>(<strong><code>str_</code></strong>, <strong><code>element</code></strong>, <strong><code>parsed</code></strong>=<em><code>None</code></em>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="remove_html" class="doc_header"><code>remove_html</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L26" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>remove_html</code>(<strong><code>str_</code></strong>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="remove_prefix_chars" class="doc_header"><code>remove_prefix_chars</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L29" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>remove_prefix_chars</code>(<strong><code>s</code></strong>, <strong><code>chars</code></strong>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="is_newline" class="doc_header"><code>is_newline</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L33" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>is_newline</code>(<strong><code>str_</code></strong>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="is_newline_div" class="doc_header"><code>is_newline_div</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L39" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>is_newline_div</code>(<strong><code>div</code></strong>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="div_is_unstructured_list_title" class="doc_header"><code>div_is_unstructured_list_title</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L45" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>div_is_unstructured_list_title</code>(<strong><code>div</code></strong>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="find_till_double_br" class="doc_header"><code>find_till_double_br</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L61" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>find_till_double_br</code>(<strong><code>divs</code></strong>)</p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="get_children" class="doc_header"><code>get_children</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L72" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>get_children</code>(<strong><code>elem</code></strong>)</p>
</blockquote>
<p>Fetches children of an element, put combines children when they are style element like <strong>example</strong></p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="contains" class="doc_header"><code>contains</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/indexers/notelist/util.py#L86" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>contains</code>(<strong><code>str_</code></strong>, <strong><code>pat</code></strong>)</p>
</blockquote>
<p>case insensitive match</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
</div>
This diff is collapsed.
......@@ -47,8 +47,34 @@ nb_path: "nbs/itembase.ipynb"
<div class="output_markdown rendered_html output_subarea ">
<h2 id="Edge" class="doc_header"><code>class</code> <code>Edge</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L60" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>Edge</code>(<strong><code>source</code></strong>, <strong><code>target</code></strong>, <strong><code>_type</code></strong>, <strong><code>label</code></strong>=<em><code>None</code></em>, <strong><code>sequence</code></strong>=<em><code>None</code></em>, <strong><code>created</code></strong>=<em><code>False</code></em>, <strong><code>reverse</code></strong>=<em><code>True</code></em>)</p>
<h2 id="Edge" class="doc_header"><code>class</code> <code>Edge</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L58" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>Edge</code>(<strong><code>source</code></strong>, <strong><code>target</code></strong>, <strong><code>_type</code></strong>, <strong><code>label</code></strong>=<em><code>None</code></em>, <strong><code>sequence</code></strong>=<em><code>None</code></em>, <strong><code>created</code></strong>=<em><code>False</code></em>, <strong><code>reverse</code></strong>=<em><code>True</code></em>)</p>
</blockquote>
<p>Makes a link between two <a href="/integrators/itembase.html#ItemBase"><code>ItemBase</code></a> Items</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="Edge.traverse" class="doc_header"><code>Edge.traverse</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L92" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>Edge.traverse</code>(<strong><code>start</code></strong>)</p>
</blockquote>
<p>traverse an edge starting from the source to the target or vice versa.</p>
</div>
......@@ -80,6 +106,111 @@ nb_path: "nbs/itembase.ipynb"
<div class="output_markdown rendered_html output_subarea ">
<h2 id="ItemBase" class="doc_header"><code>class</code> <code>ItemBase</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L103" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>ItemBase</code>(<strong><code>uid</code></strong>=<em><code>None</code></em>)</p>
</blockquote>
<p>Provides a base class for all items. All items in the schema inherit from this class, and it provides some
basic functionality for consistency and to enable easier usage.</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="ItemBase.add_edge" class="doc_header"><code>ItemBase.add_edge</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L132" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>ItemBase.add_edge</code>(<strong><code>name</code></strong>, <strong><code>val</code></strong>)</p>
</blockquote>
<p>Creates an edge of type name and makes it point to val</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="ItemBase.is_expanded" class="doc_header"><code>ItemBase.is_expanded</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L141" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>ItemBase.is_expanded</code>()</p>
</blockquote>
<p>returns whether the node is expanded. An expanded node retrieved nodes that are <em>directly</em> connected to it
from the pod, and stored their values via edges in the object.</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="ItemBase.expand" class="doc_header"><code>ItemBase.expand</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L180" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>ItemBase.expand</code>(<strong><code>api</code></strong>)</p>
</blockquote>
<p>Expands a node (retrieves all directly connected nodes ands adds to object).</p>
</div>
</div>
</div>
</div>
</div>
{% endraw %}
{% raw %}
<div class="cell border-box-sizing code_cell rendered">
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="output_markdown rendered_html output_subarea ">
<h4 id="ItemBase.inherit_funcs" class="doc_header"><code>ItemBase.inherit_funcs</code><a href="https://gitlab.memri.io/memri/integrators/tree/master/integrators/data/itembase.py#L216" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>ItemBase.inherit_funcs</code>(<strong><code>other</code></strong>)</p>
</blockquote>
<p>This function can be used to inherit new functionalities from a subclass. This is a patch to solve the fact
that python does provide extensions of classes that are defined in a different file that are dynamic enough for
our use case.</p>
</div>
......
{
"Getting Started": {
"Overview": "/",
"ItemBase": "itembase.html",
"Pod Client": "pod.client.html"
"Pod Client": "pod.client.html",
"ItemBase": "itembase.html"
},
"Importers": {
"Overview": "indexers.indexer.html"
},
"Indexers": {
"Overview": "indexers.indexer.html"
"Overview": "indexers.indexer.html",
"GeoIndexer": "indexers.GeoIndexer.html",
"": {
"NoteListIndexer": {
"Parser": "indexers.NoteListIndexer.Parser.html",
"Data": "indexers.NoteListIndexer.NoteList.html"
}
}
},
"Downloaders": {
"Overview": "indexers.indexer.html"
}
}
\ No newline at end of file
}
......@@ -6,21 +6,55 @@ index = {"read_file": "basic.ipynb",
"read_json": "basic.ipynb",
"write_json": "basic.ipynb",
"Path.ls": "basic.ipynb",
"PYI_HOME": "basic.ipynb",
"PYI_TESTDATA": "basic.ipynb",
"GeoIndexer": "indexers.GeoIndexer.ipynb",
"LOCATION_EDGE": "indexers.GeoIndexer.ipynb",
"LIST_CLASSES": "indexers.NoteListIndexer.NoteList.ipynb",
"INote": "indexers.NoteListIndexer.NoteList.ipynb",
"INoteList": "indexers.NoteListIndexer.NoteList.ipynb",
"ULNoteList": "indexers.NoteListIndexer.NoteList.ipynb",
"ISpan": "indexers.NoteListIndexer.NoteList.ipynb",
"get_span": "indexers.NoteListIndexer.NoteList.ipynb",
"HTMLListParser": "indexers.NoteListIndexer.Parser.ipynb",
"NotesListIndexer": "indexers.NoteListIndexer.ipynb",
"ListTypePredictor": "indexers.NoteListIndexer.ipynb",
"get_toplevel_elements": "indexers.NoteListIndexer.util.ipynb",
"remove_html": "indexers.NoteListIndexer.util.ipynb",
"remove_prefix_chars": "indexers.NoteListIndexer.util.ipynb",
"is_newline": "indexers.NoteListIndexer.util.ipynb",
"is_newline_div": "indexers.NoteListIndexer.util.ipynb",
"div_is_unstructured_list_title": "indexers.NoteListIndexer.util.ipynb",
"find_till_double_br": "indexers.NoteListIndexer.util.ipynb",
"get_children": "indexers.NoteListIndexer.util.ipynb",
"contains": "indexers.NoteListIndexer.util.ipynb",
"HTML_LINEBREAK_REGEX": "indexers.NoteListIndexer.util.ipynb",
"IndexerBase": "indexers.indexer.ipynb",
"IndexerData": "indexers.indexer.ipynb",
"get_indexer_run_data": "indexers.indexer.ipynb",
"test_registration": "indexers.indexer.ipynb",
"POD_FULL_ADDRESS_ENV": "indexers.indexer.ipynb",
"RUN_UID_ENV": "indexers.indexer.ipynb",
"POD_SERVICE_PAYLOAD_ENV": "indexers.indexer.ipynb",
"DATABASE_KEY_ENV": "indexers.indexer.ipynb",
"OWNER_KEY_ENV": "indexers.indexer.ipynb",
"run_indexer": "indexers.indexer.ipynb",
"run_integrator_from_run_uid": "indexers.indexer.ipynb",
"run_integrator": "indexers.indexer.ipynb",
"ALL_EDGES": "itembase.ipynb",
"UID_GEN": "itembase.ipynb",
"DB": "itembase.ipynb",
"parse_base_item_json": "itembase.ipynb",
"Edge": "itembase.ipynb",
"ItemBase": "itembase.ipynb",
"API_URL": "pod.client.ipynb",
"DEFAULT_POD_ADDRESS": "pod.client.ipynb",
"PodClient": "pod.client.ipynb"}
modules = ["data/basic.py",
"indexers/geo/geo_indexer.py",
"indexers/notelist/notelist.py",
"indexers/notelist/parser.py",
"indexers/notelist/notelist_indexer.py",
"indexers/notelist/util.py",
"indexers/indexer.py",
"data/itembase.py",
"pod/client.py"]
......
# AUTOGENERATED! DO NOT EDIT! File to edit: nbs/basic.ipynb (unless otherwise specified).
__all__ = ['read_file', 'read_json', 'write_json']
__all__ = ['read_file', 'read_json', 'write_json', 'PYI_HOME', 'PYI_TESTDATA']
# Cell
from ..imports import *
# Cell
Path.ls = lambda x: list(x.iterdir())
PYI_HOME = Path.cwd().parent
PYI_TESTDATA = PYI_HOME / "test" / "data"
def read_file(path):
return open(path, "r").read()
......
# AUTOGENERATED! DO NOT EDIT! File to edit: nbs/itembase.ipynb (unless otherwise specified).
__all__ = ['ALL_EDGES', 'UID_GEN', 'DB', 'parse_base_item_json', 'Edge', 'ItemBase']
__all__ = ['ALL_EDGES', 'DB', 'parse_base_item_json', 'Edge', 'ItemBase']
# Cell
# hide
......@@ -8,7 +8,6 @@ from ..imports import *
ALL_EDGES = "allEdges"
SOURCE, TARGET, TYPE, EDGE_TYPE, LABEL, SEQUENCE = "_source", "_target", "_type", "_type", "label", "sequence"
UID_GEN= 10000 + (random.randint(0, 1e3) * 1000)
# Cell
# hide
......@@ -56,8 +55,8 @@ def parse_base_item_json(json):
return uid, dateAccessed, dateCreated, dateModified, deleted, externalId, itemDescription, starred, version, None, None
# Cell
class Edge():
"""Makes a link between two `ItemBase` Items"""
def __init__(self, source, target, _type, label=None, sequence=None, created=False, reverse=True):
self.source = source
self.target = target
......@@ -91,6 +90,7 @@ class Edge():
and self._type == other._type
def traverse(self, start):
"""traverse an edge starting from the source to the target or vice versa."""
if start == self.source:
return self.target
elif start == self.target:
......@@ -101,6 +101,8 @@ class Edge():
# Cell
class ItemBase():
"""Provides a base class for all items. All items in the schema inherit from this class, and it provides some
basic functionality for consistency and to enable easier usage."""
global_db = DB()
def __init__(self, uid=None):
......@@ -128,6 +130,7 @@ class ItemBase():
return val
def add_edge(self, name, val):
"""Creates an edge of type name and makes it point to val"""
val = Edge(self, val, name, created=True)
if name not in self.__dict__:
raise NameError(f"object {self} does not have edge with name {name}")
......@@ -136,16 +139,9 @@ class ItemBase():
self.__setattr__(name, res)
def is_expanded(self):
"""returns whether the node is expanded. An expanded node retrieved nodes that are *directly* connected to it
from the pod, and stored their values via edges in the object."""
return len(self.get_all_edges()) > 0
# if "_expanded" in self.__dict__:
# return self._expanded
# else:
# return False
# def __setattr__(self, name, val):
# if isinstance(val, ItemBase) or self.attr_is_edge(val):
# raise ValueError("Don't set edges directly, use node.add_edge instead")
# super().__setattr__(name, val)
def get_edges(self, name):
return object.__getattribute__(self, name)
......@@ -182,6 +178,7 @@ class ItemBase():
return len(res) == 1
def expand(self, api):
"""Expands a node (retrieves all directly connected nodes ands adds to object)."""
self._expanded = True
res = api.get(self.uid, expanded=True)
for edge_name in res.get_all_edge_names():
......@@ -214,4 +211,11 @@ class ItemBase():
for v in edges.values():
v.source = res
return res
\ No newline at end of file
return res
def inherit_funcs(self, other):
"""This function can be used to inherit new functionalities from a subclass. This is a patch to solve the fact
that python does provide extensions of classes that are defined in a different file that are dynamic enough for
our use case."""
assert issubclass(other, self.__class__)
self.__class__ = other
\ No newline at end of file
......@@ -3,6 +3,7 @@ import re
import requests
import os
import random
from pathlib import Path
from collections import Counter
from pathlib import *
......
......@@ -6,7 +6,7 @@ __all__ = ['GeoIndexer', 'LOCATION_EDGE']
from ...data.schema import *
from ...data.itembase import *
from ...pod.client import PodClient
from ..indexer import IndexerBase, get_indexer_run_data
from ..indexer import IndexerBase, get_indexer_run_data, IndexerData, test_registration
from .. import *
import pycountry, requests
import reverse_geocoder as rg
......@@ -26,7 +26,6 @@ class GeoIndexer(IndexerBase):
country_name = pycountry.countries.get(alpha_2=geo_result["cc"]).name
return city_name, country_name
def get_country_by_name(self, api, name):
data = api.search_by_fields({"_type": "Country", "name": name})
if data == None or data == []: return None
......@@ -49,12 +48,15 @@ class GeoIndexer(IndexerBase):
return latlong, False
def index(self, api, indexer_run):
def get_data(self, api, indexer_run):
items_expanded = [d.expand(api) for d in get_indexer_run_data(api, indexer_run)]
items_with_location = [x for x in items_expanded
if any([loc.latitude is not None for loc in x.location])]
items_with_location = [x for x in items_expanded if any([loc.latitude is not None for loc in x.location])]
print(f"{len(items_with_location)} items found to index")
return IndexerData(items_with_location=items_with_location)
def index(self, data, indexer_run, api=None):
items_with_location = data.items_with_location
print(f"indexing {len(items_with_location)} items")
new_nodes = []
for n, item in enumerate(items_with_location):
......@@ -68,7 +70,7 @@ class GeoIndexer(IndexerBase):
# add information to indexer objects
item.city = city_name
# item.add_property("city", city_name)
country = self.get_country_by_name(api, country_name)
country = self.get_country_by_name(api, country_name) if api is not None else None
if country is None:
country = Country(name=country_name)
......@@ -82,7 +84,7 @@ class GeoIndexer(IndexerBase):
progress = int(n+1 / len(items_with_location) * 100)
indexer_run.progress=progress
indexer_run.update(api, edges=False)
if api is not None: indexer_run.update(api, edges=False)
# indexer_run.set_progress(api, progress)
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment