--- title: Indexer keywords: fastai sidebar: home_sidebar nb_path: "nbs/indexers.indexer.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class IndexerBase[source]

IndexerBase(indexerClass=None, *args, **kwargs) :: Indexer

Provides a base class for all items. All items in the schema inherit from this class, and it provides some basic functionality for consistency and to enable easier usage.

{% endraw %} {% raw %}

class IndexerData[source]

IndexerData(**kwargs)

{% endraw %} {% raw %}

get_indexer_run_data[source]

get_indexer_run_data(client, indexer_run)

{% endraw %} {% raw %}

test_registration[source]

test_registration(integrator)

Check whether an integrator is registred. Registration is necessary to be able to load the right indexer when retrieving it from the database.

{% endraw %} {% raw %}
{% endraw %}

Running your own indexer

When we run an indexer we have four steps. 1) Get the indexer and indexer run based on the run uid. 2) run the indexer 3) populate the graph with the new information. To mock that, first we create a client and add some toy data.

{% raw %}
{% endraw %} {% raw %}

run_integrator[source]

run_integrator(environ=None, pod_full_address=None, integrator_run_uid=None, database_key=None, owner_key=None, verbose=False)

Runs an integrator, you can either provide the run settings as parameters to this function (for local testing) or via environment variables (this is how the pod communicates with integrators).

{% endraw %} {% raw %}
from pyintegrators.indexers.geo.geo_indexer import GeoIndexer

client = PodClient()

def create_toy_dataset(client):
    location = Location.from_data(latitude=-37.81, longitude=144.96)
    address = Address.from_data()
    indexer = Indexer.from_data(indexerClass="GeoIndexer", name="GeoIndexer")
    indexer_run = IndexerRun.from_data(progress=0, targetDataType="Address")
    
    for x in [location, address, indexer, indexer_run]: client.create(x)
    assert client.create_edge(Edge(indexer_run, indexer, "indexer"))
    assert client.create_edge(Edge(location, address, "location"))
    return indexer, indexer_run, location, address
{% endraw %}

Running an indexer by providing environment variables

{% raw %}

generate_test_env[source]

generate_test_env(client, indexer_run)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
indexer, indexer_run, location, address = create_toy_dataset(client)
{% endraw %} {% raw %}
run_integrator(environ=generate_test_env(client, indexer_run))
Reading run parameters from environment variables
1 items found to index
indexing 1 items
Loading formatted geocoded file...
updating IndexerRun (#4)
creating Country (#None)
updating Address (#2)
updating Country (#5)
updating Address (#2)
{% endraw %} {% raw %}
client.delete_all()
{% endraw %}

Run

Now we start with the setting we would normally have: some memri client makes a call to the pod to execute an indexer run. Lets start by getting the indexer and the indexer run.

{% raw %}
indexer, indexer_run, location, address = create_toy_dataset(client)
uid = indexer_run.uid; uid
9
{% endraw %} {% raw %}
indexer_run = client.get(uid)
indexer = indexer_run.indexer[0]
indexer
GeoIndexer (#8)
{% endraw %}

Next, we retrieve the data, which was specified in the client by the targetDataType.

{% raw %}
data = indexer.get_data(client, indexer_run)
data
1 items found to index
IndexerData 
{'items_with_location': [Address (#7)]}
{% endraw %} {% raw %}
output_items = indexer.index(data, indexer_run, client)
indexing 1 items
updating IndexerRun (#9)
{% endraw %} {% raw %}
indexer.populate(client, output_items)
creating Country (#None)
updating Address (#7)
{% endraw %} {% raw %}
client.delete_all()
{% endraw %}

Running the full Indexer pipeline

Running an indexer by providing parameters as variables

{% raw %}
indexer, indexer_run, location, address = create_toy_dataset(client)
run_integrator(pod_full_address=DEFAULT_POD_ADDRESS,
               integrator_run_uid=indexer_run.uid,
               database_key=client.database_key,
               owner_key=client.owner_key)

client.delete_all()
1 items found to index
indexing 1 items
updating IndexerRun (#14)
creating Country (#None)
updating Address (#12)
updating Country (#15)
updating Address (#12)
{% endraw %}

Registration

All indexers need to be registred before they can be ran. We can test our registration as follows

{% raw %}
test_registration(GeoIndexer)
{% endraw %}

{% include important.html content='Note that before running an indexer, it needs to be registered. We can do this by importing the file in integrators.indexer_registry.py.' %}