Onegov Search API

Integration

class onegov.search.integration.TolerantTransport(*args, **kwargs)[source]

A transport class that is less eager to rejoin connections when there’s a failure. Additionally logs all Elasticsearch transport errors in one location.

property skip_request

Returns True if the request should be skipped.

property seconds_remaining

Returns the seconds remaining until the next try or 0.

For each failure we wait an additional 10s (10s, then 20s, 30s, etc), up to a maximum of 300s (5 minutes).

perform_request(*args, **kwargs)[source]

Perform the actual request. Retrieve a connection from the connection pool, pass all the information to it’s perform_request method and return the data.

If an exception was raised, mark the connection as failed and retry (up to max_retries times).

If the operation was succesful and the connection used was previously marked as dead, mark it as live, resetting it’s failure count.

Parameters
  • method – HTTP method to use

  • url – absolute url (without host) to target

  • headers – dictionary of headers, will be handed over to the underlying Connection class

  • params – dictionary of query parameters, will be handed over to the underlying Connection class for serialization

  • body – body of the request, will be serializes using serializer and passed to the connection

class onegov.search.integration.ElasticsearchApp[source]

Provides elasticsearch integration for onegov.core.framework.Framework based applications.

The application must be connected to a database.

Usage:

from onegov.core import Framework

class MyApp(Framework, ESIntegration):
    pass

Configures the elasticsearch client, leaving it as a property on the class:

app.es_client

The following configuration options are accepted:

Enable_elasticsearch

If True, elasticsearch is enabled (defaults to True).

Elasticsearch_hosts

A list of elasticsearch clusters, including username, password, protocol and port.

For example: https://user:secret@localhost:443

By default the client connects to the localhost on port 9200 (the default), and on port 19200 (the default of boxen).

At least one host in the list of servers must be up at startup.

Elasticsearch_may_queue_size

The maximum queue size reserved for documents to be indexed. This queue is filling up if the elasticsearch cluster cannot be reached.

Once the queue is full, warnings are emitted.

Defaults to 10’000

Elasticsearch_verify_certs

If true, the elasticsearch client verifies the certificates of the ssl connection. Defaults to true. Do not disable, unless you are in testing!

Elasticsearch_languages
The languages supported by onegov.search. Defaults to:
  • en

  • de

  • fr

Returns a search scoped to the current application, with the given languages, types and private documents excluded by default.

es_search_by_request(request, types='*', explain=False, limit_to_request_language=False)[source]

Takes the current CoreRequest and returns an elastic search scoped to the current application, the requests language and it’s access rights.

es_suggestions(query, languages='*', types='*', include_private=False)[source]

Returns suggestions for the given query.

es_suggestions_by_request(request, query, types='*', limit_to_request_language=False)[source]

Returns suggestions for the given query, scoped to the language and the login status of the given requst.

Returns True if the given request is allowed to access private search results. By default every logged in user has access to those.

This method may be overwritten if this is not desired.

es_perform_reindex()[source]

Reindexes all content.

This is a heavy operation and should be run with consideration.

Indexer

class onegov.search.indexer.IndexParts(hostname, schema, language, type_name, version)
property hostname

Alias for field number 0

property language

Alias for field number 2

property schema

Alias for field number 1

property type_name

Alias for field number 3

property version

Alias for field number 4

onegov.search.indexer.parse_index_name(index_name)[source]

Takes the given index name and returns the hostname, schema, language and type_name in a dictionary.

  • If the index_name doesn’t match the pattern, all values are None.

  • If the index_name has no version, the version is None.

class onegov.search.indexer.Indexer(mappings, queue, es_client, hostname=None)[source]

Takes actions from a queue and executes them on the elasticsearch cluster. Depends on IndexManager for index management and expects to have the same TypeRegistry as ORMEventTranslator.

The idea is that this class does the indexing/deindexing, the index manager sets up the indices and the orm event translator listens for changes in the ORM.

A queue is used so the indexer can be run in a separate thread.

process(block=False, timeout=None)[source]

Processes the queue until it is empty or until there’s an error.

If there’s an error, the next call to this function will try to execute the failed task again. This is mainly meant for elasticsearch outages.

Block

If True, the process waits for the queue to be available. Useful if you run this in a separate thread.

Timeout

How long the blocking call should block. Has no effect if block is False.

Returns

The number of successfully processed items

bulk_process()[source]

Processes the queue in bulk. This offers better performance but it is less safe at the moment and should only be used as part of reindexing.

class onegov.search.indexer.IndexManager(hostname, es_client)[source]

Manages the creation/destruction of indices. The indices it creates have an internal name and an external alias. To facilitate that, versions are used.

query_indices()[source]

Queryies the elasticsearch cluster for indices belonging to this hostname.

query_aliases()[source]

Queryies the elasticsearch cluster for aliases belonging to this hostname.

ensure_index(schema, language, mapping, return_index='external')[source]

Takes the given database schema, language and type name and creates an internal index with a version number and an external alias without the version number.

Schema

The database schema this index is based on.

Language

The language in ISO 639-1 format.

Mapping

The TypeMapping mapping used in this index.

Return_index

The index name to return. Either ‘external’ or ‘internal’.

Returns

The (external/aliased) name of the created index.

remove_expired_indices(current_mappings)[source]

Removes all expired indices. An index is expired if it’s version number is no longer known in the current mappings.

Returns

The number of indices that were deleted.

get_managed_indices_wildcard(schema)[source]

Returns a wildcard index name for all indices managed.

get_external_index_names(schema, languages='*', types='*')[source]

Returns a comma separated string of external index names that match the given arguments. Useful to pass on to elasticsearch when targeting multiple indices.

get_external_index_name(schema, language, type_name)[source]

Generates the external index name from the given parameters.

get_internal_index_name(schema, language, type_name, version)[source]

Generates the internal index name from the given parameters.

class onegov.search.indexer.ORMLanguageDetector(supported_languages)[source]
class onegov.search.indexer.ORMEventTranslator(mappings, max_queue_size=0, languages=('de', 'fr', 'en'))[source]

Handles the onegov.core orm events, translates them into indexing actions and puts the result into a queue for the indexer to consume.

The queue may be limited. Once the limit is reached, new events are no longer processed and an error is logged.

Mixins

class onegov.search.mixins.Searchable[source]

Defines the interface required for an object to be searchable.

Note that es_id ``, ``es_properties and es_type_name must be class properties, not instance properties. So do this:

class X(Searchable):

    es_properties = {}
    es_type_name = 'x'

But do not do this:

class X(Searchable):

    @property
    def es_properties(self):
        return {}

    @property
    def es_type_name(self):
        return 'x'

The rest of the properties may be normal properties.

Polymorphic Identities

If SQLAlchemy’s Polymorphic Identities are used, each identity must have it’s own unqiue es_type_name. Though such models may share the es_properties from the base class, we don’t assume anything and store each polymorphic identity in its own index.

From the point of view of elasticsearch, each different polymorphic identity is a completely different model.

property es_language

Defines the language of the object. By default ‘auto’ is used, which triggers automatic language detection. Automatic language detection is reasonably accurate if provided with enough text. Short texts are not detected easily.

When ‘auto’ is used, expect some content to be misclassified. You should then search over all languages, not just the epxected one.

This property can be used to manually set the language.

property es_public

Returns True if the model is available to be found by the public. If false, only editors/admins will see this object in the search results.

property es_skip

Returns True if the indexing of this specific model instance should be skipped.

property es_suggestion

Returns suggest-as-you-type value of the document. The field used for this property should also be indexed, or the suggestion will lead to nowhere.

If a single string is returned, the completion input equals the completion output. (My Title -> My Title)

If an array of strings is returned, all values are possible inputs and the first value is the output. (My Title/Title My -> My Title)

property es_last_change

Returns the date the document was created/last modified.

property es_tags

Returns a list of tags associated with this content.

class onegov.search.mixins.ORMSearchable[source]

Extends the default Searchable class with sensible defaults for SQLAlchemy orm models.

property es_last_change

Returns the date the document was created/last modified.

class onegov.search.mixins.SearchableContent[source]

Adds search to all classes using the core’s content mixin: onegov.core.orm.mixins.content.ContentMixin

property es_public

Returns True if the model is available to be found by the public. If false, only editors/admins will see this object in the search results.

property es_tags

Returns a list of tags associated with this content.

DSL

class onegov.search.dsl.Search(*args, **kwargs)[source]

Extends elastichsearch_dsl’s search object with ORM integration.

Works exactly the same as the original, but the results it returns offer additional methods to query the SQLAlchemy models behind the results (if any).

class onegov.search.dsl.Response(search, response, doc_class=None)[source]

Extends the default response (list of results) with additional methods to query the SQLAlchemy models behind the results.

query(type)[source]

Returns an SQLAlchemy query for the given type. You must provide a type, because a query can’t consist of multiple unrelated tables.

If no results match the type, None is returned.

load()[source]

Loads all results by querying the SQLAlchemy session in the order they were returned by elasticsearch.

Note that the resulting lists may include None values, since we are might get elasticsearch results for which we do not have a model on the database (the data is then out of sync).

onegov.search.dsl.explanation_value(explanation, text)[source]

Gets the value from the explanation for descriptions starting with the given text.

class onegov.search.dsl.Hit(document)[source]

Extends a single result with additional methods to query the SQLAlchemy models behind the results.

query()[source]

Returns the SQLAlchemy query for this result.

load()[source]

Loads this result from the SQLAlchemy session.

Utils

onegov.search.utils.searchable_sqlalchemy_models(base)[source]

Searches through the given SQLAlchemy base and returns the classes of all SQLAlchemy models found which inherit from the onegov.search.mixins.Searchable interface.

onegov.search.utils.is_valid_index_name(name)[source]

Checks if the given name is a valid elasticsearch index name. Elasticsearch does it’s own checks, but we can do it earlier and we are a bit stricter.

onegov.search.utils.related_types(model)[source]

Gathers all related es type names from the given model. A type is counted as related a model is part of a polymorphic setup.

If no polymorphic identity is found, the result is simply a set with the model’s type itself.

class onegov.search.utils.LanguageDetector(supported_languages)[source]

Detects languages with the help of langdetect.

Unlike langdetect this detector may be limited to a subset of all supported languages, which may improve accuracy if the subset is known and saves some memory.