Kolibri Documentation (Archive / Deprecated)

Kolibri - The Execution Engine that loves E-Commerce Search

Kolibri is the german word for hummingbird. I picked it as project name to reflect the general aim to do many smaller things fast. And this describes the batch processing logic still quite well, while the overall conception grew broader. Built in Scala, based on Akka with the many functionalities it provides, the areas of possible applications are diverse. The feature-set provided by Akka makes it very suitable for a range of tasks, whether it is cluster-orchestration, flexible and efficient execution definitions with akka-streams or simply (distributed/sharded) state keeping or mixtures thereof.

The focus currently is on batched, clustered multi-node execution with central supervision, with the intention to iteratively provide functionalities commonly needed in (e-commerce) search in a scalable (e.g multi-node) and efficient manner, such as

  • search result evaluations (judgement-list based)
  • efficient indexing
  • load-testing

The first use-case provided is batched grid evaluations of search result order based on judgement lists, varying the involved parameters (url parameters, request bodies, headers) and allowing partial result writing, grouping and aggregation of selected partial results and analysis based on single results / groups, such as analysing which queries show improvement or decline for specific settings.

A short description of the single libraries is given in the following.

Kolibri DataTypes

This library contains basic datatypes to simplify common tasks in batch processing and async state keeping.

Kolibri-DataTypes on Github

Kolibri-DataTypes on Maven Central

Kolibri Base

Kolibri Base provides a clusterable multi-node batch execution setup. Batch definitions are flexible and make use of Akka-Streams, allowing the definition of flexible execution flows. Results are aggregated per batch and on demand aggregated to an overall result.

Features include:

  • Cluster forming / node discovery
  • Definition of datasets
  • Mechanisms to split those sets into smaller batches
  • Distribution logic of single batches on the single nodes including state handling and collection of partial results
  • Definition of expectations per batch, including maximal allowed runtime per batch, per batch element, fraction of failed executions to consider batch as failed
  • Retry mechanism on batch failure

Use case job definitions include:

  • Search parameter grid evaluation with flexible tagging based on request (e.g by request parameter), result (e.g size of result set, other characteristics of the search response) or by MetricRow result. Tagging allows separation into distinct aggregations based on the concept a tag represents.

Kolibri-Base on Github

Kolibri-Base on Maven Central

Kolibri-Base on DockerHub

Kolibri Watch (UI)

Kolibri Watch provides a UI for the Kolibri project, allowing monitoring of job execution progress, definition of the executions and submission to the Kolibri backend for execution.

Kolibri-Watch on Github

Kolibri-Watch on DockerHub