Home > Kolibri Documentation > 1. First Steps > Configure / Run

Configure / Run

After checking out the repo, you will find an example docker-compose.yml file in it. It contains an example setup of prometheus, grafana, kolibri-fleet-zio instances, dummy search service instances (response juggler) and kolibri-watch (the kolibri UI).

Why these?

Prometheus for pulling metrics from the kolibri service
Grafana for displaying the dashboard representing the service state
kolibri-fleet-zio for the actual service taking care of computations and state keeping, which needs to run on every node that is intended to take part in the distributed processing
response-juggler: simulating responses by a search system. Used to test request and parsing tasks defined in kolibri-fleet-zio
kolibri-watch: providing the UI to monitor available nodes, their resource consumption / utilization and controls to load job templates, create new ones, store job definitions and starting / stopping processing of the jobs

We will first look at the configurations that need setting before being able to start up the service, focussing on kolibri-fleet-zio:

  kolibri-zio-1:
    image: awagen/kolibri-fleet-zio:0.2.0
    cpu_count: 12
    mem_limit: 6144m
    mem_reservation: 4096m
    ports:
      - "8001:8001"
    user: "1000:1000"
    environment:
      JVM_OPTS: >
        -XX:+UseParallelGC
        -XX:ParallelGCThreads=2
        -Xms4096m
        -Xmx4096m        
      PROFILE: prod
      NODE_HASH: "abc1"
      HTTP_SERVER_PORT: 8001
      RUNNING_TASK_PER_JOB_MAX_COUNT: 20
      RUNNING_TASK_PER_JOB_DEFAULT_COUNT: 3
      MAX_NR_BATCH_RETRIES: 2
      PERSISTENCE_MODE: 'CLASS'
      PERSISTENCE_MODULE_CLASS: 'de.awagen.kolibri.fleet.zio.config.di.modules.persistence.LocalPersistenceModule'
      AWS_PROFILE: 'developer'
      AWS_S3_BUCKET: 'kolibri-dev'
      AWS_S3_PATH: 'kolibri_fleet_zio_test'
      AWS_S3_REGION: 'EU_CENTRAL_1'
      # the file path in the job definitions are to be given relative to the path (or bucket path) defined
      # for the respective configuration of persistence
      LOCAL_STORAGE_WRITE_BASE_PATH: '/app/test-files'
      LOCAL_STORAGE_READ_BASE_PATH: '/app/test-files'
      # JOB_TEMPLATES_PATH must be relative to the base path or bucket path, depending on the persistence selected
      JOB_TEMPLATES_PATH: 'templates/jobs'
      OUTPUT_RESULTS_PATH: 'test-results'
      JUDGEMENT_FILE_SOURCE_TYPE: 'CSV'
      # if judgement file format set to 'JSON_LINES', need to set 'DOUBLE' in case judgements are numeric in the json,
      # if the numeric value is represented as string, use 'STRING'. This purely refers to how the json value is 
      # interpreted, later this will be cast to double either way
      JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST: 'STRING'
      ALLOWED_TIME_PER_ELEMENT_IN_MILLIS: 4000
      ALLOWED_TIME_PER_BATCH_IN_SECONDS: 3600
      ALLOWED_TIME_PER_JOB_IN_SECONDS: 36000
      MAX_RESOURCE_DIRECTIVES_LOAD_TIME_IN_MINUTES: 10
      MAX_PARALLEL_ITEMS_PER_BATCH: 16
      CONNECTION_POOL_SIZE_MIN: 100
      CONNECTION_POOL_SIZE_MAX: 100
      CONNECTION_TTL_IN_SECONDS: 1200
      MAX_NR_JOBS_PROCESSING: 5
      MAX_NR_JOBS_CLAIMED: 5
      NETTY_HTTP_CLIENT_THREADS_MAX: 4
      BLOCKING_POOL_THREADS: 4
      NON_BLOCKING_POOL_THREADS: 4
    volumes:
      - ./tmp_data:/app/test-files
      - ${HOME}/.aws/credentials:/home/kolibri/.aws/credentials:ro

Configuration Options in Detail

General Setup Settings
PROFILE	The config file suffix for the config to be loaded on startup. Will try to find application-[PROFILE].conf in the resource folder.
NODE_HASH	The hash that identifies this specific node. If not set, will randomly set a hash. Note: nodes are identified by the node_hash, so it should be a unique identifier.
HTTP_SERVER_PORT	Port to reach the kolibri-fleet-zio API under.
ALLOWED_TIME_PER_ELEMENT_IN_MILLIS	Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed.
ALLOWED_TIME_PER_BATCH_IN_SECONDS	Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed.
ALLOWED_TIME_PER_JOB_IN_SECONDS	Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed.
MAX_RESOURCE_DIRECTIVES_LOAD_TIME_IN_MINUTES	Defines how much time loading of a global node-resource (such as judgement lists, parameters and the like) is allowed to take.
MAX_PARALLEL_ITEMS_PER_BATCH	Defines how many items are processed in parallel per batch at any given time.
MAX_NR_JOBS_PROCESSING	Defines the maximal number of batches that are allowed in progress per node at any given time.
MAX_NR_JOBS_CLAIMED	Defines the maximal number of batches that can be claimed for execution at any given time.

Storage configuration
PERSISTENCE_MODE	The persistence mode used. Can be: AWS (s3), GCP (gcs), LOCAL (local file system), RESOURCE (local resources), CLASS (if selected, need to define `PERSISTENCE_MODULE_CLASS` property, specifying fully qualified name to used persistence module class.
PERSISTENCE_MODULE_CLASS	If `PERSISTENCE_MODE` is set to `CLASS`, need to set here the fully qualified name to used persistence module class, such as `de.awagen.kolibri.fleet.zio.config.di.modules.persistence.LocalPersistenceModule` (which happens to refer to the same persistence module as just specifying PERSISTENCE_MODE as LOCAL).
AWS_PROFILE	If `PERSISTENCE_MODE` is `AWS` (or `CLASS` and the AWS module is referenced above), specify here the profile to use.
AWS_S3_BUCKET	If AWS storage is used, define here the bucket to store tha state / result data in.
AWS_S3_PATH	If AWS storage is used, define here the path within the above defined bucket to use as base path.
AWS_S3_REGION	If AWS storage is used, define the region here.
GCP_GS_BUCKET	If GCP storage is used, define here the bucket to store tha state / result data in.
GCP_GS_PATH	If GCP storage is used, define here the path within the above defined bucket to use as base path.
GCP_GS_PROJECT_ID	If GCP storage is used, define here the project id under which you created the bucket.
LOCAL_STORAGE_WRITE_BASE_PATH	If LOCAL storage is used, define the base path here under which to store the data.
LOCAL_STORAGE_READ_BASE_PATH	If LOCAL storage is used, define the base path here from which data is read (should usually be the same as the write base path).
JOB_TEMPLATES_PATH	Relative subpath (relative to the defined base paths) under which job templates are found / stored.
OUTPUT_RESULTS_PATH	Relative subpath (relative to the defined base paths) under which results are persisted.

Judgement file format configuration
JUDGEMENT_FILE_SOURCE_TYPE	Type of the utilized judgement file. Possible values: `CSV` (per line: query, product and judgement score each separated by the configured delimiter (see below)) or `JSON_LINES` (one line per query in format `{"query": "q2", "products": [{"productId": "aa", "score": 0.30}, {"productId": "bb", "score": 0.11}]}`)
JUDGEMENT_FILE_COLUMN_DELIMITER	Delimiter used if the `JUDGEMENT_FILE_SOURCE_TYPE` is set to `CSV`. Default value: `\u0000`
JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST	If `JUDGEMENT_FILE_SOURCE_TYPE` set to `JSON`, defines how to parse the `score` attribute from above format. Options: `STRING` (if the number is wrapped in string delimiters; this was initially only a workaround) or `DOUBLE` (if the `score` is a number in the used json).

Http Connection / Connection Pool Settings
CONNECTION_POOL_TYPE	Specifies whether to use a fixed (`FIXED`) or a dynamic (`DYNAMIC`) connection pool.
CONNECTION_POOL_SIZE_MIN	If pool type is dynamic, this gives the minimum number of connections. If it is fixed, gives the fixed number of connections.
CONNECTION_POOL_SIZE_MAX	If pool type is dynamic, gives the maximum number of connections. If pool type is fixed, this setting is not used.
CONNECTION_TTL_IN_SECONDS	If pool type is dynamic, gives the TTL of a connection. If pool type is fixed, this setting is not used.
CONNECTION_TIMEOUT_IN_SECONDS	In either pool type, this specifies the connection timeout.

Thread Pool Settings
NETTY_HTTP_CLIENT_THREADS_MAX	Specifies the maximal number of netty http client threads.
BLOCKING_POOL_THREADS	Defines number of threads assigned to the thread pool used for blocking computations.
NON_BLOCKING_POOL_THREADS	Defines number of threads assigned to the thread pool used for non-blocking computations.

Volume Mounts

Volumes	Some mounts needed to access data within docker container
./tmp_data:/app/test_files	mounting project root tmp_data folder to /app/test_files folder within container
[absolute-path-containing-your-aws-config-folder]/.aws/credentials:/home/kolibri/.aws/credentials:ro	read-only mount of folder on local machine containing aws credentials into standard location in container where its picked up automatically by aws lib

Run It

That section is gonna be short: docker-compose up within the project root.