Configure / Run
After checking out the repo, you will find an example docker-compose.yml
file in it.
It contains an example setup of prometheus, grafana, kolibri-fleet-zio instances,
dummy search service instances (response juggler) and kolibri-watch (the kolibri UI).
Why these?
Prometheus
for pulling metrics from the kolibri serviceGrafana
for displaying the dashboard representing the service statekolibri-fleet-zio
for the actual service taking care of computations and state keeping,
which needs to run on every node that is intended to take part in the distributed processingresponse-juggler
: simulating responses by a search system. Used to test request and parsing tasks
defined in kolibri-fleet-zio
kolibri-watch
: providing the UI to monitor available nodes, their resource consumption / utilization
and controls to load job templates, create new ones, store job definitions and starting / stopping processing
of the jobs
We will first look at the configurations that need setting before being able to start up the
service, focussing on kolibri-fleet-zio:
kolibri-zio-1:
image: awagen/kolibri-fleet-zio:0.2.0
cpu_count: 12
mem_limit: 6144m
mem_reservation: 4096m
ports:
- "8001:8001"
user: "1000:1000"
environment:
JVM_OPTS: >
-XX:+UseParallelGC
-XX:ParallelGCThreads=2
-Xms4096m
-Xmx4096m
PROFILE: prod
NODE_HASH: "abc1"
HTTP_SERVER_PORT: 8001
RUNNING_TASK_PER_JOB_MAX_COUNT: 20
RUNNING_TASK_PER_JOB_DEFAULT_COUNT: 3
MAX_NR_BATCH_RETRIES: 2
PERSISTENCE_MODE: 'CLASS'
PERSISTENCE_MODULE_CLASS: 'de.awagen.kolibri.fleet.zio.config.di.modules.persistence.LocalPersistenceModule'
AWS_PROFILE: 'developer'
AWS_S3_BUCKET: 'kolibri-dev'
AWS_S3_PATH: 'kolibri_fleet_zio_test'
AWS_S3_REGION: 'EU_CENTRAL_1'
# the file path in the job definitions are to be given relative to the path (or bucket path) defined
# for the respective configuration of persistence
LOCAL_STORAGE_WRITE_BASE_PATH: '/app/test-files'
LOCAL_STORAGE_READ_BASE_PATH: '/app/test-files'
# JOB_TEMPLATES_PATH must be relative to the base path or bucket path, depending on the persistence selected
JOB_TEMPLATES_PATH: 'templates/jobs'
OUTPUT_RESULTS_PATH: 'test-results'
JUDGEMENT_FILE_SOURCE_TYPE: 'CSV'
# if judgement file format set to 'JSON_LINES', need to set 'DOUBLE' in case judgements are numeric in the json,
# if the numeric value is represented as string, use 'STRING'. This purely refers to how the json value is
# interpreted, later this will be cast to double either way
JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST: 'STRING'
ALLOWED_TIME_PER_ELEMENT_IN_MILLIS: 4000
ALLOWED_TIME_PER_BATCH_IN_SECONDS: 3600
ALLOWED_TIME_PER_JOB_IN_SECONDS: 36000
MAX_RESOURCE_DIRECTIVES_LOAD_TIME_IN_MINUTES: 10
MAX_PARALLEL_ITEMS_PER_BATCH: 16
CONNECTION_POOL_SIZE_MIN: 100
CONNECTION_POOL_SIZE_MAX: 100
CONNECTION_TTL_IN_SECONDS: 1200
MAX_NR_JOBS_PROCESSING: 5
MAX_NR_JOBS_CLAIMED: 5
NETTY_HTTP_CLIENT_THREADS_MAX: 4
BLOCKING_POOL_THREADS: 4
NON_BLOCKING_POOL_THREADS: 4
volumes:
- ./tmp_data:/app/test-files
- ${HOME}/.aws/credentials:/home/kolibri/.aws/credentials:ro
Configuration Options in Detail
General Setup Settings | |
---|
PROFILE | The config file suffix for the config to be loaded on startup. Will try to find application-[PROFILE].conf in the resource folder. |
NODE_HASH | The hash that identifies this specific node. If not set, will randomly set a hash. Note: nodes are identified by the node_hash, so it should be a unique identifier. |
HTTP_SERVER_PORT | Port to reach the kolibri-fleet-zio API under. |
ALLOWED_TIME_PER_ELEMENT_IN_MILLIS | Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed. |
ALLOWED_TIME_PER_BATCH_IN_SECONDS | Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed. |
ALLOWED_TIME_PER_JOB_IN_SECONDS | Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed. |
MAX_RESOURCE_DIRECTIVES_LOAD_TIME_IN_MINUTES | Defines how much time loading of a global node-resource (such as judgement lists, parameters and the like) is allowed to take. |
MAX_PARALLEL_ITEMS_PER_BATCH | Defines how many items are processed in parallel per batch at any given time. |
MAX_NR_JOBS_PROCESSING | Defines the maximal number of batches that are allowed in progress per node at any given time. |
MAX_NR_JOBS_CLAIMED | Defines the maximal number of batches that can be claimed for execution at any given time. |
Storage configuration | |
---|
PERSISTENCE_MODE | The persistence mode used. Can be: AWS (s3), GCP (gcs), LOCAL (local file system), RESOURCE (local resources), CLASS (if selected, need to define PERSISTENCE_MODULE_CLASS property, specifying fully qualified name to used persistence module class. |
PERSISTENCE_MODULE_CLASS | If PERSISTENCE_MODE is set to CLASS , need to set here the fully qualified name to used persistence module class, such as de.awagen.kolibri.fleet.zio.config.di.modules.persistence.LocalPersistenceModule (which happens to refer to the same persistence module as just specifying PERSISTENCE_MODE as LOCAL). |
AWS_PROFILE | If PERSISTENCE_MODE is AWS (or CLASS and the AWS module is referenced above), specify here the profile to use. |
AWS_S3_BUCKET | If AWS storage is used, define here the bucket to store tha state / result data in. |
AWS_S3_PATH | If AWS storage is used, define here the path within the above defined bucket to use as base path. |
AWS_S3_REGION | If AWS storage is used, define the region here. |
GCP_GS_BUCKET | If GCP storage is used, define here the bucket to store tha state / result data in. |
GCP_GS_PATH | If GCP storage is used, define here the path within the above defined bucket to use as base path. |
GCP_GS_PROJECT_ID | If GCP storage is used, define here the project id under which you created the bucket. |
LOCAL_STORAGE_WRITE_BASE_PATH | If LOCAL storage is used, define the base path here under which to store the data. |
LOCAL_STORAGE_READ_BASE_PATH | If LOCAL storage is used, define the base path here from which data is read (should usually be the same as the write base path). |
JOB_TEMPLATES_PATH | Relative subpath (relative to the defined base paths) under which job templates are found / stored. |
OUTPUT_RESULTS_PATH | Relative subpath (relative to the defined base paths) under which results are persisted. |
Judgement file format configuration | |
---|
JUDGEMENT_FILE_SOURCE_TYPE | Type of the utilized judgement file. Possible values: CSV (per line: query, product and judgement score each separated by the configured delimiter (see below)) or JSON_LINES (one line per query in format {"query": "q2", "products": [{"productId": "aa", "score": 0.30}, {"productId": "bb", "score": 0.11}]} ) |
JUDGEMENT_FILE_COLUMN_DELIMITER | Delimiter used if the JUDGEMENT_FILE_SOURCE_TYPE is set to CSV . Default value: \u0000 |
JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST | If JUDGEMENT_FILE_SOURCE_TYPE set to JSON , defines how to parse the score attribute from above format. Options: STRING (if the number is wrapped in string delimiters; this was initially only a workaround) or DOUBLE (if the score is a number in the used json). |
Http Connection / Connection Pool Settings | |
---|
CONNECTION_POOL_TYPE | Specifies whether to use a fixed (FIXED ) or a dynamic (DYNAMIC ) connection pool. |
CONNECTION_POOL_SIZE_MIN | If pool type is dynamic, this gives the minimum number of connections. If it is fixed, gives the fixed number of connections. |
CONNECTION_POOL_SIZE_MAX | If pool type is dynamic, gives the maximum number of connections. If pool type is fixed, this setting is not used. |
CONNECTION_TTL_IN_SECONDS | If pool type is dynamic, gives the TTL of a connection. If pool type is fixed, this setting is not used. |
CONNECTION_TIMEOUT_IN_SECONDS | In either pool type, this specifies the connection timeout. |
Thread Pool Settings | |
---|
NETTY_HTTP_CLIENT_THREADS_MAX | Specifies the maximal number of netty http client threads. |
BLOCKING_POOL_THREADS | Defines number of threads assigned to the thread pool used for blocking computations. |
NON_BLOCKING_POOL_THREADS | Defines number of threads assigned to the thread pool used for non-blocking computations. |
Volume Mounts
Volumes | Some mounts needed to access data within docker container |
---|
./tmp_data:/app/test_files | mounting project root tmp_data folder to /app/test_files folder within container |
[absolute-path-containing-your-aws-config-folder]/.aws/credentials:/home/kolibri/.aws/credentials:ro | read-only mount of folder on local machine containing aws credentials into standard location in container where its picked up automatically by aws lib |
Run It
That section is gonna be short: docker-compose up
within the project root.