DataType Categories

Lets have a look at the distinct categories of data structures provided by the kolibri-datatypes project (might not be fully exhaustive).

Indexed Generators

Indexed Generators allow to generate elements on demand by index. It provides size without having to iterate over the elements, provides Iterator of its contains elements, methods to retrieve generators of subparts of the original generator and mapping that transforms each generated element by the specified mapping function.

The distinct types (subject to change) are the following:

TypeDescription
BatchByGeneratorIndexedGeneratorProviding Seq of IndexedGenerators[T] and an index of the generator to batch by, provide generator of generators of Seq of the elements of the generators that reflect the permutations of the generators not batched by
BatchBySizeIndexedGeneratorProviding a generator of Seq[T], provide generator of generators each of maximally the max size
ByFunctionNrLimitedIndexedGeneratorpassing the nr of elements and the generator function of index i to type T, provides the respective of type T
MergingIndexedGeneratorMerge two generators applying a mergeFunction on the distinct types to retrieve the respective element. Behavior is such that the combinations of generator1 and generator2 are permutated and on calculation of elements from both generators those are mapped to needed type via the mergeFunc
NthIsNthForEachIndexedGeneratorIndexedGenerator that yields for index n the Seq of values made of one value per generator, while for each generator its n-th element is chosen. Thus no permutations here.
OneAfterAnotherIndexedGeneratorGenerator that just starts picking elements from the next generator when the requested element exceeds its own elements, e.g just sequentially provides the elements of the distinct generators
PartitionByGroupIndexedGeneratorGenerator that takes a sequence of generators and acts like a normal OneAfterAnotherIndexedGenerator, e.g will generate the elements of each contained generator sequentially, thus the number of overall elements is the sum of the elements of the single generators. What differs here is the partitions function, which will keep the groups. This partitioning by this generator and still keep logical groups within it, e.g where each generator passed reflects such as logical grouping
PermutatingIndexedGeneratorTakes a number of generators of same type and returns generator that generates all permutations of all the values within the distinct generators, keeping the position in the resulting Seq.

BatchIterable

Passing an iterable and some maximal element size, iterate through batches iterators of at most the passed maximal size.

TypeDescription
BaseBatchIterableBase implementation on each next call requesting batchSize next elements from the iterator corresponding to initially passed iterable.
GeneratingBatchIterableImplementatin that on each next call only provides the IndexedGenerator of the next batch, meaning this iterable will itself create elements only on demand when elements are requested, since IndexedGenerator assumes a mechanism to calculate the i-th element instead of holding all elements in memory

CombinedIterator

Passing two Iterables, provides for each element of iterable1 an Iterator that iterates over all elements of iterable2 and provides the value resulting from applying the mergeFunc to the current elements of iterable1 and iterable2.

Typed Maps

TypeDescription
TypeTaggedMapStrongly typed map utilizing TypeTags to check element type and ClassTyped[T] keys that also provide the respective type casting. Can not add values of wrong type for key, getting around type erasure by TypeTags
WeaklyTypedMapMap reducing the strict type assumptions. Allows only adding correct type, but only for top level type, thus suffers from type erasure

Aggregation

TypeDescription
MetricAggregationMetricAggregation that keeps track of full MetricDocuments for keys of defined type. Each key stands for a separate aggregation, which can be used for selectively aggregating subsets of results
AggregateValueKeeps track of current value and count of samples the current value is based on
Aggregators
BaseAggregatorTakes aggregation function of new element, current aggregation value yielding new aggregation value ((U, V) => V), a start value supplier, and a merge function of two aggregation values
BasePerClassAggregatorSimilar to BaseAggregator, but keeps aggregation state per Tag
TagKeyRunningDoubleAvgPerClassAggregatorKeeps track of averages per Tag
TagKeyRunningDoubleAvgAggregatorKeeps track of overall average
TagKeyMetricDocumentPerClassAggregatorPer class aggregates MetricRow elements into MetricDocument
TagKeyMetricAggregationPerClassAggregatorPer class aggregates MetricRow elements into MetricAggregation
BaseAnyAggregatorWrapper for typed aggregators to accept any message and aggregate only those matching the type

Multiple Values

TypeDescription
OrderedMultiValuesContainer for multiple OrderedValues[Any]. Provides methods to find the n-th permutation and nr of index per value for a given overall element index

Fail Reasons

TypeDescription
ComputeFailReasonRepresenting a fail type for a computation with a description

Metric Stores

TypeDescription
MetricRowSingle metric row, where each row is identified by set of parameters and metric values that hold for the parameters
MetricRecordStorage of MetricValue for given key
MetricDocumentMetricDocument representing a map of parameter set to MetricRow. Implementation uses a mutable map; not threads-safe, thus access with single thread at a time.

Tagging

TypeDescription
TaggedWithTypeTrait mapping TagTypes to a Set of Tags
Tags
ParameterMultiValueTagTag defined by Map[String, Seq[String]] mapping
ParameterSingleValueTagTag defined by Map[String, String] mappings
AggregationTagTag consisting of id, a ParameterTag for the varied parameters and a ParameterTag for the fixed parameters
MultiTagA tag that can hold multiple other tags
StringTagTag defined by string value
NamedTagWrapper containing name and the actual Tag

Permutations

TypeDescription
PermutationUtilsHelper functions to simplify permutation calculations

PriorityStores

TypeDescription
BasePriorityStoreproviding how many elements to keep, an ordering, a function to derive key from each element, allows preserving only top elements using a PriorityQueue

Values

TypeDescription
DistinctValuesSimply a name for the parameter and a Seq of the distinct values
RangeValuesDefined by name, start value, end value and step size, generates all the values within the boundaries
MetricValueSimple container keeping state with BiRunningValue that keeps track of occurring error types and the respective counts and some aggregated value type to keep track of the successful computations aggregated in the MetricValue
RunningValueKeeps count of the nr of elements the current value is made of and functions to add other single values or AggregateValues
BiRunningValueRunning value of two distinct types, e.g can be used to record occurring errors and successful computation values in a single record, e.g in case your computation returns Either[SomeFailType, SomeComputationValue] or similar settings where two values are in some way connected. AggregateValue keeps the count of samples aggregated and the current value of the aggregation

Threadsafe Async Data Loading

TypeDescription
AtomicMapPromiseStoreImplementation ensuring thread safety of the value storage and also ensuring that no more than one request leads to the creation of the stored resource which could potentially be expensive (e.g in case multiple experiment batch processing actors on a single node try to request the ressource at once). E.g used to load some data expensive to load within an object to have only one data-instance per node
ConcurrentUpdateMapOpsUpdate functions for AtomicReference[Map[U, V]]