DataType Categories
Lets have a look at the distinct categories of data structures provided by the kolibri-datatypes project
(might not be fully exhaustive).
Indexed Generators
Indexed Generators allow to generate elements on demand
by index. It provides size without having to iterate over the elements,
provides Iterator of its contains elements, methods to retrieve
generators of subparts of the original generator and mapping
that transforms each generated element by the specified mapping function.
The distinct types (subject to change) are the following:
Type | Description |
---|
BatchByGeneratorIndexedGenerator | Providing Seq of IndexedGenerators[T] and an index of the generator to batch by, provide generator of generators of Seq of the elements of the generators that reflect the permutations of the generators not batched by |
BatchBySizeIndexedGenerator | Providing a generator of Seq[T], provide generator of generators each of maximally the max size |
ByFunctionNrLimitedIndexedGenerator | passing the nr of elements and the generator function of index i to type T, provides the respective of type T |
MergingIndexedGenerator | Merge two generators applying a mergeFunction on the distinct types to retrieve the respective element. Behavior is such that the combinations of generator1 and generator2 are permutated and on calculation of elements from both generators those are mapped to needed type via the mergeFunc |
NthIsNthForEachIndexedGenerator | IndexedGenerator that yields for index n the Seq of values made of one value per generator, while for each generator its n-th element is chosen. Thus no permutations here. |
OneAfterAnotherIndexedGenerator | Generator that just starts picking elements from the next generator when the requested element exceeds its own elements, e.g just sequentially provides the elements of the distinct generators |
PartitionByGroupIndexedGenerator | Generator that takes a sequence of generators and acts like a normal OneAfterAnotherIndexedGenerator, e.g will generate the elements of each contained generator sequentially, thus the number of overall elements is the sum of the elements of the single generators. What differs here is the partitions function, which will keep the groups. This partitioning by this generator and still keep logical groups within it, e.g where each generator passed reflects such as logical grouping |
PermutatingIndexedGenerator | Takes a number of generators of same type and returns generator that generates all permutations of all the values within the distinct generators, keeping the position in the resulting Seq. |
BatchIterable
Passing an iterable and some maximal element size, iterate through batches iterators of
at most the passed maximal size.
Type | Description |
---|
BaseBatchIterable | Base implementation on each next call requesting batchSize next elements from the iterator corresponding to initially passed iterable. |
GeneratingBatchIterable | Implementatin that on each next call only provides the IndexedGenerator of the next batch, meaning this iterable will itself create elements only on demand when elements are requested, since IndexedGenerator assumes a mechanism to calculate the i-th element instead of holding all elements in memory |
CombinedIterator
Passing two Iterables, provides for each element of iterable1
an Iterator that iterates over all elements of iterable2
and provides the value resulting from applying the mergeFunc to the
current elements of iterable1 and iterable2.
Typed Maps
Type | Description |
---|
TypeTaggedMap | Strongly typed map utilizing TypeTags to check element type and ClassTyped[T] keys that also provide the respective type casting. Can not add values of wrong type for key, getting around type erasure by TypeTags |
WeaklyTypedMap | Map reducing the strict type assumptions. Allows only adding correct type, but only for top level type, thus suffers from type erasure |
Aggregation
Type | Description |
---|
MetricAggregation | MetricAggregation that keeps track of full MetricDocuments for keys of defined type. Each key stands for a separate aggregation, which can be used for selectively aggregating subsets of results |
AggregateValue | Keeps track of current value and count of samples the current value is based on |
Aggregators | |
---|
BaseAggregator | Takes aggregation function of new element, current aggregation value yielding new aggregation value ((U, V) => V), a start value supplier, and a merge function of two aggregation values |
BasePerClassAggregator | Similar to BaseAggregator, but keeps aggregation state per Tag |
TagKeyRunningDoubleAvgPerClassAggregator | Keeps track of averages per Tag |
TagKeyRunningDoubleAvgAggregator | Keeps track of overall average |
TagKeyMetricDocumentPerClassAggregator | Per class aggregates MetricRow elements into MetricDocument |
TagKeyMetricAggregationPerClassAggregator | Per class aggregates MetricRow elements into MetricAggregation |
BaseAnyAggregator | Wrapper for typed aggregators to accept any message and aggregate only those matching the type |
Multiple Values
Type | Description |
---|
OrderedMultiValues | Container for multiple OrderedValues[Any]. Provides methods to find the n-th permutation and nr of index per value for a given overall element index |
Fail Reasons
Type | Description |
---|
ComputeFailReason | Representing a fail type for a computation with a description |
Metric Stores
Type | Description |
---|
MetricRow | Single metric row, where each row is identified by set of parameters and metric values that hold for the parameters |
MetricRecord | Storage of MetricValue for given key |
MetricDocument | MetricDocument representing a map of parameter set to MetricRow. Implementation uses a mutable map; not threads-safe, thus access with single thread at a time. |
Tagging
Type | Description |
---|
TaggedWithType | Trait mapping TagTypes to a Set of Tags |
Tags | |
---|
ParameterMultiValueTag | Tag defined by Map[String, Seq[String]] mapping |
ParameterSingleValueTag | Tag defined by Map[String, String] mappings |
AggregationTag | Tag consisting of id, a ParameterTag for the varied parameters and a ParameterTag for the fixed parameters |
MultiTag | A tag that can hold multiple other tags |
StringTag | Tag defined by string value |
NamedTag | Wrapper containing name and the actual Tag |
Permutations
Type | Description |
---|
PermutationUtils | Helper functions to simplify permutation calculations |
PriorityStores
Type | Description |
---|
BasePriorityStore | providing how many elements to keep, an ordering, a function to derive key from each element, allows preserving only top elements using a PriorityQueue |
Values
Type | Description |
---|
DistinctValues | Simply a name for the parameter and a Seq of the distinct values |
RangeValues | Defined by name, start value, end value and step size, generates all the values within the boundaries |
MetricValue | Simple container keeping state with BiRunningValue that keeps track of occurring error types and the respective counts and some aggregated value type to keep track of the successful computations aggregated in the MetricValue |
RunningValue | Keeps count of the nr of elements the current value is made of and functions to add other single values or AggregateValues |
BiRunningValue | Running value of two distinct types, e.g can be used to record occurring errors and successful computation values in a single record, e.g in case your computation returns Either[SomeFailType, SomeComputationValue] or similar settings where two values are in some way connected. AggregateValue keeps the count of samples aggregated and the current value of the aggregation |
Threadsafe Async Data Loading
Type | Description |
---|
AtomicMapPromiseStore | Implementation ensuring thread safety of the value storage and also ensuring that no more than one request leads to the creation of the stored resource which could potentially be expensive (e.g in case multiple experiment batch processing actors on a single node try to request the ressource at once). E.g used to load some data expensive to load within an object to have only one data-instance per node |
ConcurrentUpdateMapOps | Update functions for AtomicReference[Map[U, V]] |