Prajna


DKVExtensionsToDSet

Extension member functions for DSet<'K*'V>

Static members

Static memberDescription
AsyncMapByValue(x, func)
Signature: (x:DSet<'K * 'V> * func:('V -> Async<'V1>)) -> DSet<'K * 'V1>
Type parameters: 'K, 'V, 'V1

Map DKV by value, in which func is an async function so that all data in a serialization block will be executed with Async.Parallel

BinSortByKey(x, partFunc, comparer)
Signature: (x:DSet<'K * 'V> * partFunc:('K -> int) * comparer:IComparer<'K>) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Bin sort the DKV by key. Apply a partition function, repartition elements by key across nodes in the cluster. The number of partitions remains unchanged. Elements within each partition/bin are sorted using the 'comparer'.

BinSortNByKey(...)
Signature: (x:DSet<'K * 'V> * numPartitions:int * partFunc:('K -> int) * comparer:IComparer<'K>) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Bin sort the DKV by key. Apply a partition function, repartition elements by key into 'numPartitions" partitions across nodes in the cluster. Elements within each partition/bin are sorted using the 'comparer'.

BinSortPByKey(...)
Signature: (x:DSet<'K * 'V> * param:DParam * partFunc:('K -> int) * comparer:IComparer<'K>) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Bin sort the DKV by key. Apply a partition function, repartition elements by key across nodes in the cluster according to the setting specified by "param". Elements within each partition/bin are sorted using the 'comparer'.

FilterByKey(x, func)
Signature: (x:DSet<'K * 'V> * func:('K -> bool)) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Creates a new dataset containing only the elements of the dataset for which the given predicate on key returns true.

GroupByKey(x)
Signature: (x:DSet<'K * 'V>) -> DSet<'K * List<'V>>
Type parameters: 'K, 'V

Group all values of the same key to a List.

GroupByKeyN(x, numSerialization)
Signature: (x:DSet<'K * 'V> * numSerialization:int) -> DSet<'K * List<'V>>
Type parameters: 'K, 'V

Group all values of the same key to a List.

InnerJoinByMergeAfterBinSortByKey(...)
Signature: (x:DSet<'K * 'V> * x1:DSet<'K * 'V1> * comp:IComparer<'K> * func:('V -> 'V1 -> '?13158)) -> DSet<'K * '?13158>
Type parameters: 'K, 'V, 'V1, '?13158

Inner join the DKV with another DKV by merge join at each partition. It assumes that both DKVs have already been bin sorted by key (using one of the BinSortByKey methods). The bin sorts should have partitioned the two DKVs into the same number of paritions. For elements with the same key, they should have been placed into the same parition. Please refer to http://en.wikipedia.org/wiki/Join_(SQL) on the join operators.

LeftOuterJoinByMergeAfterBinSortByKey(...)
Signature: (x:DSet<'K * 'V> * x1:DSet<'K * 'V1> * comp:IComparer<'K> * func:('V -> 'V1 option -> '?13163)) -> DSet<'K * '?13163>
Type parameters: 'K, 'V, 'V1, '?13163

Left outer join the DKV with another DKV by merge join at each partition. It assumes that both DKVs have already been bin sorted by key (using one of the BinSortByKey methods). The bin sorts should have partitioned the two DKVs into the same number of paritions. For elements with the same key, they should have been placed into the same parition. Please refer to http://en.wikipedia.org/wiki/Join_(SQL) on the join operators.

MapByValue(x, func)
Signature: (x:DSet<'K * 'V> * func:('V -> 'V1)) -> DSet<'K * 'V1>
Type parameters: 'K, 'V, 'V1

Create a new dataset by transforming only the value of the original dataset

ParallelMapByValue(x, func)
Signature: (x:DSet<'K * 'V> * func:('V -> Task<'V1>)) -> DSet<'K * 'V1>
Type parameters: 'K, 'V, 'V1

Map DKV by value, in which func is an Task<_> function that may contains asynchronous operation. You will need to start the 1st task in the mapping function. Prajna will not be able to start the task for you as the returned task may not be the a Task in the creation state. see: http://blogs.msdn.com/b/pfxteam/archive/2012/01/14/10256832.aspx

ReduceByKey(x, reduceFunc)
Signature: (x:DSet<'K * 'V> * reduceFunc:('V -> 'V -> 'V)) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Aggregate all values of a unique key of the DKV togeter. Caution: as this function uses mapreduce, the network cost is not negligble. If the aggregated result is to be returned to the client, rather than further used in the cluster, the .fold function should be considered instead for performance.

RepartitionByKey(x, partFunc)
Signature: (x:DSet<'K * 'V> * partFunc:('K -> int)) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Apply a partition function, repartition elements across nodes in the cluster. The number of partitions remains unchanged.

RepartitionNByKey(...)
Signature: (x:DSet<'K * 'V> * numPartitions:int * partFunc:('K -> int)) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Apply a partition function, repartition elements by key into 'numPartitions" partitions across nodes in the cluster.

RepartitionPByKey(x, param, partFunc)
Signature: (x:DSet<'K * 'V> * param:DParam * partFunc:('K -> int)) -> DSet<'K * 'V>
Type parameters: 'K, 'V

Apply a partition function, repartition elements by the key across nodes in the cluster according to the setting specified by "param".

RightOuterJoinByMergeAfterBinSortByKey(...)
Signature: (x:DSet<'K * 'V> * x1:DSet<'K * 'V1> * comp:IComparer<'K> * func:('V option -> 'V1 -> '?13168)) -> DSet<'K * '?13168>
Type parameters: 'K, 'V, 'V1, '?13168

Right outer join the DKV with another DKV by merge join at each partition. It assumes that both DKVs have already been bin sorted by key (using one of the BinSortByKey methods). The bin sorts should have partitioned the two DKVs into the same number of paritions. For elements with the same key, they should have been placed into the same parition. Please refer to http://en.wikipedia.org/wiki/Join_(SQL) on the join operators.

Fork me on GitHub