Prajna


DKV<'K, 'V>

Functions for DSet<'U> when 'U represents a key-value pair with type 'K*'V

Static members

Static memberDescription
asyncMapByValue(func x)
Signature: (func:('V -> Async<'V1>)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V1>

async map DKV by value

binSortByKey(partFunc comparer x)
Signature: (partFunc:('K -> int)) -> comparer:IComparer<'K> -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Bin sort the DKV by key. Apply a partition function, repartition elements by key across nodes in the cluster. The number of partitions remains unchanged. Elements within each partition/bin are sorted using the 'comparer'.

binSortNByKey(...)
Signature: numPartitions:int -> (partFunc:('K -> int)) -> comparer:IComparer<'K> -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Bin sort the DKV by key. Apply a partition function, repartition elements by key into 'numPartitions" partitions across nodes in the cluster. Elements within each partition/bin are sorted using the 'comparer'.

binSortPByKey(param partFunc comparer x)
Signature: param:DParam -> (partFunc:('K -> int)) -> comparer:IComparer<'K> -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Bin sort the DKV by key. Apply a partition function, repartition elements by key across nodes in the cluster according to the setting specified by "param". Elements within each partition/bin are sorted using the 'comparer'.

filterByKey(func x)
Signature: (func:('K -> bool)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Creates a new dataset containing only the elements of the dataset for which the given predicate on key returns true.

groupByKey(x)
Signature: (x:DSet<'K * 'V>) -> DSet<'K * List<'V>>

Group all values of the same key to a List.

groupByKeyN(serializationNum x)
Signature: serializationNum:int -> (x:DSet<'K * 'V>) -> DSet<'K * List<'V>>

Group all values of the same key to a List.

innerJoinByMergeAfterBinSortByKey(...)
Signature: comp:IComparer<'K> -> (func:('V -> 'V1 -> 'V2)) -> (x:DSet<'K * 'V>) -> (x1:DSet<'K * 'V1>) -> DSet<'K * 'V2>

Inner join the two DKVs by merge join at each partition. It assumes that both DKVs have already been bin sorted by key (using one of the BinSortByKey methods). The bin sorts should have partitioned the two DKVs into the same number of paritions. For elements with the same key, they should have been placed into the same parition. Please refer to http://en.wikipedia.org/wiki/Join_(SQL) on the join operators.

leftOuterJoinByMergeAfterBinSortByKey(...)
Signature: comp:IComparer<'K> -> (func:('V -> 'V1 option -> 'V2)) -> (x:DSet<'K * 'V>) -> (x1:DSet<'K * 'V1>) -> DSet<'K * 'V2>

Left outer join DKV 'x' with DKV 'x1' by merge join at each partition. It assumes that both DKVs have already been bin sorted by key (using one of the BinSortByKey methods). The bin sorts should have partitioned the two DKVs into the same number of paritions. For elements with the same key, they should have been placed into the same parition. Please refer to http://en.wikipedia.org/wiki/Join_(SQL) on the join operators.

mapByValue(func x)
Signature: (func:('V -> 'V1)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V1>

Create a new dataset by transforming only the value of the original dataset

parallelMapByValue(func x)
Signature: (func:('V -> Task<'V1>)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V1>

Map DKV by value, in which func is an Task<_> function that may contains asynchronous operation. You will need to start the 1st task in the mapping function. Prajna will not be able to start the task for you as the returned task may not be the a Task in the creation state. see: http://blogs.msdn.com/b/pfxteam/archive/2012/01/14/10256832.aspx

reduceByKey(reduceFunc x)
Signature: (reduceFunc:('V -> 'V -> 'V)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Aggregate all values of a unique key of the DKV togeter. Caution: as this function uses mapreduce, the network cost is not negligble. If the aggregated result is to be returned to the client, rather than further used in the cluster, the .fold function should be considered instead for performance.

repartitionByKey(partFunc x)
Signature: (partFunc:('K -> int)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Apply a partition function, repartition elements across nodes in the cluster. The number of partitions remains unchanged.

repartitionNByKey(...)
Signature: numPartitions:int -> (partFunc:('K -> int)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Apply a partition function, repartition elements by key into 'numPartitions" partitions across nodes in the cluster.

repartitionPByKey(param partFunc x)
Signature: param:DParam -> (partFunc:('K -> int)) -> (x:DSet<'K * 'V>) -> DSet<'K * 'V>

Apply a partition function, repartition elements by the key across nodes in the cluster according to the setting specified by "param".

rightOuterJoinByMergeAfterBinSortByKey(...)
Signature: comp:IComparer<'K> -> (func:('V option -> 'V1 -> 'V2)) -> (x:DSet<'K * 'V>) -> (x1:DSet<'K * 'V1>) -> DSet<'K * 'V2>

Right outer join the DKV 'x' with DKV 'x1' by merge join at each partition. It assumes that both DKVs have already been bin sorted by key (using one of the BinSortByKey methods). The bin sorts should have partitioned the two DKVs into the same number of paritions. For elements with the same key, they should have been placed into the same parition. Please refer to http://en.wikipedia.org/wiki/Join_(SQL) on the join operators.

Fork me on GitHub