Prajna


DSet<'U>

Generic distributed dataset. If storing the content of DSet to a persisted store, it is highly advisable that 'U used is either of .Net system type, or using a customized serializer/deserializer to serialize ('U)[]. Otherwise, the stored data may not be retrievable as the BinaryFormatter may fail to deserialize a class if the associated code has changed.

Constructors

ConstructorDescription
new()
Signature: unit -> DSet<'U>

Instance members

Instance memberDescription
AsyncMap(func)
Signature: (func:('U -> Async<'U1>)) -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The given function is an async function, in which elements within the same collection are parallelly executed with Async.Parallel

AsyncMapi(func)
Signature: (func:(int -> int64 -> 'U -> Async<'U1>)) -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The given function is an async function, in which elements within the same collection are parallelly executed with Async.Parallel. The first integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition.

BinSort(partFunc, comparer)
Signature: (partFunc:('U -> int) * comparer:IComparer<'U>) -> DSet<'U>

Bin sort the DSet. Apply a partition function, repartition elements of dataset across nodes in the cluster. The number of partitions remains unchanged. Elements within each partition/bin are sorted using the 'comparer'.

BinSortN(...)
Signature: (numPartitions:int * partFunc:('U -> int) * comparer:IComparer<'U>) -> DSet<'U>

Bin sort the DSet. Apply a partition function, repartition elements of dataset into 'numPartitions" partitions across nodes in the cluster. Elements within each partition/bin are sorted using the 'comparer'.

BinSortP(param, partFunc, comparer)
Signature: (param:DParam * partFunc:('U -> int) * comparer:IComparer<'U>) -> DSet<'U>

Bin sort the DSet. Apply a partition function, repartition elements of dataset across nodes in the cluster according to the setting specified by "param". Elements within each partition/bin are sorted using the 'comparer'.

Bypass()
Signature: unit -> DSet<'U> * DSet<'U>

Bypass the Dset in 2 ways. One of the DSet is pulled, the other DSet will be in push dataflow.

Bypass2()
Signature: unit -> DSet<'U> * DSet<'U>

Bypass the Dset in 2 ways. One of the DSet is pulled, the other DSet will be in push dataflow.

Bypass3()
Signature: unit -> DSet<'U> * DSet<'U> * DSet<'U>

Bypass the Dset in 3 ways. One of the DSet is pulled, the other two DSets will be in push dataflow.

Bypass4()
Signature: unit -> DSet<'U> * DSet<'U> * DSet<'U> * DSet<'U>

Bypass the Dset in 4 ways. One of the DSet is pulled, the other three DSets will be in push dataflow.

BypassN(nWays)
Signature: nWays:int -> DSet<'U> * DSet<'U> []

Bypass the Dset in n ways. One of the DSet is pulled, the other n - 1 DSets will be in push dataflow.

CacheInMemory()
Signature: unit -> DSet<'U>

Cache DSet content in memory, in raw object form.

Choose(func)
Signature: (func:('U -> 'U1 option)) -> DSet<'U1>

Applies the given function to each element of the dataset. The new dataset comprised of the results for each element where the function returns Some with some value.

Collect(func)
Signature: (func:('U -> seq<'U1>)) -> DSet<'U1>

Applies the given function to each element of the dataset and concatenates all the results.

Count()
Signature: unit -> int64

Count the number of elements(rows) in the DSet

CrossJoin(mapFunc, x1)
Signature: (mapFunc:('U -> 'U1 -> 'U2) * x1:DSet<'U1>) -> DSet<'U2>
Type parameters: 'U2

Cross Join, for each entry in x:Dset<'U> and x1:Dset<'U1> apply mapFunc:'U->'U1->'U2 The resultant set is the product of Dset<'U> and Dset<'U1>

CrossJoinChoose(mapFunc, x1)
Signature: (mapFunc:('U -> 'U1 -> 'U2 option) * x1:DSet<'U1>) -> DSet<'U2>
Type parameters: 'U2

Cross Join with Filtering, for each entry in x:Dset<'U> and x1:Dset<'U1> apply mapFunc:'U->'U1->'U2 option. Filter out all those result that returns None.

CrossJoinFold(...)
Signature: (mapFunc:('U -> 'U1 -> 'U2) * foldFunc:('S -> 'U2 -> 'S) * initialState:'S * x1:DSet<'U1>) -> DSet<'S>
Type parameters: 'U2, 'S

Cross Join with Fold, for each entry in x:DSet<'U> and x1:DSet<'U1> apply mapFunc:'U->'U1->'U2, and foldFunc: 'S->'U2->'S

Distribute(sourceSeq)
Signature: sourceSeq:seq<'U> -> DSet<'U>

Span a sequence (dataset) to a distributed dataset by splitting the sequence to N partitions, with N being the number of nodes in the cluster. Each node get one partition of the dataset. The number of partitions of the distributed dataset is N, which is the number of the active nodes in the cluster.

DistributeN(num, sourceSeq)
Signature: (num:int * sourceSeq:seq<'U>) -> DSet<'U>

Span a sequence (dataset) to a distributed dataset by splitting the sequence to NUM * N partitions, with NUM being the number of partitions on each node, and N being the number of nodes in the cluster. Each node get NUM partition of the dataset.

Execute(func)
Signature: (func:(unit -> unit)) -> unit

Execute a distributed customerized functional delegate on each of the machine. The functional delegate is running remotely on each node. If any node in the cluster is not responding, the functional delegate does not run on that node.

ExecuteN(num, funcN)
Signature: (num:int * funcN:(int -> unit)) -> unit

Execute num distributed customerized functional delegates on each of the machine. The functional delegate is running remotely on each node. If any node in the cluster is not responding, the functional delegates do not run on that node.

Filter(func)
Signature: (func:('U -> bool)) -> DSet<'U>

Creates a new dataset containing only the elements of the dataset for which the given predicate returns true.

Fold(foldFunc, aggrFunc, state)
Signature: (foldFunc:('GV -> 'U -> 'GV) * aggrFunc:('GV -> 'GV -> 'GV) * state:'GV) -> 'GV

Fold the entire DSet with a fold function, an aggregation function, and an initial state. The initial state is deserialized (separately) for each partition. Within each partition, the elements are folded into the state variable using 'folder' function. Then 'aggrFunc' is used to aggregate the resulting state variables from all partitions to a single state.

GetNumberOfPartitions()
Signature: unit -> int

Get number of partitions. If partition has not been set, setup the partition

Identity()
Signature: unit -> DSet<'U>

Identity Mapping, the new DSet is the same as the old DSet, with an encode function installed.

Import(serversInfo, importName)
Signature: (serversInfo:ContractServersInfo * importName:string) -> DSet<'U>

Generate a distributed dataset by importing a customerized functional delegate from a local contract store. The imported functional delegate is usually exported by another service (either in the same remote container or in anotherPrajna remote container. Each node runs one local instance of the functional delegate, which forms one partition of the dataset. The number of partitions of dataset is N, where N is the number of the nodes in the cluster. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegates in that node.

ImportN(serversInfo, importNames)
Signature: (serversInfo:ContractServersInfo * importNames:string []) -> DSet<'U>

Generate a distributed dataset by importing customerized functional delegates from a local contract store. The imported functional delegate is usually exported by another service (either in the same remote container or in anotherPrajna remote container. Each node runs multiple local instance of the functional delegates, each of which forms one partition of the dataset. The number of partitions of dataset is N * num , where N is the number of the nodes in the cluster, and num is the number of contracts in the contract list. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegates in that node.

Init(initFunc, partitionSizeFunc)
Signature: (initFunc:(int * int -> 'U) * partitionSizeFunc:(int -> int)) -> DSet<'U>

Create a distributed dataset on the distributed cluster, with each element created by a functional delegate.

InitN(initFunc, partitionSizeFunc)
Signature: (initFunc:(int * int -> 'U) * partitionSizeFunc:(int -> int -> int)) -> DSet<'U>

Create a distributed dataset on the distributed cluster, with each element created by a functional delegate, using a given number of parallel execution per node. The functional delegate that create each element in the dataset, the integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition. The functional delegate that returns the size of the partition, the integer index passed to the function indicates the partition.

InitS(initFunc, partitionSize)
Signature: (initFunc:(int * int -> 'U) * partitionSize:int) -> DSet<'U>

Create a distributed dataset on the distributed cluster, with each element created by a functional delegate.

Iter(iterFunc)
Signature: (iterFunc:('U -> unit)) -> unit

Applies a function 'iterFunc' to each element

LazySaveToHDD()
Signature: unit -> unit

Lazily save the DSet to HDD. This is a lazy evaluation. This DSet must be a branch generated by Bypass or a child of such branch. The save action will be triggered when a pull happens on other branches generated by Bypass.

LazySaveToHDDByName(name)
Signature: name:string -> unit

Lazily save the DSet to HDD using "name". This is a lazy evaluation. This DSet must be a branch generated by Bypass or a child of such branch. The save action will be triggered when a pull happens on other branches generated by Bypass.

LoadSource()
Signature: unit -> DSet<'U>

Trigger to load metadata.

LocalIter(iterFunc)
Signature: (iterFunc:('U -> unit)) -> unit

Read DSet back to local machine and apply a function on each value. Caution: the iter function will be executed locally as the entire DSet is read back.

Map(func)
Signature: (func:('U -> 'U1)) -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The given function will be applied per collection as the dataset is being distributedly iterated. The entire dataset never materialize entirely.

Map2(func, x1)
Signature: (func:('U -> 'U1 -> 'V) * x1:DSet<'U1>) -> DSet<'V>
Type parameters: 'V

Create a new DSet whose elements are the results of applying the given function to the corresponding elements of the two DSets pairwise The two DSet must have the same partition mapping structure and same number of element (e.g.), established via Split.

Map3(func, x1, x2)
Signature: (func:('U -> 'U1 -> 'U2 -> 'V) * x1:DSet<'U1> * x2:DSet<'U2>) -> DSet<'V>
Type parameters: 'U2, 'V

Create a new DSet whose elements are the results of applying the given function to the corresponding elements of the three DSets pairwise

Map4(func, x1, x2, x3)
Signature: (func:('U -> 'U1 -> 'U2 -> 'U3 -> 'V) * x1:DSet<'U1> * x2:DSet<'U2> * x3:DSet<'U3>) -> DSet<'V>
Type parameters: 'U2, 'U3, 'V

Create a new DSet whose elements are the results of applying the given function to the corresponding elements of the four DSets pairwise

MapByCollection(func)
Signature: (func:('U [] -> 'U1 [])) -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each collection of the elements of the dataset. In the input DSet, a parttition can have multiple collections, the size of the which is determined by the SerializationLimit of the cluster.

Mapi(func)
Signature: (func:(int -> int64 -> 'U -> 'U1)) -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The first integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition

MapReduce(mapFunc, reduceFunc)
Signature: (mapFunc:('U -> seq<'K1 * 'V1>) * reduceFunc:('K1 * List<'V1> -> 'U2)) -> DSet<'U2>
Type parameters: 'V1, 'U2

MapReduce, see http://en.wikipedia.org/wiki/MapReduce

MapReduceP(param, mapFunc, reduceFunc)
Signature: (param:DParam * mapFunc:('U -> seq<'K1 * 'V1>) * reduceFunc:('K1 * List<'V1> -> 'U2)) -> DSet<'U2>
Type parameters: 'V1, 'U2

MapReduce, see http://en.wikipedia.org/wiki/MapReduce

MapReducePWithPartitionFunction(...)
Signature: (param:DParam * mapFunc:('U -> seq<'K1 * 'V1>) * partFunc:('K1 -> int) * reduceFunc:('K1 * List<'V1> -> 'U2)) -> DSet<'U2>
Type parameters: 'V1, 'U2

MapReduce, see http://en.wikipedia.org/wiki/MapReduce

Mix(x1)
Signature: x1:DSet<'U1> -> DSet<'U * 'U1>

Mixing two DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

Mix2(x1)
Signature: x1:DSet<'U1> -> DSet<'U * 'U1>

Mixing two DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

Mix3(x1, x2)
Signature: (x1:DSet<'U1> * x2:DSet<'U2>) -> DSet<'U * 'U1 * 'U2>
Type parameters: 'U2

Mixing three DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

Mix4(x1, x2, x3)
Signature: (x1:DSet<'U1> * x2:DSet<'U2> * x3:DSet<'U3>) -> DSet<'U * 'U1 * 'U2 * 'U3>
Type parameters: 'U2, 'U3

Mixing four DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

Multicast()
Signature: unit -> DSet<'U>

Multicast a DSet over network, replicate its content to all peers in the cluster.

ParallelMap(func)
Signature: (func:('U -> Task<'U1>)) -> DSet<'U1>

Map DSet , in which func is an Task<_> function that may contains asynchronous operation. You will need to start the 1st task in the mapping function. Prajna will not be able to start the task for you as the returned task may not be the a Task in the creation state. see: http://blogs.msdn.com/b/pfxteam/archive/2012/01/14/10256832.aspx

ParallelMapi(func)
Signature: (func:(int -> int64 -> 'U -> Task<'U1>)) -> DSet<'U1>

Map DSet , in which func is an Task<_> function that may contains asynchronous operation. The first integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition. You will need to start the 1st task in the mapping function. Prajna will not be able to start the task for you as the returned task may not be the a Task in the creation state. see: http://blogs.msdn.com/b/pfxteam/archive/2012/01/14/10256832.aspx

Printfn(fmt)
Signature: (fmt:TextWriterFormat<('U -> '?13274)>) -> unit

Read DSet back to local machine and print each value. Caution: the iter function will be executed locally as the entire DSet is read back.

Reduce(reducer)
Signature: (reducer:('U -> 'U -> 'U)) -> 'U

Reduces the elements using the specified 'reducer' function

ReorgWDegree(numParallelJobs)
Signature: numParallelJobs:int -> DSet<'U>

Reorganization collection in a dataset to accommodate a certain parallel execution degree.

Repartition(partFunc)
Signature: (partFunc:('U -> int)) -> DSet<'U>

Apply a partition function, repartition elements of dataset across nodes in the cluster. The number of partitions remains unchanged.

RepartitionN(numPartitions, partFunc)
Signature: (numPartitions:int * partFunc:('U -> int)) -> DSet<'U>

Apply a partition function, repartition elements of dataset into 'numPartitions" partitions across nodes in the cluster.

RepartitionP(param, partFunc)
Signature: (param:DParam * partFunc:('U -> int)) -> DSet<'U>

Apply a partition function, repartition elements of dataset across nodes in the cluster according to the setting specified by "param".

RowsMergeAll()
Signature: unit -> DSet<'U>

merge all rows of a partition into a single collection object

RowsReorg(numRows)
Signature: numRows:int -> DSet<'U>

Reorganization collection of a dataset. If numRows = 1, the dataset is split into one row per collection. If numRows = Int32.MaxValue, the data set is merged so that all rows in a partition is grouped into a single collection.

RowsSplit()
Signature: unit -> DSet<'U>

Reorganization collection of a dataset so that each collection has a single row.

SaveToHDD()
Signature: unit -> unit

Save the DSet to HDD. This is an action.

SaveToHDDByName(name)
Signature: name:string -> unit

Save the DSet to HDD using "name". This is an action.

SaveToHDDByNameWithMonitor(...)
Signature: (monitorFunc:('U -> bool) * localIterFunc:('U -> unit) * name:string) -> unit

Save the DSet to HDD using "name". This is an action. Attach a monitor function that select elements that will be fetched to local machine to be iterated by 'localIterFunc'

SaveToHDDWithMonitor(...)
Signature: (monitorFunc:('U -> bool) * localIterFunc:('U -> unit)) -> unit

Save the DSet to HDD. This is an action. Attach a monitor function that select elements that will be fetched to local machine to be iterated by 'localIterFunc'

Source(sourceSeqFunc)
Signature: (sourceSeqFunc:(unit -> seq<'U>)) -> DSet<'U>

Generate a distributed dataset through a customerized seq functional delegate running on each of the machine. Each node runs a local instance of the functional delegate, which generates a seq('U) that forms one partition of DSet('U). The number of partitions of DSet('U) is N, which is the number of the nodes in the cluster. The NumReplications is 1. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegate in that node.

SourceI(numPartitions, sourceISeqFunc)
Signature: (numPartitions:int * sourceISeqFunc:(int -> seq<'U>)) -> DSet<'U>

Generate a distributed dataset through customerized seq functional delegates running on each of the machine. The number of partitions of dataset is numPartitions. If any node in the cluster is not responding, the partition may be rerun at a different node.

SourceN(num, sourceNSeqFunc)
Signature: (num:int * sourceNSeqFunc:(int -> seq<'U>)) -> DSet<'U>

Generate a distributed dataset through customerized seq functional delegates running on each of the machine. Each node runs num local instance of the functional delegates, each of which forms one partition of the dataset. The number of partitions of dataset is N * num, where N is the number of the nodes in the cluster, and num is the number of partitions per node. The NumReplications is 1. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegates in that node.

Split2(fun0, fun1)
Signature: (fun0:('U -> 'U0) * fun1:('U -> 'U1)) -> DSet<'U0> * DSet<'U1>
Type parameters: 'U1

Correlated split a dataset to two, each of which is created by running a functional delegate that maps the element of the original dataset. The resultant datasets all have the same partition and collection structure of the original dataset.

Split3(fun0, fun1, fun2)
Signature: (fun0:('U -> 'U0) * fun1:('U -> 'U1) * fun2:('U -> 'U2)) -> DSet<'U0> * DSet<'U1> * DSet<'U2>
Type parameters: 'U1, 'U2

Correlated split a dataset to three, each of which is created by running a functional delegate that maps the element of the original dataset. The resultant datasets all have the same partition and collection structure of the original dataset.

Split4(fun0, fun1, fun2, fun3)
Signature: (fun0:('U -> 'U0) * fun1:('U -> 'U1) * fun2:('U -> 'U2) * fun3:('U -> 'U3)) -> DSet<'U0> * DSet<'U1> * DSet<'U2> * DSet<'U3>
Type parameters: 'U1, 'U2, 'U3

Correlated split a dataset to four, each of which is created by running a functional delegate that maps the element of the original dataset. The resultant datasets all have the same partition and collection structure of the original dataset.

Store(o, cts)
Signature: (o:seq<'U> * cts:CancellationToken) -> unit

Store a sequence to a persisted DSet

Store(o)
Signature: o:seq<'U> -> unit

Store a sequence to a persisted DSet

StoreInternal(o)
Signature: o:seq<'U> -> unit

Store a sequence to a persisted DSet

ToSeq()
Signature: unit -> IEnumerable<'U>

Convert DSet to a sequence seq<'U>

Union(source)
Signature: source:seq<DSet<'U>> -> DSet<'U>

Merge the content of multiple dataset into a single dataset. The original dataset become partitions of the resultant dataset. This can be considered as merge dataset by rows, and all dataset have the same column structure.

Static members

Static memberDescription
asyncMap(func x)
Signature: (func:('U -> Async<'?13397>)) -> x:DSet<'U> -> DSet<'?13397>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The given function is an async function, in which elements within the same collection are parallelly executed with Async.Parallel

asyncMapi(func x)
Signature: (func:(int -> int64 -> 'U -> Async<'U1>)) -> x:DSet<'U> -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The given function is an async function, in which elements within the same collection are parallelly executed with Async.Parallel. The first integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition.

binSort(partFunc comparer x)
Signature: (partFunc:('U -> int)) -> comparer:IComparer<'U> -> x:DSet<'U> -> DSet<'U>

Bin sort the DSet. Apply a partition function, repartition elements of dataset across nodes in the cluster. The number of partitions remains unchanged. Elements within each partition/bin are sorted using the 'comparer'.

binSortN(...)
Signature: numPartitions:int -> (partFunc:('U -> int)) -> comparer:IComparer<'U> -> x:DSet<'U> -> DSet<'U>

Bin sort the DSet. Apply a partition function, repartition elements of dataset into 'numPartitions" partitions across nodes in the cluster. Elements within each partition/bin are sorted using the 'comparer'.

binSortP(param partFunc comparer x)
Signature: param:DParam -> (partFunc:('U -> int)) -> comparer:IComparer<'U> -> x:DSet<'U> -> DSet<'U>

Bin sort the DSet. Apply a partition function, repartition elements of dataset across nodes in the cluster according to the setting specified by "param". Elements within each partition/bin are sorted using the 'comparer'.

bypass(x)
Signature: x:DSet<'U> -> DSet<'U> * DSet<'U>

Bypass the Dset in 2 ways. One of the DSet is pulled, the other DSet will be in push dataflow.

bypass2(x)
Signature: x:DSet<'U> -> DSet<'U> * DSet<'U>

Bypass the Dset in 2 ways. One of the DSet is pulled, the other DSet will be in push dataflow.

bypass3(x)
Signature: x:DSet<'U> -> DSet<'U> * DSet<'U> * DSet<'U>

Bypass the Dset in 3 ways. One of the DSet is pulled, the other two DSets will be in push dataflow.

bypass4(x)
Signature: x:DSet<'U> -> DSet<'U> * DSet<'U> * DSet<'U> * DSet<'U>

Bypass the Dset in 4 ways. One of the DSet is pulled, the other three DSets will be in push dataflow.

bypassN(nWays x)
Signature: nWays:int -> x:DSet<'U> -> DSet<'U> * DSet<'U> []

Bypass the Dset in n ways. One of the DSet is pulled, the other n - 1 DSets will be in push dataflow.

cacheInMemory(x)
Signature: x:DSet<'U> -> DSet<'U>

Cache DSet content in RAM, in raw object form.

choose(func x)
Signature: (func:('U -> '?13379 option)) -> x:DSet<'U> -> DSet<'?13379>

Applies the given function to each element of the dataset. The new dataset comprised of the results for each element where the function returns Some with some value.

collect(func x)
Signature: (func:('U -> seq<'?13457>)) -> x:DSet<'U> -> DSet<'?13457>

Applies the given function to each element of the dataset and concatenates all the results.

count(x)
Signature: x:DSet<'U> -> int64

Count the number of elements(rows) in the DSet

crossJoin(mapFunc x x1)
Signature: (mapFunc:('U -> 'U1 -> 'U2)) -> x:DSet<'U> -> x1:DSet<'U1> -> DSet<'U2>
Type parameters: 'U2

Cross Join, for each entry in x:Dset<'U> and x1:Dset<'U1> apply mapFunc:'U->'U1->'U2 The resultant set is the product of Dset<'U> and Dset<'U1>

crossJoinChoose(mapFunc x x1)
Signature: (mapFunc:('U -> 'U1 -> 'U2 option)) -> x:DSet<'U> -> x1:DSet<'U1> -> DSet<'U2>
Type parameters: 'U2

Cross Join with Filtering, for each entry in x:DSet<'U> and x1:DSet<'U1> apply mapFunc:'U->'U1->'U2 option. Filter out all those result that returns None.

crossJoinFold(...)
Signature: (mapFunc:('U -> 'U1 -> 'U2)) -> (foldFunc:('S -> 'U2 -> 'S)) -> initialState:'S -> x:DSet<'U> -> x1:DSet<'U1> -> DSet<'S>
Type parameters: 'U2, 'S

Cross Join with Fold, for each entry in x:DSet<'U> and x1:DSet<'U1> apply mapFunc:'U->'U1->'U2, and foldFunc: 'S->'U2->'S

distribute(sourceSeq x)
Signature: sourceSeq:seq<'U> -> x:DSet<'U> -> DSet<'U>

Span a sequence (dataset) to a distributed dataset by splitting the sequence to N partitions, with N being the number of nodes in the cluster. Each node get one partition of the dataset. The number of partitions of the distributed dataset is N, which is the number of the active nodes in the cluster.

distributeN(num sourceSeq x)
Signature: num:int -> sourceSeq:seq<'U> -> x:DSet<'U> -> DSet<'U>

Span a sequence (dataset) to a distributed dataset by splitting the sequence to NUM * N partitions, with NUM being the number of partitions on each node, and N being the number of nodes in the cluster. Each node get NUM partition of the dataset.

execute(func x)
Signature: (func:(unit -> unit)) -> x:DSet<'U> -> unit

Execute a distributed customerized functional delegate on each of the machine. The functional delegate is running remotely on each node. If any node in the cluster is not responding, the functional delegate does not run on that node.

executeN(num funcN x)
Signature: num:int -> (funcN:(int -> unit)) -> x:DSet<'U> -> unit

Execute num distributed customerized functional delegates on each of the machine. The functional delegate is running remotely on each node. If any node in the cluster is not responding, the functional delegates do not run on that node.

filter(func x)
Signature: (func:('U -> bool)) -> x:DSet<'U> -> DSet<'U>

Creates a new dataset containing only the elements of the dataset for which the given predicate returns true.

fold(folder aggrFunc state x)
Signature: (folder:('?13255 -> 'U -> '?13255)) -> (aggrFunc:('?13255 -> '?13255 -> '?13255)) -> state:'?13255 -> x:DSet<'U> -> '?13255

Fold the entire DSet with a fold function, an aggregation function, and an initial state. The initial state is broadcasted to each partition. Within each partition, the elements are folded into the state variable using 'folder' function. Then 'aggrFunc' is used to aggregate the resulting state variables from all partitions to a single state.

identity(x)
Signature: x:DSet<'U> -> DSet<'U>

Identity Mapping, the new DSet is the same as the old DSet, with an encode function installed.

import(servers importName x)
Signature: servers:ContractServersInfo -> importName:string -> x:DSet<'U> -> DSet<'U>

Generate a distributed dataset by importing a customerized functional delegate from a local contract store. The imported functional delegate is usually exported by another service (either in the same remote container or in anotherPrajna remote container. Each node runs one local instance of the functional delegate, which forms one partition of the dataset. The number of partitions of dataset is N, where N is the number of the nodes in the cluster. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegates in that node.

importN(servers importNames x)
Signature: servers:ContractServersInfo -> (importNames:string []) -> x:DSet<'U> -> DSet<'U>

Generate a distributed dataset by importing customerized functional delegates from a local contract store. The imported functional delegate is usually exported by another service (either in the same remote container or in anotherPrajna remote container. Each node runs multiple local instance of the functional delegates, each of which forms one partition of the dataset. The number of partitions of dataset is N * num , where N is the number of the nodes in the cluster, and num is the number of contracts in the contract list. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegates in that node.

init(initFunc partitionSizeFunc x)
Signature: (initFunc:(int * int -> 'U)) -> (partitionSizeFunc:(int -> int)) -> x:DSet<'U> -> DSet<'U>

Create a distributed dataset on the distributed cluster, with each element created by a functional delegate.

initN(initFunc, partitionSizeFunc) x
Signature: (initFunc:(int * int -> 'U) * partitionSizeFunc:(int -> int -> int)) -> x:DSet<'U> -> DSet<'U>

Create a distributed dataset on the distributed cluster, with each element created by a functional delegate, using a given number of parallel execution per node. The functional delegate that create each element in the dataset, the integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition. The functional delegate that returns the size of the partition, the integer index passed to the function indicates the partition. The DSet to operate on

initS(initFunc partitionSize x)
Signature: (initFunc:(int * int -> 'U)) -> partitionSize:int -> x:DSet<'U> -> DSet<'U>

Create a distributed dataset on the distributed cluster, with each element created by a functional delegate.

iter(iterFunc x)
Signature: (iterFunc:('U -> unit)) -> x:DSet<'U> -> unit

Iterate the given function to each element. This is an action.

lazySaveToHDD(x)
Signature: x:DSet<'U> -> unit

Lazily save the DSet to HDD. This is a lazy evaluation. This DSet must be a branch generated by Bypass or a child of such branch. The save action will be triggered when a pull happens on other branches generated by Bypass.

lazySaveToHDDByName(name x)
Signature: name:string -> x:DSet<'U> -> unit

Lazily save the DSet to HDD using "name". This is a lazy evaluation. This DSet must be a branch generated by Bypass or a child of such branch. The save action will be triggered when a pull happens on other branches generated by Bypass.

loadSource(x)
Signature: x:DSet<'U> -> DSet<'U>

Trigger to load metadata.

localIter(iterFunc x)
Signature: (iterFunc:('U -> unit)) -> x:DSet<'U> -> unit

Read DSet back to local machine and apply a function on each value. Caution: the iter function will be executed locally as the entire DSet is read back.

map(func x)
Signature: (func:('U -> '?13385)) -> x:DSet<'U> -> DSet<'?13385>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The given function will be applied per collection as the dataset is being distributedly iterated. The entire dataset never materialize entirely.

map2(func x x1)
Signature: (func:('U -> 'U1 -> 'V)) -> x:DSet<'U> -> x1:DSet<'U1> -> DSet<'V>
Type parameters: 'V

Create a new DSet whose elements are the results of applying the given function to the corresponding elements of the two DSets pairwise The two DSet must have the same partition mapping structure and same number of element (e.g.), established via Split.

map3(func x x1 x2)
Signature: (func:('U -> 'U1 -> 'U2 -> 'V)) -> x:DSet<'U> -> x1:DSet<'U1> -> x2:DSet<'U2> -> DSet<'V>
Type parameters: 'U2, 'V

Create a new DSet whose elements are the results of applying the given function to the corresponding elements of the three DSets pairwise

map4(func x x1 x2 x3)
Signature: (func:('U -> 'U1 -> 'U2 -> 'U3 -> 'V)) -> x:DSet<'U> -> x1:DSet<'U1> -> x2:DSet<'U2> -> x3:DSet<'U3> -> DSet<'V>
Type parameters: 'U2, 'U3, 'V

Create a new DSet whose elements are the results of applying the given function to the corresponding elements of the four DSets pairwise

mapByCollection(func x)
Signature: (func:('U [] -> 'U1 [])) -> x:DSet<'U> -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each collection of the elements of the dataset. In the input DSet, a parttition can have multiple collections, the size of the which is determined by the SerializationLimit of the cluster.

mapi(func x)
Signature: (func:(int -> int64 -> 'U -> 'U1)) -> x:DSet<'U> -> DSet<'U1>

Creates a new dataset whose elements are the results of applying the given function to each of the elements of the dataset. The first integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition

mapReduce(mapFunc reduceFunc x)
Signature: (mapFunc:('U -> seq<'K1 * 'V1>)) -> (reduceFunc:('K1 * List<'V1> -> 'U2)) -> x:DSet<'U> -> DSet<'U2>
Type parameters: 'V1, 'U2

MapReduce, see http://en.wikipedia.org/wiki/MapReduce

mapReduceP(param mapFunc reduceFunc x)
Signature: param:DParam -> (mapFunc:('U -> seq<'K1 * 'V1>)) -> (reduceFunc:('K1 * List<'V1> -> 'U2)) -> x:DSet<'U> -> DSet<'U2>
Type parameters: 'V1, 'U2

MapReduce, see http://en.wikipedia.org/wiki/MapReduce

mapReducePWithPartitionFunction(...)
Signature: param:DParam -> (mapFunc:('U -> seq<'K1 * 'V1>)) -> (partFunc:('K1 -> int)) -> (reduceFunc:('K1 * List<'V1> -> 'U2)) -> x:DSet<'U> -> DSet<'U2>
Type parameters: 'V1, 'U2

MapReduce, see http://en.wikipedia.org/wiki/MapReduce

mix(x x1)
Signature: x:DSet<'U> -> x1:DSet<'U1> -> DSet<'U * 'U1>

Mixing two DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

mix2(x x1)
Signature: x:DSet<'U> -> x1:DSet<'U1> -> DSet<'U * 'U1>

Mixing two DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

mix3(x x1 x2)
Signature: x:DSet<'U> -> x1:DSet<'U1> -> x2:DSet<'U2> -> DSet<'U * 'U1 * 'U2>
Type parameters: 'U2

Mixing three DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

mix4(x x1 x2 x3)
Signature: x:DSet<'U> -> x1:DSet<'U1> -> x2:DSet<'U2> -> x3:DSet<'U3> -> DSet<'U * 'U1 * 'U2 * 'U3>
Type parameters: 'U2, 'U3

Mixing four DSets that have the same size and partition layout into a single DSet by operating a function on the individual data (row).

multicast(x)
Signature: x:DSet<'U> -> DSet<'U>

Multicast a DSet over network, replicate its content to all peers in the cluster.

parallelMap(func x)
Signature: (func:('U -> Task<'?13409>)) -> x:DSet<'U> -> DSet<'?13409>

Map DSet , in which func is an Task<_> function that may contains asynchronous operation. You will need to start the 1st task in the mapping function. Prajna will not be able to start the task for you as the returned task may not be the a Task in the creation state. see: http://blogs.msdn.com/b/pfxteam/archive/2012/01/14/10256832.aspx

parallelMapi(func x)
Signature: (func:(int -> int64 -> 'U -> Task<'?13415>)) -> x:DSet<'U> -> DSet<'?13415>

Map DSet , in which func is an Task<_> function that may contains asynchronous operation. The first integer index passed to the function indicates the partition, and the second integer passed to the function index (from 0) element within the partition. You will need to start the 1st task in the mapping function. Prajna will not be able to start the task for you as the returned task may not be the a Task in the creation state. see: http://blogs.msdn.com/b/pfxteam/archive/2012/01/14/10256832.aspx

printfn(fmt x)
Signature: (fmt:TextWriterFormat<('U -> '?13277)>) -> x:DSet<'U> -> unit

Read DSet back to local machine and apply a function on each value. Caution: the iter function will be executed locally as the entire DSet is read back.

reduce(reducer x)
Signature: (reducer:('U -> 'U -> 'U)) -> x:DSet<'U> -> 'U

Reduces the elements using the specified 'reducer' function

reorgWDegree(numParallelJobs x)
Signature: numParallelJobs:int -> x:DSet<'U> -> DSet<'U>

Reorganization collection in a dataset to accommodate a certain parallel execution degree.

repartition(partFunc x)
Signature: (partFunc:('U -> int)) -> x:DSet<'U> -> DSet<'U>

Apply a partition function, repartition elements of dataset across nodes in the cluster. The number of partitions remains unchanged.

repartitionN(numPartitions partFunc x)
Signature: numPartitions:int -> (partFunc:('U -> int)) -> x:DSet<'U> -> DSet<'U>

Apply a partition function, repartition elements of dataset into 'numPartitions" partitions across nodes in the cluster.

repartitionP(param partFunc x)
Signature: param:DParam -> (partFunc:('U -> int)) -> x:DSet<'U> -> DSet<'U>

Apply a partition function, repartition elements of dataset across nodes in the cluster according to the setting specified by "param".

rowsMergeAll(x)
Signature: x:DSet<'U> -> DSet<'U>

merge all rows of a partition into a single collection object

rowsReorg(numRows x)
Signature: numRows:int -> x:DSet<'U> -> DSet<'U>

Reorganization collection of a dataset. If numRows = 1, the dataset is split into one row per collection. If numRows = Int32.MaxValue, the data set is merged so that all rows in a partition is grouped into a single collection.

rowsSplit(x)
Signature: x:DSet<'U> -> DSet<'U>

Reorganization collection of a dataset so that each collection has a single row.

saveToHDD(x)
Signature: x:DSet<'U> -> unit

Save the DSet to HDD. This is an action.

saveToHDDByName(name x)
Signature: name:string -> x:DSet<'U> -> unit

Save the DSet to HDD using "name". This is an action.

saveToHDDByNameWithMonitor(...)
Signature: (monitorFunc:('U -> bool)) -> (localIterFunc:('U -> unit)) -> name:string -> x:DSet<'U> -> unit

Save the DSet to HDD using "name". This is an action. Attach a monitor function that select elements that will be fetched to local machine to be iterated by 'localIterFunc'

saveToHDDWithMonitor(...)
Signature: (monitorFunc:('U -> bool)) -> (localIterFunc:('U -> unit)) -> x:DSet<'U> -> unit

Save the DSet to HDD. This is an action. Attach a monitor function that select elements that will be fetched to local machine to be iterated by 'localIterFunc'

source(sourceSeqFunc x)
Signature: (sourceSeqFunc:(unit -> seq<'U>)) -> x:DSet<'U> -> DSet<'U>

Generate a distributed dataset through a customerized seq functional delegate running on each of the machine. Each node runs a local instance of the functional delegate, which generates a seq('U) that forms one partition of DSet('U). The number of partitions of DSet('U) is N, which is the number of the nodes in the cluster. The NumReplications is 1. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegate in that node.

sourceI(numPartitions sourceISeqFunc x)
Signature: numPartitions:int -> (sourceISeqFunc:(int -> seq<'U>)) -> x:DSet<'U> -> DSet<'U>

Generate a distributed dataset through customerized seq functional delegates running on each of the machine. The number of partitions of dataset is numPartitions. If any node in the cluster is not responding, the partition may be rerun at a different node.

sourceN(num sourceNSeqFunc x)
Signature: num:int -> (sourceNSeqFunc:(int -> seq<'U>)) -> x:DSet<'U> -> DSet<'U>

Generate a distributed dataset through customerized seq functional delegates running on each of the machine. Each node runs num local instance of the functional delegates, each of which forms one partition of the dataset. The number of partitions of dataset is N * num, where N is the number of the nodes in the cluster, and num is the number of partitions per node. The NumReplications is 1. If any node in the cluster is not responding, the dataset does not contain the data resultant from the functional delegates in that node.

split2(fun0 fun1 x)
Signature: (fun0:('U -> '?13657)) -> (fun1:('U -> '?13658)) -> x:DSet<'U> -> DSet<'?13657> * DSet<'?13658>
Type parameters: '?13658

Correlated split a dataset to two, each of which is created by running a functional delegate that maps the element of the original dataset. The resultant datasets all have the same partition and collection structure of the original dataset.

split3(fun0 fun1 fun2 x)
Signature: (fun0:('U -> '?13666)) -> (fun1:('U -> '?13667)) -> (fun2:('U -> '?13668)) -> x:DSet<'U> -> DSet<'?13666> * DSet<'?13667> * DSet<'?13668>
Type parameters: '?13667, '?13668

Correlated split a dataset to three, each of which is created by running a functional delegate that maps the element of the original dataset. The resultant datasets all have the same partition and collection structure of the original dataset.

split4(fun0 fun1 fun2 fun3 x)
Signature: (fun0:('U -> '?13677)) -> (fun1:('U -> '?13678)) -> (fun2:('U -> '?13679)) -> (fun3:('U -> '?13680)) -> x:DSet<'U> -> DSet<'?13677> * DSet<'?13678> * DSet<'?13679> * DSet<'?13680>
Type parameters: '?13678, '?13679, '?13680

Correlated split a dataset to four, each of which is created by running a functional delegate that maps the element of the original dataset. The resultant datasets all have the same partition and collection structure of the original dataset. They can be combined later by Map4 transforms.

store(x o)
Signature: x:DSet<'U> -> o:seq<'U> -> unit

Store a sequence to a persisted DSet

toSeq(x)
Signature: x:DSet<'U> -> IEnumerable<'U>

Convert DSet to a sequence seq<'U>

tryFind(cluster searchPattern)
Signature: cluster:Cluster -> searchPattern:string -> DSet<'U> []

Find dataset with name that matches the search pattern.

union(source)
Signature: source:seq<DSet<'U>> -> DSet<'U>

Merge the content of multiple dataset into a single dataset. The original dataset become partitions of the resultant dataset. This can be considered as merge dataset by rows, and all dataset have the same column structure.

Fork me on GitHub