If you are going to create two branches or forks, it may be more efficient to call this method first which will create a node in the cascading graph.
limit the output to at most count items.
This does a sum of values WITHOUT triggering a shuffle.
Export back to a raw cascading Pipe.
Merge two TypedPipes (no order is guaranteed) This is only realized when a group (or join) is performed.
Same as groupAll.
Filter and map.
Attach a ValuePipe to each element this TypedPipe
prints the current pipe to stdout
Returns the set of distinct elements in the TypedPipe
Returns the set of distinct elements identified by a given lambda extractor in the TypedPipe
Merge two TypedPipes of different types by using Either
Sometimes useful for implementing custom joins with groupBy + mapValueStream when you know that the value/key can fit in memory.
Keep only items that satisfy this predicate
If T is a (K, V) for some V, then we can use this function to filter.
Keep only items that don't satisfy the predicate.
common pattern of attaching a value and then filter
common pattern of attaching a value and then flatMap
flatten an Iterable
Force a materialization of this pipe prior to the next operation.
This is the default means of grouping all pairs with the same key.
Send all items to a single reducer
Given a key function, add the key, then call .
Forces a shuffle by randomly assigning each item into one of the partitions.
These operations look like joins, but they do not force any communication of the current TypedPipe.
Do an inner-join without shuffling this TypedPipe, but replicating argument to all tasks
Do an leftjoin without shuffling this TypedPipe, but replicating argument to all tasks
For each element, do a map-side (hash) left join to look up a value
Just keep the keys, or .
uses hashJoin but attaches None if thatPipe is empty
ValuePipe may be empty, so, this attaches it as an Option cross is the same as leftCross(p).
Transform each element via the function f
Transform only the values (sometimes requires giving the types due to scala type inference)
common pattern of attaching a value and then map
Used to force a shuffle into a given size of nodes.
Build a sketch of this TypedPipe so that you can do a skew-join with another Grouped
Reasonably common shortcut for cases of associative/commutative reduction returns a typed pipe with only one element.
Reasonably common shortcut for cases of associative/commutative reduction by Key
swap the keys with the values
use a TupleUnpacker to flatten U out into a cascading Tuple
Just keep the values, or .
Safely write to a TypedSink[T].
a pipe equivalent to the current pipe.