2014-10-19

Transducers Explained: Part 1

An introduction to transducers using JavaScript. We will work from reducing over arrays, to defining transformations as transformers, then incrementally introducing transducers and using them with transduce. We will conclude with a summary of what we've learned, what is coming in future articles, and links to additional resources and existing transducer libraries.

Transducers...

What are they?

Straight from the source.

Transducers are composable algorithmic transformations. They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element. Because transducers are decoupled from input or output sources, they can be used in many different processes - collections, streams, channels, observables, etc. Transducers compose directly, without awareness of input or creation of intermediate aggregates.

Hmm...

I don't get it

Let's walk through some code. When using transducers, the "algorithmic transformations" are defined, at least in part, by a function similar to what you pass reduce. The Clojure documentation refers to these as reducing functions. What are these things? Well, let's start with Array#reduce. Here is the definition from MDN.

Reduce

The reduce() method applies a function against an accumulator and each value of the array (from left-to-right) has to reduce it to a single value.

You can read the documentation for more explanation, but I'll just show a few quick examples since this may already be familiar.

function sum(result, item){
  return result + item;
}

function mult(result, item){
  return result * item;
}

// 10 (=1+2+3+4)
var summed = [2,3,4].reduce(sum, 1);

// 24 (=1*2*3*4)
var multed = [2,3,4].reduce(mult, 1);

The reducing functions above are sum and mult. They are passed to reduce with an initial value, 1 in both cases. The "input source" is the array [2, 3, 4] and the "output source" is the new array created internally to the reduce implementation.

It is important to remember that reduce:

  1. Starts with an initial value
  2. Operates on one item at a time with the reducing function using
    • The initial value for result in the first step
    • The return value of the step function for result of next iteration
  3. Outputs a value using the last computed result.

Notice, in both cases, the first argument of the reducing function is provided outside of reduce. It is either passed as an initial value or computed using the previous call to the reducing function. The second value is an individual element passed by some process of iteration. Here, reduce is iterating over every element of the array, but we will see other methods of iteration later. We use reducing functions to describe "the essence of the transformation".

Transformer

Let's formalize the steps to reduce in a transformer:

var transformer = function(reducingFunction){
  return {
    // 1. Start with an initial value
    init: function(){
      return 1;
    },

    // 2. Input one item at a time, passing
    //    each result to next iteration
    //    using reducing function
    step: reducingFunction,

    // 3. Output last computed result
    result: function(result){
      return result;
    }
  }
}

We created an object that wraps any reducing function named step and provides init to initialize the transformer, and result to convert the last computed result to the desired output. Note, in this article, we will focus on the step function. We will discuss init and result in more depth in a future article. For now, you can think of them as methods to help manage the lifecycle of the transformation: init to initialize any resources, step during iteration, result to clean up.

Now, let's use this transformer with reduce.

var input = [2,3,4];

var xf = transformer(sum);
var output = input.reduce(xf.step, xf.init());
// output = 10 (=1+2+3+4)

var xf = transformer(mult);
var output = input.reduce(xf.step, xf.init());
// output = 24 (=1*2*3*4)

Since the goal is to decouple the transformation from the input and output, we will define reduce as a function.

function reduce(xf, init, input){
  var result = input.reduce(xf.step, init);
  return xf.result(result);
}

To use reduce, we pass in a transformer, initial value and input source. The implementation uses the step function with array reduce, and returns the result of the step function as the output. The internal implementation still assumes that the input is an array. We will remove this assumption later.

We also accept init as an argument to reduce. We could have used transformer init, but when we are at the level of reduce, we want to preserve the ability to change the initial value. In practice, the transformer init function is only used in cases where an initial value is not provided.

Using the new reduce function is similar to what we did before.

var input = [2,3,4];
var xf = transformer(sum);
var output = reduce(xf, xf.init(), input);
// output = 10 (=1+2+3+4)

var input = [2,3,4];
var xf = transformer(mult);
var output = reduce(xf, xf.init(), input);
// output = 24 (=1*2*3*4)

We can change the initial value by passing it in.

var input = [2,3,4];
var xf = transformer(sum);
var output = reduce(xf, 2, input);
// output = 11 (=2+2+3+4)

var input = [2,3,4];
var xf = transformer(mult);
var output = reduce(xf, 2, input);
// output = 48 (=2*2*3*4)

Our reduce function currently requires a transformer. Since init is not used internally, and result is very often an identity function, a function that directly returns the single argument, we will define a helper that converts reducing function to a transformer, and use it in reduce.

function reduce(xf, init, input){
  if(typeof xf === 'function'){
    // make sure we have a transformer
    xf = wrap(xf);
  }
  var result = input.reduce(xf.step, init);
  return xf.result(result);
}


function wrap(xf){
  return {
    // 1. We require init as arg, so do not need here
    init: function(){
      throw new Error('init not supported');
    },

    // 2. Input one item at a time, passing
    //    each result to next iteration
    step: xf,

    // 3. Output last computed result
    result: function(result){
      return result;
    }
  }
}

We check to see if the xf argument is a function. If so, we assume it is a step function and call wrap to convert it to a transformer. We then proceed with reduce as we did before.

Now we can pass the reducing function directly to reduce.

var input = [2,3,4];
var output = reduce(sum, 1, input);
// output = 10 (=1+2+3+4)

var input = [2,3,4];
var output = reduce(mult, 2, input);
// output = 48 (=2*2*3*4)

But we can still pass in transformers if we so desire.

var input = [2,3,4];
var xf = wrap(sum);
var output = reduce(xf, 2, input);
// output = 11 (=2+2+3+4)

var input = [2,3,4];
var xf = wrap(mult);
var output = reduce(xf, 1, input);
// output = 24 (=1*2*3*4)

Notice we are using wrap externally to create a transformer directly from our existing reducing functions. This is a common thing to do when working with transducers: you define your transformations as simple functions, and let your transducers library take care of converting it to a transformer.

Fancy array copy

So far, we've been working with numbers as initial values and arithmetic reducing functions. This is not necessary, we can also use reduce with arrays.

function append(result, item){
  result.push(item);
  return result;
}

var input = [2,3,4];
var output = reduce(append, [], input);
// output = [2, 3, 4]

We define a stepper function append that pushes an item onto the array, and returns it. Using this, we can use reduce to create a copy of an array.

Are you impressed? I know. Not really. Where it gets interesting is when you add the ability to transform items before appending to the output array.

The loneliest number

Let's say we want to increment every value by one. Let's define a function that increments a single value.

function plus1(item){
  return item + 1;
}

Now create a transformer using this function to transform individual items within step.

var xfplus1 = {
  init: function(){
    throw new Error('init not needed');
  },
  step: function(result, item){
    var plus1ed = plus1(item);
    return append(result, plus1ed);
  },
  result: function(result){
    return result;
  }
}

We can use the transformer to step through the results.

var xf = xfplus1;
var init = [];
var result = xf.step(init, 2);
// [3] (=append([], 2+1)))

result = xf.step(result, 3);
// [3,4] (=append([3], 3+1)))

result = xf.step(result, 4);
// [3,4,5] (=append([3,4], 4+1)))

var output = xf.result(result);
// [3,4,5]

So we used a transformer to step through items and append each incremented item to an output array.

What if what we really want is the sum of the incremented items? We can use reduce.

var output = reduce(sum, 0, output);
// 12 (=0+3+4+5)

This works, but it is unfortunate that we had to create an intermediate array to get the final answer. Can we do better?

It turns out we can. Take a look at xfplus1 above. If we replaced the call to append with a call to sum, and started with an initial value of 0, we could define a transformer that sums the items directly without creating an intermediate aggregate array.

But, there may be cases where we want to see the immediate array, and since the change from append to sum is the only change, it would be nice to have a function that can define the transformation regardless of the transformer used to combine the results.

Let's do that.

var transducerPlus1 = function(xf){
  return {
    init: function(){
      return xf.init();
    },
    step: function(result, item){
      var plus1ed = plus1(item);
      return xf.step(result, plus1ed);
    },
    result: function(result){
      return xf.result(result);
    }
  }
}

The function accepts a transformer, xf, and uses it to return another transformer that delegates to xf after transforming the results with plus1. Since we can define the transformation completely using the step function, our new transformer just asks the xf for init or result. It calls step on the wrapped transformer after transforming the item with plus1.

Transducer

We've created our first transducer: a function that accepts an existing transformer and returns a new transformer that modifies the transformation in some way, and delegates additional handling to the wrapped transformer.

Let's use it! First, let's reiterate the previous example using our transducer.

var stepper = wrap(append);
var init = [];
var transducer = transducerPlus1;
var xf = transducer(stepper);
var result = xf.step(init, 2);
// [3] (=append([], 2+1)))

result = xf.step(result, 3);
// [3,4] (=append([3], 3+1)))

result = xf.step(result, 4);
// [3,4,5] (=append([3,4], 4+1)))

var output = xf.result(result);
// [3,4,5]

Works the same. That's good. The only difference is the creation of the transformer, xf. We used wrap to convert append into a transformer called stepper and then used our transducer to wrap the stepper and return a plus1 transformation. We then used the transformation, xf, the same as we did before to step through each item and get our results.

Intermediate aggregates

Here is where it gets interesting: we can use that same transducer to get our final summation without the need for an intermediate array by changing the stepper and initial value.

var stepper = wrap(sum);
var init = 0;
var transducer = transducerPlus1;
var xf = transducer(stepper);
var result = xf.step(init, 2);
// 3 (=sum(0, 2+1)))

result = xf.step(result, 3);
// 7 (=sum(3, 3+1)))

result = xf.step(result, 4);
// 12 (=sum(7, 4+1)))

var output = xf.result(result);
// 12

We get our answer in one pass of iteration, without computing an intermediate array. Only two things changed from the previous example:

  1. We wrap sum instead of append when creating the stepper
  2. We started with an initial value of 0 instead of [].

That's it. Everything else is the same.

Notice that only the stepper transformation is aware of the type of result. It is a number when wrapping sum and an array when wrapping append. The initial value is of the same type as the result of the stepper. The item can be anything, as long as the stepper knows how to combine the result with the new item and return a new combined result, which it can then combine on the possible next iteration.

These properties allow defining transformations independent of output sources.

Can be as bad as one

What if we want plus2? What would have to change? We could define a new transducerPlus2 that works like transducerPlus1. Take a minute to look at transducerPlus1 and decide what would have to change.

Can we do better?

It turns out that everything would be the same with the exception of calling plus2 instead of plus1 in the transformation step function.

Let's pull out plus1 and pass it in as a function f.

var map = function(f){
  return function(xf){
    return {
      init: function(){
        return xf.init();
      },
      step: function(result, item){
        var mapped = f(item);
        return xf.step(result, mapped);
      },
      result: function(result){
        return xf.result(result);
      }
    }
  }
}

We have defined the mapping transducer. Let's use it to step through the transformation.

function plus2(input){
  return input+2;
}
var transducer = map(plus2);
var stepper = wrap(append);
var xf = transducer(stepper);
var init = [];
var result = xf.step(init, 2);
// [4] (=append([], 2+2)))

result = xf.step(result, 3);
// [4,5] (=append([4], 3+2)))

result = xf.step(result, 4);
// [4,5,6] (=append([4,5], 4+1)))

var output = xf.result(result);
// [4,5,6]

Compare this to the example with plus1 and append above. The only difference is the creation of the transducer using map. We could similarly create the plus1 transducer using map(plus1). Our transducerPlus1 was short lived, but hopefully it helped illustrate the point.

Transduce

The previous examples illustrated using transducers to manually transform a series of inputs. Let's break this down.

First, we initialize the transformation by calling a transducer with a stepper transformation and defining our initial value.

var transducer = map(plus1);
var stepper = wrap(append);
var xf = transducer(stepper);
var init = [];

We then step through each input item by using the reducing function xf.step. We use the initial value as the first result to the step function, and the return value of the previous step function for all subsequent items.

var result = xf.step(init, 2);
// [3] (=append([], 2+1)))

result = xf.step(result, 3);
// [3,4] (=append([3], 3+1)))

result = xf.step(result, 4);
// [3,4,5] (=append([3,4], 4+1)))

We finalize the result to our output using xf.result.

var output = xf.result(result);
// [3,4,5]

You may have noticed that this is very similar to our reduce implementation defined above. In fact, it is. We can encapsulate this process into a new function transduce.

function transduce(transducer, stepper, init, input){
  if(typeof stepper === 'function'){
    // make sure we have a transformer for stepping
    stepper = wrap(stepper);
  }

  // pass in stepper to create transformer
  var xf = transducer(stepper);

  // xf is now a transformer
  // we now can use reduce defined above to
  // iterate and transform input
  return reduce(xf, init, input);
}

Just like reduce, we ensure that our stepper is a transformer. We then create our new transformer by passing the stepper to the transducer. Finally we use reduce with our new transformer to iterate and transform results.

Let's put it to use!

var transducer = map(plus1);
var stepper = append;
var init = [];
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// [3,4,5]

var transducer = map(plus2);
var stepper = append;
var init = [];
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// [4,5,6]

The only thing that changed is the function passed to map.

How about using different step functions and initial values?

var transducer = map(plus1);
var stepper = sum;
var init = 0;
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// 12 (=3+4+5)

var transducer = map(plus2);
var stepper = sum;
var init = 0;
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// 15 (=4+5+6)

var transducer = map(plus1);
var stepper = mult;
var init = 1;
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// 60 (=3*4*5)

var transducer = map(plus2);
var stepper = mult;
var init = 1;
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// 120 (=4*5*6)

Here we are only changing the stepper and initial value. Again, we can compute the sum and product without intermediate aggregates in one pass.

Composition

What if we want to add 3? We could define plus3 and use map, but we can take advantage of a special property of transducers.

It turns out we can define plus3 in terms of two other functions: plus1 and plus2.

var plus3 = function(item){
  var result = plus2(item);
  result = plus1(result);
  return result;
}

You may recognize this as function composition. Let's redefine plus3 in terms of composition.

function compose2(fn1, fn2){
  return function(item){
    var result = fn2(item);
    result = fn1(result);
    return result;
  }
}

var plus3 = compose2(plus1, plus2);

var output = [plus3(2), plus3(3), plus3(4)];
// [5,6,7]

Here we defined compose2 that returns the composition of two functions. Your transducers library, or Underscore.js and others, have a compose function that does the same thing, but with an arbitrary number of function arguments, composing them right-to-left. The last function is called with the item argument, then every other function to the right is called with the result of the adjacent computation.

Let's use compose2 to define a transducer for adding 3 to each item by composing plus1 and plus2.

var transducerPlus3 = map(compose2(plus1, plus2));
var transducer = transducerPlus3;
var stepper = append;
var init = [];
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// [5,6,7]

We used function composition to compose the function passed to map using existing functions, plus1 and plus2 instead of using the new plus3.

Why am I telling you all of this? It turns out we can also create new transducers by composing other transducers.

var transducerPlus1 = map(plus1);
var transducerPlus2 = map(plus2);
var transducerPlus3 = compose2(transducerPlus1, transducerPlus2);
var transducer = transducerPlus3;
var stepper = append;
var init = [];
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);
// [5,6,7]

You don't have to stop there, you can use the new composed transducer to compose again.

var transducerPlus1 = map(plus1);
var transducerPlus2 = map(plus2);
var transducerPlus3 = compose2(transducerPlus1, transducerPlus2);
var transducerPlus4 = compose2(transducerPlus3, transducerPlus1);
var transducer = transducerPlus4;
var stepper = append;
var init = [];
var input = [2,3,4];
var output = transduce(transducer, stepper, init, input);

Notice, again, that the only difference in the preceding examples in this section is the creation of the transducer. Everything else is the same.

Composition works because transducers are defined to accept a transformer and return a transformer. The single argument has the same type as the return type. Whenever this is the case, you can use function composition to create a new function with the same parameter type.

We have shown that transducers are "composable algorithmic transformations". This turns out to be very powerful in practice: you can define new transformations as a series of smaller transformations and compose them in pipelines. We'll show more examples of this in a future article.

It turns out that, although function composition is right-to-left, the transformation itself runs left-to-right. In the transducerPlus4 example above, the transformation of each item with plus3 happens before the transformation of the item with plus1.

Although the order does not matter in this example, the left-to-right order of transformation is important to remember when it does. This order makes the transformation easier to follow when you are reading the code as the transformations happen in the order you read them (if you use a language like English).

This concludes part 1...

Transducers allow the abstraction of composable algorithmic transformations independent from the input and output, and even the process of iteration.

In this article, we demonstrated how you could use transducers to abstract algorithmic transformations as a transducer that accepts and returns a transformer. The transformer could be used in transduce to iterate over and transform an input source.

Whereas Underscore.js or Lo-Dash operates on arrays and objects calculating intermediate results, transducers define the transformation in terms of functions similar to what you pass reduce: start with an initial value to use for the initial result, execute a function with a result and an item, and return the possibly transformed result for the next iteration. Once you abstract the transformation away from the data, you can apply the same transformations to different processes that start with an initial value and step through a result.

We have shown the same transducer can operate on different output sources by changing the initial value and a stepper transformation that you use to create the transducer. A benefit to this abstraction is that you can compute the result in one pass, without intermediate aggregate arrays.

Although, not explicitly stated, we also showed that transducers decouple iteration and the input source from the transducer. We used the same transducer by manually iterating over items by pushing values through the step function, and we used array reduce to pull values from an array.

I want more!

It's here! A future article will expand on this one and talk about transducers that do not send an item on each step, and early termination with reduced values. We glossed over the purpose of init and result, so we will talk more about those.

We will also see the input source could be anything that produces a sequences of values: lazy lists, indefinite sequence generation, CSP, [push streams][12], Node.js streams, iterators, generators, immutable-js data structures, etc.

But, I can't wait!

In the meantime, take a look at the Clojure documentation, or check out this video or this article. There are several more great introductions. Just Google it.

Want to use them right away? There are already two great libraries with very similar APIs: transducers-js and transducers.js. This article follows the transducers-js implementation closely, but the concepts apply to transducers.js as well.

A fan of Underscore.js? Checkout underarm, based on the transduce libraries.

How about using transducers with Node.js streams? We are only scratching the surface.

Want to be notified of new articles? Follow simplectic on Twitter.