Iterators and generators in ECMAScript 6

[2015-02-26] New version of this blog post: “Iterables and iterators in ECMAScript 6


The iterator pattern enables unified and simple access to the elements stored in data structures.
Generators are lightweight coroutines. Think: functions that can be suspended and resumed. Among other things, they help with implementing iterators.


This blog post explains how iterators and generators work in ECMAScript 6. The iterator protocol has recently changed, this post explains the new protocol.



Iterators



In the iterator pattern, the iterator is an object that functions as a “pointer” into a data structure. It produces a sequence with all of the data structure’s elements. The pattern has two advantages: First, you don’t need an intermediate data structure for storing the sequence of elements. Thus, it is reminiscent of explicit lazy programming and allows you to work with large (e.g. disk-based) data structures.


Second, many algorithms (map, filter, etc.) can based on iterators and are decoupled from how data structures store elements.

Iterator protocols



Let’s look at how other languages implement iterators. Typical ingredients are: return the next element, notify the caller that the end has been reached.

  • Java: iterators have two methods.

    • next() returns the next element.

    • hasNext() returns false if there are no more elements.



  • C#: iterators have

    • a property Current that returns the current element.

    • a method MoveNext() that advances the iterator and returns false if iteration has progressed beyond the last element.



  • Python: has a single method.

    • __next__(): returns the next element or raises a StopIteration exception if there are no further elements.


    That means that you invoke __next__ until an exception is thrown.


For ECMAScript 6, Python’s approach was initially adopted. After a long discussion, a different, more functional, prototcol was chosen. Iterators now have a single method, next(), that returns either of two values:

  1. Returning an element: { done: false, value: elem }

  2. After the last element: { done: true[, value: retVal] }


We will answer one important question later (in Sect. 3.2): Why can iterators (optionally) return a value after the last element? That capability is the reason for elements being wrapped. Otherwise, iterators could simply return a publicly defined sentinel (stop value) after the last element.

Implementing an iterator



The following function returns an iterator for an array.

function createArrayIterator(arr) {
let index = 0;
return {
next() {
if (index < arr.length) {
return { done: false, value: arr[index++] };
} else {
return { done: true };
}
}
}
}


Iterables



Data structures that can be iterated over are called iterables in ECMAScript 6. They have a method that returns an iterator for all of their elements. That method has a special key: the symbol Symbol.iterator.

class MySpecialTree {
...
[Symbol.iterator]() { // (*)
...
return theIterator;
}
}

The key of the method starting in line (*) is the symbol Symbol.iterator (not a string).

The for-of loop



ECMAScript 6 has a new loop, for-of. That loop works with iterables. Before we can use it with createArrayIterator(), we need to turn the result into an iterable.

function createArrayIterator(arr) {
let index = 0;
return {
[Symbol.iterator]() { // protocol: iterable
return this; // an iterator!
},
next() { // protocol: iterator
...
}
}
}

Method iterate returns an iterator, the object itself. Now we can use for-of:

const arr = [ 'red', 'green', 'blue' ];
for(let x of createArrayIterator(arr)) {
console.log(x);
}

In ECMAScript 6, array are iterable and the iteration is over their elements. But there is also a method Array.prototype.entries() for iterating over [index, element] pairs:

let arr = ['foo', 'bar', 'baz'];
for (let [index,element] of arr.entries()) {
console.log(index + '. ' + element);
}

The output of this code is:

0. foo
1. bar
2. baz


entries() returns an iterable, similar to createArrayIterator(), above.

Generators



Generators are a special kind of function that can be suspended and resumed. They have the yield operator which could be called a “resumable return”: it returns a value, but when the generator is resumed, execution doesn’t start fresh, it continues after the yield.
Similarly to Python, from which this feature has been borrowed, generators are tightly integrated with iterators in ECMAScript 6.


The following is a generator function (short: generator). It is written like a normal JavaScript function declaration, but with the keyword function* instead of function.


function* generatorFunction() {
yield 1;
yield 2;
}

Calling a generator function produces an object for controlling generator execution, a so-called generator object. Such objects are iterators. For example:

> let genObj = generatorFunction();
> genObj.next()
{ done: false, value: 1 }
> genObj.next()
{ done: false, value: 2 }
> genObj.next()
{ done: true }

Generator objects are also iterable – their iterate method returns this.

Using generators for iteration



Generators allow you to use recursion to implement an iterator. As an example, let’s say you want to traverse a tree that is encoded as nested arrays. A traditional recursive implementation would use a callback:

function traverseTree(tree, visitor) {
if (Array.isArray(tree)) {
// inner node
for(let i=0; i < tree.length; i++) {
traverseTree(tree[i], visitor); // (*) recursion
}
} else {
// leaf
visitor(tree);
}
}

The function in use:

> const tree = [ 'a', ['b', 'c'], ['d', 'e'] ];
> traverseTree(tree, function (x) { console.log(x) })
a
b
c
d
e

If we want to traverse the tree via an iterator and implement that iterator via a generator, we have to figure out how to perform the recursive call in line (*). The problem is that the generator cannot really call itself, because doing so would simply produce a generator object. It can’t use another function, either, because yield is only allowed in generator functions. Thus, ECMAScript 6 provides the dedicated operator yield* for this use case:

function* iterTree(tree) {
if (Array.isArray(tree)) {
// inner node
for(let i=0; i < tree.length; i++) {
yield* iterTree(tree[i]); // (*) recursion
}
} else {
// leaf
yield tree;
}
}

yield* in line (*) yields everything that is yielded by the iterable that is its operand. It is equivalent to iterating over the operand’s elements and yielding them. Since iterTree returns an iterable, it works with for-of:

const tree = [ 'a', ['b', 'c'], ['d', 'e'] ];
for(let x of iterTree(tree)) {
console.log(x);
}


Using generators as lightweight threads



Generators can also be used as lightweight threads. For example, you could break up a long-running task into smaller steps and pause via yield:

function* longRunningTask() {
// do something
yield; // pause
// do something
yield; // pause
...
}

This task would be executed via a scheduling function:

scheduler(longRunningTask());

The following is a very simple scheduling function.

function scheduler(task) {
setTimeout(function () {
if (!task.next().done) {
scheduler(task);
}
}, 0);
}

So far so good. But what if you want the steps of the long-running task to be performed by other (generator) functions? In a non-generator function, that would look as follows.

function longRunningTask() {
let result1 = step1();
// yield
let result2 = step2();
// yield
step3(result1, result2);
}
function step1() {
...
return result1;
}
function step2() {
...
return result2;
}
function step3(param1, param2) {
...
}

If we are to implement the above as a generator function, we have a problem: We need step1 and step2 to return values to their invoker. But until now, we have not seen how generators would be able to do that. If a generator function is called recursively via yield*, it can yield values, but those values don’t reach the location where yield* is used. Therefore, generators can return values in addition to yielding them. A returned value is passed on to yield*:

function* longRunningTask() {
let result1 = yield* step1();
yield;
let result2 = yield* step2();
yield;
yield* step3(result1, result2);
}
function* step1() {
...
return result1;
}
function* step2() {
...
return result2;
}
function* step3(param1, param2) {
...
}

Note that the returned value comes after everything has been yielded. This explains why iterators can optionally attach a value to the “end of iteration” marker: it is the value that is returned by generators and returned to yield*.

task.js



task.js takes the idea sketched in this section one step further: you write tasks as generator functions. If you yield a promise object (e.g. the result of a function call) from such a generator then task.js wakes it up once the promise is fulfilled. Therefore, your code looks synchronous, but can make asynchronous requests. Additionally, generators-as-tasks can call other generators as if they were normal functions – thanks to yield* and the ability of generators to return values.

When can I use these things?



Quoting a tweet from @wycats (Yehuda Katz):

Node 0.11.13 ships with generators. Alea iacta est

Node.js gets generators from V8. Thus, Chrome will support them, soon, too.


Firefox has had iterators and generators for a long time, but they still work slightly differently from the ECMAScript 6 standard.


Additionally, you can keep an eye on the ECMAScript 6 compatibility table to find out how much various engines support.

Comments

Popular posts from this blog

Steve Lopez and the Importance of Newspapers

Ideas for fixing unconnected computing

Omar to kill me