ES proposal: Shared memory and atomics
The ECMAScript proposal “Shared memory and atomics” by Lars T. Hansen has reached stage 4 this week and will be part of ECMAScript 2017. It introduces a new constructor SharedArrayBuffer
and a namespace object Atomics
with helper functions. This blog post explains the details.
Parallelism vs. concurrency
Before we begin, let’s clarify two terms that are similar, yet distinct: “parallelism” and “concurrency”. Many definitions for them exist; I’m using them as follows:
- Parallelism (parallel vs. serial): execute multiple tasks simultaneously
- Concurrency (concurrent vs. sequential): execute several tasks during overlapping periods of time (and not one after another).
Both are closely related, but not the same:
- Parallelism without concurrency: single instruction, multiple data (SIMD). Multiple computations happen in parallel, but only a single task (instruction) is executed at any given moment.
- Concurrency without parallelism: multitasking via time-sharing on a single-core CPU.
However, it is difficult to use these terms precisely, which is why interchanging them is usually not a problem.
Models of parallelism
Two models of parallelism are:
Data parallelism: The same piece of code is executed several times in parallel. The instances operate on different elements of the same dataset. For example: MapReduce is a data-parallel programming model.
Task parallelism: Different pieces of code are executed in parallel. Examples: web workers and the Unix model of spawning processes.
A history of JS parallelism
JavaScript started as being executed in a single thread. Some tasks could be performed asynchronously: browsers usually ran those tasks in separate threads and later fed their results back into the single thread, via callbacks.
Web workers brought task parallelism to JavaScript: They are relatively heavweight processes. Each worker has its own global environment. By default, nothing is shared. Communication between workers (or between workers and the main thread) evolved:
- At first, you could only send and receive strings.
- Then, structured cloning was introduced: copies of data could be sent and received. Structured cloning works for most data (JSON data, Typed Arrays, regular expressions,
Blob
objects,ImageData
objects, etc.). It can even handle cyclic references between objects correctly. However, error objects, function objects and DOM nodes cannot be cloned. - Transferables move data between workers: the sending party loses access as the receiving party gains access to data.
Computing on GPUs (which tend to do data parallelism well) via WebGL: It’s a bit of a hack and works as follows.
- Input: your data, converted into an image (pixel by pixel).
- Processing: OpenGL pixel shaders can perform arbitrary computations on GPUs. Your pixel shader transforms the input image.
- Output: again an image that you can convert back to your kind of data.
SIMD (low-level data parallelism): is supported via the ECMAScript proposal SIMD.js. It allows you to perform operations (such as addition and square root) on several integers or floats at the same time.
PJS (codenamed River Trail): the plan of this ultimately abandoned project was to bring high-level data parallelism (think map-reduce via pure functions) to JavaScript. However, there was not enough interest from developers and engine implementers. Without implementations, one could not experiment with this API, because it can’t be polyfilled. On 2015-01-05, Lars T. Hansen announced that an experimental implementation was going to be removed from Firefox.
The next step: SharedArrayBuffer
What’s next? For low-level parallelism, the direction is quite clear: support SIMD and GPUs as well as possible. However, for high-level parallelism, things are much less clear, especially after the failure of PJS.
What is needed is a way to try out many approaches, to find out how to best bring high-level parallelism to JavaScript. Following the principles of the extensible web manifesto, the proposal “shared memory and atomics” (a.k.a. “Shared Array Buffers”) does so by providing low-level primitives that can be used to implement higher-level constructs.
Shared Array Buffers
Shared Array Buffers are a primitive building block for higher-level concurrency abstractions. They allow you to share the bytes of a SharedArrayBuffer
object between multiple workers and the main thread (the buffer is shared, to access the bytes, wrap it in a Typed Array). This kind of sharing has two benefits:
- You can share data between workers more quickly.
- Coordination between workers becomes simpler and faster (compared to
postMessage()
).
Creating and sending a Shared Array Buffer
// main.js
const worker = new Worker('worker.js');
// To be shared
const sharedBuffer = new SharedArrayBuffer( // (A)
10 * Int32Array.BYTES_PER_ELEMENT); // 10 elements
// Share sharedBuffer with the worker
worker.postMessage({sharedBuffer}); // clone
// Local only
const sharedArray = new Int32Array(sharedBuffer); // (B)
You create a Shared Array Buffer the same way you create a normal Array Buffer: by invoking the constructor and specifying the size of the buffer in bytes (line A). What you share with workers is the buffer. For your own, local, use, you normally wrap Shared Array Buffers in Typed Arrays (line B).
Warning: Cloning a Shared Array Buffer is the correct way of sharing it, but some engines still implement an older version of the API and require you to transfer it:
worker.postMessage({sharedBuffer}, [sharedBuffer]); // transfer (deprecated)
In the final version of the API, transferring a Shared Array Buffer means that you lose access to it.
Receiving a Shared Array Buffer
The implementation of the worker looks as follows.
// worker.js
self.addEventListener('message', function (event) {
const {sharedBuffer} = event.data;
const sharedArray = new Int32Array(sharedBuffer); // (A)
// ···
});
We first extract the Shared Array Buffer that was sent to us and then wrap it in a Typed Array (line A), so that we can use it locally.
Accessing Shared Array Buffers
If you access shared memory like you access normal memory, you are facing two problems.
Problem 1: you may read intermediate results
Consider the following code where main.js
and worker.js
were set up like shown previously.
// Initialization before sharing the Array
sharedArray[0] = 1;
// main.js
sharedArray[0] = 2;
// worker.js
while (sharedArray[0] === 1) ; // (A)
console.log(sharedArray[0]); // (B)
Propagating shared state takes time, which is why the values we read in lines A and B may be neither 1 nor 2.
Problem 2: writes may be reordered
Let’s assume reading intermediate results is not an issue. Then you face the problem that readers (such as worker.js
in the following example) cannot rely on writes (by main.js
) happening in a deterministic order.
// Initialization before sharing the Array
sharedArray[0] = 1;
// main.js
sharedArray[0] = 2;
sharedArray[0] = 3;
// worker.js
let v;
while ((v = sharedArray[0]) === 1);
console.log(v); // either 2 or 3
worker.js
cannot rely on main.js
writing 2 first and then 3. The reason for that is that within the same thread, compilers (and even CPUs) may reorder writes if no (local) read depends on them.
How do we fix these two problems?
Solution: Atomics
You can’t use normal operations to read and write data to Shared Array Buffers, you need to use the functions provided via the namespace object Atomics
. For example, if we rewrite the first of the previous two code fragments, we get:
// Initialization before sharing the Array
Atomics.store(sharedArray, 0, 1);
// main.js
Atomics.store(sharedArray, 0, 2);
// worker.js
while (Atomics.load(sharedArray, 0) === 1) ;
console.log(Atomics.load(sharedArray, 0)); // 2
Atomics
ensures two things:
Its operations are like database transactions in that they happen atomically: all of the writing is done in a single step; you can’t observe intermediate states.
Additionally, the order of writes is fixed; they will never be reordered.
Shared Array Buffers and the run-to-completion semantics of JavaScript
JavaScript has so-called run-to-completion semantics: every function can rely on not being interrupted by another thread until it is finished. Functions become transactions and can perform complete algorithms without anyone seeing the data they operate on in an intermediate state.
Shared Array Buffers break run to completion (RTC): data a function is working on can be changed by another thread during the runtime of the function. However, code has complete control over whether or not this violation of RTC happens: if it doesn’t use Shared Array Buffers, it is safe.
This is loosely similar to how async functions violate RTC. There, you opt into a blocking operation via the keyword await
.
Shared Array Buffers and asm.js and WebAssembly
Shared Array Buffers enable emscripten to compile pthreads to asm.js. Quoting an emscripten documentation page:
[Shared Array Buffers allow] Emscripten applications to share the main memory heap between web workers. This along with primitives for low level atomics and futex support enables Emscripten to implement support for the Pthreads (POSIX threads) API.
That is, you can compile multithreaded C and C++ code to asm.js.
Discussion on how to best bring multi-threading to WebAssembly is ongoing. Given that web workers are relatively heavyweight, it is possible that WebAssembly will introduce lightweight threads. You can also see that threads are on the roadmap for WebAssembly’s future.
Sharing data other than integers
At the moment, only Arrays of integers (up to 32 bits long) can be shared. That means that the only way of sharing other kinds of data is by encoding them as integers. Tools that may help include:
TextEncoder
andTextDecoder
: The former converts strings to instances ofUint8Array
. The latter does the opposite.stringview.js: a library that handles strings as arrays of characters. Uses Array Buffers.
FlatJS: enhances JavaScript with ways of storing complex data structures (structs, classes and arrays) in flat memory (
ArrayBuffer
andSharedArrayBuffer
). JavaScript+FlatJS is compiled to plain JavaScript. JavaScript dialects (TypeScript etc.) are supported.TurboScript: is a JavaScript dialect for fast parallel programming. It compiles to asm.js and WebAssembly.
Eventually, there will probably be additional – higher-level – mechanisms for sharing data. And experiments will continue to figure out what these mechanisms should look like.
How much faster is code that uses Shared Array Buffers?
Lars T. Hansen has written two implementations of the Mandelbrot algorithm (as documented in his article “A Taste of JavaScript’s New Parallel Primitives” where you can try them out online): A serial version and a parallel version that uses multiple web workers. For up to 4 web workers (and therefore processor cores), speed-up improves almost linearly, from 6.9 frames per seconds (1 web worker) to 25.4 frames per seconds (4 web workers). More web workers bring additional performance improvements, but more modest ones.
Hansen notes that the speed-ups are impressive, but going parallel comes at the cost of the code being more complex.
Example
Let’s look at a more comprehensive example. Its code is available on GitHub, in the repository shared-array-buffer-demo
. And you can run it online.
Using a shared lock
In the main thread, we set up shared memory so that it encodes a closed lock and send it to a worker (line A). Once the user clicks, we open the lock (line B).
// main.js
// Set up the shared memory
const sharedBuffer = new SharedArrayBuffer(
1 * Int32Array.BYTES_PER_ELEMENT);
const sharedArray = new Int32Array(sharedBuffer);
// Set up the lock
Lock.initialize(sharedArray, 0);
const lock = new Lock(sharedArray, 0);
lock.lock(); // writes to sharedBuffer
worker.postMessage({sharedBuffer}); // (A)
document.getElementById('unlock').addEventListener(
'click', event => {
event.preventDefault();
lock.unlock(); // (B)
});
In the worker, we set up a local version of the lock (whose state is shared with the main thread via a Shared Array Buffer). In line B, we wait until the lock is unlocked. In lines A and C, we send text to the main thread, which displays it on the page for us (how it does that is not shown in the previous code fragment). That is, we are using self.postMessage()
much like console.log()
in these two lines.
// worker.js
self.addEventListener('message', function (event) {
const {sharedBuffer} = event.data;
const lock = new Lock(new Int32Array(sharedBuffer), 0);
self.postMessage('Waiting for lock...'); // (A)
lock.lock(); // (B) blocks!
self.postMessage('Unlocked'); // (C)
});
It is noteworthy that waiting for the lock in line B stops the complete worker. That is real blocking, which hasn’t existed in JavaScript until now (await
in async functions is an approximation).
Implementing a shared lock
Next, we’ll look at an ES6-ified version of a Lock
implementation by Lars T. Hansen that is based on SharedArrayBuffer
.
In this section, we’ll need (among others) the following Atomics
function:
Atomics.compareExchange(ta : TypedArray<T>, index, expectedValue, replacementValue) : T
If the current element ofta
atindex
isexpectedValue
, replace it withreplacementValue
. Return the previous (or unchanged) element atindex
.
The implementation starts with a few constants and the constructor:
const UNLOCKED = 0;
const LOCKED_NO_WAITERS = 1;
const LOCKED_POSSIBLE_WAITERS = 2;
// Number of shared Int32 locations needed by the lock.
const NUMINTS = 1;
class Lock {
/**
* @param iab an Int32Array wrapping a SharedArrayBuffer
* @param ibase an index inside iab, leaving enough room for NUMINTS
*/
constructor(iab, ibase) {
// OMITTED: check parameters
this.iab = iab;
this.ibase = ibase;
}
The constructor mainly stores its parameters in instance properties.
The method for locking looks as follows.
/**
* Acquire the lock, or block until we can. Locking is not recursive:
* you must not hold the lock when calling this.
*/
lock() {
const iab = this.iab;
const stateIdx = this.ibase;
var c;
if ((c = Atomics.compareExchange(iab, stateIdx, // (A)
UNLOCKED, LOCKED_NO_WAITERS)) !== UNLOCKED) {
do {
if (c === LOCKED_POSSIBLE_WAITERS // (B)
|| Atomics.compareExchange(iab, stateIdx,
LOCKED_NO_WAITERS, LOCKED_POSSIBLE_WAITERS) !== UNLOCKED) {
Atomics.wait(iab, stateIdx, // (C)
LOCKED_POSSIBLE_WAITERS, Number.POSITIVE_INFINITY);
}
} while ((c = Atomics.compareExchange(iab, stateIdx,
UNLOCKED, LOCKED_POSSIBLE_WAITERS)) !== UNLOCKED);
}
}
In line A, we change the lock to LOCKED_NO_WAITERS
if its current value is UNLOCKED
. We only enter the then-block if the lock is already locked (in which case compareExchange()
did not change anything).
In line B (inside a do-while
loop), we check if the lock is locked with waiters or not unlocked. Given that we are about to wait, the compareExchange()
also switches to LOCKED_POSSIBLE_WAITERS
if the current value is LOCKED_NO_WAITERS
.
In line C, we wait if the lock value is LOCKED_POSSIBLE_WAITERS
. The last parameter, Number.POSITIVE_INFINITY
, means that waiting never times out.
After waking up, we continue the loop if we are not unlocked. compareExchange()
also switches to LOCKED_POSSIBLE_WAITERS
if the lock is UNLOCKED
. We use LOCKED_POSSIBLE_WAITERS
and not LOCKED_NO_WAITERS
, because we need to restore this value after unlock()
temporarily set it to UNLOCKED
and woke us up.
The method for unlocking looks as follows.
/**
* Unlock a lock that is held. Anyone can unlock a lock that
* is held; nobody can unlock a lock that is not held.
*/
unlock() {
const iab = this.iab;
const stateIdx = this.ibase;
var v0 = Atomics.sub(iab, stateIdx, 1); // A
// Wake up a waiter if there are any
if (v0 !== LOCKED_NO_WAITERS) {
Atomics.store(iab, stateIdx, UNLOCKED);
Atomics.wake(iab, stateIdx, 1);
}
}
// ···
}
In line A, v0
gets the value that iab[stateIdx]
had before 1 was subtracted from it. The subtraction means that we go (e.g.) from LOCKED_NO_WAITERS
to UNLOCKED
and from LOCKED_POSSIBLE_WAITERS
to LOCKED
.
If the value was previously LOCKED_NO_WAITERS
then it is now UNLOCKED
and everything is fine (there is no one to wake up).
Otherwise, the value was either LOCKED_POSSIBLE_WAITERS
or UNLOCKED
. In the former case, we are now unlocked and must wake up someone (who will usually lock again). In the latter case, we must fix the illegal value created by subtraction and the wake()
simply does nothing.
Conclusion for the example
This gives you a rough idea how locks based on SharedArrayBuffer
work. Keep in mind that multithreaded code is notoriously difficult to write, because things can change at any time. Case in point: lock.js
is based on a paper documenting a futex implementation for the Linux kernel. And the title of that paper is “Futexes are tricky” (PDF).
If you want to go deeper into parallel programming with Shared Array Buffers, take a look at synchronic.js
and the document it is based on (PDF).
The API for shared memory and atomics
SharedArrayBuffer
Constructor:
new SharedArrayBuffer(length)
Create a buffer forlength
bytes.
Static property:
get SharedArrayBuffer[Symbol.species]
Returnsthis
by default. Override to control whatslice()
returns.
Instance properties:
get SharedArrayBuffer.prototype.byteLength()
Returns the length of the buffer in bytes.SharedArrayBuffer.prototype.slice(start, end)
Create a new instance ofthis.constructor[Symbol.species]
and fill it with the bytes at the indices from (including)start
to (excluding)end
.
Atomics
The main operand of Atomics
functions must be an instance of Int8Array
, Uint8Array
, Int16Array
, Uint16Array
, Int32Array
or Uint32Array
. It must wrap a SharedArrayBuffer
.
All functions perform their operations atomically. The ordering of store operations is fixed and can’t be reordered by compilers or CPUs.
Loading and storing
Atomics.load(ta : TypedArray<T>, index) : T
Read and return the element ofta
atindex
.Atomics.store(ta : TypedArray<T>, index, value : T) : T
Writevalue
tota
atindex
and returnvalue
.Atomics.exchange(ta : TypedArray<T>, index, value : T) : T
Set the element ofta
atindex
tovalue
and return the previous value at that index.
Atomics.compareExchange(ta : TypedArray<T>, index, expectedValue, replacementValue) : T
If the current element ofta
atindex
isexpectedValue
, replace it withreplacementValue
. Return the previous (or unchanged) element atindex
.
Simple modification of Typed Array elements
Each of the following functions changes a Typed Array element at a given index: It applies an operator to the element and a parameter and writes the result back to the element. It returns the original value of the element.
Atomics.add(ta : TypedArray<T>, index, value) : T
Performta[index] += value
and return the original value ofta[index]
.Atomics.sub(ta : TypedArray<T>, index, value) : T
Performta[index] -= value
and return the original value ofta[index]
.Atomics.and(ta : TypedArray<T>, index, value) : T
Performta[index] &= value
and return the original value ofta[index]
.Atomics.or(ta : TypedArray<T>, index, value) : T
Performta[index] |= value
and return the original value ofta[index]
.Atomics.xor(ta : TypedArray<T>, index, value) : T
Performta[index] ^= value
and return the original value ofta[index]
.
Waiting and waking
Waiting and waking requires the parameter ta
to be an instance of Int32Array
.
Atomics.wait(ta: Int32Array, index, value, timeout=Number.POSITIVE_INFINITY) : ('not-equal' | 'ok' | 'timed-out')
If the current value atta[index]
is notvalue
, return'not-equal'
. Otherwise go to sleep until we are woken up viaAtomics.wake()
or until sleeping times out. In the former case, return'ok'
. In the latter case, return'timed-out'
.timeout
is specified in milliseconds. Mnemonic for what this function does: “wait ifta[index]
isvalue
”.Atomics.wake(ta : Int32Array, index, count)
Wake upcount
number of workers who are waiting atindex
ofta
.
Miscellaneous
Atomics.isLockFree(size)
This function lets you ask the JavaScript engine if operands with the givensize
(in bytes) can be manipulated without locking. That can inform algorithms whether they want to rely on built-in primitives (compareExchange()
etc.) or use their own locking.Atomics.isLockFree(4)
always returnstrue
, because that’s what all currently relevant supports.
FAQ
What browsers support Shared Array Buffers?
At the moment, I’m aware of:
- Firefox (50.1.0+): go to
about:config
and setjavascript.options.shared_memory
totrue
- Safari Technology Preview (Release 21+): enabled by default.
- Chrome Canary (58.0+): There are two ways to switch it on.
- Via
chrome://flags/
(“Experimental enabled SharedArrayBuffer support in JavaScript”) --js-flags=--harmony-sharedarraybuffer --enable-blink-feature=SharedArrayBuffer
- Via
Further reading
More information on Shared Array Buffers and supporting technologies:
“Shared memory – a brief tutorial” by Lars T. Hansen
“A Taste of JavaScript’s New Parallel Primitives” by Lars T. Hansen [a good intro to Shared Array Buffers]
“SharedArrayBuffer and Atomics Stage 2.95 to Stage 3” (PDF), slides by Shu-yu Guo and Lars T. Hansen (2016-11-30) [slides accompanying the ES proposal]
“The Basics of Web Workers” by Eric Bidelman [an introduction to web workers]
Other JavaScript technologies related to parallelism:
- “The Path to Parallel JavaScript” by Dave Herman [a general overview of where JavaScript is heading after the abandonment of PJS]
- “Write massively-parallel GPU code for the browser with WebGL” by Steve Sanderson [fascinating talk that explains how to get WebGL to do computations for you on the GPU]
Background on parallelism:
- “Concurrency is not parallelism” by Rob Pike [Pike uses the terms “concurrency” and “parallelism” slightly differently than I do in this blog post, providing an interesting complementary view]
Comments
Post a Comment