Foundations¶
You should read through the quickstart before reading this document.
Distributed computing is hard for two reasons:
- Consistent coordination of distributed systems requires sophistication
- Concurrent network programming is tricky and error prone
The foundations of dask.distributed
provide abstractions to hide some
complexity of concurrent network programming (#2). These abstractions ease the
construction of sophisticated parallel systems (#1) in a safer environment.
However, as with all layered abstractions, ours has flaws. Critical feedback
is welcome.
Concurrency with Tornado Coroutines¶
Worker and Scheduler nodes operate concurrently. They serve several overlapping requests and perform several overlapping computations at the same time without blocking. There are several approaches for concurrent programming, we’ve chosen to use Tornado for the following reasons:
- Developing and debugging is more comfortable without threads
- Tornado’s documentation is excellent
- Stackoverflow coverage is excellent
- Performance is satisfactory
Communication with Tornado Streams (raw sockets)¶
Workers, the Scheduler, and clients communicate with each other over the network. They use raw sockets as mediated by tornado streams. We separate messages by a sentinel value.
Servers¶
Worker and Scheduler nodes serve requests over TCP. Both Worker and Scheduler
objects inherit from a Server
class. This Server class thinly wraps
tornado.tcpserver.TCPServer
. These servers expect requests of a particular
form.
-
class
distributed.core.
Server
(handlers, max_buffer_size=8353576960.0, connection_limit=512, deserialize=True, **kwargs)[source]¶ Distributed TCP Server
Superclass for both Worker and Scheduler objects. Inherits from
tornado.tcpserver.TCPServer
, adding a protocol for RPC.Handlers
Servers define operations with a
handlers
dict mapping operation names to functions. The first argument of a handler function must be a stream for the connection to the client. Other arguments will receive inputs from the keys of the incoming message which will always be a dictionary.>>> def pingpong(stream): ... return b'pong'
>>> def add(stream, x, y): ... return x + y
>>> handlers = {'ping': pingpong, 'add': add} >>> server = Server(handlers) >>> server.listen(8000)
Message Format
The server expects messages to be dictionaries with a special key, ‘op’ that corresponds to the name of the operation, and other key-value pairs as required by the function.
So in the example above the following would be good messages.
{'op': 'ping'}
{'op': 'add': 'x': 10, 'y': 20}
RPC¶
To interact with remote servers we typically use rpc
objects.
-
class
distributed.core.
rpc
(arg=None, stream=None, ip=None, port=None, addr=None, deserialize=True, timeout=3)[source]¶ Conveniently interact with a remote server
Normally we construct messages as dictionaries and send them with read/write
>>> stream = yield connect(ip, port) >>> msg = {'op': 'add', 'x': 10, 'y': 20} >>> yield write(stream, msg) >>> response = yield read(stream)
To reduce verbosity we use an
rpc
object.>>> remote = rpc(ip=ip, port=port) >>> response = yield remote.add(x=10, y=20)
One rpc object can be reused for several interactions. Additionally, this object creates and destroys many streams as necessary and so is safe to use in multiple overlapping communications.
When done, close streams explicitly.
>>> remote.close_streams()
Example¶
Here is a small example using distributed.core to create and interact with a custom server.
Server Side¶
from tornado import gen
from tornado.ioloop import IOLoop
from distributed.core import write, Server
def add(stream, x=None, y=None): # simple handler, just a function
return x + y
@gen.coroutine
def stream_data(stream, interval=1): # complex handler, multiple responses
data = 0
while True:
yield gen.sleep(interval)
data += 1
yield write(stream, data)
s = Server({'add': add, 'stream': stream_data})
s.listen(8888)
IOLoop.current().start()
Client Side¶
from tornado import gen
from tornado.ioloop import IOLoop
from distributed.core import connect, read, write
@gen.coroutine
def f():
stream = yield connect('127.0.0.1', 8888)
yield write(stream, {'op': 'add', 'x': 1, 'y': 2})
result = yield read(stream)
print(result)
>>> IOLoop().run_sync(f)
3
@gen.coroutine
def g():
stream = yield connect('127.0.0.1', 8888)
yield write(stream, {'op': 'stream', 'interval': 1})
while True:
result = yield read(stream)
print(result)
>>> IOLoop().run_sync(g)
1
2
3
...
Client Side with rpc¶
RPC provides a more pythonic interface. It also provides other benefits, such as using multiple streams in concurrent cases. Most distributed code uses rpc. The exception is when we need to perform multiple reads or writes, as with the stream data case above.
from tornado import gen
from tornado.ioloop import IOLoop
from distributed.core import rpc
@gen.coroutine
def f():
# stream = yield connect('127.0.0.1', 8888)
# yield write(stream, {'op': 'add', 'x': 1, 'y': 2})
# result = yield read(stream)
r = rpc(ip='127.0.0.1', 8888)
result = yield r.add(x=1, y=2)
print(result)
>>> IOLoop().run_sync(f)
3
Everything is a Server¶
Workers, Scheduler, and Nanny objects all inherit from Server. Each maintains separate state and serves separate functions but all communicate in the way shown above. They talk to each other by opening connections, writing messages that trigger remote functions, and then collect the results with read.