Summary
This is a Python3 asynchronous I/O proposal starting with Python3.3. Study specific proposals missing from PEP 3153. This proposal includes a pluggable event loop API, transport and protocol abstractions similar to Twisted, as well as a higher-level yield-based scheduler from (PEP 380). A reference implementation in a work, its code is named Tulip (the link to Tulip's repo is placed in the reference section at the end of the article).
Introduction
Event loops are often used in places with high interoperability. For frameworks like Twisted, Tornado or ZeroMQ (based on Python 3.3), it should be easy to adapt the default event loop implementation through lightweight wrappers or proxies according to the needs of the framework, or use their own events Loop implementation to replace the default implementation. (Some frameworks, like Twisted, have multiple event loop implementations. Since these implementations all have a unified interface, this should not be a problem.)
Event loops may even have two different third-party frameworks interacting, Either by sharing the default event loop implementation (each using its own adapter), or by sharing the event loop implementation of one of the frameworks. In the latter, two different levels of adaptation may exist (from framework A's event loop to the standard event loop interface, and then from standard to framework B's event loop). The event loop implementation used should be under the control of the main program (although a default strategy is provided that the event loop can choose from).
Thus, two separate APIs are defined:
Getting and setting the loop object for the current event;
An interface that identifies the event loop object and its minimum guarantees
An event loop implementation may provide additional methods and guarantees .
The event loop interface does not depend on output, instead it uses a combination of callbacks, extra interfaces (transport protocols and protocols) and futures(?). The latter is similar to the interface defined in PEP3148, but has a different implementation and is not bound to threads. In particular, they do not have a wait() method, users will use callbacks.
For those who don’t like to use callback functions (including me), (Python) provides a scheduler for writing asynchronous I/O code as a coroutine, which uses the yield from expression of PEP 380. This scheduler is not pluggable; pluggability exists at the event loop level, and the scheduler should work on any standards-compliant event loop implementation.
For interoperability with code that uses coroutines and other asynchronous frameworks, the scheduler has a Task class that behaves like a Future. A framework that interacts at the event loop level can add a callback function to a Future and wait for a Future to complete. Likewise, the scheduler provides an operation to suspend the coroutine until the callback function is called.
Provide constraints for inter-thread interaction through the event loop interface; (in Python) there is an API that can submit a function to an executor that can return an event loop-compatible Future (see PEP 3148).
Without Purpose
System interoperability like Stackless Python or greenlets/gevent is not the purpose of this tutorial.
Specifications
Dependencies
Python3.3 is required. No new language or standard library is required beyond the scope of Python 3.3. No third-party modules or packages are required.
Module namespace
The specifications here will be placed in a new top-level package. Different components will be placed in different submodules of this package. Packages will import commonly used APIs from their respective submodules and make them available as package properties (similar to how email packages are done).
The name of the top-level package has not been specified yet. The reference implementation is named "tulip", but this name may be changed to something more annoying when this implementation is added to the standard library (hopefully in Python 3.4).
Before the annoying name is chosen, this tutorial will use "tulip" as the name of the top-level package. It is assumed that classes and functions without a given module name are accessed through the top-level package.
Event loop strategy: getting and setting the event loop
To get the current event loop, you can use get_event_loop(). This function returns an instance of the EventLoop class defined below or an equivalent object. get_event_loop() may return different objects based on the current thread, or return different objects based on other contextual concepts.
To set the current event loop, you can use set_event_loop(event_loop), where event_loop is an instance of the EventLoop class or an equivalent instance object. The same concept of context is used here as get_event_loop().
There is also a third strategy function: new_event_loop(), which is beneficial for unit testing and other special situations. It will create and return a new EventLoop instance based on the default rules of the strategy. To make it the current event loop, you need to call set_event_loop().
To change the way the above three functions work (including the concept of their context), you can call set_event_loop_policy(policy), where the parameter policy is an event loop policy object. This policy object can be any object that contains functions similar to those described above (get_event_loop(), set_event_loop(event_loop) and new_event_loop()). The default event loop policy is an instance of the DefaultEventLoopPolicy class. The current event loop policy object can be retrieved by calling get_event_loop_policy().
An event loop strategy does not mandate that only one event loop can exist. The default event loop policy does not enforce this, but it does enforce that there can only be one event loop per thread.
Event Loop Interface
About time: In Python, all timeouts, intervals and delays are calculated in seconds and can be integers or floats. Point type. The accuracy of the clock is implementation dependent; the default is time.monotonic().
About callbacks and handlers: If a function accepts a callback function and any number of variables as parameters, then you can also replace the callback function with a handler function object (Handler). In this case, there is no need to pass those parameters. This handler function object should be a function that returns immediately (fromcall_soon()), not a delayed return (fromcall_later()). If the handler has been canceled, this call will have no effect.
A standard-compliant event loop object has the following methods:
run(). Execute the event loop until there is nothing more to do. The specific meaning is:
Except for canceling the call, there are no more calls scheduled through call_later(), call_repeatedly(), call_soon(), orcall_soon_threadsafe().
No more registered file descriptors. The file descriptor is unregistered by the registering party when it is closed.
Note: run() will block until it encounters a termination condition or calls stop().
Note: If you use call_repeatedly() to perform a call, run() will not exit before you call stop().
Need to elaborate: How many similar ones do we really need to do?
run_forever(). The event loop runs until stop() is called.
run_until_complete(future, timeout=None). The event loop runs until the Future completes. If a timeout value is given, it will wait for the timeout time. If the Future completes, its result will be returned or its exception will be thrown; if the Future completes before timeout, or stop() is called, a TimeoutError will be thrown (but the Future will not be canceled). While the event loop is already running , this method cannot be called.
Note: This API is more used for testing or similar work. It should not be used as a future replacement for yield from expressions or other methods that await a Future. (e.g. registering a completion callback).
run_once(timeout=None). Run an event loop for a segment of events. If a timeout value is given, I/O polling will block for a period of time; otherwise, I/O polling will not be time-bound.
Note: To be precise, how much work is done here depends on the specific implementation. One constraint is: if one uses call_soon() to schedule itself directly, it will cause failure, and run_once() will still return.
stop(). Stop the event loop as quickly as possible. The loop (or a variant of it) can then be restarted using run().
Note: There are multiple blocks to stop based on its specific implementation. All direct callback functions that were running before stop() must still be running, but scheduled callback functions (or delayed ones) after stop() is called will not run.
close(). Closes the event loop and releases any resources it held, such as file descriptors used by epoll() or kqueue(). This method should not be called while the event loop is running. It can be called multiple times.
call_later(delay, callback, *args). Schedule callback(*args) with a delay of approximately delay seconds until it is called, unless canceled. Return a Handler object to represent the callback function. The cancel() method of the Handler object is often used to cancel the callback function.
call_repeatedly(interval, callback, **args). Similar to call_later(), but the callback function will be called repeatedly every interval seconds until the returned Handler is canceled. The first call is within interval seconds.
call_soon(callback, *args). Similar to call_later(0, callback, *args).
call_soon_threadsafe(callback, *args). Similar to call_soon(callback, *args), but when the event loop is blocked waiting for IO and called on another thread, the blocking of the event loop will be cancelled. This is the only method that can be safely called from another thread. (To schedule a callback function with a delay in a thread-safe way, you can use ev.call_soon_threadsafe(ev.call_later,when,callback,*args).) But it is not safe to call from a signal handler (because it Locks can be used).
add_signal_handler(sig, callback, *args). Whenever signal ``sigis" is received, callback(*args) will be scheduled to be called. Returns a Handler that can be used to cancel the signal callback function. (Canceling the return handler will cause remove_signal_handler() to be called when the next signal arrives. Call remove_signal_handler() explicitly first.) Define another return function for the same signal to replace the previous handler (each signal can only be activated a handler). The sig parameter must be a valid signal value defined in the signal module. This will throw an exception if the signal cannot be handled: if it is not a valid signal or if it is an uncatchable signal (such as SIGKILL), a ValueError will be thrown. If this particular event loop instance cannot handle the signal (because signals are global variables for each processor and only the main thread's event loop can handle these signals), it will throw a RuntimeError.
remove_signal_handler(sig). Remove handler for signal sig when set. Throws the same exception as add_signal_handler() (except returns False instead of raising a RuntimeError if a good signal cannot be received). If the handler is removed successfully, it returns True, if the handler is not set, it returns False.
Some methods that conform to standard interfaces and return Future:
wrap_future(future). This requires the Future described in PEP 3148 (for example, an instance of concurrent.futures.Future) and returning an event loop-compatible Future (for example, an instance of tulip.Future).
run_in_executor(executor, callback, *args). Arrange for calling callback(*args) in an executor (see PEP 3148). The successful result of the returned Future is the return value of the call. This method is equivalent to wrap_future(executor.submit(callback, *args)). If there is no executor, there will be a ThreadPoolExecutor that defaults to 5 threads.
set_default_executor(executor). Set a default executor used by run_in_executor().
getaddrinfo(host, port, family=0, type=0, proto=0, flags=0). Similar to the socket.getaddrinfo() function, but returns a Future. The successful result of Future is a column of data in the same format as the return value of socket.getaddrinfo(). The default implementation calls socket.getaddrinfo() via run_in_executor(), but other implementations may choose to use their own DNS lookup. Optional arguments must be specified keyword arguments.
getnameinfo(sockaddr, flags=0). Similar to socket.getnameinfo(), but returns a Future. The successful result of Future will be an array of (host, port). Has the same implementation as forgetaddrinfo().
create_connection(protocol_factory, host, port, **kwargs). Create a streaming connection using the given host and port. This creates a Transport-dependent implementation to represent the link, then calls protocol_factory() to instantiate (or retrieve) the user's Protocol implementation, and then binds the two together. (See the definitions of Transport and Protocol below.) The user's Protocol implementation is created or retrieved by calling protocol_factory() without parameters (*). The return value is a Future whose successful result is the (transport, protocol) pair; if an error prevented the creation of a successful link, the Future will contain an appropriate exception set. Note that when the Future is completed, the protocol's connection_made() method is not called; that will happen when the connection handshake is completed.
(*) There is no requirement that protocol_factory be a class. If your protocol class requires defined parameters to be passed to the constructor, you can use lambda or functool.partial(). You can also pass in a lambda of a previously constructed Protocol instance.
Optional key parameters:
family, proto, flags: address family, protocol, and mixed flag parameters are passed to getaddrinfo(). These all default to 0. ((The socket type is always SOCK_STREAM.)
ssl: Pass True to create an SSL transport (established by defaulting to a plain TCP). Or pass a ssl.SSLConteext object to override the default SSL context object. Use
start_serving(protocol_factory, host, port, **kwds) to enter a loop that receives connections. It is completed once the loop is set to the service; its return value is None whenever a connection is received. , the protocol_factory without parameters (*) is called to create a Protocol, a Transport representing the network side of the connection is created, and the two objects are bound together by calling protocol.connection_made(transport)
(*) See above. Additional notes on create_connection(). However, since protocol_factory() will only be called once for each incoming connection, it is recommended to return a new Protocol object each time it is called.
Optional key parameters:
family, proto, flags: Address family, protocol, and mixed flag parameters are passed to getaddrinfo(). These all default to 0. ((The socket type is always SOCK_STREAM.)
Additional: Does it support SSL? I don't know how to support (SSL) asynchronously, I suggest this requires a certificate
.Added: Perhaps the result object of a Future can be used to control the service loop, such as stopping the service, terminating all active connections, and (if supported) adjusting the backlog or other parameters? It can also have an API to query active connections. Also, if the loop stops serving due to an error, or if it can't be started, return a Future (subclass?) that just completes? Canceling it may cause the loop to stop.
Addition: Some platforms may not be interested in implementing all these methods. For example, mobile APP is not very interested in start_serving(). (Although I have a Minecraft server on my iPad...)
The following methods of registering callback functions for file descriptors are not required. If these methods are not implemented, an AttributeError will be returned when accessing these methods (rather than calling them). The default implementation provides these methods, but users generally do not use them directly and are only used exclusively by the transport layer. Similarly, on the Windows platform, these methods are not necessarily implemented. It depends on whether the select or IOCP event loop model is used. Both models accept an integer file descriptor instead of the object returned by the fileno() method. The file descriptor is preferably queryable, for example, a disk file is not.
add_reader(fd, callback, *args). Call the specified callback function callback(*args) when the file descriptor fd is ready for read operations. Returns a handler function object that can be used to cancel the callback function. Note that unlike call_later(), this callback function can be called multiple times. Calling add_reader() again on the same file descriptor will cancel the previously set callback function. Note: The cancellation handler may wait until the handler function is called. If you want to close fd, you should call remove_reader(fd). (TODO: If the handler function has been set, throw an exception).
add_writer(fd, callback, *args). Similar to add_reader(), but calls the callback function before the write operation can be performed.
remove_reader(fd). Removes the read operation callback function that has been set for file descriptor fd. If the callback function is not set, no operation will be performed. (Such an alternative interface is provided because recording file descriptors is more convenient and simpler than recording processing functions). Returns True if the deletion is successful and False if it fails.
remove_writer(fd). Removes the write operation callback function that has been set for file descriptor fd.
Unfinished: What should we do if a file descriptor contains multiple callback functions? The current mechanism is to replace the previous callback function, and if the callback function has been registered, an exception should be raised.
The following methods are optional in socket asynchronous I/O. They are an alternative to the optional methods mentioned above, intended to use IOCP in Windows' transport implementation (if the event loop supports it). The socket parameter must not block the socket.
sock_recv(sock, n). Receive bytes from the socket sock. Returns a Future, which will be a bytes object on success.
sock_sendall(sock, data). Send bytes of data to the socket sock. Returns a Future. The result of Future will be None after success. (Additional: Would it be better to have it emulate sendall() or send()? But I think sendall() - maybe it should be named send()?)
sock_connect(sock, address). Connect to the given address. Returns a Future, the successful result of Future is None.
sock_accept(sock). Receive a link from the socket. The socket must be in listening mode and bound to a custom socket. Returns a Future. The successful result of the Future will be an array of (conn, peer), where conn is a connected non-blocking socket and peer is the peer address. (Sidebar: people tell me this API style is very slow for high-level servers. So there is start_sering() above. Do we still need this?)
Sidebar: None of the optional methods are very good. Maybe these are all needed? It still relies on a more efficient setup of the platform. Another possibility is that the documentation notes these are "available for transmission only" and others are "available in any case".
Callback order
When two callback functions are scheduled at the same time, they will be executed in the order in which they were registered. For example:
ev.call_soon(foo)
ev.call_soon(bar)
It is guaranteed that foo() will be executed in bar().
If call_soon() is used, this guarantee is still true even if the system clock goes backwards. This also works for call_later(0,callback,*args). However, if call_later() is used with zero delay when the system clock is going backwards, there is no guarantee. (A good event loop implementation should use time.monotonic() to avoid problems caused by system clock retrograde. Refer to PEP 418.)
Context
All event loops have the concept of context. For the default event loop implementation, the context is a thread. An event loop implementation should run all callbacks in the same context. An event loop implementation should only run one callback at a time, so the callback is responsible for maintaining automatic mutual exclusion from other callbacks scheduled in the same event loop.
Abnormal
There are two types of exceptions in Python: those derived from the Exception class and those derived from BaseException. Exceptions derived from Exception are usually caught and handled appropriately; for example, exceptions are passed through a Future, and when they occur within a callback, they are logged and ignored.
However, exceptions from BaseException are never caught, they usually come with an error traceback and cause the program to terminate. (Examples of this include KeyboardInterrupt and SystemExit; it would be unwise to treat these exceptions like most other exceptions.)
Handler class
There are various methods for registering callback functions (such as call_later()), which will return an object to represent the registration. The changed object can be used to cancel the callback function. Although users never need to instantiate this class, they still want to give this object a good name: Handler. This class has a public method:
cancel(). Try to cancel the callback function. Supplement: Accurate specification.
Read-only public property:
callback. The callback function to be called.
args. Array of parameters for calling the callback function.
cancelled. If the cancel() table is called, its value is True.
It should be noted that some callback functions (such as those registered through call_later()) are meant to be called only once. Others (such as those registered via add_reader()) are meant to be called multiple times.
Supplement: An API for calling callback functions (is it necessary to encapsulate exception handling)? Does it need to record how many times it has been called? Maybe this API should be like _call_()? (But it should suppress the exception.)
Added: Is there some public property to record those real-time values when the callback function is dispatched? (Because this requires some way to save it to the heap.)
Futures
ulip.Future is intentionally designed to be similar to concurrent.futures.Future in PEP 3148, with only slight differences. Whenever Future is mentioned in this PEP, it refers to tulip.Future unless explicitly specified as concurrent.futures.Future. The public API supported by tulip.Future is as follows, and the difference from PEP 3148 is also pointed out:
cancel(). If the Future has been completed (or canceled), False is returned. Otherwise, change the status of the Future to the canceled state (which can also be understood as completed), schedule the callback function, and return True.
cancelled(). Returns True if the Future has been canceled.
running(). always returns False. Unlike PEP 3148, there is no running state.
done(). Returns True if the Future has been completed. Note: A canceled Future is also considered completed (this is true here and elsewhere).
result(). Returns the result set by set_result(), or returns the exception set by set_exception(). If it has been canceled, a CancelledError is thrown. Unlike PEP 3148, there is no timeout parameter and no waiting. If the Future has not yet completed, an exception is thrown.
exception(). Same as above, it returns an exception.
add_done_callback(fn). Add a callback function to run when the Future is completed (or canceled). If the Future has completed (or been canceled), call_soon() is used to dispatch the callback function. Unlike PEP 3148, added callback functions are not called immediately and always run in the context of the caller. (Typically, a context is a thread). You can understand it as using call_soon() to call the callback function. Note: The added callback function (different from other callback functions in this PEP, and ignoring the conventions in the "Callback Style" section below) will always receive a Future as a parameter, and this callback function should not be a Handler object.
set_result(result). This Future cannot be in the completion (or cancellation) state. This method will put the current Future into the completion state and prepare to call the relevant callback function. Unlike PEP 3148: this is a public API.
set_exception(exception). Same as above, setting exception.
The internal method set_running_or_notify_cancel() is no longer supported; there is no way to directly set it to the running state.
This PEP defines the following exception:
InvalidStateError. This exception will be thrown when the called method does not accept the state of this Future (for example: calling the set_result() method in a completed Future, or in The result() method is called on an unfinished Future).
InvalidTimeoutError. This exception is thrown when a non-zero argument is passed when calling result() or exception().
CancelledError. Alias of concurrent.futures.CancelledError. This exception is thrown when the result() or exception() method is called on a canceled Future.
TimeoutError. Alias for concurrent.futures.TimeoutError. May be thrown by EventLoop.run_until_complete() method.
When creating a Future, it will be associated with the default event loop. (Yet to be done: allow passing an event loop as argument?).
The wait() and as_completed() methods in the concurrent.futures package do not accept tulip.Future objects as parameters. However, there are similar APIs tulip.wait() and tulip.as_completed(), described below.
Tulip.Future objects can be applied to yield from expressions in subroutines (coroutines). This is implemented through the __iter__() interface in Future. Please refer to the "Subroutines and Scheduler" section below.
When the Future object is recycled, if there is an associated exception but the result(), exception() or __iter__() method is not called (or an exception is generated but has not been thrown yet), then the exception should be Record to log. TBD: At what level is it recorded?
In the future, we may unify tulip.Future and concurrent.futures.Future. For example, add an __iter__() method to the latter object to support yield from expressions. In order to prevent the event loop from being blocked by accidentally calling unfinished result(), the blocking mechanism needs to detect whether there is an active event loop in the current thread, otherwise an exception will be thrown. However, this PEP aims to minimize external dependencies (only relying on Python3.3), so no changes will be made to concurrent.futures.Future at this time.
Transport layer
The transport layer refers to an abstraction layer based on sockets or other similar mechanisms (such as pipes or SSL connections). The transport layer here is heavily influenced by Twisted and PEP 3153. Users rarely implement or instantiate the transport layer directly. The event loop provides related methods for setting the transport layer.
The transport layer is used to work with protocols. Typical protocols do not care about the specific details of the underlying transport layer, and the transport layer can be used to work with a variety of protocols. For example, an HTTP client implementation can use a normal socket transport layer, or it can use an SSL transport layer. The plain socket transport layer can work with a large number of protocols outside of the HTTP protocol (e.g., SMTP, IMAP, POP, FTP, IRC, SPDY).
Most connections have asymmetrical properties: the client and server usually have different roles and behaviors. Therefore, the interface between the transport layer and the protocol is also asymmetric. From a protocol perspective, sending data is accomplished by calling the write() method of the transport layer object. The write() method returns immediately after putting the data into the buffer. When reading data, the transport layer will play a more active role: after receiving data from the socket (or other data source), the transport layer will call the protocol's data_received() method.
The transport layer has the following public methods:
write(data). Write data. The parameter must be a bytes object. Return None. The transport layer is free to cache bytes of data, but must ensure that the data is sent to the other end and the behavior of the data flow is maintained. That is: t.write(b'abc'); t.write(b'def') is equivalent to t.write(b'abcdef'), also equivalent to:
t.write(b'a')
t.write(b'b')
t.write(b'c')
t.write(b'd')
t.write(b'e')
t.write( b'f')
writelines(iterable). Equivalent to:
for data in iterable:
self.write(data)
write_eof(). Close the write data connection and will no longer be allowed to call write( ) method. When all buffered data has been transferred, the transport layer will signal to the other end that there is no more data. Some protocols do not support this; in that case, calling write_eof() will throw an exception. (Note: This method was previously called half_close(). Unless you know the specific meaning, this method name does not clearly indicate which end will be closed.)
can_write_eof(). If the protocol supports write_eof(), return True; otherwise return False. (When write_eof() is not available, some protocols need to change the corresponding behavior, so this method is needed. For example, in HTTP, in order to send data of unknown current size, write_eof() is usually used to indicate that the data has been sent. However, SSL does not To support this behavior, the corresponding HTTP protocol implementation needs to use chunked encoding. But if the data size is unknown at the time of sending, the best solution for both cases is to use the Content-Length header)
pause(). . Pause sending data and directly call the resume() method. Between the pause() call and the resume() call, the protocol's data_received() method will not be called again. Not valid in write() method.
resume(). Resumes data transmission using the protocol's data_received().
close(). Close the connection. The connection will not be closed until all data buffered using write() has been sent. After the connection is closed, the protocol's data_received() method will no longer be called. When all buffered data has been sent, the protocol's connection_lost() method will be called with None as a parameter. Note: This method does not guarantee that all the above methods will be called.
abort(). Abort the connection. All data in the buffer that has not yet been transferred is discarded. Shortly after, the protocol's connection_lost() will be called, passing in a None parameter. (To be determined: In close(), abort() or the closing action of the other end, pass in different parameters to connection_lost()? Or add a method specifically to query this? Glyph recommends passing in different exceptions)
Unfinished: Provide another method of flow control: the transport layer may pause the protocol layer if the buffer data becomes a burden. Recommendation: If the protocol has pause() and resume() methods, allow the transport layer to call them; if they do not exist, the protocol does not support flow control. (For pause() and resume(), maybe it would be better to use different names for the protocol layer and transport layer?)
Protocols
Protocols are usually used together with the transport layer. Several commonly used protocols are provided here (for example, several useful HTTP client and server implementations), most of which require user or third-party libraries to implement.
A protocol must implement the following methods, which will be called by the transport layer. These callback functions will be called by the event loop in the correct context (see the "Context" section above).
connection_made(transport). Means that the transport layer is ready and connected to the other end of an implementation. The protocol should save the transport layer reference as a variable (so that its write() and other methods can be called later), and it can also send handshake requests at this time.
data_received(data). The transport layer has read some data from. The parameter is a bytes object that is not empty. There is no explicit limit on the size of this parameter. p.data_received(b'abcdef') should be equivalent to the following statement:
p.data_received(b'abc')
p.data_received(b'def')
eof_received(). called on the other end This method will be called when writing_eof() or other equivalent methods. The default implementation will call the close() method of the transport layer, which calls the connection_lost() method in the protocol.
connection_lost(exc). The transport layer has been closed or interrupted, the other end has safely closed the connection, or an exception has occurred. In the first three cases, the parameter is None; in the case of an exception, the parameter is the exception that caused the transport layer to interrupt. (To be determined: Do we need to treat the first three cases differently?)
Here is a diagram showing the order and diversity of calls:
connection_made()-- exactly once
data_received()-- zero or more times
eof_received()-- at most once
connection_lost()-- exactly once
Supplementary: Discuss whether the user's code should do something to ensure that the protocol and transport protocol will not be prematurely GC (garbage collection).
Callback function style
Most interfaces take callback functions and also take positional parameters. For example, to schedule foor("abc",42) to be called immediately, you would call ev.call_soon(foo,"abc",42). To schedule a call to foo(), use ev.call_soon(foo). This convention greatly reduces the number of small lambda expressions required for typical callback function programming.
This convention explicitly does not support keyword parameters. Keyword arguments are often used to pass optional additional information about the callback function. This allows for elegant modifications to the API without having to worry about whether a keyword is declared somewhere by a caller. If you have a callback function that must be called with keyword arguments, you can use lambda expressions or functools.partial. For example:
ev.call_soon(functools.partial(foo, "abc", repeat=42))
Choose an event loop implementation
To be completed. (About using select/poll/epoll, and how to change the selection. Belongs to the event loop strategy)
Coroutines and Schedulers
This is an independent top-level part because its state is different from the event loop interface. Coroutines are optional, and it's good to just write code using callbacks. On the other hand, there is only one scheduler/coroutine API implementation, and if you choose coroutines, that's the only one you'll use.
Coroutine
A coroutine is a producer that follows the following conventions. For the sake of good documentation, all coroutines should be decorated with @tulip.coroutine, but this is not strictly required.
Coroutines use the yield from syntax introduced in PEP 380 instead of the original yield syntax.
The word "coroutine" has a similar meaning to "producer" and is used to describe two different (although related) concepts.
Defines the function of the coroutine (a function defined using tulip.coroutine modification). To disambiguate, we can call this function a coroutine function.
Get the object through a coroutine. This object represents a calculation or I/O operation (usually both) that will eventually be completed. We can call this object a coroutine object to eliminate ambiguity.
What the coroutine can do:
result = use yield from the future-- until the future is completed, suspend the coroutine, and then return the result of the future, or throw the exception it wants to pass.
result = use yield from a coroutine--wait for another coroutine to produce a result (or throw an exception to be passed). The exception of this coroutine must be a call to another coroutine.
Return result--Return a result for the coroutine that uses the yield from expression to wait for the result.
Throw exception--Throw an exception in the coroutine for (program) waiting using the yield from expression.
Calling a coroutine does not immediately execute its code - it is just a producer, and the coroutine returned by the call is indeed just a producer object, which does nothing until you iterate over it. For a coroutine, there are two basic ways to start it running: call yield from another coroutine (assuming the other coroutine is already running!), or convert it to a Task (see below) .
Coroutines can only run when the event loop is running.
Waiting for multiple coroutines
There are two similar wait() and as_completed(). The API in the package concurrent.futures provides to wait for multiple coroutines or Future:
tulip.wait(fs, timeout=None, return_when=ALL_COMPLETED). This is a coroutine provided by fs that waits for Future or other coroutines to complete. Coroutine parameters will be encapsulated in Task (see below). This method will return a Future. The successful result of the Future is a tuple containing two Future sets (done, pending). done is an original Future (or encapsulated coroutine). Set means completion (or cancellation), and pending means rest, such as not yet completed (or cancellation). The optional parameters timeout and return_when have the same meaning and default value as the parameters in concurrent.futures.wait(): timeout, if not None, specifies a timeout for all operations; return_when, specifies when to stop. The constants FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED use the same value definition and have the same meaning in PEP 3148:
ALL_COMPLETED(default): Wait until all Futures are processed or completed (or until a timeout occurs).
FIRST_COMPLETED: Wait until at least one Future is completed or canceled (or until a timeout occurs).
FIRST_EXCEPTION: Wait until at least one Future is ready (non-cancelled) due to an exception (This exclusion of canceled Futures from the filter is magical, but PEP 3148 does it this way.)
tulip.as_completed(fs, timeout=None). Returns an iterator with a value of Future; waits for a successful value until the next Future or the coroutine completes from fs, and returns its own result (or throws out of its exception). The optional parameter timeout has the same meaning and default value as the parameters in concurrent.futures.wait(): when there is a timeout, the next Future returned by the iterator will throw a TimeoutError exception while waiting. Use examples:
for f in as_completed(fs):
result = yield from f # May raise an exception.
# Use result. Is an object that manages independently running subprograms. The Task interface and Future interface are the same. If the subroutine completes or throws an exception, the task associated with it is also completed. The returned result is the result of the corresponding task, and the exception thrown is the exception of the corresponding task.
If you cancel an unfinished task, it will prevent the associated subroutine from continuing to execute. In this case, the subroutine will receive an exception to better handle the cancellation command. Of course, the subroutine does not have to handle this exception. This mechanism is accomplished by calling the generator's standard close() method, which is described in PEP 342.
Scheduler
Unfinished: yield sleep(seconds). You can use sleep(0) to suspend and query I/O.
The best way to use coroutines to implement protocols is to use a stream cache. This cache uses data_received() to fill the data, and can also use methods like read(n) and readline( ) and other methods that return a Future to read data asynchronously. When the connection is closed, the read() method should return a Future whose result will be '', or throw an exception if connection_closed() is called due to an exception.
Supplementary. When a task is canceled, its coroutine will see the exception wherever it aborts from the scheduler (such as possibly aborting in the middle of an operation). We need to make it clear what exception to throw.
Added again: timeout.
Known issues
Debugging API? For example something that records a lot of stuff or records uncommon conditions (like a queue filling faster than emptying) or even a callback function that takes a lot of time...
Do we need an introspective API? For example, the request read callback function returns a file descriptor. Or when the next scheduled (callback function) is called. Or some file descriptors registered by the callback function.
Transports may require a method that attempts to return the address of the socket (and another method that returns the equivalent address). Although this depends on the type of socket, it is not always a socket; then None should be returned. (Alternatively, there could be a method that returns the socket itself - but conceivably one does not use the socket to implement the IP link, so how would it do it?)
Need to deal with os.fokd(). (This would probably go up to the selector class in the case of Tulip.)
Maybe start_serving() needs a method to pass in a current socket (gunicorn needs this for example). create_connection() also has the same problem.
We may introduce some explicit locks, although it is a bit painful to use, because we cannot use the with lock: blocking syntax (because to wait for a lock, we have to use yield from, which is not possible with the with statement ).
Whether to support datagram protocol and link. There may be more socket I/O methods, such as sock_sendto() and sock_recvfrom(). Or user clients write their own (this is not rocket science). Is there any reason to override write(), writelines(), data_received() into a single datagram? (Glyph recommends the latter.) What to use instead of write() then? Finally, do we need to support the connectionless datagram protocol? (This means encapsulating sendto() and recvfrom().)
We may need APIs to control various supermarkets. For example we might want to limit the time spent resolving DNS, connections, SSL handshakes, idle connections, or even each session. Perhaps there is a way to fully add the timeout keyword argument to some methods, and others to implement timeouts cleverly via call_later and Task.cancel(). But there may be methods that require a default timeout, and we may want to change the default value for global operations of this specification. (e.g. every event loop).
A NodeJS style event trigger? Or make this a separate tutorial? This is actually easy enough to do in user space, although it might be better to do it in standardization (see https://github.com/mnot/thor/blob/master/thor/events.py and https://github.com /mnot/thor/blob/master/doc/events.md for example. )
References
PEP 380 from TBD: Greg Ewing's tutorial describing the semantics of yield.
PEP 3148 describes concurrent.futures.Future.
PEP 3153, although rejected, nicely describes the need to separate transports and protocols.
Tulip repo: http://code.google.com/p/tulip/
A good blog entry by Nick Coghlan with some background on different handling of asynchronous I/O, gevent, and how to use futures Just like the idea of the concepts of wihle, for and with, : http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html
TBD: About Twisted, Tornado, ZeroMQ, pyftpdlib, libevent, libev , pyev, libuv, wattle, etc.
Acknowledgments
In addition to PEP 3153, the affected ones include PEP 380 and Greg Ewing's yield from tutorials, Twisted, Tornado, ZeroMQ, pyftpdlib, tulip (the author's attempt to bring it all together), wattle (a counter-proposal by Steve Dower), extensive discussions on python-ideas from September to December 2012, a Skype session with Steve Dower and Dino Viehland, and Ben An email exchange with Darnell, an audience with Niels Provos (the original author of libevent), and two face-to-face meetings with several Twisted developers, including Glyph, Brian Warner, David Reid, and Duncan McGreggor. Similarly, the author's early asynchronous support for Google App Engine's NDB library also had an important impact.