Using twisted.internet.process
- Introduction
- The ProcessProtocol
- Things that happen to your ProcessProtocol
- Things you can do from your ProcessProtocol
- An Example
Introduction
twisted.internet.reactor.spawnProcess
is a
twisted function that makes it possible to create and control spawned child
processes from your Twisted-based application. Pipes are created to the
child process, and added to the reactor core so that the application
continues to not block while sending data into or pulling data out of the
new process.
from twisted.internet import reactor mypp = MyProcessProtocol() reactor.spawnProcess(mypp, program, argv=[program, arg1, arg2], env={'HOME': os.environ['HOME']})
To use this, you'll need to assemble a few things. The first is the exact specification of how the process should be run. This means a string that will be the program itself, a list that will be the argv array for the process, and a dict which provides the environment. Both the argv and env arguments are optional, but the default values are empty, and many programs depend upon both to be set correctly for them to function properly. At the very least, argv[0] should probably be the same as program. If you just provide os.environ for env, the child program will inherit the environment from the current program, which is usually the civilized thing to do (unless you want to explicitly clean the environment as a security precaution).
You can optionally set a path, in which case the child will switch to the given directory before starting the program. You can also set uid/gid, but only if you started as root.
The ProcessProtocol
The second thing you'll need to use spawnProtocolis an instance
of ProcessProtocol
(or, more likely, a subclass you have written) that is used to control the
data going into and out of the process. This object behaves very much like a
normal Protocol
,
in that you should write a subclass that overrides certain methods. These
methods are called whenever some data from the process is available.
Things that can happend to your ProcessProtocol
These are the methods that you can usefully override in your subclass of ProcessProtocol:
- .connectionMade: this is called when the program is started,
and makes a good place to write data into the stdin pipe (using
self.transport.write()
). - .outReceived(data): this is called with data that was received from the process' stdout pipe. Pipes tend to provide data in larger chunks than sockets (one kilobyte is a common buffer size), so you may not experience the "random dribs and drabs" behavior typical of network sockets, but regardless you should be prepared to deal if you don't get all your data in a single call. To do it properly, outReceived ought to simply accumulate the data and put off doing anything with it until the process has finished.
- .errReceived(data): this is called with data from the process' stderr pipe. It behaves just like outReceived.
- .inConnectionLost: this is called when the reactor notices
that the process' stdin pipe has closed. Programs don't typically close
their own stdin, so this will probably get called when your
ProcessProtocol has shut down the write side with
self.transport.loseConnection()
. - .outConnectionLost: this is called when the program closes its stdout pipe. This usually happens when the program terminates.
- .errConnectionLost: same as outConnectionLost, but for stderr instead of stdout.
.processEnded(status): this is called when the child process has been reaped, and receives information about the process' exit status. The status is passed in the form of a
Failure
instance, created with a .value that either holds aProcessDone
object if the process terminated normally (it died of natural causes instead of receiving a signal, and if the exit code was 0), or aProcessTerminated
object (with an .exitCode attribute) if something went wrong. This scheme may seem a bit weird, but I trust that it proves useful when dealing with exceptions that occur in asynchronous code.XXX: check twisted/internet/process.py:v1.30:line357, I think death-by-signal wouldn't be reported properly.
This will always be called after inConnectionLost, outConnectionLost, and errConnectionLost are called.
The base-class definitions of these functions are all no-ops. This will result in all stdout and stderr being thrown away. Note that it is important for data you don't care about to be thrown away: if the pipe were not read, the child process would eventually block as it tried to write to a full pipe.
Things you can do from your ProcessProtocol
The following are the basic ways to control the child process:
- self.transport.write(data): stuff some data in the stdin pipe. Note that this write method will queue any data that can't be written immediately.
- self.transport.closeStdin: close the stdin pipe. Programs which act as filters (reading from stdin, modifying the data, writing to stdout) usually take this as a sign that they should finish their job and terminate. For these programs, it is important to close stdin when you're done with it, otherwise the child process will never terminate.
- self.transport.closeStdout: not usually called, kind of mean, since you're putting the process into a state where any attempt to write to stdout will cause a SIGPIPE error.
- self.transport.closeStderr: not usually called, same reason as closeStdout.
- self.transport.loseConnection: close all three pipes.
- os.kill(self.transport.pid, signal.SIGKILL): kill the child process. This will eventually result in processEnded being called.
Example
Here is an example that is rather verbose about exactly when all the methods are called. It writes a number of lines into the wc program and then parses the output.
#! /usr/bin/python from twisted.internet import protocol from twisted.internet import reactor import re class MyPP(protocol.ProcessProtocol): def __init__(self, verses): self.verses = verses self.data = "" def connectionMade(self): print "connectionMade!" for i in range(self.verses): self.transport.write("Aleph-null bottles of beer on the wall,\n" + "Aleph-null bottles of beer,\n" + "Take on down and pass it around,\n" + "Aleph-null bottles of beer on the wall.\n") self.transport.closeStdin() # tell them we're done def outReceived(self, data): print "outReceived! with %d bytes!" % len(data) self.data = self.data + data def errReceived(self, data): print "errReceived! with %d bytes!" % len(data) def inConnectionLost(self): print "inConnectionLost! stdin is closed! (we probably did it)" def outConnectionLost(self): print "outConnectionLost! The child closed their stdout!" # now is the time to examine what they wrote #print "I saw them write:", self.data (dummy, lines, words, chars, file) = re.split(r'\s+', self.data) print "I saw %s lines" % lines def errConnectionLost(self): print "errConnectionLost! The child closed their stderr." def processEnded(self, status_object): print "processEnded, status %d" % status_object.value.exitCode print "quitting" reactor.stop() pp = MyPP(10) reactor.spawnProcess(pp, "wc", ["wc"], {}) reactor.run()
The exact output of this program depends upon the relative timing of some un-synchronized events. In particular, the program may observe the child process close its stderr pipe before or after it reads data from the stdout pipe. One possible transcript would look like this:
% ./process.py connectionMade! inConnectionLost! stdin is closed! (we probably did it) errConnectionLost! The child closed their stderr. outReceived! with 24 bytes! outConnectionLost! The child closed their stdout! I saw 40 lines processEnded, status 0 quitting Main loop terminated. %
Brian Warner <warner@lothar.com> Last modified: Thu Oct 3 02:14:56 PDT 2002