Brian's weblog

< June 2009
SuMoTuWeThFrSa
  1 2 3 4 5 6
7 8 910111213
14151617181920
21222324252627
282930    
/ (41)
  code/ (1)
  emacs/ (2)
  foolscap/ (1)
  go/ (1)
  hardware/ (2)
  python/ (2)
  spam/ (1)
  twisted/ (8)
  version-control/ (1)
  web/ (1)
  weblog/ (6)
Wed, 24 Jun 2009

darcs-fast-export

So idnar just turned me on to darcs-fast-export, which can be used with git-fast-import to quickly convert a repository from

darcs to git. I've been using Git more and more in the last few months, and I'm growing quite fond of it. Tahoe is managed in darcs, and I've been using a private Git mirror to manage the several dozen feature branches that I work on at any given moment. I wanted to make a more-official mirror that would be reasonable to publish on GitHub.

I had to patch the darcs-fast-export script a little bit, one because our darcs repository happens to have some bad (non-UTF8) characters in some old patches (before darcs started rejecting those), and two because I wanted to preserve our tag names (like "allmydata-tahoe-1.4.1", and darcs-fast-export was squashing the hyphens down to underscores).

Tahoe has about 4000 patches. darcs-fast-export started doing about 170ms/patch (20 patches per second), and towards the end of the job is slowing to about 1.1s/patch. In contrast, when I first tried the conversion with tailor, the "darcs pull" operation was taking about 20 seconds per patch. Tailor finished after 13.5 hours. darcs-fast-export took 42 minutes.

darcs-fast-export also takes care of incremental updates, so I can update the mirror later as more darcs patches arrive. It also suggests that it can be used bidirectionally. I might start using this to move my git patches back into trunk.

posted at: 12:03 | path: /version-control | permanent link to this entry

Sat, 20 Jun 2009

Foolscap-0.4.2 released

I've released foolscap-0.4.2 .. download it from http://foolscap.lothar.com/trac . I made the relase last week, and as usual I've managed to not send out the announcement email yet. One reason for that is that I wanted to blog about it first, and I've started using a foolscap-0.4.2 -based tool to manage my blog, and I effectively got into a circular dependency between the blog and the blog software, with the email depending upon both.

The big new feature in this release is the "FooLscap APPlication SERVER", or "flappserver". It's like twistd for foolscap, enabling non-programmers to deploy pre-written tools without needing to write new code. twistd makes it easy to create and launch things like a web server or FTP server. flappserver makes it easy to create and launch a service which is accessed remotely via a secure FURL. There is a corresponding "flappclient" which takes a FURL (and some arguments) and does something with that service. The service runs as whichever user started the server, and it's easy to daemonize the server and run it in the background. Typically you'd start the server from a @reboot crontab entry or /etc/init.d script or LaunchAgent.plist file.

Eventually flappserver will have a plugin mechanism, but for now it comes with two remarkably useful basic services. The first is named "upload-file": the client provides the file and basename, the server provides the directory. It's like a write-only drop-box, accessed with a FURL. This is great for buildslaves that need to drop a generated package into some world-visible directory: the buildslave can touch that one directory and no others, and there are no funny filenames or shell-escape tricks it can use to break out of there.

The second service is named "run-command": the server controls everything about the command: executable, arguments, and working directory. The client just gets to push the button. It's like a remote-garage-door-opener for program execution. Optionally, the client can pass stdin and get stdout, letting you use it like a secure network pipe to a server that's run on-demand, sort of like inetd but with actual security.

It is nominally possible to do this sort of thing over SSH, but you have to start by creating a keypair for each purpose and add it to your authorized_keys file, and then figure out what sort of command= option to add to keep that key from being able to control your entire account (which usually means writing a script to implement the exact functionality you *do* want to offer), then hope that nothing they sent as an environment variable will compromise your security, then give them the 600-plus--character-long pubkey, then have them write a script which translates their input arguments into some "ssh -i single-purpose-pubkey hostname args-for-processing" command.

With a running flappserver, it's just:

flappserver add ~/server upload-file ~/incoming # returns FURL

flappclient --furl FURL upload-file foo.jpg

As a demo of what you can do with those two tools, I've started to update this very blog's back-end Git repository over a flappserver-based connection. The half-a-dozen computers that I use all have a copy of the "update my blog" FURL (really the "run git-daemon in the blog entries directory" FURL). The details are in the foolscap source tree, in doc/examples (in TRUNK, not in 0.4.2). More about this in the next post.

posted at: 12:59 | path: /foolscap | permanent link to this entry

Fri, 19 Jun 2009

moved blog to git

I just finished moving this weblog to be managed in a Git repository, using the scheme described in http://joemaller.com/2008/11/25/a-web-focused-git-workflow/ . Plus, I'm running the connection over Foolscap.. more on that in a moment if this update actually works..

posted at: 13:20 | path: /weblog | permanent link to this entry

Thu, 29 May 2008

web updates

I finally updated the system that hosts http://buildbot.net and http://foolscap.lothar.com (a dedicated VM that just runs apache for CGIs, needed to make trac and mod_python work well). Upgrading it from edgy to anything newer was a hassle, because the "update-manager" package that I wanted to use wasn't installed, and because edgy is now too old to appear on most Ubuntu mirrors. It does appear on http://old-releases.ubuntu.com , though, but unfortunately the update-manager package doesn't work unless both the "from" and the "to" releases are available on the same APT repository. Since feisty isn't old enough for old-releases yet, there's nothing you can put in your /etc/apt/sources.list that will appease update-manager.

So I had to do it the old-fashioned way: change sources.list, apt-get update, apt-get dist-upgrade . That worked, but then trac broke: the default version of python switched from 2.4 to 2.5, and none of the trac plugins I was using had eggs that were built for 2.5 . I decided to upgrade all the way to hardy before trying to fix anything else.

After fixing the eggs, it turned out that python-clearsilver in hardy is just broken: it doesn't include a 2.5 version, and I guess it was trying to make do with a 2.4 version, because I was getting errors about missing symbols. I finally found https://bugs.launchpad.net/ubuntu/+source/trac/+bug/114930 and followed the advice to rebuild the python-clearsilver package with the right version of python.

I also had to upgrade the trac databases in the process, but that's an easy "trac-admin TRACDIR upgrade".

And now everything is working again, with only an hour of unexpected downtime.

posted at: 18:59 | path: /web | permanent link to this entry

Wed, 28 May 2008

pastebinit

Another package that appeared in debian today: pastebinit, which is a command-line tool to upload bits of code to some of the various pastebin web servers out there (handy when you want to discuss some code over IRC and don't want to jam the whole thing into the channel.. it is much more polite to put it in a pastebin and then refer to it by URL).

Now what I want is an emacs interface to this, since the code I'd be referring to would always come from one of my emacs buffers anyways.

posted at: 18:34 | path: /emacs | permanent link to this entry

Mutation Testing

I've often thought that it would be a great idea to test your test suite by randomly changing bits of code and seeing if the tests catch it. It turns out that other people feel the same way: I just saw a Ruby library named "Heckle" show up in debian sid (the package is named libheckle-ruby). The blurb says:

Heckle is a mutation tester. It modifies your code and runs your tests to make sure they fail. The idea is that if code can be changed and your tests don't notice, either that code isn't being covered or it doesn't do anything.

In a security context, this is similar to an approach thought up by (I believe) David Wagner, Ka-Ping Yee, and Mark Miller, during the security analysis of Ping's electronic voting software. The unusual challenge was that the defined security goal was to be safe against the author of the software, not just the usual malicious attackers (who try to provide bad input, or make the code act in surprising ways). Their scheme was to have one team modify the code to insert intentional errors (or opportunities for mischief), then the second team try to find those errors. If the second team finds other errors, then the code is obviously buggy, and loses. If the second team can't find the errors, then the code is too complicated to analyze, and it loses. If the design of the code is so straightforward that bugs and backdoors stand out like a sore thumb, the code wins.

Of course, this requires really good, really tightly specified unit tests. In my experience, if you're using the right language, a test that specifies the desired result so precisely is effectively your functional code anyways, so you have to be careful to define your tests in some way that doesn't mean you're writing the same code twice.

I don't know Ruby, but I may need to learn enough about it to be able to read this Heckle library and see if it can be ported to Python.

posted at: 18:24 | path: /code | permanent link to this entry

Emacs Trick of the Day

There are a few million gems hidden inside emacs. The two that I ran into most recently are:

C-x r m, C-x r b, C-x r l : these create named bookmarks, each of which records the file that you're visiting and a position within that file. When I need to hold my place while I looked elsewhere, I usually split the window (C-x 2) and leave one of them fixed while I moved around in the other one to find something. Then C-x 0 makes that window go away, leaving me in my original position. But if you do that too deeply, the windows get too small.

C-x r m creates a bookmark, and the name defaults to the name of the file (so if you only use one bookmark per file, you don't even have to type anything). Then C-x r b jumps back to that bookmark. C-x r l lists all your bookmarks.

Bookmarks can also be persistent.

highlight-trailing-space: by setting this to 't', any trailing whitespace will be highlighted in an ugly orange color that makes you want to delete it right away. Darcs does the same thing when you're committing code (it shows you a special "[_$_]" -like symbol to make you aware of the whitespace at then end of the line), so I've been in the habit of deleting that whitespace anyways.. even wrote a little python tool to find it all for me. With highlight-trailing-space turned on, I get to see the whitespace as I'm editing, so I can remove it earlier.

posted at: 18:15 | path: /emacs | permanent link to this entry

Mon, 28 Apr 2008

Levenshtein Distance

A library just showed up in debian ("python-levenshtein") to measure the Levenshtein Distance between two strings: the minimum number of edits (inserts, changes, deletes) necessary to turn one string into another.

I've been thinking about ways to implement efficiently-edited large mutable files for Tahoe, and it seems like a tool like this might help. Something clever like what rsync does is probably going to be involved too. The trick is that you want to determine what deltas to store without reading the whole file over the wire, from a server who isn't allowed to see the plaintext. You can store whatever ciphertext hashes you want on the far end. We're planning to provide insert/delete delta messages in the server side, using something like Mercurial's "revlog" format. The question is how to efficiently figure out the deltas on a very large file.

posted at: 18:45 | path: / | permanent link to this entry

Fri, 20 Jul 2007

sparkfun toys

I was thumbing through some of my old del.icio.us bookmarks today, and came across sparkfun electronics again. Man, their coolness doubles in size every six months. $25 for a half-inch square self-contained radio data link, serial interface that you can run with a microcontroller, 3V, built-in antenna. Wow. $6 for a white Luxeon 1W LED ($8 for 3W, $25 for 5W). $5 for a 1W Luxeon that's TWO FRIGGING MILLIMETERS on a side. Holy crap.

And $20 for a color LCD like the ones from a cellphone. And speaking of cellphones, $184 gets you a quad-band cellphone module with a GPS receiver, camera driver, and a python interpreter. Add an antenna, a battery, a serial port, and a SIM card, and you've got a mobile data node. And I think you can even get prepaid SIM cards that can be topped-off online.

(note to self, places like this sell such cards, generally 5 to 20 cents per minute, which can be recharged with scratch-off coupons. And it looks like you can buy them from retail cellphone shops too. They all come with a phone number.. no wonder the phone numberspace is getting so crowded, you can buy them from vending machines in some countries..)

Each time I visit these folks (or browse through the digikey catalog, or just look through my old notebooks), I feel such a strong drive to build something. The delay involved in actually getting the parts usually means I don't get around to doing it. But maybe if I just keep buying stuff and stocking my workbench then the next time I'm in a construction mood I'll have everything I need already at hand and I can just start soldering away...

posted at: 18:46 | path: /hardware | permanent link to this entry

Tue, 17 Jul 2007

trac spam

Oh happy day! The buildbot.net trac instance just recently got visited by the link spammers. They haven't caused any actual damage yet, just a user account created with advertising in the profile text, but I'm afraid it's only a matter of time before the bots descend upon us and we're smothered by a wave of sentient AIs dedicated to filing mass buildbot bug reports containing nothing but links to offshore casinos and faux designer watches.

sigh.

I guess I should add some sort of "prove you can read" test to the account-creation page, just barely enough to make the script kiddies work for a living. Something like "what is 1+2?" or "type the word 'please' in here" or something.

Reminds me of a suggestion someone made to me while I was working on petmail: you don't need super-clever CAPTCHA techniques if you can manage to have a whole bunch of different requirements instead, like each user creating their own simple technique. A bot could be written to mass-solve any particular one, but since everybody is creating their own, the bot-writers job is that much harder.

And sometimes, just raising the bar a bit is good enough for now. As the joke goes, I don't have to outrun the lion.. I just have to outrun you :-).

posted at: 00:47 | path: /spam | permanent link to this entry

Fri, 13 Jul 2007

foolscap.lothar.com

I just finished building a Trac instance for Foolscap, now online at http://foolscap.lothar.com/trac . It's got a (mercurial-based) code browser, tickets, and a wiki.

Setting it up required some twisted.web hacking, because my setup puts a twisted.web server out front, and reverse-proxies certain requests to a separate Xen virtual machine which handles all CGI (for multiple sites, like buildbot.net and foolscap.lothar.com). That CGI host is running apache, and since URLs inside returned pages are not being rewritten, I had to use named virtual hosts to distinguish between, say, http://buildbot.net/trac and http://foolscap.lothar.com/trac .

But the normal twistd.web.proxy ReverseProxyResource clobbers the Host: header when it forwards the request (setting it equal to the new host being targeted). I suppose this is to hide the presence of the proxy from the new host, but in my situation is has the effect of making it impossible to use vhosts on the apache side to distinguish between requests that were received for different hostnames.

So I subclassed and commented out that line, and apache is happy. Now that I can have more than one trac instance on this box, I'm creating Tracs for everything. Whee!

posted at: 12:41 | path: /twisted | permanent link to this entry

Mon, 09 Jul 2007

mercurial

Wow, so long since I updated this. Each time I remember that I do have a technical blog, and think to add something to it, I am tempted to start by rewriting the whole blog system in some brand new way that will make it easier to post to (and, the theory goes, therefore make me more likely to write in it). The process of writing more code creates something that I'm even less likely to understand next time, and code begets more code. It's like a depth-first search through an infinite design space. Bad idea.

And speaking of technical distractions, I've been playing with Mercurial recently. I like it. I moved Foolscap from Darcs to Mercurial last week, mostly to learn more about it, and I've been pleased. My main reason was to make it easier for folks to hack on Foolscap: darcs is all fine if you're running debian and someone else has compiled it for you, but if you have to build it yourself you have to start by building GHC, which is a non-trivial adventure.

Mercurial's plugin architecture is pretty nice: one line in the .hgrc file tells it to import a .py file, which registers a set of new subcommands with the main /usr/bin/hg entry point. Which reminds me that I want to adapt Trac's plugin mechanism (which lets you drop an .egg file in a specific directory and then reference modules inside it from the config file) to Buildbot, to make it easier for users to get interesting code into their master.cfg files. Not that huge of a change, but it would make the installation instructions for that code to get simpler; no need to change sys.path from within master.cfg .

And because the plugin approach makes it easy, people are writing fun plugins. The Tk-based graphical revision browser is great (and has a little tram-line-style graph of which revisions got merged into which, very cute). The 'bisect' extension helps you do an efficient binary search for the revision which introduced (or fixed) a bug.

I'm still trying to figure out the "forest" extension, though. I think it's what I want for tracking a couple dozen separate small projects (things I've been doing in CVS for years, since I can update just one at a time, or commit the whole lot of them and push the work from my laptop to my desktop). But for the life of me I can't figure out how to use it, and the documentation is heavy on the per-subcommand reference and light on the big-picture descriptions.

And mercurial is fast. The cgi-based web server lets them speed up the initial checkout: for the full Foolscap repository, doing a 'darcs get' through the naive (twisted.web) server took 22 seconds (of which probably 17 was network), whereas doing the equivalent 'hg clone' from a hgwebdir.cgi server (under apache) took 6 total. Mercurial manages to store the history more compactly too: the tree with full history under darcs was 4.4MB, and 2.9MB in hg.

I've been using Darcs for a year or two now, and we've been using it extensively at work, and it's fun (the incremental commit feature is amazing, and I miss it in hg, and it wouldn't be impossible to add). But every once in a while something explodes (possibly because we've used 'darcs oblit' more than once, and that seems to be an underexplored corner of the darcs jungle). I really like the append-only and cryptographically-secure nature of hg revisions, and regret that you can't securely and concisely name a specific darcs revision the way you can with mercurial. Having spent a lot of time defining sha-256 hash-based identifiers recently, I'm coming to be wary of any system that doesn't let me create strong references like that.

So I'm looking forward to playing with it more. Commuting patches is nifty, but for things like Buildbot and Foolscap I'm not really creating crazy branches with patches that need to be held out of trunk for months at a time. So I think hg has a lot of promise.

posted at: 21:29 | path: / | permanent link to this entry

Mon, 05 Mar 2007

forgetfulness-based development

You're probably familiar with eXtreme Programming, and branch-based development, and agile development. But I've discovered that I've been using a new technique recently, that I call Forgetfulness-Based Development. The way it works is this: I come up with something insanely complicated, that takes me weeks to get my head around and document and implement and test, but seems like it's the best way to solve whatever the current problem is. And then I go away on vacation for two weeks, and forget absolutely everything about it. And then I come back, and look at it again, and discover how little I can understand. After a few days of cursing the fool who wrote the insane thing, I start seeing ways that it could be done more simply, or more generally, or more robustly, or more understandably. And then I write some more code to replace the old stuff.

Lather, rinse, repeat, and eventually you wind up with a design that solves the problem *and* makes sense to a new employee/developer. As the python folks say, Readability Matters. And as Brian Kernighan says: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

(of course, to make this work right, you have to take a lot of vacations. but usually it's a sacrifice I'm willing to take.)

posted at: 17:55 | path: /python | permanent link to this entry

Thu, 01 Mar 2007

PyCon2007, Buildbot

I just got back from PyCon. Highly inspirational as always, saw some fascinating projects and some thought-provoking keynotes. r0ml's talk in particular has me thinking about how to structure code as a narrative, trying to bring the world of human-to-human communication and the world of human-to-machine communication closer together. He had a lot fo say about parallels between the development of writing systems (the introduction of random-access pages in a book rather than linear-access scrolls, the use of standardized fonts, the use of spaces between words) and the development of programming languages.

I ran a Buildbot BOF, and had about 25 people show up! There are a lot of folks out there using this thing. Very gratifying.

I spent a few days sprinting, mostly working with Eric Mangold (aka teratorn) on a Buildbot plugin for Trac. It's starting to take shape nicely.

Also, I foolishly walked into a room where a bunch of people were playing a PyGame space-themed production game called Galcon, and stupidly installed it. It's amazingly addictive for such a straightforward game. Now I'm seeing little spaceships launching and crashing into planets every time I close my eyes. I'm hopeful that the hallucinations will only last a few days.

posted at: 14:06 | path: /python | permanent link to this entry

Mon, 29 Jan 2007

Trac

I've been setting up a Trac instance for Buildbot, to make it easier for people other than me to publish advice and tips in a persistent and easily-searchable fashion, also to make the Buildbot web page a little bit less ugly. Trac is quite spiffy, and I've been looking over the Trac Hacks page at the wide variety of neat plugins that are available. In particular the one that exposes wiki-page editing via XMLRPC (in conjunction with the emacs wiki-editing tool) is quite intriguing.

I hope that one day Buildbot will have a list of plugins like that.

posted at: 02:29 | path: / | permanent link to this entry

Sat, 07 Oct 2006

utilities

/usr/bin/watch is a little utility that will erase the screen, run a command, sleep for a few seconds, then repeat. You can use it to follow files in /proc without continually re-typing the command.

This program has been around since 1991. How is it that I've been unaware of it all this time? How many other thousands of useful tools like this are lurking on my system right now that I've remained ignorant of?

So the advice of the day: spend some time getting to know your /usr/bin directory. Tomorrow, make it a point to learn a new emacs keybinding that you've never used before (if you don't know about M-/, start with that one).

posted at: 15:12 | path: / | permanent link to this entry

Mon, 25 Sep 2006

promise syntax

Zooko's in town, and already I feel 20% smarter. I roped him into a discussion about the Promise syntax I'm developing for Foolscap, and he suggested an alternative that has some good properties.

I'll illustrate with an example where promise-pipelining actually does you some good. (many of the use cases I've been thinking of involve some sort of publish/subscribe scheme, and in those cases you win almost nothing with pipelining). I'm imagining a theoretical Buildbot status interface using newpb, and a tools that wants to connect to the buildmaster and retrieve the results of the latest build for a given Builder. The oldpb code would look like this:

    # Example 1
    def checkResults(results):
        if results == SUCCESS:
            print "yay!"
    def oops(failure):
        print "boo"
    #
    s = getStatus()
    d = s.callRemote("getBuilder", "python-2.4-full")
    d.addCallback(lambda builder: builder.callRemote("getBuild", -1))
    d.addCallback(lambda build: build.callRemote("getResults"))
    d.addCallback(checkResults)
    d.addErrback(oops)

The syntax I've currently got in Foolscap would make it look like this:

    # Example 2
    s = getStatus()
    b = send(s).getBuilder("python-2.4-full")
    b1 = send(b).getBuild(-1)
    r = send(b1).getResults()
    when(r).addCallback(checkResults).addErrback(oops)

The big win with the promise pipelining is that all 3 calls (4 if you include getStatus) take place in one round trip, whereas the oldpb approach requires 3 or 4 separate roundtrips. As MarkM has said, the pipes are getting wider but not shorter, and eventually the round-trip latency will be the biggest bottleneck.

The syntax that Zooko suggested would make this all look much more like the (blocking) synchronous form:

    # Example 3
    s = getStatus()
    b = s.getBuilder("python-2.4-full")
    b1 = b.getBuild(-1)
    r = b1.getResults()
    r._then(checkResults)._except(oops)

Or you could chain it all into a single column, which my editor wouldn't like (you'd have to add some outer parenthesis to keep it indenting happily) but which python will still accept:

    # Example 4
    getStatus().getBuilder("python-2.4-full")
      .getBuild(-1)
      .getResults()
      ._then(checkResults)
      ._except(oops)

which is a lot easier to read than the same collapsed form with my send() syntax:

    # Example 5
    when(send(send(send(getStatus()).getBuilder("python-2.4-full")).getBuild(-1)).getResults()).addCallback(checkResults).addErrback(oops)

Now, a syntax which looks synchronous is great for programmers who aren't familiar with asynchronous control flows: they can look at example 3 or 4 and, except for the funny _then clause, it all looks exactly like what they expect from xmlrpclib or other blocking RPC mechanisms. The problem with this syntax is that they might forget that they're actually dealing with Promises, and try to do something like:

    results = b.getResults()
    if results == SUCCESS:
        print "yay!"

and forget that 'results' is actually a Promise, and the only things you can do with a promise is to send messages to it, or invoke _then or _except. In some cases this could just raise an exception:

    counter = b.getCounter()
    print counter + 1
    # TypeError: unsupported operand types(s) for +: 'instance' and int

And in other cases (like 'results is SUCCESS') it might fail silently, always returning False. Whereas the send() syntax would make it obvious that you're dealing with a Promise.

One thing I like about Zooko's approach is that I can have the _then and _except methods be simplified wrappers for the more general purpose _when or _when_resolved method, the one that returns a Deferred:

    results = b.getResults()
    d = results._when()
    d.addCallback(checkResults)

That way *I* can use Deferreds for my control flow, while the newcomers for whom Deferreds still seem magical can use a somewhat-familiar _then(callback) approach. (without this, we'd be walking backwards in time to the beginning of the evolutionary path that has resulted in Deferreds as a general-purpose callback management tool).

In addition, these two syntaxes aren't necessarily mutually exclusive. I could have one kind of Promise that implements the __getattr__ magic necessary to make Zooko's syntax work, but if you call send() on one, it sets a flag to disable that magic, so that you end up using the send/when syntax.

There was more to the discussion but it's all in a notebook in the other room and I'm too sleepy to express it all right now.

posted at: 23:32 | path: /twisted | permanent link to this entry

Sun, 24 Sep 2006

new microcontrollers

I've been playing with Phidgets recently, having a lot of fun. They're great for prototyping, but they would be too expensive to use for most of the production purposes I have in mind. I've been thinking that for gadgets I plan to make more than one of, I'd use an FTDI usb-to-serial chip (somewhere around 2.5UKP from their web store, and I think about $5 from the parallax store) and a small AVR microcontroller (for another few dollars). The FTDI web store also sells adapter modules (USB B on one side, header pins on the other) for 10UKP. For the basic make-lights-blink peripheral I have in mind, the FTDI chip alone would suffice, as it's got 5 GPIO pins in addition to the serial port.

I've played with the AnchorChips/Cypress EZUSB before, and it's pretty handy, and you can get them from digikey (page 493 of the digikey catalog lists the full-speed ones at about $10, and the high-speed ones from $15 to $20), but it uses an 8051 core, which is a real drag to program.

So I was pleased to see that Atmel is in the USB game, with their AT90USB1286 and related parts. 128K flash, 8K ram, firmware that lets you program the flash over the USB box, sample code and libraries to do mouse/keyboard/HID stuff (although it doesn't look like the sample code is under a free software license, boo), and a $31 evaluation kit (basically a USB dongle with breakout headers). Digikey has the chips for $14, and a cheaper 64K-flash version is due out soon.

And Atmel also has a handful of ZigBee/805.15.4 chips available, which could be really cool. They include the MAC stack. It's not clear where to buy them or how much they'll cost, though. It looks like there's enough RF goo that you'd want to go with the eval board, and that probably means a couple hundred bucks. But eventually this stuff will make it out to smaller boards.

They're also coming out with a new series of AVRs with really low power consumption, down below a microamp.

posted at: 18:19 | path: /hardware | permanent link to this entry

Sat, 23 Sep 2006

Promises

Aaaagh! Promises are hurting my brain.

I'm trying to figure out how to provide a useful subset of E's reference mechanics in newpb/foolscap. Specifically, one of the clever things that E does is to provide Promise Pipelining, a limited form of remote code execution, in which I can ask you for an object and tell you to deliver a message to that object in a single round trip (rather than the usual two). So I want to be able to do something like:

# target, record, and results are all Promise objects
target = tub.getReferenceAsPromise(sturdyref)
record = send(target).getRecord(args)
results = send(record).getField(otherargs)
def printResults(r):
  print r
when(results).addCallback(printResults) # when() returns a Deferred

You can also include Promises as arguments:

record = send(target).getRecord(args)
send(laserprinter).printRecord(record)

So I'd like to provide this feature in python/foolscap, both because using Promises as a programming technique holds a lot of promise (as it were) for being a cleaner asynchronous style, and because it opens up the possibility of doing pipelining (which is an actual performance win).

The challenge is that E has very different reference mechanics than python. In E, any reference could be a Promise. (specifically, each reference is in any one of 5 states: LocalPromise, RemotePromise, Near, Far, and Broken). Whereas in python, references are always Near, and we have to fake everything else with wrapper objects.

My current approach is to have the Promise class be the wrapper and have it handle everything except Near references. The basic Promise is created with a matching resolver:

promise, resolver = foolscap.makePromise()
resolver(result) # resolves the promise

But the most common way to get one is to do an eventual send to something:

from foolscap import send
class Adder:
  def add(arg):
    return arg+1
a = Adder()
promise = send(a).add(4)

There are only two things you can do with a promise: send it more messages, and wait for it to resolve. The former is done with send (which accepts either a promise or a regular object, and always does an eventual-send), the latter is done with when:

from foolscap import when
d = when(promise)
d.addCallback(printResults)

The when always returns a Deferred that will fire with the resolution of the Promise. So send moves us from the synchronous world to the asynchronous+promise world, while when and addCallback move us back to the synchronous one. (when by itself moves us from the asynchronous+promise world to the asynchronous+Deferred world).

So far so good. But here are some problems:

posted at: 10:25 | path: /twisted | permanent link to this entry

Mon, 18 Sep 2006

newpb-0.0.2 released

I finally got some twisted time this weekend, so I fixed ticket #1999 and moved newpb out of the Twisted subdirectory entirely, renaming it to Foolscap in the process. I also released version 0.0.2, so there's a complete tarball ready to install and play with.

Having it live outside the Twisted tree has a number of advantages. Twisted is mature enough to have moved to a slower development model that preserves stability at the expense of making new development easy. Each potential change to the codebase must be reviewed before being applied to the trunk, so all development takes place on branches and must serve to fix a specific ticket. Very little of the newpb development falls under this model, and there are a distinct scarcity of people able to review newpb code. By moving it outside the Twisted tree, I can continue to work on it in a more suitable development model.

In addition, moving it outside the twisted. package makes it much easier to test and deploy. When it lived in twisted/pb/*.py, you had to actually install it before using it, into the same directory as the rest of Twisted. Now that it lives in foolscap/*.py instead, you can run it from the source tree. This will make things easier for everybody.

The new name is a bit of a compromise, though. I'm not entirely satisfied with "Foolscap". It has some good properties (google thinks it is fairly unique, it has "cap" which might make you think of capabilities, it has "oo" which might make you think of objects, there's the visual of a twisted foolscap of paper, the jester's hat-and-bells could make a nice logo). But it also has some bad ones (MarkM points out that there's enough negative baggage around the word "capabilities" that you might not want "cap" in your protocol name, using the word "fool" gives some negative connotations, the promise-pipelining aspects are really more interesting than the capabilities ones, and anyways "foolscap" doesn't really flow off the tongue in a glib manner). But it needed a name to live outside Twisted, and now it has one. That might change, but Foolscap should get us through the next couple of months.

I've been staring at E's CapTP protocol a lot, thanks to help from Mark Miller, trying to understand what their goals are, how they accomplish them, and what pieces would be useful to implement in Foolscap. What I learned last week was how the CapTP 3-Vat introduction system works. I think I can implement it in Foolscap, but I'm trying to decide if it's worth it. CapTP does some funny tricks to make sure that messages which introduce two Vats are delivered in the correct order relative to other messages between those Vats (this is called E-Order in MarkM's papers). I assume this is a good property to maintain (my general approach is to assume that everything MarkM does has a good reason behind it, and that if I work at it long enough I may learn that reason for myself, but for now just shut up and implement it).

But a lot of CapTP is tied up in Promises, and I'm still getting my head around how to provide something in python that resembles a Promise and is still useable. We don't have a lot of the language features that E does, in particular the way that an E object holding a reference to a Promise will eventually discover (after the promise has been resolved) that they're holding a reference to some other object. We don't have that sort of silent slot mutation in Python, so I'm trying to figure out what would be a meaningful equivalent. So far the Promise syntax is looking something like:

 p2 = send(p1).foo(args)
 #  equivalent of E's:  p2 = p1 <- foo(args)

Of course you can also use send() on non-promises if you just want to do an eventual-send. This is a more precise way to accomplish what I've been (crudely) doing with reactor.callLater(0,..) all these years. I'm also writing a sendOnly for when you want to throw away the return value. E has compiler support for this, it knows whether the results of the send are used or not, and can switch between send and sendOnly automatically. Python does not have such a context sensor, so we have to do it by hand.

Then, when you want to interface back to the synchronous world, you use when() to turn the promise into a Deferred, to which you can then attach some code to run:

 def _stuff(value):
     print value
 d = when(p2)
 d.addCallback(_stuff)

Trying to get this to work with the actual eventual-send queue and make the result Promises work correctly is making my head spin. I need to sit down with Zooko on this stuff, he'll understand it well enough to help me get my brain around it.

posted at: 00:45 | path: /twisted | permanent link to this entry

Wed, 29 Mar 2006

antispam

I ran some stats on my spambuckets tonight, comparing which of my email addreses get a lot of spam now versus 6 months ago, and noticed a few addresses that had stopped getting spam altogether. This gives me hope that by making my 10-year-old primary address less harvestable, the 500-plus spams per day might start drying up somewhat.

So a bit of find and perl later, and my web site no longer has a bare email address on it. I obfuscated it with just a character entity, so cut-and-paste will still work. Now I'll give it a few months and re-run the stats, to see if there is any noticeable effect.

posted at: 01:45 | path: / | permanent link to this entry

Sat, 29 Oct 2005

new kernel options

I'm in the process of upgrading my systems to linux-2.6.14, and noticed a couple of neat patches that made it into the kernel this time around.

One is that FUSE (http://fuse.sourceforge.net) has finally gotten in. One thing I'd like to use this for is setting up a UML-based jail, to limit the authority of applications to the minimum necessary. Each app would get a separate jail. The guest code runs in a virtual kernel that has read-only access to things like /bin and /usr/lib (so system administration isn't a nightmare, plus you don't have to have multiple copies of your base system, plus memory-mapped libraries like libc.so can be shared amongst *everything* in the system rather than each kernel keeping its own copy around). The jail would then have a private read-write copy of everything it's supposed to have read-write (say /tmp and /var).

The nice thing about this approach as opposed to the usual big-file-as-block-device scheme that usually gets used with UML is that you can look into the filesystem from the outside. If you want to see what exactly the program has changed on its "disk", you just diff -r their workspace with a checkpoint that you cp -r'ed out earlier. In contrast, the fake-block-device approach requires that you *log in* to the guest system and examine it from the inside, and if you assume that a malicious program has already compromised as much as it can from the inside, you may no longer be able to trust the tools that you would use to perform the comparison.

Of course, you still have to trust that the guest code is unable to compromise the UML kernel, otherwise it now has control of a local user on the host system, and may be able to bootstrap that upwards. But it limits the immediate danger of a root compromise within the guest system, and allows for better monitoring of the jail.

And you still need to patch the host kernel with the SKAS patch (http://www.user-mode-linux.org/~blaisorblade/) because, while the necessary arch-specific code to create a UML guest kernel has been merged into the linux source, the changes that enable fast and safe UML operation in the host have not.

The other neat feature that just showed up is CONFIG_SECCOMP. From the blurb:

Once seccomp is enabled via /proc//seccomp, it cannot be disabled and the task is only allowed to execute a few safe syscalls defined by each seccomp mode.

The idea is that you have a parent process that opens up a couple of pipes to itself, forks off the child, then throws the child into seccomp mode. After that, the child relies upon RPC over those pipes to make requests of the parent. In this way, you get to run compiled languages at full speed, but they are dependent upon an external entity to actually *do* anything. The parent process can then grant capabilities to the child. Someone at the cap-talk meeting at HP mentioned an approach like this about a month ago, somebody had speculated about setting up an SELinux policy that prohibited all syscalls except read(), write(), and select(). It appears that CONFIG_SECCOMP does exactly this without requiring a full SELinux setup. (although SELinux might be trivial to set up.. I've never tried).

SECCOMP comes from Andrea Archangeli, who is using it to provide exactly these sorts of services on a pennies-per-CPU-second basis (http://www.cpushare.com), using a bunch of Twisted-based code, no less.

Less exciting: CONFIG_CONNECTOR, which makes it easier to write kernel-side event-driven interfaces that userspace can access through normal socket/bind/send/recv/poll calls (via special netlink sockets). I've built interfaces like this through magic /dev/foo files, but you have to build up your own queueing routines, and implementing the necessary poll() hooks is a nuisance. This unifies everything into an existing event-oriented interface, so things like /dev/foo can stick to synchronous "give me the current state *now*" -style applications. Also RELAYFS, which creates a filesystem interface for efficiently transferring large streams of data from userspace to kernelspace.

Also of interest to me: netfilter's netlink-socket interface has been unified, so the IPv4-only ipt_ULOG target is turning into an all-protocol NFNETLINK target. This is also intended to replace the syslog-based ipt_LOG target. Queueing packets to userspace is being changed the same way, with the more-flexible TARGET_NFQUEUE. Finally the kernel interface allows multiple queues to userspace, which addresses some of the traffic-control problems inherent to multiple kinds of traffic all sharing the same queue.

Plus, the ieee80211 code made it into the kernel, so I don't need to keep building a separate module for my laptop's ipw2200 driver. And HOSTAP is now in the kernel, for my PCMCIA prism2 card.

posted at: 14:04 | path: / | permanent link to this entry

Thu, 15 Sep 2005

concurrency

Had a great chat with Donovan today, about newpb and E and secure python and concurrency management. It turns out we have some of the same ideas about interesting things to do with these kinds of tools. He pointed me at a language named Io that's doing some neat stuff with lightweight coroutines, and had some interesting thoughts on coroutines in python (making protocol-parsing code look a good bit simpler than the purely data-driven model that twisted Protocol classes tend to have).

posted at: 23:26 | path: /twisted | permanent link to this entry

Fri, 29 Jul 2005

happy birthday!

% whois lothar.com
...
domain:         LOTHAR.COM
person:         Brian Warner
nic-hdl:        BW116-GANDI
address:        The Castle Lothar
...
reg_created:    1995-07-29 00:00:00

Ten years ago today, I registered my little personal domain, with a woman at best.com named Pandora, who was nicely amused by the "company name". In the intervening time, it has been through two registrars, three hosting companies, four IP addresses, and five server platforms. For a while it lived as a verio vhost, for a while it ran on a Cobalt Qube on the near end of a DSL line, and a mini-ITX board booting from a read-only USB drive. These days it is a UML slice at linode.com.

I keep meaning to do more with it, but overall I'm pretty happy just to have a little corner of the 'net that I can call home.

posted at: 20:07 | path: / | permanent link to this entry

Wed, 13 Jul 2005

hacking

The last few weeks have been mostly filled with Buildbot hacking. I'm neck-deep in the implementation phase of a big new set of features, and it's taking *forever*. But I think I'm finally past the hardest part, the design issues that remain to be solved are at last medium-sized ones instead of huge ones, and even the unit tests pass. So I'm feeling pretty good about that.

I'm also trying to hack on Petmail a little bit more. There's a spam conference at Stanford next week that I'll be attending, and even though it's unlikely I'll be showing it off to anyone, I'd like to be sufficiently back in the Petmail mindset that I can discuss it intelligently while I'm there.

I'm trying to shift Petmail's configuration interface from the current Gtk app into a web page one, using Nevow, because eventually (when Bill gets some time to work on the Thunderbird plugin) the send/receive mail interface will be through XMLRPC (or whatever Mozilla code can get to most conveniently). I haven't figured it out yet, though, nevow provides some nice features for free, but I don't yet know if they're the ones that I need to implement this sort of add/edit/remove configuration stuff.

Also, I'm moving Petmail development over to Darcs. I've been a bit frustrated with my recent Buildbot development push, because I'm using Bazaar on my laptop, with a local repository so I can make commits offline, but pushing changes back and forth between repositories is enough of a hassle that I just don't do it. So I'm doing all the buildbot work slouched over my laptop (which I really like, but the keyboard is making my hands just a little bit uncomfortable), rather than the desktop with the proper keyboard and proper chair. It looks like Darcs would make it a bit easier to fling changes from one place to another, so using it might encourage me to do development anywhere I feel like. (plus, I should really get a new monitor for the desktop machine.. my ex-brother-in-law has a gorgeous 20" LCD, something from Dell, which I'm really tempted to splurge for).

So anyway, there's a Darcs tree for Petmail available at http://petmail.lothar.com/repos/trunk , which replaces the old CVS repository on that same site. I don't have a Darcs equivalent for ViewCVS up yet, though. I've seen a web-based Darcs patch viewer, but I wasn't really impressed. So I'll keep looking.

posted at: 23:08 | path: / | permanent link to this entry

Sat, 28 May 2005

Go Tools

I was talking with my brother-in-law about a gadget to make playing Go online a bit more like playing it in person. The feel of the board and the THWACK! as you plunk down stones adds a lovely touch to the game, but you don't get that when clicking on the cgoban window. We talked about using a real Go board at each end, pointing a camera at it to figure out where you've just moved and relay it to the server, and using a targettable laser pointer (on a pair of servos) to point to where your partner has just played.

I ran into this blog today about a guy who's interested in part of this problem, specifically using image-processing software to create a log of a game in progress. He also has a link to a japanese academic paper about doing the same thing (specifically creating a game log, aka "Kifu", from a recording of a TV program).

I visited the SF Go Club for the first time last week, and had a great time.. looking forward to going again next week.

posted at: 12:29 | path: /go | permanent link to this entry

Fri, 27 May 2005

Twist-E

Spent another great day down at HP, talking about implementing E and web-calculus concepts within Twisted and newpb. Tyler Close was kind enough to spend the entire afternoon with me, explaining how his web-calculus works and the design decisions behind it. I'm really excited about implenting this stuff in newpb: I think we can make a system that's both secure *and* highly usable. Some of the ideas I came away with that I want write up before I forget:

Promises: In addition to Deferred, we can build a Promise. The usage syntax would look like:

 p = tub.getReference(url)
 p.authorize(credentials).subscribe(self)
 when(p.getReady()).addCallback(lambda res: p.trigger())
 p2 = Promise(d1) # turn "deferred which fires with an instance" into a Promise
 p3 = p2.invoke()
 d2 = when(p3)
 d2.addCallback(stuff)

The Promise object is basically a wrapper around any Deferred that expects to fire with an instance. It has a __getattr__ which lets it pretend to implement any method. Such methods just queue the call and its arguments, then finish immediately, returning a new Promise. Something like:

class Promise:
  def __getattr__(self, methname):
    if self.resolved:
        m = getattr(self.resolution, methname)
        assert callable(m)
        return m
    def newmethod(*args, **kwargs):
        self.calls.append((methname, args, kwargs))
        # except more cleverness in case the method is invoked after the
        # promise is resolved
    return newmethod

When the Deferred fires, all pending calls are invoked on the instance it fired with. Each call also returns a Promise, possibly already fulfilled, with the results of that call, so that p.meth1().meth2() is the asynchronous equivalent of o.meth1().meth2(), or func2(func1(o)). 'p.meth1(); p.meth2()' means that meth2 must be invoked *after* meth1: I'm not sure what other kind of sequencing promises to make (should we wait until meth1 has finished before invoking meth2?).

If the Deferred errbacks instead, then the Promise is "smashed", which is like an errback. No further method calls are made, any dependent Promises are smashed too.

The idea is to make the asynchronous domain be the normal case, and mark the boundary with the synchronous domain specially. when() would be a function that turns a Promise into a Deferred, with which the transition could be scheduled:

def when(p):
  if not isinstance(p, Promise):
    return defer.succeed(p.resolution)
  if p.resolved:
    return defer.succeed(p.resolution)
  else:
    d = defer.Deferred()
    p.waiting.append(d)
    return d

He pointed out that E currently has two separate method invocation syntaxes: 'o.foo()' requires a local reference, and may or may not return a Promise. 'p <- foo()' can accept either a local reference or a Promise, and always returns a Promise. (actually I'm not sure I'm getting this right, but the implication was that there were two forms, one for local and one for remote, whereas Tyler felt that there should only be one).

Then, later, we'll create the RemotePromise, which is a Promise that's associated with a RemoteReference. rp.foo(args) is equivalent to d.addCallback(lambda res: res.callRemote("foo", args)) . When Promises are serialized, they get a clid and show up as another Promises on the far end. You push the waiting as far away as possible, apparently this is the way to reduce the probability of deadlocks.

My main concern with this syntax is that it may confuse the synchronous-domain developers that we (as Twisted) have been trying to gently nudge into the world of asynchronous programming. We're not blocking, but the code looks a lot like that's what's happening. But, once you've stopped thinking that the lack of a .callLater implies immediate execution, the p.meth(args) syntax really is a lot cleaner. You just assume that everything could be a promise, and you use when() if you need to assure that you have an immediate value.

One problem with reference counting is that your peer can force you to retain an object for arbitrarily long times, by just never sending you the decref (and Gifts make things even worse). Tyler's hunch is that distributed reference counting is the wrong approach, and it is more practical to manage object lifetime with the Vat/Tub. Break application processing into units, create a Tub for each unit, when the unit is finished, destroy the Tub. All objects that pass through a Tub are registered (under an unguessable name) in that Tub, so they remain accessible for the lifetime of the Tub, and then become inaccessible when the Tub is destroyed.

To use this well, it must be easy to create new Tubs and destroy them later. These Tubs must be able to share listener ports, which can distinguish the desired Tub by its keyid. To accomplish this with newpb, I think we may need a module-level registry of Listeners, so that two Tubs that are asked to listen on the same port will register with the same Listener. (it might also make sense to use newtub = oldtub.makeTub(), and have the Listener be inherited). We should pay attention to the possibility of sharing a TCP connection to an existing Tub, but keep in mind that separate TLS keys will require separate TCP connections.

Secure PB URLs want a key as the primary specifier, followed by a list of location hints, followed by a Tub-scoped name.

PBY url: pby://key@1.2.3.4,foo.com,[::1],loc2,loc3/name
 key is base32(sha1(tub.pubkey))
 unix socket is trickier
 non-authenticated url still requires Tub ID
He also feels that DoS prevention (one of the three reasons for Constraints, the other two being semantic typechecking assertions and API documentation) is difficult to implement and hard to get right, and unlikely to do the complete job that you'd want out of it. He said MarkM burned a lot of cycles trying to build DoS prevention techniques into CapIDL, and it would be worth asking him for his thoughts.

He said one deployment pattern would be to put security proxies in a set of separate processes, which perform deserialization, check arguments, etc, and then pass the results on to the real object. The security proxies would be CPU/memory limited, and there would be one per connection, so that if someone started to abuse their connection, only they would suffer. Once you get to a service large enough to be worried about DoS attacks, you'd want this architecture anyway because then you can distribute it out to multiple machines. I was skeptical about how to go about implementing this sort of proxy: how much CPU time do you give it? If it takes 1ms to deserialize a message that then consumes 1s of server time, do you have to restrict it to 1/1000th the CPU time of the server? Note that other possibilities include strict prioritization of the processes/threads (so the connections are starved until the server becomes idle), and enforcing one-at-a-time processing of messages.

His approach in web-amp was just to limit each serialized argument to 8kb. The objection that this might not be enough is countered by the fact that if you're sending more data than that, you should mark it explicitly (by creating a publish/subscribe model), because there's a good chance that the data is being used on the wrong side of the wire. The attacker is allowed to do whatever evil they can accomplish in 8kb, maybe that means a 2k-deep nested series of lists, but whatever it is won't be too big. I feel that at some point you have to enforce a limit.. in web-amp, you must limit the total number of arguments they can send you, or the number of method calls per second, or something.

The non-DoS-related semantic typechecking (I'm expecting an int, is it really an int?) is just as easily done with assert()s inside the method body. I want this kind of checking to happen as close to the top of the method as possible.. doing it in a RemoteInterface in some separate file feels wrong to me. One approach is a func.guard method attribute (whose constructor takes arguments much like the RemoteInterface methods do), which could be pulled up to the top of the method body with a decorator. The big difference in thought here is the idea of providing objects (which happen to implement a certain set of methods) versus providing methods (which happen to be bound to a particular object).

A lot of the typechecking concerns are eased with finer-grained capabilities. Ideally, the worst they can do by sending you a weird object type is to cause an exception. As long as you haven't registered an Unslicer that gives the resulting object some ambient authority, you aren't going give them any new privileges by invoking a method on something they *can* give you. Tyler says you only do typechecking when you're considering granting them some new privileges. The notion is that it's the bound-method capability that is the basis of power, not what they do with it or what they send to it.

The constraints are useful for method documentation, especially if they can be serialized and passed to an object browser, but can only document the list of methods and the names/types of their arguments. The actual API description still needs to be in epydoc, which can provide (non-machine-parseable) argument name/type docs too.

positional parameters for interoperability with java:

java doesn't have keyword args. To provide interoperability, the python-newpb method call serializer needs to send args in strict order, the java newpb receiver would ignore the argument names (only using the values). In the other direction, the java method call serializer would send None for the argument names, and the python receiver would use the local RemoteInterface to turn the argument list into a kwargs dict.

Finally, I need to study the XML schemas in the web-calculus more closely. In it, the bound method closure URL can be used for two purposes: a GET returns the method schema (a description of what types the positional parameters will accept), while a POST will invoke the closure. However, the object which provided that URL has a class, and the method clause had a name, and the method schema is always the same for any given (class, methodname) pair, so even a fully send-time-checking implementation doesn't have to retrieve any method schema more than once. I had first thought that there was some reduncancy in the XML data being returned, but Tyler's put a lot of thought and time into it to minimize the round-trips and avoid redundancy. newpb would be well-served by studying his approach carefully.

posted at: 18:00 | path: /twisted | permanent link to this entry

Thu, 26 May 2005

books

I started in on Alastair Reynolds' _Century Rain_ last night, got about halfway through before I finally succumbed to sleep. It's a good read: finally he gets to have at least a few chapters that don't involve pervasing nanotechnology or uploaded personality constructs or galaxy-spanning machine intelligences.

I was thrown at first, however, because he's got a system-wide human government named The Polity, and just last week I had finished reading Neal Asher's _Line Of Polity_, in which *his* galaxy-wide human government (also named The Polity) is considerably more powerful, and somewhat less conflicted, and certainly motivated by different things. It took me a while to put that Polity out of my mind.

posted at: 12:57 | path: / | permanent link to this entry

Tue, 24 May 2005

and a calendar too

Hey, that wasn't too bad. I also added some CSS to make everything a tiny bit less ugly.

Now all I need is auto-completion on the category elisp..

posted at: 01:18 | path: /weblog | permanent link to this entry

Mon, 23 May 2005

adding subcategories

I think I've gotten my elisp code to handle pyblosxom categories now. pyblosxom was easy, but I have to add the glue to let you choose a category. Unfortunately creating new categories requires manual work (registering the CVS directory).

Next step: find a pyblosxom plugin to create that spiffy little category sidebar I've seen on so many other blogs.

posted at: 20:29 | path: /weblog | permanent link to this entry

Sat, 21 May 2005

great week

Man, what a great week. I spent a couple of days working with Donovan at his office on a couple of issues: making py.test capable of running Twisted test cases, improving LivePage event notification, and setting up a BuildBot for their in-house test suite.

Thursday night was the BayPIGgies meeting (a local Python users group), held at Google's spiffy office complex in mountain view. I handed off some JavaButton hardware that I'm loaning to Pavel for a month, and wound up hanging out with Zooko for the rest of the evening, talking about some software licensing ideas he's been thinking about. We agreed that they need a bit of work, but were still quite promising, and we were up pretty late arguing about the details. When you start talking about metalicenses, you know it's getting late.

Friday I spent at HP hanging out with some of the E/Capabilities people. In the discussion I happened to mention an essay I'd seen about expectations of privacy in online spaces, unfortunately I wasn't able to remember the site or the author in realtime. Of course it turns out that it was written by Danny O'Brien, whom I met at CodeCon and when we talked to the ZigBee people about licensing their technology and brands in a way that would make them more compatible with free-software. Small world.

The afternoon was spent at Kragen's office watching he and Donovan and Mark work on Wheat. When Tyler showed up we spent about half an hour talking about newpb could incorporate some of the ideas of his web-calculus model. This was really useful, it sounds like he's addressed most of the problems we've encountered in building newpb. I think there exists a possibility that we could use his serialization scheme and (since they're working on making E speak the same protocol) thus make newpb interoperate with E. That would be a nice accomplishment.

posted at: 15:08 | path: / | permanent link to this entry

Mon, 16 May 2005

SPF

I've been trying to decide whether to publish an SPF record for lothar.com or not. The last few days have seen an absolute deluge of spam from some german bastards, much of which is being forged in my name. The only real solution is, of course, to sign everything and make sure the entire rest of the world knows about that practice. Or magically switch everybody over to my http://petmail.lothar.com/ project.

But I'm starting to think that SPF might address the specific frustration I'm feeling with this forgery. And I'm seeing about 2-3 TXT record lookups per hour, so *somebody* out there is using it.

http://homepages.tesco.net/~J.deBoynePollard/FGA/smtp-spf-is-harmful.html http://www.anders.com/projects/sysadmin/djbdnsRecordBuilder/

posted at: 20:05 | path: / | permanent link to this entry

iButtons

I was talking with Pavel (aka PenguinOfDoom, on #twisted) last week about iButtons, and mentioned the JavaButton I picked up years ago that I haven't really managed to do anything with yet. That prompted me to poke around the web site (was dalsemi.com, since bought by http://www.maxim-ic.com), and it turns out they have a new-ish version of the portable C code that interfaces with the things. The last time I looked (version 300b2), there was a single function left unimplemented which prevented the use of JavaButtons on a USB adapter under linux. (non-java buttons were ok, serial port adapters were ok, it was just the combination that didn't work). I don't yet know if that's been fixed in the "new" (2004) version 300 library.

Trying to buy a JavaButton looks hard, much harder than it was when I got mine. It probably requires talking to a sales rep. I got a starter kit that included a DS1957 on a USB key fob, very nicely designed. The only part I can see listed on their web site is the DS1955, which has like 8kB of ram (the DS1957 has more like 150kB). The JavaButtons include cryptographic code, so they require a license/export agreement, but it would have been nice if they made it clear how you obtain such a thing.

Anyway, here are a handful of links, since their web site seems particularly hard to navigate.

http://www.maxim-ic.com/products/ibutton/software/1wire/wirekit.cfm

http://www.maxim-ic.com/pl_list.cfm/filter/22 list of iButton data sheets

http://www.maxim-ic.com/1-Wire.cfm regular ICs (not in a steel can) using the same protocol, usually TO-92

http://www.maxim-ic.com/products/microcontrollers/crypto_ibutton_license_application.cfm might be the entry point to buying a JavaButton, or maybe just one of their crypto iButtons

UPDATE: no, version 300 still does not support JavaButtons over USB. The specific issue is that JavaButtons require a strong pullup to provide lots of power while they're crunching away in the crypto routines. The USB adapter can do this, but the Linux interface code doesn't know how to turn it on. lib/general/Link/USB_Linux/usblnk.c has a routine named hasPowerDelivery, which currently reads:

SMALLINT hasPowerDelivery(int portnum)
{
   // Adapter supports it but not implemented yet
   return FALSE;
}

Sigh.

posted at: 11:27 | path: / | permanent link to this entry

Sat, 07 May 2005

sparklines

My friend Drew just sent this one along:

http://bitworking.org/news/Sparklines_in_data_URIs_in_Python

I'm pondering things I might do with this. I've been using Data: URIs for one of my projects, they're pretty handy and both Firefox and Safari are more than happy to take ridiculously large ones (50k or more). Like Drew, I'm wondering what I could do with sparklines.

The first thing that comes to mind is a compact representation of BuildBot test results. When you look at the history of a single builder, a series of builds over time, what you care about it how the results have changed from one build to the next. I've been thinking about having the buildbot pay attention to things like when any given test starts failing or starts passing again, but until I get around to writing that code, you could use a sparkline to represent the test results in a compact glyph, and then just show the last 50 of those. The user could then scan them visually to look for changes.

I'm not sure where else to use them yet. I'm tempted to write a Nevow renderer to create them, though, because that would make it a lot easier to insert them into other pages. That would let you use some HTML like: <div nevow:render="sparkline" nevow:data="stuff" /> and then implement a data_stuff method that would return whatever you wanted to put into the sparkline.

posted at: 12:40 | path: / | permanent link to this entry

Wed, 04 May 2005

pyblosxom-noindex

After some amount of perseverance, I finally figured out how to make pyblosxom insert "noindex" meta tags in the top-level index page. This was the last barrier keeping me from linking this blog to the main site, since I didn't want Google indexing a page that's going to change every few days anyway. For reference, here's the plugin I made. It's remarkably simple, after I traced through the code for several hours to figure out what function needed to be hooked.
#! /usr/bin/python

import sys

template = \
"""<html>
<head><title>$blog_title_with_path</title>
<meta name="robots" content="follow,noindex" />
</head>

<body><h1>$blog_title</h1><p>$pi_da $pi_mo $pi_yr</p>

"""

def cb_head(args):
    """This replaces the HEAD portion of the template whenever a 'directory'
    is being rendered. The modified template adds special 'noindex' meta tags
    to tell google that it shouldn't bother indexing the main page (since it
    will change), but to index the permalink pages instead.
    """
    #print >>sys.stderr, args['template']
    if args['request'].getData()['bl_type'] == "dir":
        args['template'] = template
    return args
</pre>
posted at: 03:13 | path: /weblog | permanent link to this entry

Wed, 27 Apr 2005

buildbot versus windows

I just spent several hours getting a reasonable python environment working under Windows, something I had hoped to never have a need for. The Buildbot is having some.. disagreements.. with Windows, and it became clear that being able to reproduce the problem locally was the only sane way to fix it.

Man, was that painful.

For the record, here's what I did. Many thanks to Bear for creating this checklist and walking me through the process.

0. Check to make sure your PATHEXT environment variable has ";.PY" in it -- if not set your global environment to include it.

Control Panels / System / Advanced / Environment Variables / System variables

1. Install python -- 2.4 -- http://python.org * run win32 installer - no special options needed so far

2. install zope interface package -- 3.0.1final -- http://www.zope.org/Products/ZopeInterface * run win32 installer - it should auto-detect your python 2.4 installation

3. python for windows extensions -- build 203 -- http://pywin32.sourceforge.net/ * run win32 installer - it should auto-detect your python 2.4 installation

the installer complains about a missing DLL. Download mfc71.dll from the site mentioned in the warning (http://starship.python.net/crew/mhammond/win32/) and move it into c:\Python24\DLLs

4. at this point, to preserve my own sanity, I grabbed cygwin.com's setup.exe and started it. It behaves a lot like dselect. I installed bash and other tools (but *not* python). I added C:\cygwin\bin to PATH, allowing me to use tar, md5sum, cvs, all the usual stuff. I also installed emacs, going from the notes at http://www.gnu.org/software/emacs/windows/ntemacs.html . Their FAQ at http://www.gnu.org/software/emacs/windows/faq3.html#install has a note on how to swap CapsLock and Control.

I also modified PATH (in the same place as PATHEXT) to include C:\Python24 and C:\Python24\Scripts . This will allow 'python' and (eventually) 'trial' to work in a regular command shell.

5. twisted -- 2.0 -- http://twistedmatrix.com/projects/core/ * unpack tarball and run python setup.py install Note: if you want to test your setup - run: python c:\python24\Scripts\trial.py -o -R twisted (the -o will format the output for console and the "-R twisted" will recursively run all unit tests)

I had to edit Twisted (core)'s setup.py, to make detectExtensions() return an empty list before running builder._compile_helper(). Apparently the test it uses to detect if the (optional) C modules can be compiled causes the install process to simply quit without actually installing anything.

I installed several packages: core, Lore, Mail, Web, and Words. They all got copied to C:\Python24\Lib\site-packages\

At this point

trial --version

works, so 'trial -o -R twisted' will run the Twisted test suite. Note that this is not necessarily setting PYTHONPATH, so it may be running the test suite that was installed, not the one in the current directory.

6. I used CVS to grab a copy of the latest Buildbot sources. To run the tests, you must first add the buildbot directory to PYTHONPATH. Windows does not appear to have a Bourne-shell-style syntax to set a variable just for a single command, so you have to set it once and remember it will affect all commands for the lifetime of that shell session.

set PYTHONPATH=. trial -o -r win32 buildbot.test

To run against both buildbot-CVS and, say, Twisted-SVN, do:

set PYTHONPATH=.;C:\path to\Twisted-SVN

posted at: 15:14 | path: / | permanent link to this entry

Sat, 23 Apr 2005

buildbot hacking

I'm pushing to get a new BuildBot release out on monday, so the last few days have been a flurry of commits (and the weekend will probably be the same). I was very pleased to hear that the Boost crew have implemented a Buildbot to run their (very large) regression test suite, especially because Dave Abrahms and I talked about setting one up two years ago, at PyCon, and I was never able to give them the time to make it happen. I was even more pleased to hear that their goal is to move all their testing over to buildbot. You couldn't ask for better marketing than for the STL heir-apparent to be using your project :).

Both Thomas (at Fluendo) and the Boost folks have patched their buildbots to allow the waterfall display be themed with CSS, and the results look great. I'm looking forward to getting Thomas's code pulled into the mainline sources.. finally a way to make the waterfall display less ugly.

Finally, the metabuildbot is shaping up. This is a buildbot that works to run the buildbot's own unit tests. I need to find a reasonable hostname and link for it, then I'll make it publically visible. Bear has put a lot of time so far into making the win32 slave work correctly, with no success yet (the specific problem is that I'm using Arch to get up-to-date sources out to the buildslaves, and tla is not happy on win32, some kind of 260-character limit on pathnames that tla runs up against when it does a checkout). I've dropped back to CVS for now (with a three-hour timeout in the hopes of getting around sf.net's enormous anoncvs latency), but a separate bug in the buildslave, compounded by a bug in the buildslave's error-handling code, have conspired to get the win32 slave into a state that requires manual intervention to un-jam. Grr, stupid windows.

posted at: 03:50 | path: / | permanent link to this entry

Wed, 20 Apr 2005

twisted talk

So I think the talk went really well. I spoke for about an hour before the room was needed for another meeting, to about 10 or 15 OSAF developers. I managed to cover the reactor, Protocols, Factories, building higher-level protocols, Failures, Deferreds, reactor.run() vs twistd -y vs mktap/twistd -f, and even a bit of twisted.web (the resource-tree model) and threads (reactor.runInThread/runFromThread). The things that were on my list but which I didn't get to cover were Cred, usage.Options, PB, and Interfaces.

But all in all I think the session helped a lot of people get their heads around the architecture.. I think they're now in a position to understand the existing HOWTOs and other documentation.

After the session, I sat down with Brian and two other OSAF folks: Lisa Dusseault and Grant Baillie. They are working on WebDAV, and have a strong interest in a functional WebDAV client library. As I understand it, this library's top-level API would need to look like an abstract file system, with directory lookups, pathnames, something like file handles, and file attributes. Inside, it would need to have a back end which actually speaks WebDAV to some server, creating new connections when necessary, or re-using persistent connections is possible. There would also need to be some sort of cache-management policy hting, since smart caching can make or break the performance of a WebDAV session.

Given their needs, we agreed that a Twisted WebDAV client library would be a great solution, and they've got the motivation and the knowledge (apparently Lisa was one of the primary WebDAV folks at Microsoft) to pull it off.

I described the recent work that's gone into an abstract file system (by spiv and others, for twisted.ftp), thinking that it would be the best place to start. The next step will probably be to introduce them to spiv, and float a post on twisted-python to see who else has an interest.

Brian also gave me a quick demo of Chandler, giving me a better idea about where they're going and what their plans are. It's funny, about 15 years ago I had a summer job at a research lab who had a similar goal. They were working on OCR and search technology, and wanted to make a box that could digitize and read all the random bits of paper that you produce in the course of a day, then let you index the information contained on them in a useful way. The Chandler folks want to take all the random bits of digital information that you create in the course of a day (email, IMs, calendar entries, todo lists) and organize/share them in a useful way. Kinda neat. I look forward to seeing where it goes.

posted at: 01:15 | path: /twisted | permanent link to this entry

Tue, 19 Apr 2005

OSAF Twisted talk

This is a rough outline of the talk I'll be giving at the OSAF tomorrow.
definition of Twisted, resources:
 http://www.twistedmatrix.com
  svn://svn.twistedmatrix.com/svn/Twisted/trunk
  http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
  http://twistedmatrix.com/bugs/
  http://twistedmatrix.com/buildbot/
 #twisted, #twisted.web on freenode

relationship of subprojects, dependencies:
 core, names, mail, web, words, conch, trial
 zope.interface, python2.2
 optional: pyopenssl, db stuff

directory overview:
 twisted.python: usage.Options, Failure, log
 twisted.internet: reactors, base classes for Protocol+Factory, Deferred
 twisted.protocols: simple protocols: finger, socks, telnet
 subproject directories
 doc/*/howto
 doc/core/howto/tutorial/listings/finter/*.py

motivation:
 simple client
 simple server
 not-so-simple server
 client+server
 need for a generalized solution
 threads, processes, event loop
event loop:
 asyncore
 reactor

picture: reactor with select() call, sockets in .readers/.writers
 sockets have .doRead, .doWrite, are scheduled with .addReader/etc
 timers
 different kinds of reactors, using other event loops: gtk, kqueue

picture: Protocol with Transports, reactor
 Protocol: connectionMade, dataReceived, connectionLost, transport.write

how do those Protocols get created?
reactor.listenTCP(port, factory)
picture (server): Protocols, Factory
 listening socket (Port) points to Factory, creates new Protocols
 Factory gets startFactory, stopFactory, buildProtocol
 Protocols generally have .factory

reactor.connectTCP(host, port, factory)
picture (client):
 Factory gets startedConnecting, clientConnectionFailed, clientConnectionLost
  as well as startFactory, stopFactory, buildProtocol
 Connector is responsible for getting a connection to host+port+factory
  possibly multiple times, for ReconnectingClientFactory
 skip over Connector stuff

writing Protocols, using existing ones
picture: t.p.finger.Finger
 overridable methods for getUser, getDomain, forwardQuery
 subclass, override method
 make a Factory which instantiates your new subclass
 attach to listenTCP

Protocols are used for both clients and servers
 state machine
 return one-shot results with Deferreds
 return multi-shot results by overriding methods

larger protocols have more complex setup

names: protocol parses the query, hands to factory
 factory does self.handleQuery, asks self.resolver, calls self.sendReply
 # good example of API, use of deferred: t.n.server.py:120, dns.py:1050

web: basic HTTP protocol creates Requests, then does req.process
 twisted.web.site implements a Resource tree
  picture(web): root, getChild(), isLeaf, render(req)
  specialized subclasses provide CGI processing, static.File, distrib

imap: involves cred, Mailbox objects, Message objects

top-level invocation:
 __main__, reactor.run()
  connectTCP, listenTCP
 or, creating an Application, then using twistd
  motivation: daemonization, logging, setuid/chroot, reactor, profiling
   think /etc/init.d
  picture: trees of Service/MultiService objects
   each gets startService, stopService
   t.a.internet.TCPServer(port, factory), TCPClient
  twistd -y foo.tac, script which creates an Application object
   sidebar: python as a configuration language
  serialize the Application, then launch it again later: twistd -f foo.tap
  shortcuts for common applications: mktap
  mktap plugins: Options, makeService(), register with plugins.tml

threads:
 nothing here needs threads
 where are they useful?
  wrapping blocking APIs: adbapi in particular
  integrating with other code
 threadpool: run a function in a thread, tell me when it is done

t.p.log:
 log.msg(msg, msg) emits a log
 log.err() emits the current exception
 log.err(f) emits a Failure object
 log output goes to an observer
 running from twistd: goes to twistd.log, or syslog
 running from __main__: log messages are discarded
 log.startLogging()

Failure:
 encapsulates a python exception
 can be serialized, printed, queried about what caused it
 Failure() inside an except: block wraps the current exception

Deferred:
 callback management
 use web.client.getPage as an example
 synchronous style:
   a=foo()
   b=bar(a)
   baz(b)
 asynchronous style:
   d=foo();
   d.addCallback(bar)
   d.addCallback(baz)
 callback vs errback, ladder diagram
 fire-before-addCallback is safe
 callbacks can return Deferreds: sub-ladders

usage.Options:
 create subclass, attributes indicate valid options
  optFlags, optParameters, subCommands
  define opt_foo(self,str) to implement --foo=str
 methods can customize processing further
  parseArgs, postOptions
 str() provides usage message
 Options implements the dict interface, opts['foo'], opts['v']
 usually invoked with opts.parseOptions(), which grabs sys.argv
 why? mktap plugins use the 'Options' class from the plugin to parse argv

lore:
 turn .xhtml into .html (or .latex, others)
  inline listings, pretty-print python code
  links to epydoc-generated API docs

pb:
 translucent RPC
 f=pb.PBServerFactory(root); reactor.listenTCP(port, f)
 cf=pb.PBClientFactory(); reactor.connectTCP(host, port, cf)
 d=cf.getRootObject(); d.addCallback(dostuff)
 ref.callRemote("method", args)
 def remote_method(self, args)

cred: howto is really good
 avatar, portal, realm, credentials, checker, mind
 portal has a set of checkers
 checker gets credentials, decides if they're ok, provides an avatarID
 realm gets avatarID and desired interfaces, returns an avatar
 protocol gets back the avatar, does stuff with it

interfaces: PEP245-style
 twisted/python/components.py
 zope.interface, tiny portion of Zope3
 many APIs want "object that can be adapted to IFoo" rather than an instance
  of a specific class
 some systems use it extensively: nevow's 'context': IRequest,ISession,ISite

posted at: 02:44 | path: /twisted | permanent link to this entry

Mon, 18 Apr 2005

emacs

I set up a few tools to post blog entries from emacs. All entries are kept in CVS, and the whole tree is rsync'ed over to the web server. The elisp which actually publishes the entry looks like this:
(defvar pyblosxom-entry-dir "~/stuff/Projects/WebLog/entries")

;; adapted from http://wiki.woozle.org/BlogdorEngine
;; and http://list-archive.xemacs.org/xemacs/200211/msg00022.html

(defun char-isalpha-p (thechar)
  "Check to see if thechar is a letter"
  (and (or (and (>= thechar ?a) (<= thechar ?z))
	   (and (>= thechar ?A) (<= thechar ?Z)))))

(defun char-isnum-p (thechar)
  "Check to see if thechar is a number"
  (and (>= thechar ?0) (<= thechar ?9)))

(defun char-isalnum-p (thechar)
  (or (char-isalpha-p thechar) (char-isnum-p thechar)))


(require 'cl-seq)

(defun blog-publish ()
  "Publish the blog entry in the current buffer"
  (interactive)
  (shell-command (format "cvs commit -m 'blog entry' %s"
                         (file-name-nondirectory buffer-file-name)))
  (shell-command "make -C .. publish")  ; publish
)

(define-minor-mode pyblosxom-post-minor-mode
  "Minor mode for blog posts"
  nil
  " blog-post"                          ; mode-line indicator
  '(
    ("\C-c\C-c" . blog-publish)
    )
  ()                                    ; forms run on mode entry/exit
)

(defun blog-post (title)
  "Create a journal entry"
  (interactive "sTitle: ")
  (let ((filetitle (substitute-if-not ?_
                                      (lambda (c) (char-isalnum-p c))
                                      title)))
    (find-file (concat pyblosxom-entry-dir "/"
                       filetitle
                       (format-time-string "-%Y-%m-%d-%H-%M")
                        ".txt"))
    (goto-char (point-min))
    (insert title "\n\n")
    (save-buffer)
    (vc-register)
    (pyblosxom-post-minor-mode 1)
))
posted at: 02:13 | path: /weblog | permanent link to this entry

Powered by PyBlosxom