May 2005
| |||||||||||||||||||||||||||||||||||||||||||||||||
|
/ (45) code/ (1) emacs/ (3) foolscap/ (1) go/ (1) hardware/ (2) python/ (2) security/ (1) spam/ (1) twisted/ (8) version-control/ (1) web/ (1) weblog/ (6) | |||||||||||||||||||||||||||||||||||||||||||||||||
I was talking with my brother-in-law about a gadget to make playing Go online a bit more like playing it in person. The feel of the board and the THWACK! as you plunk down stones adds a lovely touch to the game, but you don't get that when clicking on the cgoban window. We talked about using a real Go board at each end, pointing a camera at it to figure out where you've just moved and relay it to the server, and using a targettable laser pointer (on a pair of servos) to point to where your partner has just played.
I ran into this blog today about a guy who's interested in part of this problem, specifically using image-processing software to create a log of a game in progress. He also has a link to a japanese academic paper about doing the same thing (specifically creating a game log, aka "Kifu", from a recording of a TV program).
I visited the SF Go Club for the first time last week, and had a great time.. looking forward to going again next week.
Spent another great day down at HP, talking about implementing E and web-calculus concepts within Twisted and newpb. Tyler Close was kind enough to spend the entire afternoon with me, explaining how his web-calculus works and the design decisions behind it. I'm really excited about implenting this stuff in newpb: I think we can make a system that's both secure *and* highly usable. Some of the ideas I came away with that I want write up before I forget:
Promises: In addition to Deferred, we can build a Promise. The usage syntax would look like:
p = tub.getReference(url) p.authorize(credentials).subscribe(self) when(p.getReady()).addCallback(lambda res: p.trigger()) p2 = Promise(d1) # turn "deferred which fires with an instance" into a Promise p3 = p2.invoke() d2 = when(p3) d2.addCallback(stuff)
The Promise object is basically a wrapper around any Deferred that expects to fire with an instance. It has a __getattr__ which lets it pretend to implement any method. Such methods just queue the call and its arguments, then finish immediately, returning a new Promise. Something like:
class Promise:
def __getattr__(self, methname):
if self.resolved:
m = getattr(self.resolution, methname)
assert callable(m)
return m
def newmethod(*args, **kwargs):
self.calls.append((methname, args, kwargs))
# except more cleverness in case the method is invoked after the
# promise is resolved
return newmethod
When the Deferred fires, all pending calls are invoked on the instance it fired with. Each call also returns a Promise, possibly already fulfilled, with the results of that call, so that p.meth1().meth2() is the asynchronous equivalent of o.meth1().meth2(), or func2(func1(o)). 'p.meth1(); p.meth2()' means that meth2 must be invoked *after* meth1: I'm not sure what other kind of sequencing promises to make (should we wait until meth1 has finished before invoking meth2?).
If the Deferred errbacks instead, then the Promise is "smashed", which is like an errback. No further method calls are made, any dependent Promises are smashed too.
The idea is to make the asynchronous domain be the normal case, and mark the boundary with the synchronous domain specially. when() would be a function that turns a Promise into a Deferred, with which the transition could be scheduled:
def when(p):
if not isinstance(p, Promise):
return defer.succeed(p.resolution)
if p.resolved:
return defer.succeed(p.resolution)
else:
d = defer.Deferred()
p.waiting.append(d)
return d
He pointed out that E currently has two separate method invocation syntaxes: 'o.foo()' requires a local reference, and may or may not return a Promise. 'p <- foo()' can accept either a local reference or a Promise, and always returns a Promise. (actually I'm not sure I'm getting this right, but the implication was that there were two forms, one for local and one for remote, whereas Tyler felt that there should only be one).
Then, later, we'll create the RemotePromise, which is a Promise that's associated with a RemoteReference. rp.foo(args) is equivalent to d.addCallback(lambda res: res.callRemote("foo", args)) . When Promises are serialized, they get a clid and show up as another Promises on the far end. You push the waiting as far away as possible, apparently this is the way to reduce the probability of deadlocks.
My main concern with this syntax is that it may confuse the synchronous-domain developers that we (as Twisted) have been trying to gently nudge into the world of asynchronous programming. We're not blocking, but the code looks a lot like that's what's happening. But, once you've stopped thinking that the lack of a .callLater implies immediate execution, the p.meth(args) syntax really is a lot cleaner. You just assume that everything could be a promise, and you use when() if you need to assure that you have an immediate value.
One problem with reference counting is that your peer can force you to retain an object for arbitrarily long times, by just never sending you the decref (and Gifts make things even worse). Tyler's hunch is that distributed reference counting is the wrong approach, and it is more practical to manage object lifetime with the Vat/Tub. Break application processing into units, create a Tub for each unit, when the unit is finished, destroy the Tub. All objects that pass through a Tub are registered (under an unguessable name) in that Tub, so they remain accessible for the lifetime of the Tub, and then become inaccessible when the Tub is destroyed.
To use this well, it must be easy to create new Tubs and destroy them later. These Tubs must be able to share listener ports, which can distinguish the desired Tub by its keyid. To accomplish this with newpb, I think we may need a module-level registry of Listeners, so that two Tubs that are asked to listen on the same port will register with the same Listener. (it might also make sense to use newtub = oldtub.makeTub(), and have the Listener be inherited). We should pay attention to the possibility of sharing a TCP connection to an existing Tub, but keep in mind that separate TLS keys will require separate TCP connections.
Secure PB URLs want a key as the primary specifier, followed by a list of location hints, followed by a Tub-scoped name.
PBY url: pby://key@1.2.3.4,foo.com,[::1],loc2,loc3/name key is base32(sha1(tub.pubkey)) unix socket is trickier non-authenticated url still requires Tub IDHe also feels that DoS prevention (one of the three reasons for Constraints, the other two being semantic typechecking assertions and API documentation) is difficult to implement and hard to get right, and unlikely to do the complete job that you'd want out of it. He said MarkM burned a lot of cycles trying to build DoS prevention techniques into CapIDL, and it would be worth asking him for his thoughts.
He said one deployment pattern would be to put security proxies in a set of separate processes, which perform deserialization, check arguments, etc, and then pass the results on to the real object. The security proxies would be CPU/memory limited, and there would be one per connection, so that if someone started to abuse their connection, only they would suffer. Once you get to a service large enough to be worried about DoS attacks, you'd want this architecture anyway because then you can distribute it out to multiple machines. I was skeptical about how to go about implementing this sort of proxy: how much CPU time do you give it? If it takes 1ms to deserialize a message that then consumes 1s of server time, do you have to restrict it to 1/1000th the CPU time of the server? Note that other possibilities include strict prioritization of the processes/threads (so the connections are starved until the server becomes idle), and enforcing one-at-a-time processing of messages.
His approach in web-amp was just to limit each serialized argument to 8kb. The objection that this might not be enough is countered by the fact that if you're sending more data than that, you should mark it explicitly (by creating a publish/subscribe model), because there's a good chance that the data is being used on the wrong side of the wire. The attacker is allowed to do whatever evil they can accomplish in 8kb, maybe that means a 2k-deep nested series of lists, but whatever it is won't be too big. I feel that at some point you have to enforce a limit.. in web-amp, you must limit the total number of arguments they can send you, or the number of method calls per second, or something.
The non-DoS-related semantic typechecking (I'm expecting an int, is it really an int?) is just as easily done with assert()s inside the method body. I want this kind of checking to happen as close to the top of the method as possible.. doing it in a RemoteInterface in some separate file feels wrong to me. One approach is a func.guard method attribute (whose constructor takes arguments much like the RemoteInterface methods do), which could be pulled up to the top of the method body with a decorator. The big difference in thought here is the idea of providing objects (which happen to implement a certain set of methods) versus providing methods (which happen to be bound to a particular object).
A lot of the typechecking concerns are eased with finer-grained capabilities. Ideally, the worst they can do by sending you a weird object type is to cause an exception. As long as you haven't registered an Unslicer that gives the resulting object some ambient authority, you aren't going give them any new privileges by invoking a method on something they *can* give you. Tyler says you only do typechecking when you're considering granting them some new privileges. The notion is that it's the bound-method capability that is the basis of power, not what they do with it or what they send to it.
The constraints are useful for method documentation, especially if they can be serialized and passed to an object browser, but can only document the list of methods and the names/types of their arguments. The actual API description still needs to be in epydoc, which can provide (non-machine-parseable) argument name/type docs too.
positional parameters for interoperability with java:
java doesn't have keyword args. To provide interoperability, the python-newpb method call serializer needs to send args in strict order, the java newpb receiver would ignore the argument names (only using the values). In the other direction, the java method call serializer would send None for the argument names, and the python receiver would use the local RemoteInterface to turn the argument list into a kwargs dict.
Finally, I need to study the XML schemas in the web-calculus more closely. In it, the bound method closure URL can be used for two purposes: a GET returns the method schema (a description of what types the positional parameters will accept), while a POST will invoke the closure. However, the object which provided that URL has a class, and the method clause had a name, and the method schema is always the same for any given (class, methodname) pair, so even a fully send-time-checking implementation doesn't have to retrieve any method schema more than once. I had first thought that there was some reduncancy in the XML data being returned, but Tyler's put a lot of thought and time into it to minimize the round-trips and avoid redundancy. newpb would be well-served by studying his approach carefully.
I started in on Alastair Reynolds' _Century Rain_ last night, got about halfway through before I finally succumbed to sleep. It's a good read: finally he gets to have at least a few chapters that don't involve pervasing nanotechnology or uploaded personality constructs or galaxy-spanning machine intelligences.
I was thrown at first, however, because he's got a system-wide human government named The Polity, and just last week I had finished reading Neal Asher's _Line Of Polity_, in which *his* galaxy-wide human government (also named The Polity) is considerably more powerful, and somewhat less conflicted, and certainly motivated by different things. It took me a while to put that Polity out of my mind.
Hey, that wasn't too bad. I also added some CSS to make everything a tiny bit less ugly.
Now all I need is auto-completion on the category elisp..
I think I've gotten my elisp code to handle pyblosxom categories now. pyblosxom was easy, but I have to add the glue to let you choose a category. Unfortunately creating new categories requires manual work (registering the CVS directory).
Next step: find a pyblosxom plugin to create that spiffy little category sidebar I've seen on so many other blogs.
Man, what a great week. I spent a couple of days working with Donovan at his office on a couple of issues: making py.test capable of running Twisted test cases, improving LivePage event notification, and setting up a BuildBot for their in-house test suite.
Thursday night was the BayPIGgies meeting (a local Python users group), held at Google's spiffy office complex in mountain view. I handed off some JavaButton hardware that I'm loaning to Pavel for a month, and wound up hanging out with Zooko for the rest of the evening, talking about some software licensing ideas he's been thinking about. We agreed that they need a bit of work, but were still quite promising, and we were up pretty late arguing about the details. When you start talking about metalicenses, you know it's getting late.
Friday I spent at HP hanging out with some of the E/Capabilities people. In the discussion I happened to mention an essay I'd seen about expectations of privacy in online spaces, unfortunately I wasn't able to remember the site or the author in realtime. Of course it turns out that it was written by Danny O'Brien, whom I met at CodeCon and when we talked to the ZigBee people about licensing their technology and brands in a way that would make them more compatible with free-software. Small world.
The afternoon was spent at Kragen's office watching he and Donovan and Mark work on Wheat. When Tyler showed up we spent about half an hour talking about newpb could incorporate some of the ideas of his web-calculus model. This was really useful, it sounds like he's addressed most of the problems we've encountered in building newpb. I think there exists a possibility that we could use his serialization scheme and (since they're working on making E speak the same protocol) thus make newpb interoperate with E. That would be a nice accomplishment.
I've been trying to decide whether to publish an SPF record for lothar.com or not. The last few days have seen an absolute deluge of spam from some german bastards, much of which is being forged in my name. The only real solution is, of course, to sign everything and make sure the entire rest of the world knows about that practice. Or magically switch everybody over to my http://petmail.lothar.com/ project.
But I'm starting to think that SPF might address the specific frustration I'm feeling with this forgery. And I'm seeing about 2-3 TXT record lookups per hour, so *somebody* out there is using it.
http://homepages.tesco.net/~J.deBoynePollard/FGA/smtp-spf-is-harmful.html http://www.anders.com/projects/sysadmin/djbdnsRecordBuilder/
I was talking with Pavel (aka PenguinOfDoom, on #twisted) last week about iButtons, and mentioned the JavaButton I picked up years ago that I haven't really managed to do anything with yet. That prompted me to poke around the web site (was dalsemi.com, since bought by http://www.maxim-ic.com), and it turns out they have a new-ish version of the portable C code that interfaces with the things. The last time I looked (version 300b2), there was a single function left unimplemented which prevented the use of JavaButtons on a USB adapter under linux. (non-java buttons were ok, serial port adapters were ok, it was just the combination that didn't work). I don't yet know if that's been fixed in the "new" (2004) version 300 library.
Trying to buy a JavaButton looks hard, much harder than it was when I got mine. It probably requires talking to a sales rep. I got a starter kit that included a DS1957 on a USB key fob, very nicely designed. The only part I can see listed on their web site is the DS1955, which has like 8kB of ram (the DS1957 has more like 150kB). The JavaButtons include cryptographic code, so they require a license/export agreement, but it would have been nice if they made it clear how you obtain such a thing.
Anyway, here are a handful of links, since their web site seems particularly hard to navigate.
http://www.maxim-ic.com/products/ibutton/software/1wire/wirekit.cfm
http://www.maxim-ic.com/pl_list.cfm/filter/22 list of iButton data sheets
http://www.maxim-ic.com/1-Wire.cfm regular ICs (not in a steel can) using the same protocol, usually TO-92
http://www.maxim-ic.com/products/microcontrollers/crypto_ibutton_license_application.cfm might be the entry point to buying a JavaButton, or maybe just one of their crypto iButtons
UPDATE: no, version 300 still does not support JavaButtons over USB. The specific issue is that JavaButtons require a strong pullup to provide lots of power while they're crunching away in the crypto routines. The USB adapter can do this, but the Linux interface code doesn't know how to turn it on. lib/general/Link/USB_Linux/usblnk.c has a routine named hasPowerDelivery, which currently reads:
SMALLINT hasPowerDelivery(int portnum)
{
// Adapter supports it but not implemented yet
return FALSE;
}
Sigh.
My friend Drew just sent this one along:
http://bitworking.org/news/Sparklines_in_data_URIs_in_Python
I'm pondering things I might do with this. I've been using Data: URIs for one of my projects, they're pretty handy and both Firefox and Safari are more than happy to take ridiculously large ones (50k or more). Like Drew, I'm wondering what I could do with sparklines.
The first thing that comes to mind is a compact representation of BuildBot test results. When you look at the history of a single builder, a series of builds over time, what you care about it how the results have changed from one build to the next. I've been thinking about having the buildbot pay attention to things like when any given test starts failing or starts passing again, but until I get around to writing that code, you could use a sparkline to represent the test results in a compact glyph, and then just show the last 50 of those. The user could then scan them visually to look for changes.
I'm not sure where else to use them yet. I'm tempted to write a Nevow renderer to create them, though, because that would make it a lot easier to insert them into other pages. That would let you use some HTML like: <div nevow:render="sparkline" nevow:data="stuff" /> and then implement a data_stuff method that would return whatever you wanted to put into the sparkline.
#! /usr/bin/python
import sys
template = \
"""<html>
<head><title>$blog_title_with_path</title>
<meta name="robots" content="follow,noindex" />
</head>
<body><h1>$blog_title</h1><p>$pi_da $pi_mo $pi_yr</p>
"""
def cb_head(args):
"""This replaces the HEAD portion of the template whenever a 'directory'
is being rendered. The modified template adds special 'noindex' meta tags
to tell google that it shouldn't bother indexing the main page (since it
will change), but to index the permalink pages instead.
"""
#print >>sys.stderr, args['template']
if args['request'].getData()['bl_type'] == "dir":
args['template'] = template
return args
</pre>