Brian's weblog

28 Apr 2008

< April 2008 >
SuMoTuWeThFrSa
   1 2 3 4 5
6 7 8 9101112
13141516171819
20212223242526
27282930   
/ (45)
  code/ (1)
  emacs/ (3)
  foolscap/ (1)
  go/ (1)
  hardware/ (2)
  python/ (2)
  security/ (1)
  spam/ (1)
  twisted/ (8)
  version-control/ (1)
  web/ (1)
  weblog/ (6)
Mon, 28 Apr 2008

Levenshtein Distance

A library just showed up in debian ("python-levenshtein") to measure the Levenshtein Distance between two strings: the minimum number of edits (inserts, changes, deletes) necessary to turn one string into another.

I've been thinking about ways to implement efficiently-edited large mutable files for Tahoe, and it seems like a tool like this might help. Something clever like what rsync does is probably going to be involved too. The trick is that you want to determine what deltas to store without reading the whole file over the wire, from a server who isn't allowed to see the plaintext. You can store whatever ciphertext hashes you want on the far end. We're planning to provide insert/delete delta messages in the server side, using something like Mercurial's "revlog" format. The question is how to efficiently figure out the deltas on a very large file.

posted at: 18:45 | path: / | permanent link to this entry

Powered by PyBlosxom