Living fearlessly with a laptop

My Thinkpad X61s is tiny and powerful, so I like to take it places with me. Of course, when you go places with a laptop, there is a risk of it getting lost, broken, or stolen. For me (as for most people, I would guess) the integrity of the data on my laptop is far more valuable than the cost of the laptop itself. In fact, there are two measures I take which almost entirely mitigate the risks of getting my laptop stolen (lost, broken, etc.). Consequently, I have become more willing to bring my laptop places. Here is what I do:

Use hard disk encryption. The Alternate installers in Ubuntu and Debian give you the option of easily configuring an encrypted hard disk. Everything (except the /boot partition, but even your swap) that goes onto the disk is transparently encrypted. You just need to type a password whenever you boot up your computer. There is some overhead associated with encrypting everything, but if you have more than one core you will rarely notice it.

This means that, should your laptop fall into the wrong hands, no useful information whatsoever can be extracted from the hard disk.

For this to work well, you need to lock your screen when you are not using it, and your computer needs to be configured to lock the screen when you wake from suspend. It should be noted that some attacks have been described on hard disk encryption techniques. While the risk of these attacks remains low for most targets, if you are paranoid you would have to shut down (not just suspend) your computer before taking it places. You would also have to consider either overwriting the most sensitive areas of memory before you shutdown, or leaving your computer powered off for a couple of hours before taking it anywhere.

Keep backups. I do my backups over the internet so that I can backup from anywhere. I use a variant on the following script:

rsync -vaxE --delete --ignore-errors --delete-excluded --filter="merge excluded-files" /home/phil/ remotehost:/path/to/backup/destination/

where excluded-files is a file that looks like this, and contains some paths that I don't want backed up (usually, local cache-like places that are generally space-consuming and not terribly useful):

- /.local/share/Trash/
- /.mozilla/firefox/*/Cache/
- /.thumbnails/

I run this about as often as I can remember to, and before I shut down my laptop to take it somewhere. That's all there is to it.

With this measure, I can be quite confident that were my laptop to vaporize, I would lose nothing at all. It has the fortuitous side effect of making it super easy to reinstall an operating system.

Linus Torvalds on Git

I finally got around to watching the video of the tech talk that Linus gave at Google discussing the design of Git.

In this video, Linus explains a lot of the advantages of using a distributed system. But it is also enlightening because it's a window into Linus's motivations: he discusses the ways in which his own needs— as a system maintainer— drove the design of the system, in particular in the areas of workflow, speed, and data integrity.

One interesting idea is that in DVCS, the preferred development workflow (you pull from a small group of people you trust, who in turn pull from people they trust...) mirrors the way humans are wired to think about social situations. You cannot directly trust a huge group of people, but you can transitively trust many people via a web of trust— a familiar concept from security. A centralized system cannot scale because there are n2 pairs of conflicts waiting to happen, and they will happen, because groups of people are distributed (not everyone is in the same room at the same time on the same LAN). But a DVCS workflow can scale, because it is fundamentally based on interactions between people and not on the artificial technical requirement that there has to be a single canonical place for everything.

Warning: Linus has strong opinions. I think he refers to at least three different groups of people as "ugly and stupid" in the course of his 70-minute talk.

A million lines of Lisp

People rave about Lisp, and one reason why is because you can use macros to make new kinds of abstractions. Sure, in other languages you can write functions to define new procedures, but in Lisp you can write macros to define new control flow constructs (for starters). You can write functions that write functions and programs that write programs, a power which is pretty much unimaginable in most other languages. In other languages you can only stack your abstractions so high, but in Lisp, the sky's the limit. Because of macros, and others of its features, Lisp is sometimes called the programmable programming language.

I suspect that because of macros, Lisp programs grow in a way that is qualitatively different from programs in other languages. With every line of additional code in Lisp, you can do more, more quickly; in ordinary languages, the return on additional lines of code is constant if not decreasing. Various people have tried to estimate the advantage that Lisp has over C++ (in the ratio of the number of lines needed to do a particular task). Whatever the figure, I believe that Lisp's advantage should increase for larger and larger programs.

In pretty much any language, a program with a million lines of code is regarded with considerable respect. So if Lisp can do so much more with so much less, what would a million line Lisp program look like? Could we even imagine it? (Would it be sentient, or what?)

It turns that such a program exists, and is widely available.

As of right now (23 June 2008), a checkout of GNU Emacs has 1,112,341 lines of Lisp as well as 346,822 lines of C.

It is somewhat astonishing that Emacs, a program with more than 30,000 built-in functions— in a single namespace&mdash can keep from falling apart under its own weight! In Emacs, activating minor or major modes, or setting variables, can have effects on how particular buffers look, what is done when Emacs saves a file, and how different keys (or other events) are interpreted, among other things. Unlike operating systems, which are broken down into programs (which only talk to each other in limited ways), Emacs has many parts which all have the potential to interact with each other.

In some cases it is necessary to add special-case code to say what should happen for particular pairs of interacting features, but a robust system should be able to make the right thing happen most of the time even for features which are totally oblivious of each other. One way that Emacs tames this complexity is with the use of hooks.

Hooks (sometimes referred to elsewhere as listeners) are a way of telling Emacs to run a particular piece of code whenever a certain situation happens (e.g. when a file is opened). Many Emacs modules add hooks in order to do their thing. For example, VC checks every time I open a file whether it is in a version-controlled directory, so that it can report the file's version and status. Another example: Emacs activates follow-mode— if I have set a particular option— whenever I open a file. The variable find-file-hook is a list containing, among others, two functions responsible for performing the tasks described above; each function in that list is executed whenever Emacs opens a new file. I were to add a function of my own to one of these hooks, then Emacs would happily run it, too.

As an alternative, you might consider implementing such an extensible system using polymorphism and inheritance: e.g. to add behaviors, you would create a subclass of a particular class which implemented some method differently. The chief advantage of using hooks over that approach is that with hooks, changing behavior is very obviously additive: if you can call for either A, B, or C to happen in a particular situation, you can also easily call for any combination of those things, but A, B, and C can very well be implemented completely independently— a key requirement for systems of a million lines or more.

My Top Ten Essential Emacs Tips

I wrote a brief article detailing what I consider to be the top ten can't-live-without-them Emacs features.

Switching to Openbox

I switched window managers recently, to Openbox. I also switched panels from gnome-panel to pypanel.

My new setup has these chief advantages over the old setup:

  1. Speed. Going from GDM to a desktop that is ready is much faster than it was under Gnome. I blame gnome-session and the gnome-panel.
  2. Openbox is much more flexible.
    • I can rebind the window management keys, so I'm now using the Windows key (which was previously not seeing much use). I've bound W-Tab to serve the same purpose that Alt-Tab used to. I've bound W-j, W-k to switch virtual desktops, and W-1, W-2, and other keys to do window management tasks (maximizing, minimizing, etc.). No more reaching for the arrow keys or the function keys for complete common tasks. And it frees up Alt-Tab, which Emacs uses.
    • I can bind keys to other actions: for example, W-c starts up a terminal and W-b starts up a web browser.
    • I can bind mouse actions, too. I've set up W-drag to move windows (just as Alt-drag does usually), and W-right-drag, which resizes my windows, is a lot better than trying to drag window borders (hello, Fitt's Law!).
    • The configuration files can be easily version-controlled, so I synchronize my Openbox settings everywhere. This is not true of my Gnome/Metacity setup.
  3. Space. I turned off window decorations to save space. Who needs window decorations? I don't need them to see what application I'm using because I have a panel. I don't need them to move windows because W-drag is faster. And I don't need them for window context menus because W-SPC already gets me one.

To install it on Debian/Ubuntu: apt-get install openbox

The kicker is that Openbox was apparently designed with Gnome interoperability in mind. And it is easy to switch back and forth while you are testing. All you need to do is log out to get back to GDM and then select either the "Gnome" or "Openbox" sessions from the Session menu. No futzing with files in your home directory to select your session. Moreover, the default Openbox setup loads a bunch of stuff that your GTK+/Gnome apps need to work. So your GTK themes and Gnome settings will be just fine, D-Bus and other session services will be launched as normal, and all that. It was suprisingly painless to switch.

To configure Openbox, after installing it, copy the XML files in /etc/xdg/openbox to ~/.config/openbox. Check out the default configuration to see what you are in for. Remember that when you log in to Openbox, right-clicking on the Desktop will probably get you enough to get out of a jam. Enjoy!

Installing Ubuntu from hard disk + grub

Previously, I praised Debian for supporting an installation method ("hard disk booting") that only requires an existing filesystem and Grub, and can be kicked off just by downloading two files and asking Grub to boot from them. This is really convenient because you can install a full system without a CD (or USB key, or any media), without PXE (or configuring any other hosts on your LAN), and with a download that is quite small.

Well, it turns out that Ubuntu supports this installation method too (which is, now that I think about it, not surprising). It just doesn't seem to be advertised anywhere! (I suppose that instead of praising Debian I should praise the Debian documentation.) I just used it to install Ubuntu with encrypted LVM on my Thinkpad X61s, a machine which would otherwise be nontrivial to reinstall because it has no optical drive.

Here are the links to the downloads (the "netboot" installer): i386, amd64. Download initrd.gz (disk image) and linux (the kernel). Make sure you put them somewhere Grub knows how to get to (i.e. not to a networked or encrypted volume; /boot is a good place). Then do the following to boot into your new installer (more complete instructions from Debian):

  1. Restart your computer and wait for Grub to load.
  2. Find some existing boot entry and press e to edit it.
  3. Edit the root line to make sure that it corresponds to the partition where you downloaded the files. It might already be correct.
  4. Edit the kernel line so it reads kernel /boot/wherever/you/put/linux (on recent grubs, you may have to use linux instead of kernel).
  5. Edit the initrd line so it reads initrd /boot/wherever/you/put/initrd.gz
  6. Press b to boot.

Enjoy your new installer!

Update, 19 Jun 2008: I have documented this procedure in the Ubuntu Wiki here.

Using wget or curl to download web sites for archival

wget is useful for downloading entire web sites recursively. For archival purposes, what you want is usually something like this:

wget -rkp -l3 -np -nH --cut-dirs=1 http://web.psung.name/emacstips/

This will start at the specified URL and recursively download pages up to 3 links away from the original page, but only pages which are in the directory of the URL you specified (emacstips/) or one of its subdirectories.

wget will also rewrite the links in the pages it downloaded to make your downloaded copy a useful local copy, and it will download all page prerequisites (e.g. images, stylesheets, and the like).

The last two options -nH --cut-dirs=1 control where wget places the output. If you omitted those two options, wget would, for example, download http://web.psung.name/emacstips/index.html and place it under a subdirectory web.psung.name/emacstips of the current directory. With only -nH ("no host directory") wget would write that same file to a subdirectory emacstips. And with both options wget would write that same file to the current directory. In general, if you want to reduce the number of extraneous directories created, change cut-dirs to be the number of leading directories in your URL.

Bonus: downloading files with curl

Another tool, curl, provides some of the same features as wget but also some complementary features. One thing that curl can do is to download sequentially numbered files, specified using brackets [..]. For example, the following string:

http://www.cl.cam.ac.uk/~rja14/Papers/SE-[01-24].pdf

refers to the 24 chapters of Ross Anderson's Security Engineering: http://www.cl.cam.ac.uk/~rja14/Papers/SE-01.pdf, http://www.cl.cam.ac.uk/~rja14/Papers/SE-02.pdf, etc., http://www.cl.cam.ac.uk/~rja14/Papers/SE-24.pdf.

You can give curl a pattern for naming the output files. For example if I wanted the files to be named SE-chapter-01.pdf, etc, then the appropriate curl incantation would be:

curl http://www.cl.cam.ac.uk/~rja14/Papers/SE-[01-24].pdf -o "SE-chapter-#1.pdf"

In addition to specifying consecutively numbered files, you can also use braces {..} to specify alternatives, as you would in a shell, e.g. http://web.psung.name/page/{one,two,three}.html. Specifying output patterns with "#1" works with braces too.

Managing dotfiles with git, continued

Previously, I commented on my setup for keeping my dotfiles (.emacs, .bashrc, etc.) synchronized using git. Here is one refinement I've made to this process in the meantime.

Allowing local customizations

Occasionally there are changes I'd like to keep local to one machine. These may be on a permanent basis (for example, if there are certain things I'd like to happen, or not happen, on my laptop but not my desktop) or on a temporary basis (if I want to test out some change locally for some time before pushing it to my canonical repo). In version control a setup like this is best represented using branches. Here's how I've done this:

The master branch contains customizations that are supposed to be common to all machines and are appropriate to apply everywhere. Most changes are of this form. But on each machine I maintain a local branch named, for example, laptop-custom. This branch contains all the changes in master, plus usually no more than a couple of changes specific to that machine.

To initially set this up, after making a clone, I create a new branch and switch to it. Most of the time I stay on this branch.

git checkout -b laptop-custom

When I make changes, they initially go in to laptop-custom as local changes:

emacs # make some changes...
git add ...
git commit

If I decide a change is appropriate to apply everywhere, I put it on the master branch by using git-cherry-pick. I then rebase the local branch so the local patches always are at the "end". When you cherry-pick a patch to master and then rebase the other branch, git recognizes that the patch has already been applied on master and does not attempt to apply it again. So as you move changes over, the number of patches which are exclusive to the local branch decreases.

git checkout master
git cherry-pick ccddeef
git checkout laptop-custom
git rebase master

Pushing and pulling the master branch (containing the common customizations) is done in the same way as before, except that I always rebase the local branch afterwards.

git checkout master
git pull
git push
git checkout laptop-custom
git rebase master

For the benefit of posterity

index-pack died of signal 25

may occur when you try to pull from a repo created with git 1.5.x using git 1.4.x. Just wanted to put that out there for Google.

FreeRunner entering mass production

Word on the OpenMoko community list is that FreeRunner has been cleared to enter mass production.

FreeRunner will be the world's first freed phone, and it is arriving not a minute too soon. Mobile phones are now everywhere, and they are becoming the premier mode of communication and computation for many, especially in the developing world. Mobile phones can deliver on the promise of ubiquitous computing— but only if they have been freed.

For the mobile phone, or any technology, to realize its true potential, the ones with the incentive to see it improve— the users— must have the power to improve it. That is as sure a law as there ever was one, and should be pretty apparent to anyone who has taken an economics class. Unfortunately, essentially all phones sold today are deficient in that respect.

The power to improve the system may, of course, be realized exercised directly (if I do some work myself) or indirectly (if I pay someone else to do it). But when this power is totally sequestered away, that necessarily puts a damper on innovation. This is the case with any proprietary software product: the vendor is the only one with the power and the right to make changes to the software. Sure, you could attempt to pay the vendor to make the changes. But they, being the only ones who can do it anyway, will charge monopoly prices. And they can refuse to do it at all if doing so would, for example, cut into sales of another of their products. So as long as they are the sole entry-point, you are beholden to them.

Even if one can assume that the vendor is generally benevolent, they still have a finite amount of resources. They cannot entertain implementation requests from every guy in his office, school, or lab. And that is unfortunate because one of those people has the next big thing on his hands. Creativity is everywhere.

The two great revolutions in computing— the rise of the PC, and the emergence of web applications— demonstrate that freedom leads to the kind of innovation that transforms people's lives. It is no accident that the explosion in personal computers happened on the platform that had commodity hardware, not the one with a single hardware vendor. And I can say with some confidence that the web would not be what it was today had AOL (yes, remember AOL?) been its sole gatekeeper for both access and content.

The mobile phone ecosystem is still in its infancy. Today, mobile phone software and hardware do not support (and sometimes actively inhibit) using a device to its fullest. But when (and only when) mobile phones are unshackled, we will see creative innovations that we can probably not even imagine today. When mobile phones are truly ubiquitous they will be not just devices for communication but also for computation, sensing, and entertainment, and they will be deeply integrated into the activities of our lives.

One of the goals for FreeRunner is to have a phone which runs on free software, but what is neat about OpenMoko is that they realize that they are not just a software project. They are doing whatever it takes to help the mobile phone reach ubiquity. OpenMoko released the CAD files for the case of the FreeRunner— people are talking about machining cases in different colors, alternate styles, even a bicycle mount for the FreeRunner. I cannot wait to see what is next.

Comparing directory trees with diff or rsync

When you are trying to figure out if (and how) the contents of two directories differ, there are a couple of standard/common ways to use diff.

diff -Naur DIR1 DIR2 shows all differences between the two directories as a unified diff:

$ diff -Naur website website-new
diff -Naur website/index.shtml website-new/index.shtml
--- website/index.shtml        2008-05-22 20:16:12.000000000 -0400
+++ website-new/index.shtml    2008-06-04 12:10:50.000000000 -0400
@@ -14,6 +14,7 @@
 <!-- page body -->
 <div id="body">

+<p>Welcome!</p>

 <p>
   <b>About:</b> This subject is aimed at students with little or no
diff -Naur website/style.css website-new/style.css
--- website/style.css  2008-04-11 01:25:12.000000000 -0400
+++ website-new/style.css      2008-06-04 12:11:01.000000000 -0400
@@ -24,7 +24,7 @@
     color: white; text-decoration: none; font-weight: bold; padding: 0 0.25em;
 }

-div#body { padding: 0.1em 55px 2em 55px; font-size: small }
+div#body { padding: 0.1em 55px 2em 55px; font-size: medium }

 dd { margin-bottom: 1em }

On the other hand, if you just want to get a quick sense of the differences, diff -qr DIR1 DIR2 merely names the differing files:

$ diff -qr website website-new
Files website/index.shtml and website-new/index.shtml differ
Files website/style.css and website-new/style.css differ

rsync can do something similar, and it works even when the files are not on the same host. rsync -rvnc --delete DIR1/ remotehost:path/to/DIR2/ (the trailing slash on DIR1/ is important here!) will tell you what files rsync would have updated or deleted on the remote host. (The -n option makes rsync do a "dry-run", meaning it makes no changes on the remote host.)

$ rsync -rvnc --delete website/ laptop:projects/website/
deleting schedule.shtml
style.css

The -c option is used because we're paranoid: it forces rsync to compute and compare checksums for each file to verify that they are the same. Otherwise, rsync assumes that files are the same if they have the same timestamp and size (sometimes this gives false positives if timestamps were not preserved when you made the copy). If that is acceptable, you can omit -c to get a speedup.