WebCert+ Phishing Attempt
UPDATE: Definitely phishing.
I recently got an email ostensibly from Bank of America. It said I needed to sign up for their new “WebCert+” service and if I didn’t, my account would be suspended and imposed a hefty $45 reactivation fee. I received one email (which went to Spam) containing an embedded web form in which I was to fill out all kinds of personal information. And three others adjuring me to fill it out lest terrible things happen to me and my bank account.
I think this is a phishing attempt. I want to know if there are others getting the same thing and if anyone can confirm that it is illegitimate.
My first clue that it’s a phishing attempt is that it’s sent from alerts@bankofamericaalerts.0nlinereport.com. These days it’s so easy (right?) to spoof a FROM address, you wonder why more people don’t.
Second was the embedded form. Who sends embedded email forms? I should have to log into my account online and _then_ fill out the form.
But all of that is circumstantial.
Anyone else see this or can confirm that it is attempted thievery?
iPod Touch Troubles
My 1st generation iPod Touch would not sync today. As soon as I plugged the USB cable into the computer, iTunes would register the new iPod’s presence, but immediately error out with:
The iPod Minerva cannot be synced. The required disk cannot be found.
After googling a bit, it seemed most people solved this by removing or plugging in a different device’s USB cable right after hitting the Sync button. Unfortunately, this didn’t work for me because iTunes chose automatically to sync as soon as I plugged my iPod in (no matter what options I set). Finally, I stumbled on one person’s advice: the cable is bad.
Or at least unable to sufficiently communicate with iTunes.
So I switched cables and everything’s working fine now.
Getting mGSD to work on Chrome under Ubuntu
mGSD is a getting things done organizer for your web browser. It’s based on TiddlyWiki. It’s pretty neat.
Anyway, out of the box, you can’t use mGSD on Google Chrome because it needs a java plugin and the ability to store cookies. The former may work out of the box for you, but you’ll need a special flag for Chrome to grant a local html file the ability to store cookies.
I’m using Ubuntu 11.04 (64 bit) with the Unity interface on my new netbook. Also, I’m not using chromium, available through Synaptic, I’m actually using a Chrome I downloaded for Google. Anyway, here’s how you do it.
Edit the file ~/.local/share/applications/google-chrome.desktop. Scroll to the bottom where you see the line
Exec=/opt/google/chrome/google-chrome %U
Change it to:
Exec=/opt/google/chrome/google-chrome --enable-file-cookies %U
Next, kill openjdk:
sudo apt-get remove default-jdk openjdk-6-jre openjdk-6-jdk
sudo apt-get autoremove
And install sun’s JDK:
sudo apt-get install sun-java6-jdk sun-java6-jre sun-java6-fonts
Finally, tell Chrome about the new plugin. If you’re on a 32 bit machine, use i386 instead of amd64 in the next command.
sudo mkdir /opt/google/chrome/plugins
sudo ln -s /usr/lib/jvm/java-6-sun/jre/lib/amd64/libnpjp2.so /opt/google/chrome/plugins/
Then restart Chrome and you should see the java plugin if you browse to chrome://plugins. Now try running and saving your very own mGSD.
New versions of funsat and bitset
I wrote (and sometimes maintain) two Haskell programs: funsat and bitset. Funsat is a native haskell CDCL (conflict-driven clause-learning) SAT solver. Bitset is a small library for representing sets of items using bits under the hood (as opposed to trees, which is common in functional programming).
I just released version funsat 0.6.2 and bitset 1.1.
Funsat 0.6.2 has some compatibility changes so it now works with ghc 6.12. Also, it is cabal-installable. (Before, it would die because of stupid dependencies, or something.)
Bitset 1.1 now gives you access to the Integer that is used in the underlying bit representation.
Functional Priority Queues
In a previous post on my SAT solver I promised to post about a dead end using functional priority queues. I believe the most efficient data structure for the dynamic variable ordering should be a priority queue. In fact, it looks like a job for a Fibonacci heap (animation). According to CLRS, “Fibonacci heaps are especially desirable when the number of Extract-Min and Delete operations is small relative to the number of other operations performed.” At least if just a few decisions will lead to a conflict, then we might end up adjusting the priority of many variables (to handle the conflict) so that there’s more adjusting than making decisions.
The asymptotic complexity of the Fibonacci heap works out because of its mutable tree structure. Specifically, adjusting the key of a node in the heap requires the caller to provide a pointer to that node in the heap, and that node needs to be able to navigate anywhere else in the heap. This can be done in Haskell (either with the IO monad or the ST monad), but the result isn’t elegant by any means. It also feels wrong — I program in a functional language because it lets me do equational reasoning, and think about data dependencies rather than state machine models of computation.
Functional Fibonacci Heap?
I tried for a while to see if I could come up with a Zipper view of the heap instead of the traditional linked-lists-of-heap-ordered-trees approach. That is, maybe a zipper could approximate a pointer. What I’d really need is a zipper with arbitrarily-many points of access into the heap (instead of one), and I’ve no idea how to write that. But any update of the heap requires updating all these zippers, which seems to destroy the complexity of the Decrease-Key operation.
Another option is to implement Decrease-Key by searching for the node to update, instead of providing a pointer. But I couldn’t figure out a way to integrate searching within the structure of a Fibonacci heap.
Eventually I considered this a dead end and posted to comp.lang.functional, where Ben Franksen pointed me to finger trees.
Finger Trees
Finger trees are a functional data structure for persistent sequences. Ralf Hinze described a Haskell implementation of 2-3 finger trees which is available from Hackage, so I used this. In the paper Hinze even describes how to implement max-priority queues on top of finger trees, which seemed like a good idea at first. Unfortunately, I don’t think it admits an efficient Decrease-Key, so I used the paper’s description of ordered sequences instead. This seems like the right thing to do, given that the paper says:
Ordered sequences subsume priority queues, as we have immediate access to the smallest and the greatest element, and search trees, as we can partition ordered sequences in logarithmic time.
The ability to do search trees gives us an efficient (sub-linear) Decrease-Key. Finally, my priority queue operations are:
extractMax :: (Ord k) => Heap k a -> (Info k a, Heap k a)
extractMax (OrdSeq s) = (x, OrdSeq s')
where x :< s' = viewl s
increaseKey :: (Ord k, Eq a) => Info k a -> k -> Heap k a -> Heap k a
increaseKey oldInfo newKey (OrdSeq t) = OrdSeq (l' >< eqs' >< r')
where (l, r) = split (<= measure oldInfo) t
(eqs, r') = split (< measure oldInfo) r
eqs' = foldr (\i t' -> if i == oldInfo then t' else i <| t')
FT.empty eqs
-- newInfo is bigger, so must fit on bigger side of split
(OrdSeq l') = insert newInfo (OrdSeq l)
newInfo = oldInfo{ key = newKey }
These implementations essentially do what I described in the previous section: implement a pointer approximation and integrate efficient searching. (Read the paper for more details.)
So What?
Well, I used finger trees in funsat. They were faster than whatever I was doing before.
Wondermark wonderland
Wondermark is a webcomic that I enjoy reading. It is one of the few that I read consistently. You should read it.
Here are a few choice ones that I enjoy:
- http://wondermark.com/566/ — supernatural collective nouns
- http://wondermark.com/554/ — mad libs
- http://wondermark.com/442/ — the best comeback ever
- http://wondermark.com/538/ — a popular show
What’s your favorite Wondermark?
Converting RealMedia Audio to MP3
I used mplayer and lame. MPlayer decodes the input rm audio stream into a WAVE file; lame encodes that to an mp3.
Just save the following script to a file and run it on your favorite rm file.
#!/bin/bash
FILE="$1"
OUTDIR="mp3"
OUTPUT=$OUTDIR/`basename "$FILE" .rm`.mp3
# We use a fifo file so that encoding the mp3 with lame can start immediately
# after decoding with mplayer starts.
FIFO=rm2mp3.fifo
if ! test -f "$FILE"; then
echo "error: '$FILE' does not exist"
exit 1
fi
if ! test -p "$FIFO"; then
mkfifo "$FIFO"
fi
if ! test -d "$OUTDIR"; then
mkdir mp3
fi
echo "Input: '$FILE'"
echo "Output: '$OUTPUT'"
sleep 2 # Give time for user to kill if the input/output is wrong
# Show commands as they are executed.
set -x
# Send rm audio to fifo
mplayer -ao pcm:fast -ao pcm:file=$FIFO -vc null -vo null "$FILE" >/dev/null 2>&1 &
# Create MP3 from WAV
lame -h -V 6 $FIFO "$OUTPUT"
rm -f "$FIFO"
Please send along any improvements (such as better flags for mplayer/lame).
Dealing with Data Corruption with GIT, or, I’m Famous!
I store all the data I care about in git. Not so recently I had updated a few files in one of my repositories when my computer kernel panic‘d. Right before the panic I was getting ready to push my changes into a backup repository on another machine. Upon reboot I went to do this when git noticed what turned out to be data corruption.
One minor aside: if I had been using any of some other VCSs, one that did not hash whatever I put into it, I probably wouldn’t have to deal with this problem. But then I would only notice the corruption much later, when I’ve forgotten all the changes, and I run into some hard-to-diagnose problem. Dealing with this up front is definitely better.
This prompted a thread on the git mailing list (the post contains a detailed description of the symptoms). The problem was that two git objects corresponding to two different, recent commits in my repository had been corrupted. Now, there are several ways one can proceed when dealing with corruption, in ascending order of simplicity:
- Copy the corrupted objects from a backup repository.
- Replace the offending commit with a new commit altogether, re-creating approximately the same changes.
- Re-create exactly the changes that turn the commit-before-offending-commit into the offending commit.
I couldn’t take option 1, because I didn’t have a backup; and I couldn’t take option 3 because I didn’t remember the exact changes between the previous and offending commits. So I had to go with 2. This option, incidentally, is an option that is not covered explicitly in the relevant section of the manual. Also, as I usually commit early and often, it was pretty easy for me to reproduce these changes.
Replacing a Corrupted Commit
As noted in the thread, my git repository was in the following state:
$ git fsck --full
error: 320bd6e82267b71dd2ca7043ea3f61dbbca16109: object corrupt or missing
missing blob 320bd6e82267b71dd2ca7043ea3f61dbbca16109
Jakub Narebski was kind enough to explain with diagrams how this is done. Assume that A is the SHA1 ID of the commit preceding the corrupt commit, and B is the ID of the commit immediately following. First, create a new branch, here called corruption, whose head is the commit before the corrupt commit.
$ git checkout -b corruption A $ ... edit edit edit ... $ git commit -a -m <something-like-corrupted-commit-msg>
The next step is teaching the git repository to ignore the corrupted commit. To accomplish this we use the undocumented grafts file. Conceptually, this file is extremely simple. It consists of any number of entries, one per line. Each entry is of the form:
<sha1-id> <sha1-id>*
that is, a SHA1 ID followed by zero or more IDs. The effect of this is that in your local repository, git will treat the first named object as having the parents given. In this way we can trick git, by adding the entry:
$ echo B '<new-commit>' >> .git/info/grafts
At this point, switch back to master.
$ git checkout master $ git fsck --full
There should be no errors. (There might be warnings.)
Unfortunately, this is not the whole story. The grafts file is purely a local measure. Every clone of this repository will still have the corruption. So we have to teach git to write the grafts information directly into the history. Enter git filter-branch.
Replacing a Corrupted Commit For All Time
git filter-branch rewrites history while allowing filters to alter the history. We’ll use it to carve the grafts file into an actual git history.
(on master) $ git filter-branch HEAD Rewrite (3/3) Ref 'refs/heads/master' was rewritten
Now the clones should use the new history. Voilà!
You shouldn’t lie about being famous
Behold, recorded for all time in the git source tree:
commit e9039dd35194b7c1cf4ecd479928638166b8458f
Author: Linus Torvalds
Date: Tue Jun 10 18:47:18 2008 -0700
Consolidate SHA1 object file close
This consolidates the common operations for closing the new
temporary file that we have written, before we move it into
place with the final name.
There's some common code there (make it read-only and check
for errors on close), but more importantly, this also gives a
single place to add an fsync_or_die() call if we want to add
a safe mode.
This was triggered due to Denis Bueno apparently twice being
able to corrupt his git repository on OS X due to an unlucky
combination of kernel crashes and a not-very-robust
filesystem.
Signed-off-by: Linus Torvalds
Signed-off-by: Junio C Hamano
Also, I think I’m the reason for a recent patch adding a new git configuration option, core.fsyncobjectfiles, described as:
This is a total waste of time and effort on a filesystem that orders data writes properly, but can be useful for filesystems that do not use journalling (traditional UNIX filesystems) or that only journal metadata and not file contents (OS X’s HFS+, or Linux ext3 with “data=writeback”).
Linear-time First UIP calculation
In a previous post I mentioned I was using a super-linear algorithm for calculating the first unique implication point (UIP) learned clause in funsat. The algorithm basically uses the definition of first UIP and requires calculating graph dominators of an explicitly-constructed conflict graph. By contrast, the linear-time algorithm described in the Minisat paper never explicitly constructs the graph, it merely inspects the trail in reverse, figuring out which literals should be inserted in the conflict clause using the reasons for each assignment. The former algorithm has the advantage that it implements what it means to calculate the first UIP clause; in other words, it’s easily seen as a correct implementation. The latter isn’t, but when it works, it’s faster and leaner.
The Minisat paper only gives lightly documented pseudocode for the algorithm. There are no data structure invariants nor proof of correctness. Here’s my attempt at explaining how and why it works.
The implication graph is described well in this handbook chapter. Basically, it is a directed graph in which the nodes are literals from the assignment, and an edge x → y indicates the assignment x helped propagate the assignment y.
A UIP of an implication graph is a node at the current decision level d such that every path from the decision variable at level d to the conflict variable or its negation must go through it. Intuitively, it is a single reason at level d that causes the conflict. (This paragraph is from the same chapter.)
There may be many UIPs for the current decision level. The last decision variable is always a UIP. The first UIP is one with the shortest path to the conflict node.
From the UIP definition it is clear why graph dominators are involved: every UIP is a dominator of the conflict variables with respect to the last decision variable. My first implementation explicitly calculated those dominators, and chose the one closest to the conflict nodes.
Once the desired UIP is found, we have to calculate the corresponding learned clause. It turns out that good learned clauses correspond to cuts of the implication graph during a conflict (this is often called the conflict graph). The learned clause corresponding to a cut (S,T) is the set of nodes that are cut edge sources. Formally, this is the set . To tie the knot, one only need know that the UIP u determines the cut (S,T) where T =
. This information is sufficient to calculate the learned clause corresponding to a UIP.
The trail is the current assignment arranged in reverse chronological unit-propagation order (last assigned first out). The reason for a literal q is the set .
Algorithm
The algorithm outputs a learned clause (sequence of literals). There are a few important variables and conventions for the following pseudocode:
- p — Invariant: literal from the current decision level, initially the propagated literal that caused the conflict. The top of the trail is not p.
- c — Invariant: number of unprocessed but seen variables from current decision level, initially 0.
- We can mark a variable as seen. All variables are initially unseen.
- Every literal included in the learned clause has sign opposite what it does under the current assignment. (In the case of the conflicting literal, we include its negation.)
Onto the pseudocode:
Process literals starting with p until we process all the literals we see at the current decision level.
do
Process literal p:
foreach literal q in the reason for p
if var(q)1 is unseen
mark var(q) seen
if q is from the current decision level
increment c
else if q is from a lower decision level
add q to the learned clauseSelect the next interesting literal to follow:
do
assign p to head of trail
undo head of trail
while p is unseen
decrement cwhile c > 0
By now, p is the first UIP node of the current decision level. Add the negation of p to the learned clause and output it.
1 var(x) is the variable corresponding to the literal x.
Correctness
The algorithm performs a backwards breadth-first search for the first UIP node. The trail is the BFS queue. The counter allows us to deduce when p is the closest dominator of the conflict variable. Recall that a node’s being seen means having been discovered as a reason during another node’s processing. At the bottom of the loop, the counter contains the number of unprocessed but seen nodes we know about which end a reverse path from the conflict variable backwards consisting only of current-level nodes (say it three times fast). When c reaches zero, it means there are no seen reverse paths back from the conflict node to the decision variable. Since p is from the current decision level, however, it must have a path from the decision variable. Therefore p must dominate all paths from the decision variable to the conflict variable. p is a UIP node. Moreover, since p is the first such node, it must be the first UIP node.
The first UIP learned clause is determined by the literals that cross the cut (S,T) determined by p, as indicated above. Every proper descendant of p is on the T side of the cut. Therefore, any lower-decision-level node must be on the S side of the cut. (If such a node x were on the T side, there would be a path from the decision node for d to x, and x would have level d, contradicting the assumption that x has a lower decision level.) The first such nodes encountered during traversal, as well as p, cross the cut. The algorithm includes exactly these variables in the learned clause.
Update 2011-12-24: Corrected algorithm to decrement c properly, as Peter reported in the comments.
ICFP Contest 2008 — The One Liners
This year I participated in the International Conference on Functional Programming (ICFP) contest with a friend, Sooraj Bhat. Our team was The One Liners. As it is the fashion, and we found the exercise rewarding, here is our official ICFP 2008 post mortem.
We used git to manage our source (tarball). Here are some basic statistics:
- We wrote 447 lines of Haskell (generated using David A. Wheeler’s ‘SLOCCount’).
- Individual commits: 206 (me), 41 (Lazy*)
- Changed lines: 120620 (me), 2953 (Sleepy)
Introduction
The task was to write software to control a Mars rover and make it to home base, while avoiding three types of obstacles: boulders, craters, and martians. The rover would ricochet after hitting a boulder, fall into craters, and get captured by martians. Thus, you really want to try avoiding all obstacles if you can, although boulders are the least penalising. Our basic strategy was to accelerate toward home base unless there was an obstacle along that path. If there was an obstacle, we’d pick a point to the left or the right of it, and go there, preferring directions we were already facing.
We chose Haskell because we’re both functional programming nuts. It worked out well. A priori, I thought speed would be an issue, but, we never used more than 2% of our processor (both of us had dual core machines), whereas the server was close to 100% constantly. Our model of the world is basically a list of all known, static objects (boulders and craters). The collision avoidance code looks through this list, looking for the closest object to avoid. I know that some other teams had cleverer obstacle representations (like quadtrees), to get the nearest obstacle more quickly. But it appears not to have been necessary, at least on the maps provided by the contest organisers.
A Sad Tale
When the contest started, we got right to work. Sooraj went off to get lunch for three hours, and I prepared a script that would make a tarball of the repo from HEAD, and test each of the contest requirements (the README, bin/install, etc. are present and non-empty). I then wrote bin/install, which attempted to build our source using only the (paltry number of) packages available on the LiveCD.
Since one of the requirements was that bin/install should not attempt to write outside of the icfp08 directory, I planned to use the --package-db and --prefix Cabal flags to make it compile and install all the libraries to a place under icfp08. This was a great idea, except that every release of Cabal has bugs with this flag. (I did not realise this until several hours into trying this out.) Duncan Coutts (dcoutts on #haskell) was kind enough to apply a patch from the bleeding edge Cabal (1.5 branch) to the 1.4 branch for me, fixing the issue (I found out later it was only part of the issue, unfortunately). After a while, I stopped, confident I could probably figure it out later, and had better actually think about the problem.
Around 1330 EDT on Monday 14 July, we started preparing our submission. I again went to work trying to convince Cabal to install to and read from the right places. Every package built fine except network, which has some C glue code. For some reason I never figured out, Cabal wasn’t passing the -package-conf flag to GHC in this case.
In any case, 40 minutes before contest end, when I was feverishly trying to figure out why this wasn’t working, a message was posted saying that network package would be available. At that point I could have deleted 4 lines from bin/install and submitted, but I never saw the message. (Sooraj didn’t see it either. He was making cookies or something. Oh yeah, he wasn’t subscribed to the mailing list.) So, with great anguish, we didn’t actually submit anything.
In any case, we had a lot of fun, and here are our thoughts.
Thoughts
What Worked
- Since Sooraj is in Atlanta and I’m in New York, we used webcams and Skype for communication, and we were rarely not in communication. We recommend microphones you don’t have to wear.
- For tricky code, working together on the same code helped noticeably, instead of on independent subtasks. It’s tempting to concentrate on the parallelism afforded by a team, but ours benefitted especially from synchronous activity.
For example,
beelinewas the name of our tactic which, given a desired end position, generated rover commands to control the steering and acceleration to send us roughly in that direction. I was working on this alone while Sooraj was working on other things, and both of us came up with mediocre half-solutions. When we started working together things started clicking, and we produced shorter, simpler, and correct code.However, for straightforward stuff, parallelism worked well. For example, we needed code that would establish a TCP/IP connection and manage the incoming stream of messages, as well as send outgoing control messages. Also, we needed a parser that would take a message and turn it into a convenient Haskell data type. I worked on the former while Sooraj worked on the latter, and after an hour or two when we had finished, both had at most one bug.
- git-gui: Sooraj, aka, the four-hour-dinner man, had never used git. git-gui, a graphical front-end for interacting with a git repository, made using git relatively painless. Personally, I’ve always found GUIs for version control to be inhibiting, but git-gui gives me exactly what I want, 95% of the time. I used it, too.
Advice for Next Time
- More testing tools: we later found out that other teams had come up with some neat tools for debugging and testing their rovers. One team even drew a map the same size as the rover’s, and on it had the rover’s heading vector and subgoal position marked. We should have made a similar effort in making the modify-test-feedback loop tighter. If we had, we would have found bugs earlier.
- This particular task we think would benefit from a higher-level control interface. We should have come up with several types of explicit goals which reflect a high level structure, and which some series of translations turns into tactics over time.
One way in which we did this right is by explicitly choosing a goal location (either the home base or to the side of the nearest obstacle) for the rover, and generating commands to meet that goal. This is done by what we called the
sidestepplanner. However, we do not do this for the acceleration of the rover — we mostly just accelerate as much as possible. We should have come up with a way to describe acceleration goals, and planned using those. - Use a console logger that can separate output from different programmers. This way each developer can insert his own debugging messages without creating a bunch of noise for the others. It looks like hslogger would fit the bill nicely. Early on in development, we used the Haskell equivalent of printf(), because that was really all we needed. After our parsing and basic tactic infrastructure was in place, we really didn’t need a logger anymore, so we got rid of the output.
- Have a quantitative, automatic way to assess progress over time, e.g. performance/score graphs. We didn’t spend any time coming up with a way to store our results so that we could compare to them later. This will save you time in the end, because you can quickly reject approaches that aren’t working.
Finally, our README
For posterity.
=== ICFP 2008 Contest Submission README ===
Team: The One Liners
Language: Haskell
Compiler: GHC 6.8.2
== Third Party Libraries ==
bytestring-0.9.1.0 -- Fast strings
Cabal-1.4.0.2 -- Build/package manager
mtl-1.1.0.1 -- Monad transformers
network-2.1.0.0 -- Network facilities
parsec-3.0.0 -- Parser combinators
== Overview of the modules / our strategy ==
Main --
Controller -- This contains our main logic. The basic structure
consists of a a low-level routine who is responsible for
navigating to a specified location, and a high-level routine who
is responsible for choosing subgoal locations to go to.
Controller.Util -- Generically useful support routines for our
controller strategies.
Controller.World -- Any state that the controller wishes to
persist between runs.
Protocol -- Basic datastructures that hold information from the
messages.
Protocol.Wire -- Routines to read/write the messages from/to
the network.
Data.Vector -- Vectors in R^2
* — Warning: all denigrating comments toward Sooraj are more true sarcastic than they appear.
-
Archives
- August 2011 (1)
- July 2011 (1)
- May 2011 (1)
- March 2011 (2)
- November 2009 (1)
- July 2009 (1)
- March 2009 (2)
- July 2008 (1)
- May 2008 (1)
- April 2008 (1)
- March 2008 (1)
- February 2008 (1)
-
Categories
-
RSS
Entries RSS
Comments RSS