Text search over vast amounts of text

I have vast amounts of text. I want to search for substrings. (Searching for regular expressions would be a bonus.) grep is too slow if I have many queries, which I do.

There must be some UNIX tool, or pipeline, that will create indices from the text and allow me to search them and correlate the matches to the files containing the matches. Does anyone know of a simple way?

When I find the answer I will update this post.

Getting mGSD to work on Chrome under Ubuntu

mGSD is a getting things done organizer for your web browser. It’s based on TiddlyWiki. It’s pretty neat.

Anyway, out of the box, you can’t use mGSD on Google Chrome because it needs a java plugin and the ability to store cookies. The former may work out of the box for you, but you’ll need a special flag for Chrome to grant a local html file the ability to store cookies.

I’m using Ubuntu 11.04 (64 bit) with the Unity interface on my new netbook. Also, I’m not using chromium, available through Synaptic, I’m actually using a Chrome I downloaded for Google. Anyway, here’s how you do it.

Edit the file ~/.local/share/applications/google-chrome.desktop. Scroll to the bottom where you see the line

Exec=/opt/google/chrome/google-chrome %U

Change it to:

Exec=/opt/google/chrome/google-chrome --enable-file-cookies %U

Next, kill openjdk:

sudo apt-get remove default-jdk openjdk-6-jre openjdk-6-jdk
sudo apt-get autoremove

And install sun’s JDK:

sudo apt-get install sun-java6-jdk sun-java6-jre sun-java6-fonts

Finally, tell Chrome about the new plugin. If you’re on a 32 bit machine, use i386 instead of amd64 in the next command.

sudo mkdir /opt/google/chrome/plugins
sudo ln -s /usr/lib/jvm/java-6-sun/jre/lib/amd64/libnpjp2.so /opt/google/chrome/plugins/

Then restart Chrome and you should see the java plugin if you browse to chrome://plugins. Now try running and saving your very own mGSD.

Converting RealMedia Audio to MP3

I used mplayer and lame. MPlayer decodes the input rm audio stream into a WAVE file; lame encodes that to an mp3.

Just save the following script to a file and run it on your favorite rm file.

#!/bin/bash

FILE="$1"

OUTDIR="mp3"
OUTPUT=$OUTDIR/`basename "$FILE" .rm`.mp3

# We use a fifo file so that encoding the mp3 with lame can start immediately
# after decoding with mplayer starts.
FIFO=rm2mp3.fifo

if ! test -f "$FILE"; then
    echo "error: '$FILE' does not exist"
    exit 1
fi
if ! test -p "$FIFO"; then
    mkfifo "$FIFO"
fi
if ! test -d "$OUTDIR"; then
    mkdir mp3
fi

echo "Input: '$FILE'"
echo "Output: '$OUTPUT'"
sleep 2 # Give time for user to kill if the input/output is wrong

# Show commands as they are executed.
set -x

# Send rm audio to fifo
mplayer -ao pcm:fast -ao pcm:file=$FIFO -vc null -vo null "$FILE" >/dev/null 2>&1 &

# Create MP3 from WAV
lame -h -V 6 $FIFO "$OUTPUT"

rm -f "$FIFO"

Please send along any improvements (such as better flags for mplayer/lame).

New Blogger

I have recently been preparing some posts that require typesetting some math. Blogger does not support this. I have seen other Blogger blogs try to use HTML to typeset math… it’s not pretty.

I’m trying out WordPress, which supports embedded LaTeX. Here’s my blog.

I may be switching blogs soon. If so, it will be obvious.

1, 2, 3…

I would have found the following useful when I was first learning to count. Perhaps you would have found it useful, too.

I haven’t found any such matrix in any of the books I’ve studied on counting and probability. The information is all there, sure. But the matrix is helpful for the subtle distinctions.

Emanuela Arcadia Bueno

Please welcome my first daughter into the world. She was born on 30 December at 0319, weighing 8lbs 10oz, and being 20 inches long. There are other pictures.

Mother is very glad that she was born (finally… it’s always finally) and father is glad that she was born between semesters (phew).