Friday, July 22, 2011

A Busy Summer

The summer is proving busier than expected. In June, I started taking remote classes at Stanford in pursuit of a Graduate Certificate in Data Mining and Applications through the Stanford Center for Professional Development (SCPD). So far, I am enjoying my first class, stats202. As I suspected, I really like working with data to find hidden patterns, trends, etc.

At the same time, I have a project at work with an aggressive mid-August deadline. So I've been putting in extra hours. The good news is that the project is going well and I'm doing lots of R programming to analyze performance data. More fun working with data. I like R and am gaining a lot of experience with it through work and school projects.

Any spare time not spent with family is being use to work on a few hobby projects. I'm building a Zen Toolworks desktop CNC machine so that my son and I can make cool things. I bought an Arduino and Ultimate Microcontoller Pack from the MakerShed that I'm having fun with. If that wasn't enough, I'm slowly putting together a plan to build a tricopter.

A busy summer indeed.

Friday, May 6, 2011

TomTom Foolery

Three years ago I bought a TomTom ONE XL for a family trip. It worked great but I haven't used it much since then. We took another family trip a couple of weeks ago and I again wanted to take the TomTom. I decided to update its maps and thus ensued an unexpected adventure.

I bought an updated (and overpriced) map via the TomTom Home desktop application. Part way through copying the map to the device, it complained that it was out of space. A little digging revealed that there wasn't room for both the old and new maps. I looked for a way to uninstall the original map via the desktop but couldn't find one. Luckily, OSX mounted the TomTom as a FAT formatted disk so I just deleted the old map files. No more out-of-space problem.

I confidently re-started copying the map to the TomTom but ran into another problem. The TomTom suddenly disconnected. Repeated attempts all ended with the same error message that the USB device had unexpectedly disconnected. I tried different USB cables and ports but nothing worked. With the trip looming, it was time for some hacking.

I opened an OSX Terminal and used dd to write files with increasing sizes to the TomTom. It consistently disconnected when writing files larger than 100MB. Writing smaller files with intervening pauses seemed to work OK. Now that I knew what I could do, I turned my attention to what I needed to do.

I figured out that TomTom Home put the new map files at the path,

~/Documents/TomTom/HOME/Download/complete/map/USA_and_Canada/

The directory contained the files,

$ls -lh 
total 1790632
-rw-r--r--  jcardent  staff   7.7K Apr 13 16:25 USA_and_Canada-1.gif
-rw-r--r--  jcardent  staff   1.6K Apr 13 17:22 USA_and_Canada.gif
-rw-r--r--  jcardent  staff   2.6K Apr 13 17:30 USA_and_Canada.toc
-rw-r--r--  jcardent  staff   874M Apr 13 17:30 USA_and_Canada.zip
-rw-r--r--  jcardent  staff   305B Apr 13 17:22 activation.zip

Clearly, the file USA_and_Canada.zip contained the majority of the data. So I unzipped it and found,

$ls -lh
total 1790504
-rw-r--r--  jcardent  staff   252K Jan 10 11:05 USA_and_Canada-308.meta
-rwxr-xr-x  jcardent  staff    60B Jan 10 11:05 USA_and_Canada.pna
drwxr-xr-x  jcardent  staff   3.1K Jan 10 11:04 brand
-rwxr-xr-x  jcardent  staff   446M Jan 10 11:04 cline.dat
-rwxr-xr-x  jcardent  staff   113M Jan 10 11:05 cname.dat
-rwxr-xr-x  jcardent  staff   123M Jan 10 11:05 cnode.dat
-rwxr-xr-x  jcardent  staff    23M Jan 10 11:05 cphoneme.dat
-rwxr-xr-x  jcardent  staff    56M Jan 10 11:05 faces.dat
-rwxr-xr-x  jcardent  staff    30B Jan 10 11:05 mapinfo.dat
-rwxr-xr-x  jcardent  staff    92M Jan 10 11:05 poi.dat
-rwxr-xr-x  jcardent  staff    15M Feb 18 10:53 tables.dat
-rwxr-xr-x  jcardent  staff   4.9M Jan 10 11:05 tmccodes.dat
-rwxr-xr-x  jcardent  staff   129B Jan 10 11:05 traffic.dat

Three files over 100MB, oh bother. But there was no need to fear for I was armed with dd. I proceeded to use a command like the following to copy the large files to the TomTom in 50MB chunks,

dd if=./<map file> of=/<TomTom path> bs=1024 count=52428800 \
  iseek=<offset> oseek=<offset>

After an hour or so, all the data was on the TomTom. I disconnected and rebooted it only to get a "map not authorized" error. After a some curses, I recalled the other downloaded file, activation.zip. I unzipped the file, copied the contents to a couple of places on the TomTom - I wasn't sure where it belonged - and rebooted. Woot! The updated map worked!

I'm happy to report that the TomTom worked flawlessly for our vacation.

Moral of the lesson, know and use your UNIX command line tools.

Friday, April 29, 2011

Diving into R

I've wanted to learn R for a long time. A new project at work is providing an ideal opportunity to finally use it. So far, it's been a great experience. R is an incredibly powerful tool for data analysis. It's allowed to me dive deep into the project's data and automate much of the analysis process.

Programming in R has been easier than expected. I've previously programmed in Matlab which has helped greatly. Some of the concepts are still foreign but I'm confident that they will become less so with time.

The greatest joy has been getting "lost" for hours writing R functions to analyze the data and produce reports. R's interactive interface has made it easy to build up code in an exploratory manner. This is my preferred programming methodology that, I find, allows me to stay in a flow state for long periods of time. The experience has been very similar to programming in Lisp dialects which I also deeply enjoy.

Although there is a lot of good information about R available for free on the web, I've found the following O'Reilly books the best resource for coming up to speed quickly,

A particularly powerful library is ggplot2 by Hadley Wickham. With it, I've been able to create very complex graphs and charts with minimal code. ggplot2 uses a grammar to create graphics in layers that, at first, can be challenging to learn. The website is informative but the book has been the best resource and well worth the money.

Another useful library is brew which I am using to auto-generate pleasant looking reports in PDF via LaTex.

I look forward to working more with R. Data science is a growing interest of mine and this opportunity to use R is adding to the momentum.

Monday, April 11, 2011

Book Review: Final Jeopardy

Final Jeopardy: Man vs Machine and the Quest to Know Everything by Stephen Baker

I found the Watson exhibition very exciting. I was therefore eager to read Baker's new book, Final Jeopardy, that accounts the inception of IBM's Jeopardy Grand Challenge and the software team that completed it by creating Watson. Although light on technical details, the book provides a good overview of the primary challenges. It also discusses the non-technical issues that the Watson and Jeopardy teams struggled with in staging the man-machine competition. Overall, a very good and enjoyable book. If you enjoyed Baker's Numerati, you'll probably enjoy this book too.

The next challenge is to create a computer that can write Jeopardy questions rather than just answering them.

Monday, April 4, 2011

Seymour Cray Videos

I've long admired Seymour Cray as the genius behind early super computers such as the CDC6600, Cray-1, and later Cray systems. However, I know little about Cray himself. So, I was happy to discover two YouTube videos of Cray speaking about his career and systems.

In this 1976 talk, Cray describes the design of the Cray-1. Among other topics, he describes the factors that gave rise to the Cray-1's iconic shape.

Thirteen years later, Cray discusses the design of the Cray-3 and Cray-4 systems in this talk and his decision to use Gallium Arsenide, then a leading edge material. I wasn't aware of the three dimensional modules used in the Cray-3. Cool stuff.

I enjoyed both talks. Cray was much more personable than I expected. He was very humble and claimed ignorance in a number of areas related to computing. It was refreshing to see someone of Cray's caliber display these characteristics.

It was amusing to see that the fundamental problems of building computing systems have remained the same for decades: speed, size, and power. The more things change, the more they stay the same.

Sunday, March 6, 2011

Quants: The Alchemists of Wall Street

Last week, I stumbled across a good documentary by VPRO on quantitative analysts. It features a couple of famous "quants", Paul Wilmott and Emanuel Derman, as well as Michael Osinki who wrote the software used by many banks to securitize mortgages.

The documentary discusses the challenges associated with financial modeling. For example,

  • Many models were based on limited historical data that was insufficient to represent macro-economic swings.
  • Many executives did not understand the technical aspects of financial modeling and were therefore unable to recognize the associated risks that led to the subprime crisis.

I strongly agree with Paul Wilmott on the following (paraphrased) point,

People that take risk should be compensated. But they should not be compensated for taking risk with other people's money.

Here here. Wilmott is extremely impressive. When the subprime crisis hit, I was surprised to find out that he had been warning against model related risks. Given his high regard in quant circles, I'm surprised his warnings were not better heeded.

Saturday, February 26, 2011

Commemorating Discovery's Last Launch

In commemoration of the Space Shuttle Discovery's last flight, I decided to post a link to the YouTube videos of MIT's Fall 2005 session of Aircraft Systems Engineering (16.885J). The course was co-taught by ex-shuttle astronaut Jeffrey Hoffman and ex-NASA official Aaron Cohen. It featured many guest speakers from the Shuttle program who went into a lot of technical detail about the system's design and operations.

All of the videos are good but my favorites are,

Whenever I need a hard-core technical fix, I watch one of these videos. Works every time. These were real engineers.

Other materials from this course are available on MIT's OCW website.

Thank you Discovery for twenty seven years of service. It's disappointing that the space program is returning to rockets. It's just so 20th century.