Monday, July 19, 2010

Thoughts about thinking

Challenging work assignments, family matters, and home maintenance projects have kept me very busy lately. Hence, the significant drop-off in blog posts over the past couple of months. I'm hoping to correct that soon.

Not surprisingly, I've been thinking a lot about leisure time - in particular its benefits for creative thinking. My reflections are guided by two lectures on the topic.

The first is a GoogleTechTalk by Dr. David M. Levy, No Time to Think. In the talk, Dr. Levy asserts that deep contemplation not only promotes creativity but also provides a sense of calmness and satisfaction. He further argues that deep thinking cannot be forced directly - instead we must make ourselves available to it by seeking out silence, and sanctuary. Dr Levy observes that this requirement for sanctuary conflicts with modern social pressures to multi-task, remain in constant communication, and solve issues through repetitive searches of existing information.

The second is a lecture, Solitude and Leadership, by William Deresiewicz at West Point in 2009. The talk is deeply insightful and worth reading in its entirety but, for this post, can be fairly summarized by the following three points:

  1. Original thinking is a core attribute of leadership.
  2. Formulating original thoughts requires long periods of concentration and distance from the thoughts of others.
  3. Quiet solitude is necessary to do both.

Like Dr. Levy, Mr. Deresiewicz laments social pressures to multi-task, remain in constant communication, and rely on existing information to solve problems. To quote,

Multitasking, in short, is not only not thinking, it impairs your ability to think. Thinking means concentrating on one thing long enough to develop an idea about it. Not learning other people’s ideas, or memorizing a body of information, however much those may sometimes be useful. Developing your own ideas. In short, thinking for yourself. You simply cannot do that in bursts of 20 seconds at a time, constantly interrupted by Facebook messages or Twitter tweets, or fiddling with your iPod, or watching something on YouTube.

and,

Here’s the other problem with Facebook and Twitter and even The New York Times. When you expose yourself to those things, especially in the constant way that people do now—older people as well as younger people—you are continuously bombarding yourself with a stream of other people’s thoughts. You are marinating yourself in the conventional wisdom. In other people’s reality: for others, not for yourself. You are creating a cacophony in which it is impossible to hear your own voice, whether it’s yourself you’re thinking about or anything else.

Thinking originally about difficult problems is an activity that I deeply enjoy and find satisfying. Like Dr. Levy and Mr. Deresiewicz, I reached similar conclusions about distracting activities years ago and have since guarded my time and attention vigorously. I've often been teased for my resistance to social media. At times I've felt self-conscious about this choice, even deficient or outdated. These talks give me renewed confidence in my choices.

The challenge, as indicated at the outset of this post, is finding the time to think. A hard task given the pace of modern society and the high-tech industry in particular.

Wednesday, June 23, 2010

Org-mode hack: tasks done last month

I'm a big fan of Emacs's org-mode. Over the past year, I've started using it for everything - tracking tasks, taking notes, and drafting all my reports, papers, and blog posts. Org-mode is the only task-tracking software that I've used for more then a week.

At work, I am required to produce a monthly status report. To automate part of the process, I figured out a way to have org-mode produce a list of the tasks completed during a specific month. Since I couldn't find a similar example through a Google search, I thought I would post my approach for the benefit of others (and as a reminder to myself!).

Below is an example org file containing completed tasks that I'll use to illustrate the approach. The tracking closed items feature has been configured to add a time-stamp when each task is transitioned to the DONE state. The header specifies a category, Foo, that org will associate with all of the tasks in the file.

#+Category: Foo

* DONE Feed the dog
   CLOSED: [2010-04-30 ]

* DONE Mow the lawn
   CLOSED: [2010-05-01 ]

* DONE Take out the trash
   CLOSED: [2010-05-20 ]

* DONE Pay the bills
   CLOSED: [2010-06-01 ]

First, configure org-mode's agenda feature and use the C-c [ command to add the example file to the agenda files list.

At this point, a list of the tasks completed in May can be produced by issuing the agenda tag matching command, C-c a m, and giving it the following match string:

CATEGORY="Foo"+TODO="DONE"+CLOSED>="[2010-05-01]"+CLOSED<="[2010-05-31]"

This should produce the following list (slightly reformatted to fit blog width):

Headlines with TAGS match: CATEGORY="Foo"+TODO="DONE"\
+CLOSED>="[2010-05-01]"+CLOSED<="[2010-05-31]"
Press `C-u r' to search again with new search string
  Foo:        DONE Mow the lawn
  Foo:        DONE Take out the trash

Although this works, entering the search string is a cumbersome task. A better solution would avoid this step.

Agenda provides a way to define custom commands that can perform searches using pre-defined match strings. The following elisp code defines a custom command that performs the above tag search automatically.

(setq org-agenda-custom-commands
  `(("F" "Closed Last Month" 
     tags (concat "CATEGORY=\"Foo\""
                  "+TODO=\"DONE\""
                  "+CLOSED>=\"[2010-05-01]\""
                  "+CLOSED<=\"[2010-05-30]\"")))

After eval-ing this command, typing C-c a F will produce the same list as above without having to enter the match string. This approach is indeed better but uses a hard-coded match string. An even better solution would generate the match string based on the current date.

Although the call to concat in the example above programatically generates the match string, it does so only when the setq is evaluated. If the setq is in an initialization file (e.g. ~/.emacs) the match string will get generated based on the date emacs was started and not the date on which the search is performed. This could produce erroneous searches when using an Emacs instance started before the turn of the month. In such cases, the setq could be manually re-evaluated to generate the correct match string but an automatic solution would be best.

Unfortunately, org doesn't currently support providing a lambda to generate the match string at search time. For instance, this example:

(setq org-agenda-custom-commands
  `(("F" "Closed Last Month" 
     tags
     (lambda ()   
       (concat "CATEGORY=\"Foo\""
               "+TODO=\"DONE\""
               "+CLOSED>=\"[2010-05-01]\""
               "+CLOSED<=\"[2010-05-30]\"")))))

produces the error message "Wrong type argument: stringp, …". Patching org-mode to support lambdas for match strings is an option but I prefer to maintain the stock org-mode code.

Thanks to the near infinite hackability of emacs, it's possible to extend the stock org mode functionality without modifying it directly. The below elisp code defines two new interactive functions that call into org-mode to perform a tag search for a specific month.

(require 'calendar)

(defun jtc-org-tasks-closed-in-month (&optional month year match-string)
  "Produces an org agenda tags view list of the tasks completed 
in the specified month and year. Month parameter expects a number 
from 1 to 12. Year parameter expects a four digit number. Defaults 
to the current month when arguments are not provided. Additional search
criteria can be provided via the optional match-string argument "
  (interactive)
  (let* ((today (calendar-current-date))
         (for-month (or month (calendar-extract-month today)))
         (for-year  (or year  (calendar-extract-year today))))
    (org-tags-view nil 
          (concat
           match-string
           (format "+CLOSED>=\"[%d-%02d-01]\"" 
                   for-year for-month)
           (format "+CLOSED<=\"[%d-%02d-%02d]\"" 
                   for-year for-month 
                   (calendar-last-day-of-month for-month for-year))))))

(defun jtc-foo-tasks-last-month ()
  "Produces an org agenda tags view list of all the tasks completed
last month with the Category Foo."
  (interactive)
  (let* ((today (calendar-current-date))
         (for-month (calendar-extract-month today))
         (for-year  (calendar-extract-year today)))
       (calendar-increment-month for-month for-year -1)
       (jtc-org-tasks-closed-in-month 
        for-month for-year "CATEGORY=\"Foo\"+TODO=\"DONE\"")))

The first function, jtc-org-tasks-closed-in-month, generates an appropriate query string and calls the internal org-mode agenda function org-tags-view. The function defaults to the current month but takes optional arguments for the desired month and year. The function also takes a match-string argument that can be used to provide additional match criteria.

The second function, jtc-foo-tasks-last-month, calculates the prior month and calls jtc-org-tasks-closed-in-month with an additional match string to limit the list to DONE tasks from the category Foo. Executing jtc-foo-tasks-last-month interactively automatically produces a list of the tasks closed in the prior month. For my purposes, this is close enough to the ideal solution. Using the optional match-string argument, I can re-use this solution to search for tasks completed in other categories or with specific tags.

My typical work flow is to archive the closed tasks after my status report is written. Org-mode's agenda makes this an easy task. First I mark all of the tasks for a bulk operation by typing m on each. Then I perform a bulk archive by typing the command B $. This will move the closed tasks to an archive file, typically a file of the same name with an added _archive suffix.

Org-mode is a great productivity tool. Combined with Emacs's hackability, it's possible to create tools optimized for your particular work flow.

Addendum

I found that searching on CLOSED date ranges didn't work in org-mode version 6.34a. The problem appears to be fixed in the 6.36c release so be sure to have the right version if you want to replicate this method.

Tuesday, May 25, 2010

Book Review: The Quants

The Quants by Scott Patterson

In The Quants, Patterson provides an intriguing account of Wall Street's most successful quantitative analysts (aka quants) and the role they played in the subprime crisis.

The first few chapters introduce the main players and provide a brief introduction to quantitative finance. Patterson begins by describing how Ed Thorp applied his mathematics background and experience pioneering Black Jack card counting techniques to invent various hedging techniques and start the first arbitrage hedge fund. From there, Patterson lightly introduces other important concepts like Brownian Motion, Random Walk Theory, Efficient Market Hypothesis, and statistical arbitrage. The introduction winds down with the October, 1987 market crash which is used as the context to introduce the "fat tail" contrary point of view personified by Benoit Mandelbrot, and Nassim Nicholas Taleb - a not so subtle foreshadow.

The "middle" of the book discusses the background, career, and substantial success of primarily five high-profile quants: Pete Muller, Ken Griffin, Cliff Asness, Boaz Weinstein, and Jim Simmons. Other Wall Street personalities are also mentioned but to a lesser extent. This part of the book more or less establishes that the above quants are very smart, and very rich.

The last part of the book provides a blow-by-blow account of the sub-prime crisis. All of the quants appeared to be caught off guard, perplexed by the market's "irrational" behavior, and unshure of how to adjust their models to prevent further losses. Throughout the ordeal, many of the quants are forced to question the very foundations of their mathematical models and prior success - was it all just luck?

Over all, I thought this book was OK but felt it tried to cover too much ground as a quantitative finance primer, homage to quants, historical account of the sub-prime crisis, and financial mystery-thriller. Since my interest lies more in the technical details, I was a disappointed with those portions of the book and uninterested in the dramatized historical account. Perhaps I simply had the wrong expectations of the book.

It's also possible that my expectations were set artificially high by Poundstone's excellent book Fortune's Formula which provides a detailed historical and technical account of the events that gave rise to the quantitative finance industry. If you're interested in this topic, then I highly recommend Fortune's Formula.

While reviewing the book to write this review, one passage caught my eye on page 250 regarding a study performed by MIT Professor Andrew Lo and his student Amir Khandani:

There was also the worry about what happened if high-frequency quant funds, which had become a central cog of the market, helping transfer risk at lightning speeds, were forced to shut down by extreme volatility. "Hedge funds can decide to withdraw liquidity at a moment's notice," they wrote, "and while this may be benign if it occurs rarely and randomly, a coordinated withdrawal of liquidity among an entire sector of hedge funds could have a disastrous consequences for the viability of the financial system if it occurs at the wrong time and in the wrong sector."

There is some evidence indicating that the withdrawal of high-frequency liquidity was a contributing factor to the May 6, 2010 flash crash. I doubt the story of the quants is over just yet.

Wednesday, April 14, 2010

Book Review: Daemon & FreedomTM

Daemon and FreedomTM by Daniel Suarez

I haven't enjoyed a science fiction, techno-thriller this much since reading Neal Stephenson's Cryptonomicon. I liked Daemon so much that I finished the sequel, FreedomTM, before I got the chance to write a review (or do anything else for that matter).

I tried a couple of times to summarize the basic plot without revealing too much but failed. So I think I'll just say that if you're into computers, AI, hacking, MMORPG's, augmented reality, sustainable technologies, and overthrowing corporate social control then you'll probably like these books. My only criticism is that they are a bit too graphic in places for my taste (mostly violence but some sex).

One of the most refreshing things about the book is that the author is an IT specialist so the technology stuff isn't too bogus. In fact, even the "far-fetched" technology in the book is actually just an exaggeration of the current state of the art.

Preview chapters are available online for both Daemon and FreedomTM if you would like to read a sample before buying. I actually listened to the audiobook version of both books via iTunes which worked out quite well - the reader's voices enhanced the overall experience, especially the Daemon's computer-generated, English-accented female voice.

One of my favorite quotes from the book was:

"Technology. It is the physical manifestation of the human will. It began with simple tools. Then came the wheel, and on it goes to this very day. Civilizations rise and fall based on technological innovation. Bronze falls to iron. Iron falls to steel. Steel falls to gunpowder. Gunpowder falls to circuitry."

I don't think there is any doubt that circuitry, more specifically digital information, is becoming the dominant source of power. Why destroy a nation when you can simply crash its infrastructure and delete its data? Daemon and FreedomTM certainly drive this point home.

The future suggested by Daemon and FreedomTM is both frightening and exciting. Although a work of fiction primarily intended to entertain, I think some valuable lessons and cautions can be drawn from the story. Good stuff.

Monday, March 29, 2010

StudyHack's Stretch Churn

Although I am no longer a student, I really enjoy reading Cal Newport's StudyHacks blog. In particular, I like its focus on achieving success through good time management, hard focus, deliberate practice, and obtaining outstanding skill.

In this post on James McLurkin, Cal discusses an interesting concept called Stretch Churn. Paraphrased from Cal's post:

  • Stretch Project: A project that requires a skill you don't have at the outset. Importantly, a stretch project is hard enough to stretch your ability but reasonable enough to be completed.
  • Stretch Churn: The number of stretch projects you complete per unit time.

The premise is that the higher your stretch churn rate the more likely you are to obtain the kind of skill required to be a leader in your chosen field. As the interview with James demonstrates, highly successful people are adept at maintaining a high stretch churn rate. I suspect this is one of the underlying attributes of Outliers.

I think the stretch churn concept is an important insight because it clarifies how to apply the deliberate practice concept in engineering and research environments. Instead of working on a single problem over a long period of time - a common approach in research - the stretch churn concept suggests that it is better to work on a series of related, hard-but-achievable projects. In a way, this strikes me as the agile development model applied to becoming a domain expert.

On a personal level, I found the stretch churn concept interesting for two reasons. First, it explains why I highly value my advanced development experience - the very nature of the work has allowed me to maintain a high stretch churn rate for years. Second, it helped me realize that if I want to become a real domain expert that I'll have to more tightly focus my stretch projects so that they build upon each other. It's a vector math problem - stretch projects in many different directions result in little change when added together.

I suspect the stretch churn concept will be a valuable addition to my self-development toolbox.

Sunday, March 21, 2010

Recovering Deleted JPEGs from a FAT File System - Part 9

Part 9 in a series of posts on recovering deleted JPEG files from a FAT file system.

A month ago (!), in part 8, we looked at the JPEG file format specification to determine if there was sufficient determinism in the on-disk layout to allow the recovery of deleted files through analyzing the residual data in the file system. The answer was mixed:

  1. GOOD: Uniquely valued markers, discoverable through data inspection, identify the beginning and type of the segments that constitute a JPEG file.
  2. GOOD: the metadata segments have a pre-defined size
  3. BAD: the length of the entropy encoded image data is, to the best of my knowledge, unspecified in the START-OF-SCAN segment header. Instead, an END-OF-IMAGE marker is used to identify the end of the entropy encoded data. The theory is that this is done to allow JPEG files to be written as the image is processed.

Essentially, this means that there is no way to determine through data inspection the length or location of the clusters containing the encoded image data. The only clue available is the END-OF-IMAGE marker at the end of the entropy encoded data.

One option is to discover and analyze latent directory entries in the data area - doing so could provide valuable clues to the start and length of erased JPEG files. The downsides to this approach are added complexity (recovering deleted directory entries) and incompleteness (directory entries for deleted JPEG files may not exist due to reuse).

A simpler approach is to inspect each cluster in the data area to see if it begins with a START-OF-IMAGE marker or contains an END-OF-IMAGE marker. Any extent of clusters bounded by START-OF-IMAGE and END-OF-IMAGE markers stands a good chance of being the data for a contiguous JPEG file - the very kind of file we've been trying to recover in this series. In this post, I'll implement this simple method and test the results. Follow the "Read more" think for the rest of the post.

Friday, March 19, 2010

Wait a moment...

The other day, I posted a lament about modern (popular) programming being mostly a matter of connecting pre-existing components or libraries with minimal work. In the post I stated that I didn't find such work satisfying.

The next day, however, I realized something - I've just spent nearly a decade in advanced development roles happily creating prototypes by modifying and combining pre-existing software components with the minimal possible work. Spot the inconsistency?

This perplexed me - why did I react this way to Mike Taylor's post when I have enjoyed such prototyping work so much?

After some thought, I concluded that my prototype work has involved modifying complex systems. This has required first understanding enough about each system's design and software to determine the minimal changes required to implement the desired functionality. So, although the eventual changes were relatively minor (100s to 10Ks LOC), along the way I had to obtain a deep understanding to complete the task. I think this is the material difference between the prototyping work I've enjoyed and the kind of "library gluing" that I dislike (and Mike of course!). If true, then there is no inconsistency after all.

Clearly the issue isn't black-and-white - the amount of work performed doesn't represent the amount of understanding required. To some degree this reminds me of the Simplicity Cycle - after a certain point, enough understanding is achieved to make the solution simpler, not more complex. From this perspective, I suspect that the satisfaction that I - and perhaps others - seek is the result of crossing that complexity-understanding threshold - this may be the "grokking" point that was easier to achieve in simpler times (e.g. 8bit programming).