Thursday, January 15, 2015

The Cognitive Cost of Switching Technology Stacks

I do kinda feel like my head is full!
 My context switching penalty is high and my process isolation is not what it used to be.
-          Elon Musk, Reddit AMA, Jan 5, 2015

Cognitive load is a term applied to the overall effort used in working memory for an individual performing a task. Faced with any technology choice, we tend to concoct an approximation in our minds of the cost of effort, compared to the benefit of change.  The cost that has been on my mind recently – is that of cognitive load. Even thinking about the irony of that statement adds to my cognitive load.

I moved to Singapore in 2007, with roads and driver’s seats opposite those I learned on in Canada. When driving there, conversations with passengers were halting, stressful and mentally draining. I could feel my brain fighting to avoid old reflexes, which seemed to conspire against my progress. Switching contexts between driving and navigating was a chore. I circled and doubled back quite often.

This experience left me sensitized to my own cognitive load. I began to notice how I reacted to context switching between computing technology and my own scientific domain, biology. Like Elon Musk, my context switching comes with a penalty best described as a “mental lag”. A period of time where I can remember nothing about what I know. This lag is a brief moment of stupidity, lasting seconds to minutes. It is as though my brain needs time and more clues to rebuild the branching needing to recall those things that do indeed reside deep in my memory. It seems like that the path into deep memory gets displaced by whatever I was last doing. The more cognitive load my last task used, the longer the lag seems. The discomfort of switching contexts seems to drive me to try to reduce my cognitive load.

Educators design instructional material to reduce cognitive load in a few ways.
  • Physical integration of information. (Think Wikipedia as our savior for any trivia question.)
  • Eliminating unnecessary redundancy. (We Canadians fill out government forms in one or the other official language, never both, no matter how fluently bilingual we are. ) 
  • Worked examples. 
  • Open-ended exercises.
So my hypothesis here is that technology stack components that are successful – ones that entice people to switch to them – seem to reduce cognitive load in ways that approach the list above. At the same time they have a low switching cost.  It is as though knowledge workers carry this approximation in their heads, balancing the real and cognitive costs and benefits of switching to new technologies, while at the same time watching network behavior so as not to get stranded on the wrong side of emerging successes.

Over the last 3 years, I have changed my complete computing stack, including infrastructure, operating systems, databases, and language. I attribute this big changeover to my fundamental need to reduce my cognitive load, and I am pleased to say it has. 

Switching from physical infrastructure to a hybrid IaaS system has made life much easier. Aside from the usual number-of-cores-on-the-head-of-a-pin, or CapEx-vs-OpEx arguments, I would argue that the popularity of IaaS and PaaS cloud computing relates directly to reducing the cognitive load of developers. The cloud based web developer doesn't need to carry around the mental baggage of physical systems knowledge. Maybe this is obvious to everyone.

Cloud computers are invisible computers, and “out of sight – out of mind” seems to me to be a key to minimizing cognitive load. Cloud VMs remove the burden of needing to know much about the idiosyncrasies of physical server hardware installations: power, security, firewalls, networking, cabling, storage formatting, remote console booting, and OS driver updating. By using cloud computing, developers get a free chunk of cognitive load back to use for other fun software development stuff.

Another example is the rise of server-side Javascript.  Through Node.js, Javascript has become a general purpose computing language, adding to its original role as the common language in web browsers. This is a clear case of  – Two birds, one stone.  Developers work more efficiently in a single language that spans the back-end and front-end of modern web applications. Efficiency improvements come from re-using code on both the server and browser side, but also from the reduced cognitive load needed to code in two separate languages and by avoiding context switching penalties.

The popular rise of Docker also seems to fit the pattern of a reduced cognitive load – as it chunks software deployments, abstracts Linux system calls, and promises to make it easier to deploy across heterogeneous cloud platforms. Packaging a Docker image consolidates familiar git based builds and package management based installations, carrying with it all the application dependencies.  The Docker ecosystem promises to democratize the compute image itself, making it vendor neutral, and, at the same time, reducing the worry over cloud vendor lock-in. Docker embodies the concept of the physical integration of computing information, which was the first bullet point from the list above. But it will take at least another year of ecosystem development before Docker is widely used in production systems.

While these three are arguably successful and appear to reduce cognitive load, other new systems are eating your newly-freed cognitive capacity for lunch.

There has been a sprawl of database and storage technologies supporting unstructured data and Big Data. These aspire to provide solutions that scale by changing fundamental compute paradigms of databases and storage.  If you believe Big Data listicles, our fate seem sealed and driving towards a NoSQL MongoDB, Hadoop/HDFS world for Data Science. But Hadoop, in particular, has not won many converts among my kindred spirits in Bioinformatics. The Bioinformatics stack depends on general purpose Unix/Linux computing and POSIX file systems. Somehow the cost to convert tools to the Hadoop/HDFS world is not yet justified to my peers.

So - what happens when general purpose computing systems catch up with these diverse and specialized Big Data systems?  Consider two new open-source systems PostgreSQL 9.4 and Manta, both released in open source in late 2014. Both of these systems offer consolidations that enable unstructured data and Big Data computing within existing computing paradigms.

In December 2014 PostgreSQL 9.4 was released with the JSONB data type and extensions for JSON indexing and querying. These extensions allow any SQL column containing JSON data to work as a fast, SQL indexed and query-able JSON store. In practice, this makes NoSQL/JSON databases functionally redundant and embeds the functionality in an SQL system. One can query arbitrary JSON structures with SQL syntax extensions, and importantly, connect rich SQL-based visualization and computing tools like Tableau, and R, to the database. The specialization movement towards NoSQL/JSON databases, as implemented in MongoDB, has now lost its raison d'ĂȘtre. SQL has caught up in functionality. PostgreSQL could be a winner in the long-term and start edging out NoSQL and MySQL systems in a couple of short years.

In terms of Big Data/Data Science technology stacks, consider the on-disk data (and concept) redundancies that exist between traditional POSIX data storage, persistent cloud object storage (e.g. AWS S3), and Hadoop HDFS block storage. These are all separate systems that the popular Data Science stack depends upon for Big Data computing. Data Science practitioners must recall all the implementation details necessary to manage conventional POSIX file systems, cloud storage objects (buckets and eventual consistency models) and some obscure bits, like how to write Java tools to pack small files into optimally sized HDFS Sequence Files for Hadoop processing. 

Manta consolidates this fragmentation of storage by providing a persistent object store with general purpose Map/Reduce computing using the CPU cores of commodity storage nodes. It provides a strongly consistent, hierarchical, ZFS (POSIX compliant) copy-on-write file system-based, object store.  It allows secure container-based compute tasks to be moved to storage, rather than moving storage to computing nodes. This eliminates AWS S3 – EC2 – HDFS transfer time, duplicated data costs, and saves us from having to optimize HDFS by packing small files into a larger SequenceFile via a peculiar re-invention of tar.

On Manta, a user can build complex Map/Reduce pipelines involving any  runtime language or compiled special purpose software (e.g. ffmpeg, OpenCV, BLAST, CRAM), and run Map/Reduce in a secure multi-tenant environment. Manta functions without the drag of Hadoop’s Java-based process control code and HDFS management code running on top of the operating system and storage. Manta ZFS storage is LZ4 compressed at the operating system level, a unique feature making the better use of commodity disk space. Open-source Manta software development kits (SDKs) allow Map/Reduce systems to be rapidly constructed with one-line Bash or PowerScript scripts, as well as  in Node.js, Python, Go, R, Java and Ruby.

Despite the elimination of real storage redundancies and cognitive load redundancies imposed by maintaining three separate storage paradigms, Manta has a switching penalty – it is not a Linux based system. It runs on the x86 illumos based distro – SmartOS, a server operating system forked from OpenSolaris. SmartOS itself is specialized in running secure operating system containers (Zones), as well as the KVM hypervisor which can run Linux or Windows virtual machines.

Released as a cloud service in 2013, Manta had four distinct cognitive costs making adopters take pause. Now that it is open source, as of Nov 2014, there are three left. First, it has to be installed on physical or hosted commodity computing infrastructure. Someone has to rig the required networking and learn how to administer the system. Second, is the cognitive cost of switching your operating system command memory from Linux to SmartOS  (i.e. reading the docs, cheat sheets, list-servers).

Third, and arguably the biggest sticking point, is the cognitive cost in cross-compiling Linux specialized code to SmartOS. Although over ten thousand packages have been ported to SmartOS’s package manager, writing software that is cross-platform between Linux and Unix systems is, in many cases, a dying art form. It would seem that the success of Linux has allowed developers to reduce their cognitive load and stop caring about cross-platform software compatibility.

But this last cognitive load barrier is falling fast. In 2014 Joyent’s engineers began updating the lx-branded zone, based on the old OpenSolaris Linux operating system call emulator. Now in open beta, lx-branded zones are containers that run current versions of CentOS and Ubuntu, and a majority of current 32- and 64-bit Linux software. Extensive community testing is helping find bugs, which are being eliminated fast and furiously.

The lx zone provides a way for Linux software to run on SmartOS  at bare metal speeds without cross compiling code and, gives die-hard Linux users their cherished apt and yum package managers.

A key piece of Linux software targeted for the lx zone is the Docker daemon itself. For Manta this is most significant as it will allow Docker images to form the building blocks of Big Data computing on storage. When Joyent succeeds at this effort, it will tip the balance. The actual cost of maintaining data and moving it between three separate storage paradigms, POSIX, S3 and HDFS, will outweigh the cognitive cost of switching to Manta. 

Then the Big Data/Data Science stack may be ready for serious disruption.

Disclaimer: I no longer work for Joyent, and have no competing financial interests. 

PostgreSQL 9.4 JSONB
What's New:

Hadoop Sequence File

Manta: Object Storage with Integrated Compute
Overview and Docs:
Open Source:
Docker Rising:

SmartOS & lx-branded zones (LXz)
lx-branded zones on the SmartOS Wiki for Beta Testers:
Bryan Cantrill on OS emulation in - The Dream Is Alive Video:
Why SmartOS in my Lab?
Linux to SmartOS Cheat Sheet:

Tuesday, April 30, 2013

Open Science: Inverting Metagenomics Pipelines - 10 things you may not know about RPS-BLAST

1 May 2013- Singapore. This is an Open Science post intended to spark ideas and conversations with other scientists.

I was thinking about some of the assembly problems that have been described in large metagenomic (e.g. soil) samples by @ctitusbrown, @phylogenomics and many others.

To start with - I am not an “NGS” metagenomics expert. Iam a “FGS” metagenomics guy. We published log-odds scores for detecting over 100 species by amino acid composition in 2002. ( I know a thing or two about large scale bioinformatics pipelines and optimizations.

So. Here are my current thoughts about inverting the metagenomics assembly pipeline.

In a nutshell:

Short read -> RPS-Blast -> PSSM Hierarchy Bin -> Bin Assembly -> Bin contigs -> RPS-Blast hybrid ejection -> Function Assignment -> Phylogenetic assignment -> Taxon Bin -> Assembly. 

Inversion? How is this possible?

First - the "small database effect".
What is the small database effect? It is the inverse effect of the large database problem. Google "cancer" and you get (today) 644 million hits. The larger the database, the more hits you get. A smaller database gives you correspondingly fewer hits.

Consider that the number of protein functional classes (PSSMs or HMMS) is smaller than the number of possible species. Organisms are built on common protein building blocks, and there are more ways to organize the blocks and encode them (i.e. species) than there are blocks themselves.

In principle, searching for the function of a short read should outperform a search for the taxon of a short read, in terms of both hit quality and number. Can this work in practice? I think so. Here’s why. (I have tried out steps 1-3 they work from the command-line RPS-BLAST, which got me started on this idea).
  1.  RPS-BLAST can find PSSMs that match sequences of 9-12 amino acids. Change the threshold E-score to around 100. Multiple PSSMs may be hit in the process, and some of these may be false positives. See point 3 about reducing the number of hits by up to an order of magnitude, and points 6 about false positive hits.
  2. Short reads on the order of 30bp can be input directly into RPS-BLAST as nucleotides. Select the appropriate genetic code. Assume everything encodes protein and close your eyes for the moment.
  3. CDDs are not independent, rather they are curated. The PSSMs are organized into a hierarchy of evolutionary families and superfamilies. Most hits are in fact within a branch of the hierarchy. The collection of over 40,000 PSSMs is itself redundant and a last-common family or superfamily can be readily found by traversing the ancestral hierarchy. This can collapse the number of candidate PSSM hits to the short sequence considerably. In some cases, by an order of magnitude. For example 50 PSSM hits can collapse into 2-5 superfamily CDD PSSMs.
  4. With the assignment of lowest common ancestor CDD PSSMs to the short read sequence, one can use the resulting PSSM / CDD identifier as a database key for binning. A short read sequence can be assigned to multiple PSSM bins without harm at this point. By grouping together all the short-reads into PSSM based bins, one can subdivide the metagenomics dataset into potentially related protein functional units by putative peptide sequence.
  5. Once the reads are binned, one can assemble the sequences in each PSSM bin into contigs independently, by feeding the binned read set into a conventional assembler.
  6. One can remove erroneous hybrid contigs from the bin by a second RPS-BLAST scan against the parent PSSM or CDD PSSM branch hierarchy. This is a tiny fraction of the entire PSSM set. At this stage the successful contigs should represent species fragments with coding regions based on nucleotide and reading frame overlap, with minimal interference from random overlap. The resulting set of contigs is functionally related to the PSSM, so the assignment of function is already in hand.
  7. Digital normalization could be applied to the contigs within the PSSM bin to remove redundant information for further genome assembly. There will be a pile of un-binned reads as well that do not map to any PSSMs. These could be assembled/digitally normalized separately from the binned sequences, and merged for the final assembly pass.
  8. The PSSM itself could be used a scaffold for sub-assembly. That will require some code, but it is probably not necessary.
  9. The processed contigs can be sent to another system in the pipeline for phylogenetic assignment (e.g. BlastX ). Final assembly can performed assuming species specific sequences are grouped better by the preceding steps.
  10. PSSMs are divisible and thus can be cut up and subdivided into smaller PSSMs. If there are unmatched portions of the original PSSMs in a set of contigs, they can be chopped into mini-PSSMs, related in the existing CDD hierarchy to the parent PSSM we cut them from. This now changes the search statistics. A new custom RPS-BLAST database can then be compiled out of these leftovers. The unassigned or discarded short reads can be searched against this. This mini-PSSM database will be much smaller in size than the original. This could further extend the previously found contigs by assigning new, weaker hit contigs into the existing PSSM bins up the hierarchy. 

Feel free to substitute HMM for PSSM in most of this. However I have no idea whether implementations of protein family HMM searching can use short nt sequences as input (points 1,2), or how to achieve the hierarchical reduction in number of hits without an evolutionary model for the entire set, as curated by CDD (point 3), or whether a HMM can be simply divisible (point 10) without a lot of re-processing.

Tuesday, October 23, 2012

The mouse trap, redux.

The Hook of Job: 
History reduces the irreducible.

tldr? - scroll down to Fig 5 to skip the details.

            The historical provenance of the mouse trap's unique design back to 1847 reveals its inventor, Job Johnson, and that it is reducible to a functioning single part animal trap, the fish-hook.

 Figure 1. A modern Victor Brand mouse trap with bait-pedal up,
showing the vestigial profile of the fishhook, from which it originated.

          In his book Darwin’s Black Box1, and follow-up The Edge of Evolution2, Biochemist and Intelligent Design instigator Michael J. Behe uses the mouse trap as the defining example of a device that is irreducibly complex. He explains how it can not function without all of its parts, and that none of its parts, alone or in various combinations, can do the function of the entire trap. And he explains how the trap could not have been created by a small succession of modifications to some simpler precursor that performed the same function of mouse trapping.  Niall Shanks, in his critique of Intelligent Design3, made an effort to address the historical origin of the mouse trap, but could not get past what seems like a popular myth when in fact there is much older supporting evidence for the origin of the mouse trap in both the Patent Office and in antique mouse trap collections.

          The origin story for the mouse trap is important because it maps the progress of a complex idea. It is at the first moment of invention of the mouse trap that marks the start of the idea of its complexity. It is in the original trap that the human intelligent design work was performed. The complex mouse traps designed thereafter are a cascade of copycat follow-ups with incremental changes to the original design. The idea of a snap-style mouse trap is an idea that, once it started, became hugely popular.

          If you could collect one example of each mouse trap produced every year after its original invention and place each trap on a long table side by side in chronological order, how do you think these mouse traps would appear to have changed over time? How far back in time would we have to go to find the original snap-style mouse trap? Would the original snap-style mouse trap even look the same? The long line of mouse traps we see before us would be a record of the legacy of small improvements to the original snap-style mouse trap design. 

          While we have no table of mouse traps handy, there is an excellent record in the U.S. Patent office that documents, illustrates and patents each important improvement in its evolution. For example, today some Victor brand mouse traps4,5,6 are sold “Pre-Baited” with yellow plastic bait parts shaped like little slices of Swiss cheese impregnated with a chemical scent to attract mice, a high-tech improvement eliminating the need to supply and load bait. But this chemical innovation is really just a small incremental change to the previous stamped metal bait tray7. Everything else is the same. If the plastic was soft enough to gnaw on and could be further impregnated with a mouse-specific poison, perhaps the rest of the mechanical parts of the mouse trap would become dispensable, converting it from a trap to a poison. So minor innovations, like the new “Pre-Baited” trap, can accumulate and obscure the details of the original mouse trap design.

Figure 2. Patent drawings of the pre-baited cheese-shaped bait pedal. 

          This is precisely why we must go back in time and conduct some research to try to find the original invention and an authentic artifact or illustration to comprehend its design. The original inventor of the mouse trap had no mouse trap to look at or think about, just a problem to solve: how to trap and kill a wild animal busily gnawing away at his food stores in the cold of winter. Would Behe’s questions about irreducible complexity and the nature of the parts of the mouse trap hold true for the original first mouse trap? Or would the original device betray Behe with answers that would prove to be the undoing of the most fundamental straw-man of the Intelligent Design argument?

          The snap-style mouse trap design is what Richard Dawkins8 calls a meme, which he defines as an original thought or first of a kind idea that has been copied and repeated many times since its origin. The mouse trap idea began with the very first such trap, and it is at the time of origin of this idea where the analysis of the irreducibly complex features must be properly considered, not the countless copies that follow or their incremental changes to the original design.

          Let’s look at the history of the mouse trap to understand this. Now, the modern mouse trap is a very common item, simple enough to be understood mechanically, and very easy to illustrate. Definitive statements can be made about today’s mouse trap mechanism and its complexity. In his argument, Behe asks his readers to imagine how well the modern version of a mouse trap would work missing one, two or even more parts. But we know nothing of earlier designs, so this puts the reader in the position of having to imagine a design history for the modern mouse trap. For us to conclude that there could be no simpler form of the mouse trap, based on the very sparse imaginary history we fabricate in our minds, is a flawed way of thinking. Like the many versions of the Robin Hood story we have seen, starring a broad range of stars, from an animated fox to Russell Crowe, there can be a big difference between an imaginary history and a real history.

          Since old mouse trap designs are obscure, we must look up some of the mouse trap patents and find old mouse traps in collections in order to follow them back to their origin. There are thousands of illustrations of mouse traps filed with the U.S. Patent Office and other patent offices worldwide. By looking back through old patent documents, we can find many forms of mouse traps, some with recognizable snap-style trap features, but also many others with springs in strange places or iron parts where wood is expected. So it is better to consider the information in the patent records and in actual artifacts as original sources of information.

          Who invented the mouse trap? A crowd of inventors might say “I did!” each raising his hand. With over 4,400 U.S. patents on mouse traps, many can claim to have invented a mouse trap. But who was first? The mouse trap’s invention story is remarkably convoluted and obscured by thousands of inventors all trying to “build a better mouse trap”. More problematic is that the most successful American mouse trap company, Woodstream, has perpetuated, in the popular media, a myth about the invention of the mouse trap, attributing the modern mouse trap design mostly to its founder, John Mast.

          In fact there seems to be two separate corporate mouse trap origin myths in publications, one American and the other British. The good folk of Lititz PA would point to their long-running company, Woodstream, makers of the famous Victor brand mouse trap, and credit company founder John Mast with the invention in 1899 which was patented in 19039. Although some popular articles10 correctly place the year of invention of the Victor Brand mouse trap to 1894, that is not the date of the Mast patent. In an interesting parallel, folks more familiar with the British “Little Nipper” mouse trap would credit James H. Atkinson with the invention in 1897. His British patent (GB 13277 of 1899) was sold in 1913 to a company named Proctor for 1000 British Pounds. Both Proctor and Woodstream have been making mouse traps for a very long time, so it is no surprise that each takes credit for the invention, though separated by vastly different markets and an ocean.

          Now, while Mast and Atkinson are the inventor-founders of the two major mouse trap companies, the race to perfect and market a cheap and reliable mouse trap was not new in the late 1890s, the period in which they started their respective companies. Their own patent documents show that they borrowed many design ideas from other trap inventors. One earlier patented mouse trap design11 is more similar to Woodstream’s modern Victor brand design than any other, even closer to it than Mast’s own 1903 patent. It was the 1894 design of William C. Hooker of Abingdon Illinois. Hooker founded The Animal Trap Co. of Abingdon Illinois with his invention. According to mouse trap collector and expert Rick Cicciarelli, The Animal Trap Co. first marketed their unmistakably modern looking mouse trap as the “Out O’ Sight” mouse and rat trap in two different sizes. 

Figure 3.  1894 W.C. Hooker snap style "animal trap".

Figure 4. Hooker 1894 Design Patent Animal Traps - "Out O' Sight Rat Trap and Mouse Trap.
Photo courtesy Rick Cicciarelli.

This is important because the design of the snap-style mouse trap and the rat trap are both identical designs, just scaled copies of the same design, one mouse-sized, and one-rat sized. Cicciarelli tells us that the company that Mast had founded acquired Hooker’s design in 1905. This explains why sometimes the Victor mouse trap is credited with being invented in 189411, the date of the Hooker patent, which became their intellectual property.

So it was Hooker’s design that became the modern mouse trap, but even his design was also predated by earlier traps and patents. And while Mast was indeed a master of turning the idea into a low-cost product, the essence of the snap-trap was not his idea, nor even Hooker’s idea, for that matter. The origin is found further back in time, back when trapping animals for food was far more important than trapping pests like mice and rats.

To find the original mouse trap design idea, let us try to define the essence of the mouse trap. How do we describe the unmistakable part of the mouse trap that can be recognized by its structure as the invention? The snap-style mouse trap is distinguished by a U-shaped bar that travels 180 degrees from one side of a thin rectangular platform of wood, to the other. It is powered by a coiled spring under torque, not by compression or stretch. And it is triggered by the mouse moving a bait platform, releasing a catch bar, and allowing the torque spring to move the U-shaped bar forward with deadly force. A rat trap is just a bigger version of the mouse trap. So we must also consider that the very first design could have been either in the form of a mouse or a rat trap. So let us restate the “Who invented the mouse trap?” question to reflect the details of its successful design idea; “Who invented the snap-style mouse/rat trap with wooden base, bait trigger mechanism, perpendicular torque-spring and U-shaped kill bar traveling over 180 degrees?” Hooker’s patent satisfies this description. Are there older ones?

After examining snap-style mouse trap patents, what varies most in the design, generally speaking, is the trigger mechanism and bait tray. This is the part that holds the cheese that the mouse or rat gnaws on that triggers the trap to snap shut. The bait tray has been changed and refined over time by many inventors, including the modern fake plastic chemical-scented Swiss cheese version. You can see the difference in triggers on the Victor mouse traps for sale today4,7 as compared to Hooker’s patent11. The modern bait tray and trigger is much more sensitive than the original design and stamped from a single piece of metal. We can see a number of small improvements in trigger sensitivity as Hooker’s design worked only when pressed down. Hooker’s bait tray and trigger was made from three separate pieces of metal but the improved single-part bait tray and trigger senses chewing motion in any direction whereas earlier ones could be defeated by clever mice who knew how to chew in the right direction. I imagine that sideways-gnawing mice may have escaped with their lives and a cheek full of bait cheese, and gone on to breed even more sideways-gnawing mice, were it not for these numerous slight modifications to the bait pedal and trigger mechanism over time.

Now let us go back further, as several patents predate Hooker’s 1894 patent. The C.B. Trumble Patent of 189212 and the W. H. Castle patent of 188813 are key examples of what patent lawyers call “prior art”. These older designs retain the essence of the snap trap design as I defined it above, but with variations that include different triggers and cast iron bases instead of wooden bases, and double torque springs instead of a single torque spring, in the case of the Castle patent. John Mast referred to both of these earlier patents in the text of his own 1903 Patent9. In addition, Mast referred to two patents from 1855 that were granted to Lucien B. Bradley for rat trap designs14,15 but he avoids mention of the Hooker Patent11 in his filing, instead drawing attention to the older designs, which helps our quest.

The 1855 Bradley rat trap patents14,15 were powered by springs that were compressed and released and were oriented in a straight line to the bait, rather than in the perpendicular torque spring arrangement. This may have been an effort to alter an earlier torque spring design as a patent workaround. With the 1855 patents of Bradley, we are close to the origin point, and nearly half a century before the date of the Mast patent which is so commonly mistaken for the original.

Rick Cicciarelli is an avid collector and authority on the history of antique mouse and rat traps, and he has read through all the US patents in his quest as a collector. Cicciarelli owned a prized snap-style rat trap marked “JOB JOHNSON BROOKLYN NY PATENTED 1847”, which he bought at an antique store as a boy for only $30. He believes it is the earliest flat snap-trap design. In his correspondence to me, Rick says, “Generally speaking, that flat snap trap design is attributed to Hooker. However, I have a flat snap rat trap which was patented in 1847, and I believe THIS trap to be the earliest flat snap trap design.”

Figure 5. Rat trap, marked 

Photo courtesy Rick Cicciarelli.

Upon examination, the antique trap of Job Johnson does fully satisfy our question of the essence of the mouse trap design idea, with wooden base, U-shaped kill bar, and torque spring. But the patent of 1847 to Job Johnson16, assigned the very early U.S. Pat. Number 5,256, was not the design of a rat trap. Rather, it was one of the three first novel designs for a spring-loaded fish hook. Johnson’s fish hooks are gloriously illustrated in photographs in a collector’s book on spring-loaded fishing tackle and fish traps by William Blauser and Timothy Mierzwa17. This text of patent number 5,256, being sworn and witnessed statement to the U.S. government, tells us part of Job Johnson’s story:

Be it known that I, JOB JOHNSON, of the city of Brooklyn, State of New York, fish hook manufacturer, a native of England, having been resident more than one year next preceding the date hereof in the United States, and having duly declared my intention to become a citizen thereof, have invented and made and applied to use certain new and useful improvements in the constructive application, arrangement, and combination of mechanical means whereby the bite of a fish at the bait on a hook causes a crooked barb-dart to strike into and hold the nose, head or gills of the fish, independently both of the line and of the person holding the line, and the general arrangement of which, when of a proper size, may be applied to the capture of any kind of fish or of any destructive or ferocious animal, and for which improvement I seek Letters Patent of the United States;16

Figure 6. Spring-loaded fish-hook illustrations from 1847 Job Johnson Patent 5,256.

Job Johnson, as it turns out, was a prodigious inventor. His legacy was nearly lost to history but it has been recently rediscovered by Dr. Todd Larson, a historian at Xavier University. Larson is an expert on Job Johnson’s inventive legacy, which can be found in Larson’s book, The History of the Fish hook in America18. According to Larson, Job Johnson was a prolific American inventor with 38 patents ranging from fishing tackle to elevated railways, demonstrating his very broad creative capabilities. Johnson got rich from a thriving automatic fish hook manufacturing business. This success obviously put him in a position to explore commercial designs for traps for other animals. His experience with springs and wire, and his workshop, filled with springs and triggers from his work on the spring-loaded fish hook, would have been the perfect place to experiment with other forms of traps.

We know, thanks to Dr. Larson’s research, that American farmers needed rat traps and thought about using fish traps to catch rats, as is mentioned in an article in the 1847 edition of The Prairie Farmer. Entitled “A New Fish hook”19, the article described Job Johnson’s fish hook and concluded, “Those who wish to catch rats have got the right machine here.” So how did Johnson make the move from spring-loaded fish hooks to the rat trap? According to Cicciarelli it is pretty obvious to the naked eye that Job Johnson used his patented spring-loaded fish hook trigger design, stuffed the spring-loaded fish hook mechanism in a hole in the wooden base, and rigged the torque-spring with a vicious serrated U-shaped striking bar. In Cicciarelli’s words:

You can't really see the mechanism of the trap very well from the photo, but the bait hook is actually a long fish hook, the end of which goes through the base and comes out at the rear to hook into the jaw when the jaw is pulled back into the set position. It works just like the spring hook.

So the Job Johnson rat trap actually contained a copy of his spring-loaded fish hook as a bait platform and trigger mechanism. As such, it was fully covered by the wording and considerations in U.S. Patent 5,256 and so the rat trap was duly stamped “PATENTED 1847”.

Johnson was already a well-known fish hook maker, having started the first American effort in their fabrication in 1843. While now banned as unsportsmanlike, spring-loaded fish hooks were, in their day, an important way for working fishermen to maximize their catches. Todd Larson devotes an entire chapter to Job Johnson’s inventions and says that he was a man “whose hooks were so good they inspired poetry.” Truly, Job Johnson’s fish hooks are mentioned repeatedly in the 400 line ballad “The Legend of the Great Tautog” written by an anonymous author and published on 23 October in The Spirit of the Times, in 1852. According to Dr. Larson, Job Johnson’s fish traps were known to be capable of catching small game, including rats, simply by hanging them above the ground by a string and baiting them.

Larson’s careful research shows that his patent 5,256 for the spring-loaded fish hook was, in fact, an improvement on the first spring-loaded fish hook invented by a 16 year old boy in 1845 named George Washington Griswold18. Griswold’s design was assigned and patented by entrepreneur Englebrecht and lawyer Skiff in 184617,19, and given U.S. Patent number 4,670. It was the first U.S. patent involving a device to catch a fish17.

Griswold used a flat spring to cause two hooks to close on the mouth of a nibbling fish. Job Johnson improved the Griswold design with a more powerful contractile helical spring, driving the two hooks together. Job Johnson was a natural at spring-making, probably from his early training with iron wire fabrication for fish hooks. So we know Johnson had the skills to design and assemble the parts of the first rat trap. And we know that Johnson contemplated that other animals could be trapped with his spring-loaded fish hook by the words in the patent text itself16. Finally, we know that Cicciarelli’s prized artifact shows that Job Johnson fabricated rat traps containing the exact same spring-loaded fish hook mechanism, with the added torque spring and kill bar, and mounted on a flat piece of wood.

Well, this is where we come to the end of the line of commercialized mouse and mouse traps, where we run out of artifacts and patents. Older inventions may have existed, but they were not spread as ideas. There was a wave of innovation in the mid 1840s involving a proliferation of patents in fish traps, popularized by many articles in the early editions of Scientific American18 as one of the carriers of design ideas in its day. The starting point of all this innovation was from the 16 year old Griswold’s first spring-loaded fish trap.

As I mentioned, collectors Blauser and Mierzwa have a wonderfully illustrated book showing these early spring-loaded fish hook designs17. Their photographs show that Johnson’s first design was a spring-loaded fish hook from 1846 fabricated with three metal parts, three rivets and one spring (page 16) but no corresponding 1846 patent was found matching this artifact. Interestingly, his 1847 patented device was altered to be made with 4 metal parts, four rivets and one spring (page 21). So Job Johnson likely took the Griswold design and made at least two successive, slight modifications to create a superior spring-loaded fish hook design that would continue to be sold into the 1900s. 

Figure 7. Job Johnson 1846 and 1847 spring-loaded fish hooks.
Arrow points to additional part and rivent added in the 1847 design.
Photos courtesy Tim Mierzwa.

There is little doubt that both Johnson and Griswold had other prototypes made in between these that failed to work. While most of the failed prototypes of the earliest fish traps and rat traps are lost and long forgotten, the additional 1846 fish trap of Job Johnson is evidence of a prototyping process prior to the broad spread of the idea and more inventions in spring-loaded fish traps, mouse and rat traps.

Thanks to Cicciarelli, we can conclude that Job Johnson was the earliest known inventor and original spreader of the snap-style rat and mouse trap idea. Yet it was a branch of an idea started by the young Griswold, a simple idea about how to trap a fish, which was modified into a snap-style rodent-killing machine. Like a viral predator, spring-loaded fish traps crossed the species barrier to become rat trap and mouse traps, providing two independent and successful lineages of traps, one trapping food, and the other for getting rid of destructive pests.

So finally, now we can go back to the context of Behe’s steps1 to determine irreducible complexity and apply them to the Job Johnson rat trap. According to Behe, the first step is to specify the function of the system and all the system components. The second step is to ask if all the components are required for the system function. Well, the system function of the rat trap is to lure in unsuspecting rats and immobilize them so they can no longer render havoc on stored food supplies. So the system function is unchanged; it is just adapted for a larger rodent.

Now, we also know that the spring-loaded fish hook used inside the Job Johnson rat trap was itself a standalone animal trap, so we can state conclusively that not all the components are required for the system function. Importantly, this is where the answer changes from yes to no. Not all parts are required for the entire system to function. And in fact we can dispose of all but a single part and still retain system function. The simple one-part animal trap, the fish hook, is clearly visible as the bait holder in the Job Johnson rat trap. So one part of the Job Johnson rat trap is a physical precursor that remains largely unchanged throughout its design: the fish hook.

Griswold started off with a design that duplicated the fish hook into a grabbing mechanism in a configuration like the gripping talons of a bird of prey. Johnson’s intermediate work shows increased part counts in his first two spring-loaded fish hook designs. He added one part and one rivet in a step-increasing complexity in a small increment to provide better leverage for the trigger mechanism. It is clear from the historical accounts that people used spring-loaded fish hooks by themselves to catch small animals. So the piece of wood is dispensable, as is the torque spring and kill bar. It is just as likely that one could catch and kill a rat with a baited barbed fish hook as one could a fish, provided you could stay awake long enough to wait for a rat to bite the hook in the middle of the night.

So we have now uncovered that the irreducibly complex mouse trap is a conclusion made by a convenient omission of a forgotten history. The original invention is reducible to a functioning single-part animal trap, the fish hook, retaining the same system function throughout the transition. The historical lineage of the snap-style mouse trap comprises evidence showing that the mouse trap is an example of reducible complexity, thereby disproving the notion that it was irreducibly complex when it was originally created. The mouse trap voyage through time takes us back to the fish hook. And as I show in Figure 1, on the current Victor mouse trap7, the profile of the metal bait pedestal is, remarkably, still shaped with the same curve as a vestigial fish hook.

Job Johnson is the most likely inventor of the mouse trap design. Why don’t we know more about Johnson? Dr. Larson’s book tells us how Johnson’s last few inventions and investments were considered the work of a crackpot, as he suffered from senility in his 80s18. His inventive reputation suffered greatly as these failures mounted. Johnson’s senility left a poor impression on the historical record of his later years, and his earlier successes were overlooked as things like spring-loaded fish hooks fell out of popularity for being unsportsmanlike. And so the story of the invention of the mouse trap may be obscured by Mr. Johnson’s own illnesses later in life. Yet his highly successful 1847 fish hook design continued well after the patent expired, sold by the Sears and Roebuck and Montgomery Ward catalogues into the early 1900s. Only two examples are known of the Job Johnson rat trap, and there are also very few examples of the original Job Johnson spring-loaded fish hooks in the hands of collectors. Oddly, the spring-loaded fish hook line of inventions starting with Griswold was destined for extinction, while the early diverging line of rat and mouse traps is still successful, and still being used today.

Recall that long table and line of old mouse traps we were going to set up? Well we know it goes back to 1847 and we know that next to the first Job Johnson rat trap we must put the first three spring-loaded fish hook designs, two of Johnson’s, and the first from Griswold. The mouse trap line is a branch from another long line of inventions – the spring-loaded fish hooks. Men invented machines to trap pests at the same time they were inventing machines to trap food. At the very beginning of the two lines of traps lies a single-part animal trap called the fish hook, the manufacture of which was Johnson’s trade, a set of skills passed from father to son.

So let me more boldly ask whether any other device of complexity is, historically speaking, going to withstand the kind of scrutiny we just gave to the mouse trap design? Of course I will not suggest the mouse trap evolved without human intelligence. After all, it is a product of human design. The key point of the historical study is this: when you stop and identify individual intelligent human designers as individuals and you look carefully at how they design things, you see that they always apply their intelligence in small doses of creativity in the context of prior knowledge. Incremental additions to prior designs and prior knowledge are the way humans achieve creative new designs. This is a principle recognized by the Patent Office and the patent process. Small changes are how intelligent human beings get to complexity. The true nature of human intelligent design is that humans design complex systems in a step-wise approach, not in an all-at-once magical fashion. Simple designs, such as the mouse trap bait pedestal/trigger part, are made up from small improvements over time borrowing from earlier ideas. The Patent for the silly Swiss cheese shaped and scented bait pedal5, invented in 1981 and made ornamental in 19896 , makes reference to an earlier cartoonish mouse trap with decorative holes from a design patented in 194821. Again, simple ideas combine into a minor modification4 to an existing mouse trap design7.

Inventions accumulate new parts with variations on existing design memes just as evolving creatures accumulate new genes based on variations of prior ones. The key is that the knowledge is stored, either in memories, in writing or in the form of the artifact itself. In living creatures the memory of the prior prototype is simply the DNA, the gene which encodes the part. Each gene is an accumulated store of information about the successful small innovations in the mechanical parts, the proteins, within a cell. No intelligence is required in evolution because the memory of prior prototypes does not require an intelligent being to extract the information out and copy it and make small modifications to it.

Now, in the case of human design, many other examples exist, and the patent record holds many of the forgotten details. I invite you to scrutinize the patent history of any object foisted as irreducibly complex. So far there are no examples of spontaneous intelligent designs of inherent complexity that withstand proper scrutiny, and if there are any that appear to, it is simply because through the historical evidence of prototypes, trial and error has not been preserved to tell the story. Even the modern design complexity research community itself acknowledges that complex product design is a process of small step-by-step improvements to prior designs22, and that this is the preferred way our own most intelligent engineering teams pool their efforts to approach complex design tasks.

My conclusion is that human intelligent design is itself a process more similar to the evolutionary process than it is different. Only a few design ideas are successful over the long term. Even successful ones, like the spring-loaded fish hook, can become obsolete and disappear in a process rather like extinction, while another related design, the mouse trap, thrives. One of the benefits of having the Patent Office and its process is that these design ideas are captured for all to examine and modify and reproduce. Patents are the collective DNA of our human innovative genius and the genome of the industrial and technological revolutions


1.              Micheal J. Behe. Darwin’s Black Box. The Biochemical Challenge to Evolution. 1996 Free Press, New York USA.

2.              Micheal J. Behe. The Edge of Evolution. The Search for the Limits of Darwinism. 2007 Free Press, New York USA.

3.              Niall Shanks. God, the Devil and Darwin. A Critique of Intelligent Design Theory. 2006, Oxford University Press, Oxford UK.

4.              Victor® Brand Easy-Set Mouse Trap (Part No. 19032P), Woodstream, Lititz PA, USA.

5.              USPTO 4245423. 1 Dec 1978. Anthony J. Souza and Joseph H. Bumsted. Animal Trap.

6.              USPTO D300163. 7 Mar 1989. Harper Landell and Donald W. Warren. Bait Pedal for a Mouse or Rat Trap.

7.              Victor® Brand Mouse Trap (Part No. 19032P), Woodstream, Lititz PA, USA.

8.              Richard Dawkins. The Selfish Gene. 1976. Oxford University Press, Oxford, UK.

9.              USPTO 744379. 17 Nov 1903. John M. Mast. Animal-Trap.

10.           Joseph Rosenbloom. Snap, Crackle, Pop!  Inc. Magazine, May 2000. (

11.           USPTO 528671. 6 Nov 1894. William C. Hooker. Animal-Trap.

12.           USPTO 481707. 30 Aug 1892. Chauncey B. Trumble. Animal-Trap.

13.           USPTO 391118. 16 Oct 1888. William H. Castle. Animal-Trap.

14.           USPTO 12892. 22 May 1855. Lucius B. Bradley. Rat trap.

15.           USPTO 13843. 28 Aug 1855. Lucius B. Bradley. Trap for Catching Animals.

16.           USPTO 5256. 21 Aug 1847. Job Johnson. Improvement in Fish hooks.

17.           William Blauser and Timothy Mierzwa. Spring-Loaded Fish hooks, Traps & Lures. Identification and Value Guide. 2006, Collector Books. Paducah KY USA.

18.           Todd E.A. Larson. The History of the Fish hook in America: An Illustrated Overview of the Origins, Development, and Manufacture of the American Fish hook. Volume I 2007, Whitefish Press, Duluth MN USA.

19.           A New Fish hook in the 1847 edition of The Prarie Farmer, op cit18.

20.           USPTO 4670. 28 Jul 1846. Theodore F. Engelbrect and George F. Skiff. Improvement In Fish hooks.

21.           USPTO D155513. 8 Sep 1948. Charles E. Jones. Design for an Animal Trap or The Like.

22.           Eppinger, S D; Whitney, D E; Smith, R P; Gebala, D A. A model-based method for organizing tasks in product development. Research in Engineering Design 6, 1-13, 1994

23.           Dagg, J.L. Exploring Mousetrap History. Evo. Edu. Outreach. 4:397-414, 2011

Copyright (c) 2012 Christopher W. V. Hogue. All Rights Reserved.