I read this book a year ago and for some reason I liked it, and as a corollary withheld, hitherto, from writing about a major flaw in the method it heavily relies on, i.e. data mining. This flaw undermines the genuineness of the causality chains, found using data mining, that constitute the major corpus and focus of the book. May be Levitt's early admission through the introduction that he never understood mathematics made it endearing.
Data mining is an interdisciplinary, somewhat nascent branch of computer science - it is Usama Fayyad's specialty by the way - that deals with massive sets of data impenetrable by the human brain. The sole purpose of this field of science is the utilization of computational power to detecting patterns of interest in a given database.
The most substantial flaw in this method is that the computer is already told what types of patterns it should look for, pretty much like a researcher who can't come up with a theory by the mere act of reading data without a certain presupposition of what to look at and/or for, even if it occurred to her during a quick sift. This presupposition is, in essence, an implicit hypothesis already formed and is likely to lead to a next step and what comes after, and so forth in what will ultimately evolve to become a program of inspection and research grounded in and influenced by that initial implicit hypothesis - philosophically known as the problem of induction, attributed to David Hume. It is worth noting that studies concerned with complex phenomena, such as those found in social and medical sciences are the most plagued by this problem.
When such patterns of interest are detected, they are fleshed with speculations regarding the nature of the relationship among the parameters that were found to correlate - e.g. the enactment of a law that legitimize abortion in a certain state and the rate of crime sometime later in that same state. But what if it was a coincidence? Human intuition will fool us into thinking that a probability of pure coincidence being the relationship is unlikely.
Classically, the narrowness of human intuition province in such cases is exposed using the birthday problem. Through the simplest of the probabilistic notions we were introduced to as 5th graders, we can easily tell the chances that any two randomly picked human beings share the same birthday are ~1/365, and, on the other extreme, calculate the odds to be 100% when we are talking about finding at least a pair who have the same birthday from a randomly chosen group of 366 - as you can see I didn't acknowledge leap days as possible birthdays because that will make the problem more complicated. Instead, I will deem them, those born on the 29th of February, soulless bastards and move ahead. Stereotyping and bigotry make life easier for dumps as you can see. - What lies in the middle though is where our intuition fails us.
There, the problem can't be solved analytically. Instead we use computational methods, or, as it is sometimes called, brute force. You can think of analytic methods as humane investigations that put extra emphases on finding general forms of solutions, while the computational ones are more like bloody interrogations, where we, lousy engineers - engineers being lousy in nature, and not that there are breeds of engineers who are not - sometimes get fed up with a problem, and crack it open to get numerical answers. No wonder it is dubbed "using brute force" then!
Anyways, solving our problem computationally, it will yield 99% as the odds of finding at least a pair who share the same birthday in a randomly chosen group of 57 people, and 50% when we are talking about a group of 23. Very counter-intuitive. Isn't? For the records, this problem haunted me and made me doubt my understanding of the probability theory since I was a 5th grader until I found its solution in a book. I was relieved. I still doubt my understanding of the theory though!
The kernel here is, when we think about huge sets of data using our daily experience intuition, we will severely underestimate the odds that a strong correlation exhibited by a group of parameters, found in that set, can be the outcome of a mere coincidence. Practically, this means that Levitt found many other "dazzling" relationships that can be as absurd as a one between the iris colors of newborns and the percentage of German cars owners, in any given society. Could he find something entertaining and reasonable to write about such a relation, it would have definitely made it into the book or any of his other publications.
But again, his stated indifference to the profound branch of epistemology, and the confession he explicitly makes early on, that he does not understand math, and, it seems to me, not even the tool he chiefly relies on in expanding his program of economy as the study of human incentives, disarm the reader of her critical mindset from the very beginning, make her drop her guard off, relax, and enjoy as she reads on. Add to this that a recurring theme in the book is exposing cheaters and analyzing crimes, which makes one condones the not so scientific method he uses. In this sense, Levitt becomes some sort of an academic Robin Hood.
The true value of Freakonomics, however, lies in its daring attempt at breaking the tyranny maintained by prestigious academic institutions and mainstream currents of thought over the course of knowledge production and development, from the heart of one of those institutions: Chicago School of Economics.