27 November 2009

It's the Code, Stupid!

Bad code offsets are rumored to have been purchased in bulk by the University of East Anglia's Climate Research Unit.
Previously, it was suggested that the CRU programmers are "in over their heads." As people dig further into the "hacked" files, the truth of this assessment becomes increasingly evident. Take, for example, the file in which "Harry" records his (possibly FOIA motivated) efforts in 2006-2009 to resurrect / repair his predecessors' handiwork.

The programs in question process data from thousand of weather stations world-wide. Now consider the following: Each station has an ID, of course. But, Station IDs don't conform to a standard format. Neither do the data they report. From time to time, the IDs change; likewise the format in which the data is reported. Not all stations report all of the data all of the time. Sometimes they report some of it; sometimes, none at all. For better or worse, the folks at CRU substitute "synthetic" data for the missing numbers which they create from data reported by nearby sites. To determine which sites to use, they locate each station on a map and draw circles around its position. If another station lies within the circle, its data goes into the mix. But coastal stations make for a problem because land temperatures and sea temperatures are handled differently. So a station's position (latitude and longitude) can be critical. And sometimes the stations move. In short, assembling the numbers from which CRU computes global temperature "anomalies" is a book-keeping nightmare. Add to this the fact that the programming was done by graduate students, who left thousands of poorly (if at all) documented files for their successors to puzzle over, and you get a witches' brew that cries out for external review and / or independent replication.>

Here are some gems culled from the Harry_Read_Me file by TickerForum.org poster Asimov — my comments in square brackets. They've been referenced elsewhere, but some things in this life merit repetition. Also, as of 1041 h (EST), 29 November, the site is unreachable:
  • "Well, dtr2cld is not the world's most complicated program. Wheras [sic] cloudreg is, and I immediately found a mistake! Scanning forward to 1951 was done with a loop that, for completely unfathomable reasons, didn't include months! So we read 50 grids instead of 600!!"
  • "... have just located a 'cld' directory in Mark New's disk [Mark New was one of the student programmers], containing over 2000 files. Most however are binary [and therefore unreadable] and undocumented."
  • "The conclusion of a lot of investigation is that the synthetic cloud grids for 1901-1995 have now been discarded. This means that the cloud data prior to 1996 are static. ... For 1901 to 1995 - stay with published data. No clear way to replicate process as undocumented. For 1996 to 2002: ... This should approximate the correction needed." [Emphasis added]
  • "I am seriously worried that our flagship gridded data product is produced by Delaunay triangulation [new to this author] - apparently linear [as opposed to great circles on a sphere?] as well. As far as I can see, this renders the station counts totally meaningless. [Not sure why, but for now, I'll take Harry's word for it] It also means that we cannot say exactly how the gridded data is arrived at from a statistical perspective - since we're using an off-the-shelf product that isn't documented sufficiently to say that. Why this wasn't coded up in Fortran I don't know - time pressures perhaps? Was too much effort expended on homogenisation, that there wasn't enough time to write a gridding procedure? Of course, it's too late for me to fix it too." [Emphasis added]
  • "On we go.. firstly, examined the spc database.. seems to be in % x10. Looked at published data.. cloud is in % x10, too. First problem: there is no program to convert sun percentage to cloud percentage. I can do sun percentage to cloud oktas or sun hours to cloud percentage! So what the hell did Tim [Mitchell, the other student-programmer] do?!! As I keep asking." [Emphasis added]
  • "Then - comparing the two candidate spc databases: spc.0312221624.dtb [and] spc.94-00.0312221624.dtb[,] I find that they are broadly similar, except the normals lines (which both start with '6190') are very different. I was expecting that maybe the latter contained 94-00 normals [not sure what these are, but apparently they're important; maybe 1994-2000 averages], what I wasn't expecting was that thet [sic] are in % x10 not %! Unbelievable - even here the conventions have not been followed. It's botch after botch after botch. Modified the conversion program to process either kind of normals line." [Emphasis added]
  • "Decided to go with the 'spc.94-00.0312221624.dtb' database, as it hopefully has some of the 94-00 normals in. I just wish I knew more." [Emphasis added]
  • "These [results of a trial run] are very promising. The vast majority in both cases are within 0.5 degrees of the published data. However, there are still plenty of values more than a degree out." [As Asimov notes, "He [Harry]'s trying to fit the results of his programs and data to PREVIOUS results." N.B. While discrepancies of 0.5 to 1.0 degrees C may not seem especially troubling, it should be recalled that a century's worth of CO2-induced warming (according to the models) is on the order of 3-5 degrees C.] [Emphasis added]
Runtime Errors. Additional discussion of the coding issues can be found at L'Ombre de l'Olivier — two posts, "The HADCRU Code as From the CRU Leak" and "More CRU Code Thoughts". Their author, a retired programmer, categorizes the various infelicities. Arguably the one with the greatest adverse potential is the
"use of program libra[r]y subroutines that ... fail at undefined times and ... when the function fails[,] the program silently continues without reporting the error" [Emphasis in the original].
Sigh! Runtime errors are the bane of programming. If they do something dramatic, like crash the program or produce obviously nonsensical output, it's OK. Eventually, you find the bug and squash it. But if the effects are subtle, your first notice that something's wrong may be a query from a colleague / competitor wondering why he couldn't replicate your results.

Quality Control. The mess that is CRU bears on quality control standards, on peer review and on the changing nature of scientific research. With regard to quality control, our L'Ombre de l'Olivier correspondent observes:
"When I was a developer, in addition to the concepts of version control and frequent archiving, one thing my evil commercially oriented supervisors insisted on were 'code reviews'. This is the hated point where your manager and/or some other experienced developer goes through your code and critiques it in terms of clarity and quality."
Obviously, nothing of the sort transpired at CRU, most likely because there were no "experienced developers" to go through the codes. This raises the question: "Should it have transpired?" Poster Patrick, at L'Ombre observes that "It is probably better to have scientists writing bad code than programmers doing bad science." Other posters note that confidence in science results, not by someone's approving the "how" of what was done, but by independent replication — if an experiment, then on someone else's lab bench; if a theoretical calculation, then with someone else's algorithm and if data analysis, then with someone else's ad hoc assumptions and the code that implements them. These are valid observations, but there are two important caveats. The first is that replication really has to be independent. A bot mot (at the expense of the computational physics crowd) that has stayed with me over the years involves the passing of code from one lab to another, gremlins intact. That's true for things as run of the mill as differential equation solvers, matrix inverters, etc. The important point is that as soon as things get too complicated to prove , i.e., proposition, lemma, theorem, Q.E.D, what's going on, one is doing experiments. So, if it's differential equations being studied, the sensible investigator convinces himself that his results are not solver-dependent. Likewise, in the case of data analysis, one wants to be sure that the results are robust with regard to the programs that crunch the numbers and, when data are manipulated prior to analysis, and, boy, are climatological time series ever, that trends, "statistical significance," etc. aren't artifactual.

Finally, there's the source of the data. AGW proponents like to point out that similar results have been reported by different groups. But as McIntyre, McKitrick and others have pointed out, the same data sets are used over and over, and the analyses therefore not independent. This appears to be the case in both paleoclimatological and historical studies that rely on a limited number.

Peer Review. In my honors and upper division classes, I insist that students reference the peer reviewed literature. If it's not peer reviewed, I tell them, they have no way of judging the truth of the conclusions. I also tell them that every scientific paper that's ever been published is wrong, the only questions being how wrong and how long it takes the scientific community to discover the errors. So how much protection does peer review actually provide? Not a whole lot if the article in question fails to provide enough information to allow the reader to replicate the results. Many (most? all?) of the climate change papers that I have seen fail this test resoundingly. Indeed, most reviewers restrict themselves to assessing a paper's overall plausibility, the appropriateness of the methods, the extent to which the results are consistent with previously published studies, etc. Only occasionally does a reviewer attempt to reproduce the results, and, in such cases, the editorial response is unpredictable. Sometimes, the reviewer receives a letter thanking him for going the extra mile; on other occasions, one accusing him of attempting a hit job. And, of course, reviewers sometimes have an ax to grind. If the paper in question goes against their own work, they may do what they can to see that it is rejected. Correspondingly, if the paper supports a reviewer's work, he may be inclined offer a favorable response, even if there are problems. Finally, peer review, to say nothing of the funding process, discriminates against ideas and approaches that are outside the box. Let me be clear. My object is not to deny the utility of peer review, but to suggest that it is something less than a guarantee of accuracy.

The Changing Nature of Science. More and more, scientific research is being carried out by teams who tackle projects that, because of their sheer magnitude, do not lend themselves to checking. This is true across disciplines, and it is certainly true of climatology. To verify the results of an outfit like CRU, one needs another group of roughly comparable size. Now it is true that CRU is not the only entity engaged in large-scale climate studies. But it is also true that there are only a couple of others, and they all cooperate. The result is what amounts to monopolistic practices and the need, for the scientific equivalent of anti-trust legislation.

No comments: