27 November 2009

It's the Code, Stupid!

Bad code offsets are rumored to have been purchased in bulk by the University of East Anglia's Climate Research Unit.
Previously, it was suggested that the CRU programmers are "in over their heads." As people dig further into the "hacked" files, the truth of this assessment becomes increasingly evident. Take, for example, the file in which "Harry" records his (possibly FOIA motivated) efforts in 2006-2009 to resurrect / repair his predecessors' handiwork.

The programs in question process data from thousand of weather stations world-wide. Now consider the following: Each station has an ID, of course. But, Station IDs don't conform to a standard format. Neither do the data they report. From time to time, the IDs change; likewise the format in which the data is reported. Not all stations report all of the data all of the time. Sometimes they report some of it; sometimes, none at all. For better or worse, the folks at CRU substitute "synthetic" data for the missing numbers which they create from data reported by nearby sites. To determine which sites to use, they locate each station on a map and draw circles around its position. If another station lies within the circle, its data goes into the mix. But coastal stations make for a problem because land temperatures and sea temperatures are handled differently. So a station's position (latitude and longitude) can be critical. And sometimes the stations move. In short, assembling the numbers from which CRU computes global temperature "anomalies" is a book-keeping nightmare. Add to this the fact that the programming was done by graduate students, who left thousands of poorly (if at all) documented files for their successors to puzzle over, and you get a witches' brew that cries out for external review and / or independent replication.>

Here are some gems culled from the Harry_Read_Me file by TickerForum.org poster Asimov — my comments in square brackets. They've been referenced elsewhere, but some things in this life merit repetition. Also, as of 1041 h (EST), 29 November, the site is unreachable:
  • "Well, dtr2cld is not the world's most complicated program. Wheras [sic] cloudreg is, and I immediately found a mistake! Scanning forward to 1951 was done with a loop that, for completely unfathomable reasons, didn't include months! So we read 50 grids instead of 600!!"
  • "... have just located a 'cld' directory in Mark New's disk [Mark New was one of the student programmers], containing over 2000 files. Most however are binary [and therefore unreadable] and undocumented."
  • "The conclusion of a lot of investigation is that the synthetic cloud grids for 1901-1995 have now been discarded. This means that the cloud data prior to 1996 are static. ... For 1901 to 1995 - stay with published data. No clear way to replicate process as undocumented. For 1996 to 2002: ... This should approximate the correction needed." [Emphasis added]
  • "I am seriously worried that our flagship gridded data product is produced by Delaunay triangulation [new to this author] - apparently linear [as opposed to great circles on a sphere?] as well. As far as I can see, this renders the station counts totally meaningless. [Not sure why, but for now, I'll take Harry's word for it] It also means that we cannot say exactly how the gridded data is arrived at from a statistical perspective - since we're using an off-the-shelf product that isn't documented sufficiently to say that. Why this wasn't coded up in Fortran I don't know - time pressures perhaps? Was too much effort expended on homogenisation, that there wasn't enough time to write a gridding procedure? Of course, it's too late for me to fix it too." [Emphasis added]
  • "On we go.. firstly, examined the spc database.. seems to be in % x10. Looked at published data.. cloud is in % x10, too. First problem: there is no program to convert sun percentage to cloud percentage. I can do sun percentage to cloud oktas or sun hours to cloud percentage! So what the hell did Tim [Mitchell, the other student-programmer] do?!! As I keep asking." [Emphasis added]
  • "Then - comparing the two candidate spc databases: spc.0312221624.dtb [and] spc.94-00.0312221624.dtb[,] I find that they are broadly similar, except the normals lines (which both start with '6190') are very different. I was expecting that maybe the latter contained 94-00 normals [not sure what these are, but apparently they're important; maybe 1994-2000 averages], what I wasn't expecting was that thet [sic] are in % x10 not %! Unbelievable - even here the conventions have not been followed. It's botch after botch after botch. Modified the conversion program to process either kind of normals line." [Emphasis added]
  • "Decided to go with the 'spc.94-00.0312221624.dtb' database, as it hopefully has some of the 94-00 normals in. I just wish I knew more." [Emphasis added]
  • "These [results of a trial run] are very promising. The vast majority in both cases are within 0.5 degrees of the published data. However, there are still plenty of values more than a degree out." [As Asimov notes, "He [Harry]'s trying to fit the results of his programs and data to PREVIOUS results." N.B. While discrepancies of 0.5 to 1.0 degrees C may not seem especially troubling, it should be recalled that a century's worth of CO2-induced warming (according to the models) is on the order of 3-5 degrees C.] [Emphasis added]
Runtime Errors. Additional discussion of the coding issues can be found at L'Ombre de l'Olivier — two posts, "The HADCRU Code as From the CRU Leak" and "More CRU Code Thoughts". Their author, a retired programmer, categorizes the various infelicities. Arguably the one with the greatest adverse potential is the
"use of program libra[r]y subroutines that ... fail at undefined times and ... when the function fails[,] the program silently continues without reporting the error" [Emphasis in the original].
Sigh! Runtime errors are the bane of programming. If they do something dramatic, like crash the program or produce obviously nonsensical output, it's OK. Eventually, you find the bug and squash it. But if the effects are subtle, your first notice that something's wrong may be a query from a colleague / competitor wondering why he couldn't replicate your results.

Quality Control. The mess that is CRU bears on quality control standards, on peer review and on the changing nature of scientific research. With regard to quality control, our L'Ombre de l'Olivier correspondent observes:
"When I was a developer, in addition to the concepts of version control and frequent archiving, one thing my evil commercially oriented supervisors insisted on were 'code reviews'. This is the hated point where your manager and/or some other experienced developer goes through your code and critiques it in terms of clarity and quality."
Obviously, nothing of the sort transpired at CRU, most likely because there were no "experienced developers" to go through the codes. This raises the question: "Should it have transpired?" Poster Patrick, at L'Ombre observes that "It is probably better to have scientists writing bad code than programmers doing bad science." Other posters note that confidence in science results, not by someone's approving the "how" of what was done, but by independent replication — if an experiment, then on someone else's lab bench; if a theoretical calculation, then with someone else's algorithm and if data analysis, then with someone else's ad hoc assumptions and the code that implements them. These are valid observations, but there are two important caveats. The first is that replication really has to be independent. A bot mot (at the expense of the computational physics crowd) that has stayed with me over the years involves the passing of code from one lab to another, gremlins intact. That's true for things as run of the mill as differential equation solvers, matrix inverters, etc. The important point is that as soon as things get too complicated to prove , i.e., proposition, lemma, theorem, Q.E.D, what's going on, one is doing experiments. So, if it's differential equations being studied, the sensible investigator convinces himself that his results are not solver-dependent. Likewise, in the case of data analysis, one wants to be sure that the results are robust with regard to the programs that crunch the numbers and, when data are manipulated prior to analysis, and, boy, are climatological time series ever, that trends, "statistical significance," etc. aren't artifactual.

Finally, there's the source of the data. AGW proponents like to point out that similar results have been reported by different groups. But as McIntyre, McKitrick and others have pointed out, the same data sets are used over and over, and the analyses therefore not independent. This appears to be the case in both paleoclimatological and historical studies that rely on a limited number.

Peer Review. In my honors and upper division classes, I insist that students reference the peer reviewed literature. If it's not peer reviewed, I tell them, they have no way of judging the truth of the conclusions. I also tell them that every scientific paper that's ever been published is wrong, the only questions being how wrong and how long it takes the scientific community to discover the errors. So how much protection does peer review actually provide? Not a whole lot if the article in question fails to provide enough information to allow the reader to replicate the results. Many (most? all?) of the climate change papers that I have seen fail this test resoundingly. Indeed, most reviewers restrict themselves to assessing a paper's overall plausibility, the appropriateness of the methods, the extent to which the results are consistent with previously published studies, etc. Only occasionally does a reviewer attempt to reproduce the results, and, in such cases, the editorial response is unpredictable. Sometimes, the reviewer receives a letter thanking him for going the extra mile; on other occasions, one accusing him of attempting a hit job. And, of course, reviewers sometimes have an ax to grind. If the paper in question goes against their own work, they may do what they can to see that it is rejected. Correspondingly, if the paper supports a reviewer's work, he may be inclined offer a favorable response, even if there are problems. Finally, peer review, to say nothing of the funding process, discriminates against ideas and approaches that are outside the box. Let me be clear. My object is not to deny the utility of peer review, but to suggest that it is something less than a guarantee of accuracy.

The Changing Nature of Science. More and more, scientific research is being carried out by teams who tackle projects that, because of their sheer magnitude, do not lend themselves to checking. This is true across disciplines, and it is certainly true of climatology. To verify the results of an outfit like CRU, one needs another group of roughly comparable size. Now it is true that CRU is not the only entity engaged in large-scale climate studies. But it is also true that there are only a couple of others, and they all cooperate. The result is what amounts to monopolistic practices and the need, for the scientific equivalent of anti-trust legislation.
Read more ...

23 November 2009

And So It Begins.

Hide the Decline

Well done and a good laugh, even though it's the wrong decline — see Marc Sheppard, "Understanding Climategate's Hidden Decline," at The American Thinker.
At The Global Warming Policy Foundation, recently launched, we read the following:
"In response to recent revelations contained in leaked e-mails originating from the Climate Research Unit at the University of East Anglia, Lord Lawson, Chairman of the Board of Trustees of the GWPF, has called for a rigorous and independent inquiry into the matter. While reserving judgment on the contents of the e-mails, Lord Lawson said these are very serious issues and allegations that reach to "the heart of scientific integrity and credibility:
'Astonishingly, what appears, at least at first blush, to have emerged is that (a) the scientists have been manipulating the raw temperature figures to show a relentlessly rising global warming trend; (b) they have consistently refused outsiders access to the raw data; (c) the scientists have been trying to avoid freedom of information requests; and (d) they have been discussing ways to prevent papers by dissenting scientists being published in learned journals.

'There may be a perfectly innocent explanation. But what is clear is that the integrity of the scientific evidence on which not merely the British Government, but other countries, too, through the Intergovernmental Panel on Climate Change, claim to base far-reaching and hugely expensive policy decisions, has been called into question. And the reputation of British science has been seriously tarnished. A high-level independent inquiry must be set up without delay.' "
We concur and further urge that parallel inquiries be initiated at the home institutions of the individual scientists involved. These entities are recipients of millions of dollars of governmental funds, both as direct costs and as overhead returns that have become the crack cocaine of the academy. They have an obligation to see to it that minimal — forget about "the highest" — standards of ethical behavior are observed by their employees.

Meanwhile. Luboš Motl continues his review of emails over at The Reference Frame. I especially like the one (from Tom Wigley to Timothy Carter, dated 24 April, 2003) about getting rid of a journal editor consequent to the publication of an article adverse to the Hockey Stick:
"PS Re CR [Climate Research], I do not know the best way to handle the specifics of the editoring. Hans von Storch is partly to blame -- he encourages the publication of crap science 'in order to stimulate debate'. One approach is to go direct to the publishers and point out the fact that their journal is perceived as being a medium for disseminating misinformation under the guise of refereed work. I use the word 'perceived' here, since whether it is true or not is not what the publishers care about -- it is how the journal is seen by the community that counts.[Emphasis added]

"I think we could get a large group of highly credentialed scientists to sign such a letter -- 50+ people.

"Note that I am copying this view only to Mike Hulme and Phil Jones. Mike's idea to get editorial board members to resign will probably not work -- must get rid of von Storch too, otherwise holes will eventually fill up with people like Legates, Balling, Lindzen, Michaels, Singer, etc. I have heard that the publishers are not happy with von Storch, so the above approach might remove that hurdle too." [Emphasis added]
In the event, von Storch resigned along with four other editors because CR's publisher refused to print a letter he had composed suggesting "that the publication of the Soon & Baliunas article [the paper in question] was an error, and that the review process at Climate Research would be changed in order to avoid similar failures. ... The problem," he continues,
"is not whether the Medieval Warm Period was warmer than the 20th century, or if Mann's hockey stick is realistic; the problem is that the methodological basis for such a conclusion was simply not given. ... However, my authority as Editor-in-Chief did obviously not cover the publication of an editorial spelling out the problem. The publisher declined the publication, and I canceled my task as Editor-in-Chief immediately on 28 July 2003."
More recently, and subsequent to the present scandal's irruption, von Storch has written what he calls a "little addendum" (same link):
"I have been often in the cross-fire of alarmists and skeptics, two politicized gangs of climate activists - who often have something useful to say, but who are conditioned by their respective loyalties to their "agendas", while not being too much interested in providing the cold and impassionate science needed to come up with reasonable and acceptable climate policies." [Emphasis added]
For additional commentary, go here (discussion of the Hockey Stick controversy) and here (discussion of the hacked emails). In the latter, we read the following:
"I would assume ... that a useful debate about the degree of politicization of climate science will emerge. A conclusion could be that the principle, according to which data must be made public, so that also adversaries may check the analysis, must be really enforced. Another conclusion could be that scientists like Mike Mann, Phil Jones and others should no longer participate in the peer-review process or in assessment activities like IPCC. [Emphasis added]
A Prescription for Error. Von Storch's conclusions, I believe, are self-evident. They also raise a more fundamental problem, which is that the people doing the science should not also be formulating policy. But even when the scientist's role is limited to doing science, there are problems. Objectivity is a precious and vulnerable commodity; asking investigators to be assess the validity of their own ideas, a prescription for error. The late Michael Crichton, put it this way:
"Just as we have established a tradition of double-blinded research to determine drug efficacy, we must institute double-blinded research in other policy areas as well. Certainly the increased use of computer models, such as GCMs, cries out for the separation of those who make the models from those who verify them. The fact is that the present structure of science is entrepeneurial, with individual investigative teams vying for funding from organizations which all too often have a clear stake in the outcome of the research-or appear to, which may be just as bad. This is not healthy for science.

"Sooner or later, we must form an independent research institute in this country. It must be funded by industry, by government, and by private philanthropy, both individuals and trusts. The money must be pooled, so that investigators do not know who is paying them. The institute must fund more than one team to do research in a particular area, and the verification of results will be a foregone requirement: teams will know their results will be checked by other groups. In many cases, those who decide how to gather the data will not gather it, and those who gather the data will not analyze it. If we were to address the land temperature records with such rigor, we would be well on our way to an understanding of exactly how much faith we can place in global warming, and therefore [with] what seriousness we must address this." [Emphasis added]
Returning to the emails, the extent to which von Storch's decision to resign was prompted by the tactics discussed in Wigley's letter is unclear. What is clear is that the very consideration of such tactics is a stain on the profession. What were these people thinking?
Read more ...

22 November 2009

Just Read it!

For an interesting look at what goes on behind closed doors, go to http://www.tickerforum.org/cgi-ticker/akcs-www?post=118625&page=13. At issue is a file called HARRY_READ_ME.txt, which, according to the poster, consists of "15,000 lines of comments, much of it copy/pastes of code or output by somebody (who's harry?) trying to make sense of it all ...." Just read it; I'm not going to opine other than to suggest that close inspection of the simulation codes, i.e., the gobbledygook that implements the "models," might prove even more discomfiting. Briggs (see previous post) suggests that the Climategate principals are intelligent. Of course they are. But they're also in over their heads, way over. And doing this stuff in FORTRAN — I say this as a long-time programmer in that language — doesn't make things easier. Nor does it appear, from some of the shenanigans in the code, that the coders were particularly fluent in the language — system calls to "wc" and the like. Yuck!
Read more ...

04 November 2009

Three Wrights and a Wrong.

Pavel Trofimovich Morosov, hero-informer of the infamous Soviet morality tale.
The word "wright" (worker), deriving from the Old English "wryhta," survives principally in combinations — "playwright," "wheelwright," etc. — and also as a surname. Famous Wrights include the brothers, Wilbur and Orville, the architect, Frank Lloyd, and the population geneticist, Sewall. The first two are known to most; the third, principally to students of evolution. Perhaps, in a later post, we will discuss the latter's contributions to the "modern synthesis," a between the world wars confection of mathematics, observation and experiment that sought to harmonize nineteenth century Darwinism with the then nascent science of genetics. Perhaps, even, we will discuss Wright's famous quarrel with R. A. Fisher another important contributer to the theory. But those undertakings, as the Teletubby remarked, we "save for later."

Wright Makes Wrong. Like our faithful canine companions, not all Wrights go to Heaven. One who may have difficulty gaining admittance is the Reverend Jeremiah. Accuracy in Media reports that this notorious promoter of things nefarious has good things to say (is anyone surprised?) about Marxism, the recorded evidence of which endorsement surfaced briefly at Vimeo. That video has now disappeared, but, as sleuthed by Cliff Kincaid, whose exegetical commentary can be read here, Reverend Wright's address can still be viewed in parts (here, here and here).

What interests this correspondent is not the content of brother Jeremiah's remarks, which, like that of his character, is questionable, but the parallel to practices of the Great Soviet Encyclopedia (GSE). In the halcyon days of Soviet socialism, GSE subscribers sometimes received replacements for articles no longer deemed accurate along with instructions to delete the originals — literally, to cut and paste. "Accurate," of course, meant consistent with the changing party line, which circumstance, in addition to providing grist for George Orwell, necessitated the continuous rewriting of history.

Pavel Trofimovich Morosov. Regarding the Wright video, AIM's editor notes, "We do not know why the original ... was taken down, but [we] have our suspicions." So too, one hazards, does Svetlana Kunin, whose most recent article on this nation's rush to recreate the Soviet past merits serious consideration. The pattern would likewise have been familiar to the family of Pavlik Morozov, hero-informer of the infamous Soviet morality tale to which Kunin alludes. The offical story, most likely apocryphal
— see Pavlik Morozov: Soviet Boy Hero, Seventeen Moments in Soviet History and this recent article in Pravdahas young Pavel reporting his father to the Cheka for crimes against the state and subsequently dying a martyr at the hands of vengeful relatives. Remarkably, a modest contribution by George Soros is being (or already has been) used to reopen the museum that once honored Pavlik's memory, "this time," according to the first reference, "with a display placing [his] life ... in the context of the collectivization campaign, and of the political repression that it represented." How ironic: the tale foisted upon generations of Soviet school children celebrated the primacy of government over family, the same objective being pursued by Soros' far more generously funded American minions at Service.gov. The world wonders.
Read more ...