McShane and Wyner

12 11 2010

An influential new paper on paleoclimate reconstructions of the past millennium has been doing the rounds since August.

Authored by Blake McShane and Abraham Wyner, two professional statisticians with affiliations in business school, the work is remarkable in more than one way:

  1. It is performed by professional statisticians on an up-to-date network of proxy data (that of Mann et al, PNAS 2008), a refreshing change from the studies of armchair skeptics who go cherry-pick their proxies so they can get a huge Medieval Warm Period.
  2. It uses modern statistical methods, unlike the antiquated analyses that Climate Audit and other statistical conservatives have wanted us to use for a few centuries now. (whether these methods lead to correct conclusions will be investigated shortly).
  3. It makes the very strong claim that climate proxies are essentially useless.

It is the last claim that has justifiably generated a great uproar in my community. For, if it were correct, then it would imply that my colleagues and I have been wasting our time all along – we had better go to Penn State or Kellogg beg for food, as the two geniuses have just put us out of a job. Is that really so?.

Before I start criticizing it, here are a few points I really liked:

  • I agree with their conclusion that calibration/verification scores are  a poor metric of out-of-sample performance, which is indeed the only way we can judge the reliability (“models that perform similarly  at predicting the instrumental temperature series [..] tell very different stories about the past). In my opinion, a much more reliable assessment of performance comes from realistic pseudoproxy experiments: the Oracle of paleoclimate reconstructions. Although they are problems with how realisticthey are at present, the community is fast moving to improve this.
  • Operational climate reconstruction methods should indeed outperform sophisticated nulls, but caution is required here: estimating key parameters from the data to specify the AR models, for instance,  reduces the degree of  “independence” between the null models and the climate signal.
  • Any climate reconstruction worth its salt should give credible intervals (or confidence intervals if you insist on being a frequentist… However, whenever you ask a non-statistician what a confidence interval is, their answer invariably corresponds to the definition of a credible interval).
  • The authors really are spot on the big question: what is the probability that the current temperature (say averaged over a decade) is unprecedented in the past 2 millennia? And more importantly, what is the probability that the current rate of warming is unprecedented in the past 2 millennia? Bayesian methods give a real advantage here, as their output readily gives you an estimate for such probabilities, instead of wasting time with frequentist tests, which are almost always ill-posed.
  • Finally, I agree with their main conclusion:

“the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.”  That being said, it is my postulate than when climate reconstruction methods incorporate the latest advances in the field of statistics, it will indeed be found that the current warming and its rate are unprecedented… the case for it just isn’t completely airtight now.

Now for some points of divergence:

  • Pseudoproxies are understood to mean timeseries derived from the output of long integrations of general circulation models, corrupted by noise (Gaussian white or AR(1)) to mimic the imperfect correlation between climate proxies and the climate field they purport to record. The idea, as in much of climate science, is to use these numerical models as a virtual laboratory in which controlled experiments can be run. The climate output gives an “oracle” from which one can quantitatively assess the performance of a method meant to extract a climate signal from a set of noisy timeseries (the pseudoproxies), for varying qualities of the proxy network (as measured by their signal-to-noise ratio or sparsity). McShane and Wyner show a complete lack of reading of the literature (always a bad idea when you’re about to make bold claims about someone else’s field) by mistaking pseudoproxies for “nonsense predictors” (random timeseries generated independently of any climate information). That whole section of their paper is therefore of little relevance to climate problems.
  • Use of the lasso to estimate a regression model.  As someone who has spent the better part of the past 2 years working on improving climate reconstruction methodologies (a new paper will be posted here in a few weeks),  I am always delighted to see modern statistical methods applied to my field. Yet every good statistician will tell you that the method must be guided by the structure of the problem, and not the other way around. What is the lasso? It is an L-1 based method aimed at selecting very few predictors in a list of many. That is extremely useful when you face a problem where there are many variables but you only expect a few to actually matter for the thing you want to predict.  Is that the case in paleoclimatology? Nope. Climate proxies are all noisy, and one never expects a small subset of them to be dominating the rest of the pack, but rather their collective expression to have predictive power. Using the lasso to decimate the set of proxy predictors, and then concluding that the ailing ones then fail to tell you anything, is like treating a whooping cough with chemotherapy, then diagnosing that the patient is much more ill than in the first place and ordering a coffin. For a more rigorous explanation of this methodological flaw, please see the excellent reviews by Martin Tingley and Peter Craigmile & Bala Rajaratnam.
  • Useless proxies? Given the above, the statement “the proxies do not predict temperature significantly better than random series generated independently of temperature” is therefore positively ridonkulus. Further, many studies have shown that in fact, they do beat nonsense predictors in most cases (they had better!!!). Michael Mann’s recent work (2008, 2009)  actually has been systematically reporting RE scores against benchmarks obtained from nonsense predictors. I have found the same in the case of ENSO reconstructions.
  • Arrogance.  When entering a new field and finding results at odds with the vast majority of its investigators, two conclusions are possible : (1) you are a genius and everyone else is blinded by the force of habit  (2) you have the humility to recognize that you might have missed an important point, which may warrant further reading and possibly, interactions with people in said field. Spooky how McShare & Wyner jumped on (1) without considering (2).  I just returned from a very pleasant visit in the statistics department at Carnegie Mellon department, where I was happy to see that top-notch statisticians have quite different attitudes towards applications: they start by discussing with colleagues in an applied field before they unroll (or develop) the appropriate machinery to solve the problem. Since temperature reconstructions are now in the statistical spotlight, I can only hope that other statisticians interested in this topic will first seek to make of  a climate scientists a friend rather than a foe.
  • Bayesianism:  there is no doubt in my mind that the climate reconstruction problem is ideally suited to the Bayesian framework. In fact, Martin Tingley and colleagues wrote a very nice (if somewhat dense) paper explaining how all attempts made to date can be subsumed under the formalism of Bayesian inference. When Bayesians ask me why I am not part of their cult, I can only reply that I wish I were smart enough to understand and use their methods. So the authors’ attempt at a Bayesian multiproxy reconstruction is most welcome, because as said earlier it enables to answer climate questions in probabilistic terms, therefore giving a measure of uncertainty about the result. However, I was stunned to see them mention Martin Tingley’s pioneering work on the topic, and then proceeding to ignore it. The reason why he (and Li et al 2007) “do not use their model to produce temperature reconstructions from actual proxy observations” is that they are a tad more careful than the authors, and prefer to validate their methods on known problem (e.g. pseudoproxies) before recklessly applying to proxy data and proceed to fanciful conclusions. However, given that climate scientists have generally been quite reckless themselves in their use of statistical methods (without always subjecting them to extensive testing beforehand), I’ll say it’s a draw. Now we’re even, let’s put the glove and be friends. Two wrongs have never made a right, and I am convinced that the way forward is for statisticians (Bayesians or otherwise) to collaborate with climate scientists to come up with a sensible method, test it on pseudoproxies, and then (and only then) apply it to proxy data. That’s what I will publish in the next few months, anyway… stay tuned 😉
  • 10 PCs: after spending so much time indicting proxies it is a little surprising to see how tersely the methodology is described. I could not find a justification for their choice of the  number of PCs used to to compress the design matrices, and I wonder how consequential that choice is.  Hopefully this will be clarified in the responses to the flurry of comments that have flooded the AOAS website since the article’s publication.
  • RegEM is misunderstood here once again.   Both ridge regression and truncated total least squares are “Error-in-Variable” methods aimed at regularizing an ill-conditioned or rank-deficient sample covariance matrix to obtain what is essentially a penalized maximum likelihood estimate of the underlying covariance matrix of the { proxy + temperature } matrix. The authors erroneously state that Mann et al, PNAS 2008 use ridge regression, while in fact they use TTLS regularization. The error is of little consequence to their results, however.

In summary, the article ushers in a new era in research on the Climate of the past 2000 years: we now officially have the attention of professional statisticians. I believe that overall this is a good thing, because it means that smart people will start working with us on this topic where their expertise is much needed. But I urge all those who believe they know everything to please consult with a climate scientist before treating them like idiots! Much as it is my responsibility to make sure that another ClimateGate does not happen, may all statisticians who read this feel inspired to give their field a better name than McShane and Wyner’s doings !

PS: Many other commentaries have been offered thus far.  The reader is referred to a recent climatological refutation by Gavin Schmidt and Michael Mann for a climate perspective.