McShane and Wyner

12 11 2010

An influential new paper on paleoclimate reconstructions of the past millennium has been doing the rounds since August.

Authored by Blake McShane and Abraham Wyner, two professional statisticians with affiliations in business school, the work is remarkable in more than one way:

  1. It is performed by professional statisticians on an up-to-date network of proxy data (that of Mann et al, PNAS 2008), a refreshing change from the studies of armchair skeptics who go cherry-pick their proxies so they can get a huge Medieval Warm Period.
  2. It uses modern statistical methods, unlike the antiquated analyses that Climate Audit and other statistical conservatives have wanted us to use for a few centuries now. (whether these methods lead to correct conclusions will be investigated shortly).
  3. It makes the very strong claim that climate proxies are essentially useless.

It is the last claim that has justifiably generated a great uproar in my community. For, if it were correct, then it would imply that my colleagues and I have been wasting our time all along – we had better go to Penn State or Kellogg beg for food, as the two geniuses have just put us out of a job. Is that really so?.

Before I start criticizing it, here are a few points I really liked:

  • I agree with their conclusion that calibration/verification scores are  a poor metric of out-of-sample performance, which is indeed the only way we can judge the reliability (“models that perform similarly  at predicting the instrumental temperature series [..] tell very different stories about the past). In my opinion, a much more reliable assessment of performance comes from realistic pseudoproxy experiments: the Oracle of paleoclimate reconstructions. Although they are problems with how realisticthey are at present, the community is fast moving to improve this.
  • Operational climate reconstruction methods should indeed outperform sophisticated nulls, but caution is required here: estimating key parameters from the data to specify the AR models, for instance,  reduces the degree of  “independence” between the null models and the climate signal.
  • Any climate reconstruction worth its salt should give credible intervals (or confidence intervals if you insist on being a frequentist… However, whenever you ask a non-statistician what a confidence interval is, their answer invariably corresponds to the definition of a credible interval).
  • The authors really are spot on the big question: what is the probability that the current temperature (say averaged over a decade) is unprecedented in the past 2 millennia? And more importantly, what is the probability that the current rate of warming is unprecedented in the past 2 millennia? Bayesian methods give a real advantage here, as their output readily gives you an estimate for such probabilities, instead of wasting time with frequentist tests, which are almost always ill-posed.
  • Finally, I agree with their main conclusion:

“the long flat handle of the hockey stick is best understood to be a feature of regression and less a reflection of our knowledge of the truth. Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distribution of our model.”  That being said, it is my postulate than when climate reconstruction methods incorporate the latest advances in the field of statistics, it will indeed be found that the current warming and its rate are unprecedented… the case for it just isn’t completely airtight now.

Now for some points of divergence:

  • Pseudoproxies are understood to mean timeseries derived from the output of long integrations of general circulation models, corrupted by noise (Gaussian white or AR(1)) to mimic the imperfect correlation between climate proxies and the climate field they purport to record. The idea, as in much of climate science, is to use these numerical models as a virtual laboratory in which controlled experiments can be run. The climate output gives an “oracle” from which one can quantitatively assess the performance of a method meant to extract a climate signal from a set of noisy timeseries (the pseudoproxies), for varying qualities of the proxy network (as measured by their signal-to-noise ratio or sparsity). McShane and Wyner show a complete lack of reading of the literature (always a bad idea when you’re about to make bold claims about someone else’s field) by mistaking pseudoproxies for “nonsense predictors” (random timeseries generated independently of any climate information). That whole section of their paper is therefore of little relevance to climate problems.
  • Use of the lasso to estimate a regression model.  As someone who has spent the better part of the past 2 years working on improving climate reconstruction methodologies (a new paper will be posted here in a few weeks),  I am always delighted to see modern statistical methods applied to my field. Yet every good statistician will tell you that the method must be guided by the structure of the problem, and not the other way around. What is the lasso? It is an L-1 based method aimed at selecting very few predictors in a list of many. That is extremely useful when you face a problem where there are many variables but you only expect a few to actually matter for the thing you want to predict.  Is that the case in paleoclimatology? Nope. Climate proxies are all noisy, and one never expects a small subset of them to be dominating the rest of the pack, but rather their collective expression to have predictive power. Using the lasso to decimate the set of proxy predictors, and then concluding that the ailing ones then fail to tell you anything, is like treating a whooping cough with chemotherapy, then diagnosing that the patient is much more ill than in the first place and ordering a coffin. For a more rigorous explanation of this methodological flaw, please see the excellent reviews by Martin Tingley and Peter Craigmile & Bala Rajaratnam.
  • Useless proxies? Given the above, the statement “the proxies do not predict temperature significantly better than random series generated independently of temperature” is therefore positively ridonkulus. Further, many studies have shown that in fact, they do beat nonsense predictors in most cases (they had better!!!). Michael Mann’s recent work (2008, 2009)  actually has been systematically reporting RE scores against benchmarks obtained from nonsense predictors. I have found the same in the case of ENSO reconstructions.
  • Arrogance.  When entering a new field and finding results at odds with the vast majority of its investigators, two conclusions are possible : (1) you are a genius and everyone else is blinded by the force of habit  (2) you have the humility to recognize that you might have missed an important point, which may warrant further reading and possibly, interactions with people in said field. Spooky how McShare & Wyner jumped on (1) without considering (2).  I just returned from a very pleasant visit in the statistics department at Carnegie Mellon department, where I was happy to see that top-notch statisticians have quite different attitudes towards applications: they start by discussing with colleagues in an applied field before they unroll (or develop) the appropriate machinery to solve the problem. Since temperature reconstructions are now in the statistical spotlight, I can only hope that other statisticians interested in this topic will first seek to make of  a climate scientists a friend rather than a foe.
  • Bayesianism:  there is no doubt in my mind that the climate reconstruction problem is ideally suited to the Bayesian framework. In fact, Martin Tingley and colleagues wrote a very nice (if somewhat dense) paper explaining how all attempts made to date can be subsumed under the formalism of Bayesian inference. When Bayesians ask me why I am not part of their cult, I can only reply that I wish I were smart enough to understand and use their methods. So the authors’ attempt at a Bayesian multiproxy reconstruction is most welcome, because as said earlier it enables to answer climate questions in probabilistic terms, therefore giving a measure of uncertainty about the result. However, I was stunned to see them mention Martin Tingley’s pioneering work on the topic, and then proceeding to ignore it. The reason why he (and Li et al 2007) “do not use their model to produce temperature reconstructions from actual proxy observations” is that they are a tad more careful than the authors, and prefer to validate their methods on known problem (e.g. pseudoproxies) before recklessly applying to proxy data and proceed to fanciful conclusions. However, given that climate scientists have generally been quite reckless themselves in their use of statistical methods (without always subjecting them to extensive testing beforehand), I’ll say it’s a draw. Now we’re even, let’s put the glove and be friends. Two wrongs have never made a right, and I am convinced that the way forward is for statisticians (Bayesians or otherwise) to collaborate with climate scientists to come up with a sensible method, test it on pseudoproxies, and then (and only then) apply it to proxy data. That’s what I will publish in the next few months, anyway… stay tuned 😉
  • 10 PCs: after spending so much time indicting proxies it is a little surprising to see how tersely the methodology is described. I could not find a justification for their choice of the  number of PCs used to to compress the design matrices, and I wonder how consequential that choice is.  Hopefully this will be clarified in the responses to the flurry of comments that have flooded the AOAS website since the article’s publication.
  • RegEM is misunderstood here once again.   Both ridge regression and truncated total least squares are “Error-in-Variable” methods aimed at regularizing an ill-conditioned or rank-deficient sample covariance matrix to obtain what is essentially a penalized maximum likelihood estimate of the underlying covariance matrix of the { proxy + temperature } matrix. The authors erroneously state that Mann et al, PNAS 2008 use ridge regression, while in fact they use TTLS regularization. The error is of little consequence to their results, however.

In summary, the article ushers in a new era in research on the Climate of the past 2000 years: we now officially have the attention of professional statisticians. I believe that overall this is a good thing, because it means that smart people will start working with us on this topic where their expertise is much needed. But I urge all those who believe they know everything to please consult with a climate scientist before treating them like idiots! Much as it is my responsibility to make sure that another ClimateGate does not happen, may all statisticians who read this feel inspired to give their field a better name than McShane and Wyner’s doings !

PS: Many other commentaries have been offered thus far.  The reader is referred to a recent climatological refutation by Gavin Schmidt and Michael Mann for a climate perspective.

Advertisements

Actions

Information

11 responses

15 11 2010
Frank

The “armchair skeptics who go cherry-pick their proxies” never claimed that they have accurately reconstructed temperature for the last millennium. They merely showed that methods of those that claim to have done so can give radically different answers when other logical choices of proxies are made. They show that most of signal reconstructing unusual 20th century warming is contained in a relative small subset of dubious proxies.

M&W justified the choice of Lasso as follows (p14): “We chose the Lasso because it is a reasonable procedure that has proven powerful, fast, and popular, and it performs comparably well in a p ≫ n context. Thus, we believe it should provide predictions which are as good or better than other methods that we have tried (evidence for this is presented in Figure 12). Furthermore, we are as much interested in how the proxies fare as predictors when varying the holdout block and null distribution (see Sections 3.3 and 3.4) as we are in performance. Are you saying these professional statisticians made the wrong choice and that this reasoning is wrong?

M&W go on to say: “In fact, all analyses in this section have been repeated using modeling procedures other than the Lasso and qualitatively all results remain more or less the same.” Is there really ANY reason to mention possible weaknesses in Lasso when the choice of Lasso apparently makes no difference to the conclusion that “the proxies do not predict temperature significantly better than random series generated independently of temperature”. (It would have been nice to see the output from other methods.)

I can’t wait for your commentary on GS&MM’s “refutation”. If the 36 of 95 proxies that Mann eliminated from his analysis were really just noise, M&W would have found the roughly the same climate signal in all 95 proxies that Mann did found in his cherry-picked subset, oops I mean those that passed “objective standards of reliability” designed to ensure that his reconstruction would show unusual 20th century warmth whether it was present or not. IF there were a useful climate signal to extract using anyone’s methodology, maybe the potential advantages GS&MM demonstrated for hybrid RegEM with pseudoproxies would be valuable. Unfortunately, their pseudoproxies are constructed assuming that other factors influencing tree growth have no autocorrelation (unlikely) and that natural climate has the same autocorrelation as the output of climate models (which don’t show long term oscillations like the AMO or PDO). Do you think all GS&MM’s posturing will make anyone forget that they keep finding a robust signal in data that doesn’t contain one?

M&W certainly showed that their arrogance was unwarranted with they misunderstood the purpose of pseudoproxies. Perhaps they have been taking lessons at the Michael Mann school of behavior. I’m glad that you welcome the entry of professional statisticians. I doubt that Mike does.

15 11 2010
El Niño

Hi Frank,
thanks for dropping by.

Are you saying these professional statisticians made the wrong choice and that this reasoning is wrong?

Yes I do, but only after extensive discussions with other professional statisticians (who, like other human beings, are capable of disagreement).

I can’t wait for your commentary on GS&MM’s “refutation”

I don’t think theirs was the most effective rebuttal, which is why I wrote mine. There is value in diversity.

Do you think all GS&MM’s posturing will make anyone forget that they keep finding a robust signal in data that doesn’t contain one?

I wouldn’t be in this business if I weren’t thoroughly convinced that there are robust signals to be found in multiple proxies. However, insisting that current methodologies are perfect may not be the most effective way of convincing others with a bent against that idea. I hope to contribute to this endeavor in short order, but harbor no illusion that those who can’t bear the idea of changing anything about their way of life will always find a way to bury their head in the sand about anthropogenic global warming.

M&W certainly showed that their arrogance was unwarranted with they misunderstood the purpose of pseudoproxies. Perhaps they have been taking lessons at the Michael Mann school of behavior. I’m glad that you welcome the entry of professional statisticians. I doubt that Mike does.

You may have misgivings about Mike’s work, but I don’t think arrogance was his problem – I think he’s been attacked from multiple sides for reasons that had very little to do with science (cf Barton’s bullyfest of a congressional inquiry) and given this level of harassment even the best of us would have gotten a little snappy. I can’t claim I would have kept my cool with such an enormous amount of pressure. Does it mean his attitude was exemplary? No. But if professional statisticians had gone into this with a bit of a positive attitude I’m sure Mike would have welcome their work.
I do think that as a community could have done a better job of engaging statisticians. That is what I am doing now, and I find most of them eager to contribute something useful.

17 11 2010
Frank

I do hope you are able to extract a robust signal with reliable confidence intervals from your proxies – whatever that answer may find. However, getting a reliable answer can be extremely difficult when the choices you make can bias the final result. See a discussion of the history of the mass to charge ratio in Feynman’s Cargo Cult Science. http://www.lhup.edu/~DSIMANEK/cargocul.htm

You made a very revealing statement above: “it is my postulate [that] when climate reconstruction methods incorporate the latest advances in the field of statistics, it will indeed be found that the current warming and its rate are unprecedented… the case for it just isn’t completely airtight now.”

Notice the use of the word “postulate”, which is defined as “a proposition that is not proved or demonstrated but considered to be either self-evident, or subject to necessary decision”. These words suggest philosophy (finding good reasons to believe in what you already believe in) more than science. The proper approach for a science is to say that: “current temperature is innocent of being higher than that of the MWP until it has been statistically proven to be guilty beyond a reasonable doubt by proxies”. (Or phrased in more scientific terms, until the null hypothesis has been rejected.) In legal proceedings, the jobs of prosecutor, defense attorney and jury are performed by different people; but in science one person needs to all three jobs – including the “presumed innocent” part. In general, beyond a reasonable doubt means something equivalent to p<0.05. (Your friends in statistics will tell you that other measures of reasonable doubt are acceptable, but only if you define your standard before performing your analysis.) Have you seen any papers without significant flaws that demonstrate that current temperatures "are guilty" by this standard? As Feynman said in Cargo Cult Science:

"The first principle is that you must not fool yourself–and you are
the easiest person to fool. So you have to be very careful about
that."

Have you fooled yourself? a) You are convinced that the MWP must have been significantly cooler than present by methods that even you admit are not air-tight. (Isn't it likely that you are convinced that the MWP must have been significantly cooler than present because there was less CO2 in the air? We certainly have reliable evidence that increasing CO2 causes warming, but no proof that unknown factors didn't make the MWP as warm or warmer than present. Only proxies can prove that the MWP was cooler than present.) b) You haven't acknowledged that M&W claim that methods other than Lasso give similar results, making criticism of this issue is irrelevant. M&W could have mislead readers about this subject, but GS&MM or some other commenter would have noticed by now. c) You can praise and criticize M&W, but all you can say about GS&MM is that their response wasn't "the most effective rebuttal". Why can't you discuss them as candidly as M&W? d) You ignore the hypothesis that pre-screening proxies is merely a sophisticated method of cherry-picking data. If the proxies that were eliminated were merely noise, shouldn't the same signal be found in the full data set?

"Barton’s bullyfest" was a political response to the gross political misuse of the Hockey Stick by the IPCC. After all, the IPCC described the warmest decade in the millennium conclusion "likely", meaning that there could be as much as a 1/3 chance that it was wrong. Nothing that uncertain should have become a scientific icon.

18 11 2010
El Niño

Hi Frank,
first of all, thanks again for participating in this discussion. I much appreciate your insight and the structured nature of your comments. So in order:

a) You are convinced that the MWP must have been significantly cooler than present by methods that even you admit are not air-tight. (Isn’t it likely that you are convinced that the MWP must have been significantly cooler than present because there was less CO2 in the air? We certainly have reliable evidence that increasing CO2 causes warming, but no proof that unknown factors didn’t make the MWP as warm or warmer than present. Only proxies can prove that the MWP was cooler than present.)

re-reading my post, I realize that some semantic mistake of mine has led to a big misconception, and I flatly apologize for it. By “postulate” I meant that I have a hunch (borne out of looking at this kind of paleoclimate data) that the current decade is the warmest in the past 2000 years. Every experimenter, every theoretician, everybody has a hunch. But of course you can’t call hunches science because as the venerable Dr Feynman said so aptly, “you are the easiest person to fool”. So it is that my entire research program is about systematically investigating whether that might be correct or not, and whether people believe it or not, I actually don’t feel strongly about the outcome. I certainly don’t use this as a prior to guide my research!
So I apologize for the confusion, and no, I am not “convinced that the MWP must have been significantly cooler than present”. However, I do make the prediction that, once the dust settles (however long that takes), we will find this to be true. Contact me again in 10 years and see how much the story has changed with exciting new data and methods! (caution: it could take 20)

I agree with you that no published paper has established with p < 0.05 that current temperatures are "guilty". BTW, this might not ever be possible, but if we can compute the probability of this guilt and find it to be above 0.9, I'd consider that pretty convincing evidence. Wouldn't you? Of course, it will ALWAYS be conditional on a probability model and some assumptions about the data, so just like a "jury of peers", it will never be perfectly objective. But hopefully it will be transparent enough to be accepted by people without PhDs in climate science. It is never a small task to convince people who have all kinds of ideological biases (priors) against anything we do. I'm not saying that is your case, but you don't have to go far on the blogosphere to find a staunchly stubborn anti-climatologist behavior…

b) You haven’t acknowledged that M&W claim that methods other than Lasso give similar results, making criticism of this issue is irrelevant. M&W could have mislead readers about this subject, but GS&MM or some other commenter would have noticed by now

I am not sure to follow your logic. Clearly the “M&W claim that methods other than Lasso give similar results” is false in general, otherwise we could never find verification scores significantly better than those generated by nonsense predictors. We do (e.g. Mann et al 2008, 2009, and my own work on ENSO reconstructions). I am not saying that M&W misled anyone there (I do believe they are of good faith), simply suggesting that they might not have set up the problem right (cf Martin Tingley’s piece on that question). Put it this way: if I buy a fancy new telescope and it can’t see Jupiter, does it mean Jupiter doesn’t exist? Or could it be that I haven’t taken the lid off? My personal approach to science is to start with the postulate that I could be wrong, rather than the opposite. Do you think the revered Feynman would consider the M&W article a model of integrity? (thanks for sharing the link, BTW, his wisdom is always a great read)

c) You can praise and criticize M&W, but all you can say about GS&MM is that their response wasn’t “the most effective rebuttal”. Why can’t you discuss them as candidly as M&W?

OK, there’s an argument I’d like to see disappear from this blog and others, so we can start having more productive discussions. It goes something as “you can’t criticize any paper challenging climate orthodoxy because you haven’t previously challenged all orthodox papers and pieces of opinions published to date”. People gave me the same rap when I dared criticize the Loehle extravaganza. On climate dissent websites I don’t see people shooting down their own colleagues before they take the pen. Does anyone have a good reason why should I do it? Since when and on what planet has it become a rhetorical golden rule? Or is it just that the “Michael Mann shooting game” is currently the most popular on the blogosphere, and people want me to join that cult so bad they can’t be bothered to read about anything else?

Now, Frank, if you have any misgivings about the refutation written by Gavin and Mike, please feel free to list them here. Since I don’t use any of their arguments, I simply don’t see why I should defend them. Does that answer your question?

d) You ignore the hypothesis that pre-screening proxies is merely a sophisticated method of cherry-picking data. If the proxies that were eliminated were merely noise, shouldn’t the same signal be found in the full data set?

Yes and no. In a world with numerical errors, it is well known that adding noise to a matrix can screw up many a least-square problem. Apparently L1 methods are more immune to that disease, but I have yet to convince myself that a foolproof method exist, able to perfectly separate the wheat from the chaff.
Now, the “cherry-picking” you speak of is called feature selection in the statistical literature. And we climate scientists are neither the first, nor the last to use it. Cherry-picking is of course a problem when you go for a very small subset of your data while ignoring the vast majority of the rest. Is it what’s happening here? Not really: Mann et al 2008 assembled a database of high-resolution proxy data based on a number of criteria on time resolution, continuity, length, willingness of paleoclimatologists to share their data, etc. One could find subjectivity at every step of the way (there is no such thing as a perfectly objective scientific investigation) but I submit that they really tried to incorporate all the available data. Some of it reflects temperature; some of it is more strongly influenced by precipitation ; some of these records dont have a particularly clear-cut interpretation in terms of local or distal climate variable. So of course you had better screen the proxies for a temperature signal before throwing them at your regression method. You’d be pretty crazy not to. If I added baseball scores and the result of running-rat experiments to the design matrix, and then use a correlation-based feature selection method to toss them out, would you call that “cherry-picking” ? I think not. You’d call that sanity. Well, there are proxies that have little to do with surface temperature per se, so I find it wise to screen them out (not all my colleagues agree). Now, we can argue about which correlation-based criteria should be used of that – one of my pet peeves, as it turns out. But I don’t believe the general principle can be put in doubt.

Incidentally, by your measure, dear Frank, investigators of any field “cherry-pick” all day long when selecting data that is relevant to their problem, mostly subconsciously. So why it is OK for particle physicists to do it, but not climate physicists?

15 11 2010
Latimer Alder

Ummm

Re arrogance.

‘Climate science’ has such a long history of poor professional practice and self-delusion that it would be very unwise for anyone from a different field (especially one where they actually have serious expertise) to let themselves be influenced by the existing thinking and practices of the climate scientists in that area.

Good auditors should not begin their investigations by going out drinking with the Chief Accountant of the audited organisation, but start with a clean sheet of paper and a nasty suspicious mind. More of this attitude and less buddy buddy would perhaps have stopped Madoff and Enron earlier. It may not lead to comfortable cocktail parties, but that is not the purpose.

What you may consider arrogance, the outside world, used to external scrutiny as a a means for trying to keep people honest, considers as maintaining a sensible professional independence.

15 11 2010
El Niño

Interesting viewpoint. So we should have had Enron finances audited by people who knew nothing about corporate finance? Do you really think it would have solved the problem? It seems that you view climate scientists as “guilty until proven innocent”, which is not the most productive view.

16 11 2010
Latimer Alder

@El Nino

‘So we should have had Enron finances audited by people who knew nothing about corporate finance?’

No at all. That is the exact opposite of my view.

We should have works about statistics audited by proper statisticians. As McS & W are. The article suggested that it was unpleasant of them not to have sat down first with climatologists and understood their problems, rather than to start from the first principles of statistics.

In Enron;s case, the reason that the auditors failed so spectacularly (and which brought the end of Arthur Andersen as an accounting firm) was that the auditors had adopted precisely that approach. They had become too buddy buddy with the auditees and accepted their word and rationale with too much credulity and too little suspicion.

Auditors should always view the auditees as ‘of unproven probity’ until they have completed their investigations. Not quite the same as ‘guilty until proven innocent’ but far more effective than ‘innocent until proven guilty’

As I said before, it may not make for comfortable cocktail parties, but that is a minor niggle about a system supposed to keep people honest. Being productive (whatever that means in this context) does not enter into it.

16 11 2010
El Niño

Hello Latimer Alder,
I appreciate that the business crowd thinks we are out to delude everyone with our data and models, and that is why attempts at “auditing” what we do falls short of revealing anything of substance. When you say “We should have works about statistics audited by proper statisticians.”, you are making the implicit assumption that all there is to paleoclimate reconstructions is statistics. It is a crucial part of it, yes, but not the only part. Once again, I repeat that I have met some very bright statisticians who do not look down upon the applications: they understand that data are not just random numbers and that to make an appropriate use of statistics is to first understand how the data is generated and get a basic sense of what it means. Failing to read to do this leads to pretty egregious errors, which Wegman, McShane & Wyner could all have avoided by doing their basic climatology homework. Now, if having conversations with people amounts to an acceptable level of buddy-buddiness, perhaps they should consider taking a class ! I’d be happy to teach some of them about the fundamentals of climate science, and trade that for a much needed deepening of my knowledge of statistics. But assuming you know everything is always a terrible way of studying a problem, whatever the field… Before I wrote this, I, for one, consulted with statisticians for the parts I wasn’t sure to understand.

17 11 2010
PolyisTCOandbanned

Thanks for weighing in.

28 11 2010
Frank

El Nino wrote: “I agree with you that no published paper has established with p MWP) drawn from Mann’s original hockey stick to be merely “likely” (far less than Mann’s estimate of statistical significance), but his work became the highlight of the TAR.

Given the difficulties of getting p<0.1 in climate science, I won't argue with this as a standard for statistical significance. But one should always keep in mind as you read a journal full of articles using a p<0.1 standard, that as many as one out of ten articles you read could be there by a chance arrangement of data, others will be there because of subtle, hopefully-unconscious investigator bias in selecting data, and others because of systematic errors. A p<0.1 standard is fairly iffy. p<0.05 gives another 0.5 Stds of safety margin, and there is real added value in showing that p is <0.01 or even <0.001. Before the FDA will approve a new drug, they usually demand TWO independent double-blind clinical trials with p<0.05 (neither the patient nor the evaluating doctor know who was randomly selected to receive drug vs placebo). The FDA also receives all of the raw data for each patients and does their own independent statistical analysis of all of the sponsoring drug company's claims. The IPCC wants the world to take its extremely expensive "low carbon" medicine on the basis of scientific studies that fall far short of the FDA's standards.

El Nino wrote: "I'm not saying that is your case, but you don't have to go far on the blogosphere to find a staunchly stubborn anti-climatologist behavior…" Agreed! I've occasionally written comments about the absurd "science" at some prominent skeptical sites, but they just gets lost in the noise and rarely draw any sensible discussion. However, I see just as many problems at CAGW sites written by people who seem to be more capable of accurately describing the science behind GW. (RealClimate, of course, censors my comments.) My favorite reply to a comment I made at one site was: "Which side are you on?" I'm all in favor of some scientists trying to make the world a better place by applying their scientific expertise to issues of public policy, but I'd prefer to see the IPCC's reports written by scientists who have chosen to remain above the corrupting political fray.

Thanks for introducing me to the term "feature selection". Mann et al should have analyzed data their data with and without "feature selection" and demonstrate why they believe that their selection process removed noise rather than signal that disagreed with their conclusion. In the middle of the discussion of proxy data networks, Mann claims to have made reconstructions from the "full" and "screened" proxy networks, but I can't find any reconstructions made from the full network. As best I can tell, M&W are telling us what such a reconstruction would look like if Mann had actually shown it. Note: M&W explicitly stated in their paper that other methods besides Lasso give results similar to Lasso (but it would have been preferable to see the evidence). I would also like to see how the choice of correlation coefficient in the "feature selection" process influences the final result and how much bias "feature selection" introduces with psuedoproxies+various amounts of noise. When Mann et al chose to use "feature selection", it was their scientific obligation to prove to the reader that the selection process wasn't a sophisticated form of cherry-picking.

1 12 2010
El Niño

Hello Frank,
thank you for chiming in.
Believe or not, the IPCC tries to attach a probability to all of its major statements, with language such as ‘likely’, ‘very likely’, ‘unlikely’, etc, corresponding to explicitly defined levels of likelihood. However, in climate science as elsewhere, it can be challenging to ascribe quantitative probabilities to certain results. In the paleoclimate problem of interest here, I think the frequentist terminology of p-values could be advantageously replaced by a Bayesian approach, where one can attach a probability to any result. One issue is that frequentists test can be very easily fooled by all sorts of things, in particular serial correlation.
Now, is p <0.1 unacceptable ? that is in the eye of the beholder. As long as we tell decision makers what the odds are, it is their job to make decisions. If a doctor told me that if I don't quit smoking I will get lung cancer with 90% or 95% certainty, I would quit smoking in either case. The FDA has strict rules and clinical trials designed to get n large enough so that they can reach the level of confidence you aspire to. That's great. But We simply don't have the luxury of re-running the experiment called "the Earth".

My own experience with feature selection is that:
(1) it makes a great deal of difference to the end result
(2) there are various methods to implement it, and they do not all give identical results

Hence, I agree with your analysis that as a community we should be more explicit about it. It turns out that the Mann et al 2008 reconstruction uses 484 proxies selected with the p < 0.1 correlation threshold, but it took one of my students a lot of digging through supplementary info and code to make that clear. The thing is, if you do that selection a different way (my own method retained 522 proxies, some of which were not in Mann et al.'s 484), you get almost exactly the same result. However, selecting too few proxies (which is what McShane & Wyner did, because the LASSO will only retain as many predictors as there are years in the calibration interval, hence 146), then you get the result that "proxies have no signal", i.e. junk.

To be sure, “feature selection” is not an invention of climate scientists but a statistical one; if you have issues with it, you might want to discuss it with a statistician – no need to put dirty quotes around it as if it had the plague 😉 The unstated justification for using it in paleoclimate reconstructions is (in my experience) that you get junk results (cf M&W) if you don't. Now, I am also unsatisfied with how we do this in my field, and I am actually collaborating with professional statisticians on more objective ways to perform model selection – hopefully that will alleviate your (legitimate) concerns, and those of other intelligent skeptics, about this screening. You bring forth valid points that we, as a community, should be able to address head on and with clarity.

Note: M&W explicitly stated in their paper that other methods besides Lasso give results similar to Lasso (but it would have been preferable to see the evidence)

I love that M&W can get away with no presenting everything, but no climate scientist ever could: that’s what I mean by ”
guilty until proven innocent”. In any case, i surmise why they got the same result regardless of the method: with that large of a p/n ratio (number of variables vs number of observations), you need a very sharp way of discerning what predictors to retain (“model selection”), and owing to the professional statisticians that I talk to (and quote in my post), none of the methods they used would do that – so one would expect proxy noise to dominate every time. The issue is, as MacShane and Wyner acknowledge in the article, that paleoclimate data very complex, and hence one cannot just throw any random method at it and expect to do well. That is why no statistician can succeed at this task alone and why, for the same reason, no climate scientist without a knowledge of modern statistical methods will. That’s why climate scientists and statisticians need to work together and that’s why I was calling for more collaboration than antagonism between the two fields. Now, people obsessed with “audits” will take some time to get over that, but hopefully one day they’ll realize no one is manufacturing evidence, but simply trying to solve a challenging scientific problem with political implications, so collaboration is a more promising route than fearful suspicion.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: