Thursday, April 27, 2017

The Law of No Restraint

There's a new law of science reporting or, perhaps more accurately put, of the science jungle.  The law is to feed any story, no matter how fantastic, to science journalists (including your university's PR spinners), and they will pick up whatever can be spun into a Big Story, and feed it to the eager mainstream media.  Caveats may appear somewhere in the stories, but not the headlines so that, however weak or tentative or incredible, the story gets its exposure anyway.  Then on to tomorrow's over-sell.

One rationale for this is that unexpected findings--typically presented breathlessly as 'discoveries'--sell: they rate the headline. The caveats and doubts that might un-headline the story may be reported as well, but often buried in minimal terms late in the report.  Even if the report balances skeptics and claimants, simply publishing the story is enough to give at least some credence to the discovery.

The science journalism industry is heavily inflated in our commercial, 24/7 news environment. It would be better for science, if not for sales, if all these hyped papers, rather than being publicized at the time the paper is published, first appeared in musty journals for specialists to argue over, and in the pop-sci news only after some mature judgments are made about them.  Of course, that's not good for commercial or academic business.

We have just seen a piece reporting that humans were in California something like 135,000 years ago, rather than the well-established continental dates of about 12,000.  The report which I won't grace by citing here, and you've probably seen it anyway, then went on to speculate about what 'species' of our ancestors these early guys might have been.

Why is this so questionable?  If it were a finding on its own, it might seem credible, but given the plethora of skeletal and cultural archeological findings, up and down the Americas, such an ancient habitation seems a stretch.  There is no comparable trail of earlier settlements in northeast Asia or Alaska that might suggest it, and there are lots of animal and human archeological remains--all basically consistent with each other, so why has no earlier finding yet been made?  It is of course possible that this is the first and is a correct one, but it is far too soon for this to merit a headline story, even with caveats.

Another piece we saw today reported that a new analysis casts doubt on whether diets high in saturated fat are bad for you.  This was a meta-analysis of various other studies that have been done, and got some headline treatment because the authors report that, contrary to many findings over many years, saturated fats don't clog arteries. Instead, they say, coronary heart disease is a chronic inflammatory condition.  Naturally, the study's basic data are being challenged, as reflected in this story's discussion, by critiques of its data and method.  These get into details we're not qualified to judge, and we can't comment on the relative merits of the case.

However, one thing we can note is that with respect to coronary heart disease, study after study has reported more or less the same, or at least consistent findings about the correlation between saturated fats and risk. Still, despite so very much careful science, including physiological studies as well as statistical analysis of population samples, can we still apparently not be sure about a dietary component that we've been told for years should play a much reduced role in what we eat?  How on earth could we possibly still not know about saturated fat diets and disease risk?

If this very basic issue is unresolved after so long, and the story is similar for risk factors for many complex diseases, then what is all this promise of 'precise' medicine all about?  Causal explanations are still fundamentally unclear for many cancers, dementias, psychiatric disorders, heart disease, and so on.  So why isn't the most serious conclusion that our methods and approaches themselves are for some reason simply not adequate to answer such seemingly simple questions as 'is saturated fat bad for you?'  Were the plethora of previous studies all flawed in some way?  Is the current study?  Do the publicizing of the studies themselves change behaviors in ways that affects future studies?

There may be no better explanation than that diets and physiology are hard to measure and are complex, and that no simple answer is true.  We may all differ for genetic and other reasons to such an extent that population averages are untrustworthy, or our habits may change enough that studies don't get consistent answers.  Or asking about one such risk factor when diets and lifestyles are complex is a science modus operandi that developed for studying simpler things (like exposure to toxins or bacteria, the basis of classical epidemiology), and we simply need a better gestalt from which to work.

Clearly a contributory sociological factor is that the science industry has simply been cruising down the same rails despite constant popping of promise bubbles, for decades now.  It's always more money for more and bigger studies.  It's rarely let's stop and take a deep breath and think of some better way to understand (in this case) dietary relationships to physical traits.  In times past, at least, most stories like the ancient Californian didn't get ink so widely and rapidly.  But if I'm running a journal, or a media network, or am a journalist needing to earn my living, and I need to turn a buck, naturally I need to write about things that aren't yet understood.

Unfortunately, as we've noted before, the science industry is a hungry beast that needs its continual feeding, and (like our 3 cats) always demands more, more, and more.  There are ways we could reform things, at least up to a point.  We'll never end the fact that some scientists will claim almost anything to get attention, and we'll always be faced with data that suggest one thing that doesn't turn out that way.  But we should be able to temper the level of BS and get back more to sober science rather than sausage factory 'productivity'.  And educate the public that some questions can't be answered the way we'd like, or aren't being asked in the right way.  But that is something science might address effectively, if it weren't so rushed and pressured to 'produce'.

Thursday, April 20, 2017

Some genetic non-sense about nonsense genes

The April 12 issue of Nature has a research report and a main article about what is basically presented as the discovery that people typically carry doubly knocked-out genes, but show no effect. The idea as presented in the editorial (p 171) notes that the report (p235) uses an inbred population to isolate double knockout genes (that is, recessive homozygous null mutations), and look at their effects.  The population sampled, from Pakistan, has high levels of consanguineous marriages.  The criteria for a knockout mutation was based on the protein coding sequence.

We have no reason to question the technical accuracy of the papers, nor their relevance to biomedical and other genetics, but there are reasons to assert that this is nothing newly discovered, and that the story misses the really central point that should, I think, be undermining the expensive Big Data/GWAS approach to biological causation.

First, for some years now there have been reports of samples of individual humans (perhaps also of yeast, but I can't recall specifically) in which both copies of a gene appear to be inactivated.  The criteria for saying so are generally indirect, based on nonsense, frameshift, or splice-site mutations in the protein code.  That is, there are other aspects of coding regions that may be relevant to whether this is a truly thorough search to see that whatever is coded really is non-functional.  The authors mention some of these.  But, basically, costly as it is, this is science on the cheap because it clearly only addresses some aspects of gene functionality.  It would obviously be almost impossible to show either that the gene was never expressed or never worked. For our purposes here, we need not question the finding itself.  The fact that this is not a first discovery does raise the question why a journal like Nature is so desperate for Dramatic Finding stories, since this one really should be instead a report in one of many specialty human genetics journals.

Secondly, there are causes other than coding mutations for gene inactivation. They have to do with regulatory sequences, and inactivating mutations in that part of a gene's functional structure is much more difficult, if not impossible, to detect with any completeness.  A gene's coding sequence itself may seem fine, but its regulatory sequences may simply not enable it to be expressed. Gene regulation depends on epigenetic DNA modification as well as multiple transcription factor binding sites, as well as the functional aspects of the many proteins required to activate a gene, and other aspects of the local DNA environment (such as RNA editing or RNA interference).  The point here is that there are likely to be many other instances of people with complete or effectively complete double knockouts of genes.

Thirdly, the assertion that these double KOs have no effect depends on various assumptions.  Mainly, it assumes that the sampled individuals will not, in the future, experience the otherwise-expected phenotypic effects of their defunct genes.  Effects may depend on age, sex, and environmental effects rather than necessarily being a congenital yes/no functional effect.

Fourthly, there may be many coding mutations that make the protein non-functional, but these are ignored by this sort of study because they aren't clear knockout mutations, yet they are in whatever data are used for comparison of phenotypic outcomes.  There are post-translational modification, RNA editing, RNA modification, and other aspects of a 'gene' that this is not picking up.

Fifthly, and by far most important, I think, is that this is the tip of the iceberg of redundancy in genetic functions.  In that sense, the current paper is a kind of factoid that reflects what GWAS has been showing in great, if implicit, detail for a long time: there is great complexity and redundancy in biological functions.  Individual mapped genes typically affect trait values or disease risks only slightly.  Different combinations of variants at tens, hundreds, or even thousands of genome sites can yield essentially the same phenotype (and here we ignore the environment which makes things even more causally blurred).

Sixthly, other samples and certainly other populations, as well as individuals within the Pakistani data base, surely carry various aspects of redundant pathways, from plenty of them to none.  Indeed, the inbreeding that was used in this study obviously affects the rest of the genome, and there's no particular way to know in what way, or more importantly, in which individuals.  The authors found a number of basically trivial or no-effect results as it is, even after their hunt across the genome. Whether some individuals had an attributable effect of a particular double knockout is problematic at best.  Every sample, even of the same population, and certainly of other populations, will have different background genotypes (homozygous or not), so this is largely a fishing expedition in a particular pond that cannot seriously be extrapolated to other samples.

Finally, this study cannot address the effect of somatic mutation on phenotypes and their risk of occurrence.  Who knows how many local tissues have experienced double-knockout mutations and produced (or not produced) some disease or other phenotype outcome.  Constitutive genome sequencing cannot detect this.  Surely we should know this very inconvenient fact by now!

Given the well-documented and pervasive biological redundancy, it is not any sort of surprise that some genes can be non-functional and the individual phenotypically within a viable, normal range. Not only is this not a surprise, especially by now in the history of genetics, but its most important implication is that our Big Data genetic reductionistic experiment has been very successful!  It has, or should have, shown us that we are not going to be getting our money's worth from that approach.  It will yield some predictions in the sense of retrospective data fitting to case-control or other GWAS-like samples, and it will be trumpeted as a Big Success, but such findings, even if wholly correct, cannot yield reliable true predictions of future risk.

Does environment, by any chance, affect the studied traits?  We have, in principle, no way to know what environmental exposures (or somatic mutations) will be like.  The by now very well documented leaf-litter of rare and/or small-effect variants plagues GWAS for practical statistical reasons (and is why usually only a fraction of heritability is accounted for).  Naturally, finding a single doubly inactivated gene may, but by no means need, yield reliable trait predictions.

By now, we know of many individual genes whose coded function is so proximate or central to some trait that mutations in such genes can have predictable effects.  This is the case with many of the classical 'Mendelian' disorders and traits that we've known for decades.  Molecular methods have admirably identified the gene and mutations in it whose effects are understandable in functional terms (for example, because the mutation destroys a key aspect of a coded protein's function).  Examples are Huntington's disease, PKU, cystic fibrosis, and many others.

However, these are at best the exceptions that lured us to think that even more complex, often late-onset traits would be mappable so that we could parlay massive investment in computerized data sets into solid predictions and identify the 'druggable' genes-for that Big Pharma could target.  This was predictably an illusion, as some of us were saying long ago and for the right reasons.  Everyone should know better now, and this paper just reinforces the point, to the extent that one can assert that it's the political economic aspects of science funding, science careers, and hungry publications, and not the science itself, that leads to the persistence of drives to continue or expand the same methods anyway.  Naturally (or should one say reflexively?), the authors advocate a huge Human Knockout Project to study every gene--today's reflex Big Data proposal.**

Instead, it's clearly time to recognize the relative futility of this, and change gears to more focused problems that might actually punch their weight in real genetic solutions!

** [NOTE added in a revision.  We should have a wealth of data by now, from many different inbred mouse and other animal strains, and from specific knockout experiments in such animals, to know that the findings of the Pakistani family paper are to be expected.  About 1/4 to 1/3 of knockout experiments in mice have no effect or not the same effect as in humans, or have no or different effect in other inbred mouse strains.  How many times do we have to learn the same lesson?  Indeed, with existing genomewide sequence databases from many species, one can search for 2KO'ed genes.  We don't really need a new megaproject to have lots of comparable data.]

Wednesday, April 12, 2017

Reforming research funding and universities

Any aspect of society needs to be examined on a continual basis to see how it could be improved.  University research, such as that which depends on grants from the National Institutes of Health, is one area that needs reform. It has gradually become an enormous, money-directed, and largely self-serving industry, and its need for external grant funding turns science into a factory-like industry, which undermines what science should be about, advancing knowledge for the benefit of society.  

The Trump policy, if there is one, is unclear, as with much of what he says on the spur of the moment. He's threatened to reduce the NIH budget, but he's also said to favor an increase, so it's hard to know whether this represents whims du jour or policy.  But regardless of what comes from on high, it is clear to many of us with experience in the system that health and other science research has become very costly relative to its promise and too largely mechanical rather than inspired.

For these reasons, it is worth considering what reforms could be taken--knowing that changing the direction of a dependency behemoth like NIH research funding has to be slow because too many people's self-interests will be threatened--if we were to deliver in a more targeted and cost-efficient way on what researchers promise.  Here's a list of some changes that are long overdue.  In what follows, I have a few FYI asides for readers who are unfamiliar with the issues.

1.  Reduce grant overhead amounts
FYI:  Federal grants come with direct and indirect costs.  Direct costs pay the research staff, the supplies and equipment, travel and collecting data and so on.  Indirect costs are worked out for each university, and are awarded on top of the direct costs--and given to the university administrators.  If I get $100,000 on a grant, my university will get $50,000 or more, sometimes even more than $100K.  Their claim to this money is that they have to provide the labs, libraries, electricity, water, administrative support and so on, for the project, and that without the project they'd not have these expenses. Indeed, an indicator of the fat that is in overhead is that as an 'incentive' or 'reward', some overhead is returned as extra cash to the investigator who generated it.]

University administrations have notoriously been ballooning.  Administrators and their often fancy offices depend on individual grant overhead, which naturally puts intense pressure on faculty members to 'deliver'.  Educational institutions should be lean and efficient. Universities should pay for their own buildings and libraries and pare back bureaucracy. Some combination of state support, donations, and bloc grants could be developed to cover infrastructure, if not tied to individual projects or investigators' grants. 

2.  No faculty salaries on grants
FYI:  Federal grants, from NIH at least, allow faculty investigators' salaries to be paid from grant funds.  That means that in many health-science universities, the university itself is paying only a fraction, often tiny and perhaps sometimes none, of their faculty's salaries.  Faculty without salary-paying grants will be paid some fraction of their purported salaries and often for a limited time only.  And salaries generate overhead, so they're now well paid: higher pay, higher overhead for administrators!  Duh, a no-brainer!]

Universities should pay their faculty's salaries from their own resources.   Originally, grant reimbursement for faculty investigators' salaries were, in my understanding, paid on grants so the University could hire temporary faculty to do the PI's teaching and administrative obligations while s/he was doing the research.  Otherwise, if they're already paid to do research, what's the need? Faculty salaries paid on grants should only be allowed to be used in this way, not just as a source of cash.  Faculty should not be paid on soft money, because the need to hustle one's salary steadily is an obvious corrupting force on scientific originality and creativity. 

3.  Limit on how much external funding any faculty member or lab could have
There is far too much reward for empire-builders. Some do, or at least started out doing, really good work, but that's not always the case and diminishing returns for expanding cost is typical.  One consequence is that new faculty are getting reduced teaching and administrative duties so they can (must!) write grant applications. Research empires are typically too large to be effective and often have absentee PIs off hustling, and are under pressure to keep the factory running.  That understandably generates intense pressure to play it safe (though claiming to be innovative); but good science is not a predictable factory product. 

4.  A unified national health database
We need health care reform, and if we had a single national health database it would reduce medical costs and could be anonymized so research could be done, by any qualified person, without additional grants.  One can question the research value of such huge databases, as is true even of the current ad hoc database systems we pay for, but they would at least be cost-effective.

5. Temper the growth ethic 
We are over-producing PhDs, and this is largely to satisfy the game of the current faculty by which status is gained by large labs.  There are too many graduate students and post-docs for the long-term job market.  This is taking a heavy personal toll on aspiring scientists.  Meanwhile, there is inertia at the top, where we have been prevented from imposing mandatory retirement ages.  Amicably changing this system will be hard and will require creative thinking; but it won't be as cruel as the system we have now.

6. An end to deceptive publication characteristics  
We routinely see papers listing more authors than there are residents in the NY phone book.  This is pure careerism in our factory-production mode.  As once was the standard, every author should in principle be able to explain his/her paper on short notice.  I've heard 15 minutes. Those who helped on a paper such as by providing some DNA samples, should be acknowledged, but not listed as authors. Dividing papers into least-publishable-units isn't new, but with the proliferation of journals, it's out of hand.  Limiting CV lengths (and not including grants on them) when it comes to promotion and tenure could focus researchers' attention on doing what's really important rather than chaff-building.  Chairs and Deans would have to recognize this, and move away from safe but gameable bean-counting.  

FYI: We've moved towards judging people internally, and sometimes externally in grant applications, on the quantity of their publications rather than the quality, or on supposedly 'objective' (computer-tallied) citation counts.  This is play-it-safe bureaucracy and obviously encourages CV padding, which is reinforced by the proliferation of for-profit publishing.  Of course some people are both highly successful in the real scientific sense of making a major discovery, as well as in publishing their work.  But it is naive not to realize that many, often the big players grant-wise, manipulate any counting-based system.  For example, they can cite their own work in ways that increase the 'citation count' that Deans see.  Papers with very many authors also lead to red-claiming that is highly exaggerated relative to the actual scientific contribution.  Scientists quickly learn how to manipulate such 'objective' evaluation systems.] 

7.  No more too-big-and-too-long-to-kill projects
The Manhattan Project and many others taught us that if we propose huge, open-ended projects we can have funding for life.  That's what the 'omics era and other epidemiological projects reflect today.  But projects that are so big they become politically invulnerable rarely continue to deliver the goods.  Of course, the PIs, the founders and subsequent generations, naturally cry that stopping their important project after having invested so much money will be wasteful!  But it's not as wasteful as continuing to invest in diminishing returns.  Project duration should be limited and known to all from the beginning.

8.  A re-recognition that science addressing focal questions is the best science
Really good science is risky because serious new findings can't be ordered up like hamburgers at McD's.  We have to allow scientists to try things.  Most ideas won't go anywhere.  But we don't have to allow open-ended 'projects' to scale up interminably as has been the case in the 'Big Data' era, where despite often-forced claims and PR spin, most of those projects don't go very far, either, though by their size alone they generate a blizzard of results. 

9. Stopping rules need to be in place  
For many multi-year or large-scale projects, an honest assessment part-way through would show that the original question or hypothesis was wrong or won't be answered.  Such a project (and its funds) should have to be ended when it is clear that its promise will not be met.  It should be a credit to an investigator who acknowledges that an idea just isn't working out, and those who don't should be barred for some years from further federal funding.  This is not a radical new idea: it is precedented in the drug trial area, and we should do the same in research.  

It should be routine for universities to provide continuity funding for productive investigators so they don't have to cling to go-nowhere projects. Faculty investigators should always have an operating budget so that they can do research without an active external grant.  Right now, they have to piggy-back their next idea by using funds in their current grant, and without internal continuity funding, this is naturally leads to safe 'fundable'  projects, rather than really innovative ones.  The reality is that truly innovative projects typically are not funded, because it's easy for grant review panels to fault-find and move on the safer proposals.

10. Research funding should not be a university welfare program
Universities are important to society and need support.  Universities as well as scientists become entrenched.  It's natural.  But society deserves something for its funding generosity, and one of the facts of funding life could be that funds move.  Scientists shouldn't have a lock on funding any more than anybody else. Universities should be structured so they are not addicted to external funding on grants. Will this threaten jobs?  Most people in society have to deal with that, and scientists are generally very skilled people, so if one area of research shrinks others will expand.

11.  Rein in costly science publishing
Science publishing has become what one might call a greedy racket.  There are far too many journals, rushing out half-way reviewed papers for pay-as-you-go authors.  Papers are typically paid for on grant budgets (though one can ask how often young investigators shell out their own personal money to keep their careers).  Profiteering journals are proliferating to serve the CV-padding hyper-hasty bean-counting science industry that we have established.  Yet the vast majority of papers have basically no impact.  That money should go to actual research.

12.  Other ways to trim budgets without harming the science 
Budgets could be trimmed in many other ways, too:  no buying journal subscriptions on a grant (universities have subscriptions), less travel to meetings (we have Skype and Hangout!), shared costly equipment rather than a sequencer in every lab.  Grants should be smaller but of longer duration, so investigators can spend their time on research rather than hustling new grants. Junk the use of 'impact' factors and other bean-counting ways of judging faculty.  It had a point once--to reduce discrimination and be more objective, but it's long been strategized and manipulated, substituting quantity for quality.  Better evaluation means are needed.  

These suggestions are perhaps rather radical, but to the extent that they can somehow be implemented, it would have to be done humanely.  After all, people playing the game today are only doing what they were taught they must do.  Real reform is hard because science is now an entrenched part of society.  Nonetheless, a fair-minded (but determined!) phase-out of the abuses that have gradually developed would be good for science, and hence for the society that pays for it.

***NOTES:  As this was being edited, NY state has apparently just made its universities tuition-free for those whose families are not wealthy.  If true, what a step back towards sanity and public good!  The more states can get off the grant and other grant and strings-attached private donation hooks, the more independent they should be able to be.

Also, the Apr 12 Wall St Journal has a story (paywall, unless you search for it on Twitter) showing the faults of an over-stressed health research system, including some of the points made here.  The article points out problems of non-replicability and other technical mistakes that are characteristic of our heavily over-burdened system.  But it doesn't go after the System as such, the bureaucracy and wastefulness and the pressure for 'big data' studies rather than focused research, and the need to be hasty and 'productive' in order to survive.