As one of my many personality flaws, I often overlook social niceties as I engage in scientific discussions. I have never understood the value of niceties in an activity that supposedly concerns fact, logic, and reasoned arguments.
Nor have I ever had problems separating an argument about science, versus an argument with my neighbor after, for example, his dog has messed up my pool deck. Scientific arguments are engaged as scientific arguments. Nothing else. And certainly not as a mechanism to establish a pecking order among scientists. So after a jolly fight, I have never had any difficulty conceding defeat. In fact, defeat is wonderful; it means that I have learned something.
But there is another reason to avoid niceties: One of the best ways to assess the intellectual integrity of a new scientific acquaintance is to make a provocative challenge on a point of science, and see how they respond.
Thirty years ago, I did exactly that to a Uruguayan-Canadian scientist named Gaston Gonnet. Gaston, then 42 years young, had been hired by the Informatics Department at the ETH Zurich because of his stellar career in computer science at the University of Waterloo. He was perhaps best known for Maple, a symbolic computation platform.
But Gaston was a polymath. Two years earlier, with Frank Tompa at Waterloo, he had founded OpenText, with ambitions to do whole web searches nearly a decade ahead of Larry Page and Sergey Brin, whom you may have heard of. As of 2014, OpenText was Canada’s largest software company.
I had been told by my wife that Gaston was to speak on his latest project, also with Frank, a full-text searchable version of the Oxford English dictionary. This allowed a user to, for example, learn how many words were introduced into the English language between 1670 and 1690 from the Czech language. My wife thought Gaston’s lecture would help me manage my struggle with the German language, which I was using to teach chemistry students, students who were begging me to (please) lecture in English instead.
After the lecture, I approached Gaston with a impolite remark characteristic of my age of 35: “If you are going to do all this work, why don’t you work with an interesting set of data? Not this Oxford English dictionary stuff?”
Without a hint of personal affront, Gaston responded: “What data did you have in mind?”
Genuine curiosity. A remarkable thing in a scientist. I referred him to the then emerging data describing the sequences of proteins and their encoding genes.
It is difficult today to imagine the world of 1990 with respect to DNA sequencing. It was still possible to go to Chemical Abstracts, then a bound paper volume (What is “paper”? How does it have “volume”?), search under subject heading “protein sequences”, pad to the shelf holding the journal, and make a xerox copy of the sequence itself. The “Atlas of Protein Sequences”, published by Margaret Dayhoff, was a book holding the only good source of aligned sequences of proteins related by common ancestry. Jaap Beintema was still determining the amino acid sequences of ribonucleases (they are proteins) by getting pancreas tissues from expired zoo animals, treating them with acid, and doing Edman degradations (Google it).
And the only “matrix” describing how amino acids were replaced by other amino acids as proteins divergently evolve under functional constraints had been done for only a few dozen protein families.
What evolved was a scientific collaboration that produced over a dozen papers, as well as a deep personal friendship. Every Friday at 4 PM, after the crises of the week had been settled and the administrators had gone home, I would join Gaston in his office. He would provide Swiss chocolate. I would bring the latest problem in molecular evolution. And we would work things out.
What emerged, exactly 30 years ago, is captured in a technical report (Number 154, March 1991). The first web-based tools to do multiple protein sequence alignments. An updated replacement matrix that covered the entire protein sequence database, publish in Science. A model for how proteins insert and delete segments of the protein chain. The rectified database, eight years later, became a commercial product (the “Master Catalog”) with $3.4 million in lifetime sales. The MasterCatalog precomputed multiple sequence alignments of all protein families, built trees showing their evolutionary relation, and computed ancestral sequences, the structures of proteins from now extinct organisms. Gaston wrote a bioinformatics programming language called Darwin, atop Maple, to manipulate protein sequences. While the origin of all of this is largely forgotten by a generation of scientists who do not know what “paper” is, all of these advances are now routine in the field.
And the products of this large-scale comprehensive analysis of protein sequences persist. The analysis of patterns of amino acid replacement in proteins allowed us to predict the folded structure of protein kinase from sequence data alone, the first time that this is been done “blind” and published ahead of actual knowledge of the structure. Using emerging biotechnology, the ancestral sequences determined on those Friday afternoons were resurrected in the laboratory. We physically held in our hands proteins from ruminants that lived two million years ago, and then 80 million years ago, and then proteins from bacteria that lived three billion years ago. The field of experimental paleogenetics, adumbrated by Linus Pauling and Emile Zuckerkandl 30 years before, was born on these Friday afternoons.
Paleogenetics is now driving medicine. Also driving medicine was the recognition that standard alignment algorithms treat proteins as if they were linear strings of letters, as in the Oxford Dictionary. However, proteins are not linear strings of letters, but rather folded and functioning molecules. “Big data” (well, it was “big” in 1991) showed that signals of fold and function are contained in the differences, distinguishing how proteins actually evolve, and how they would evolve if they were simple strings of letters. This signal is captured in arcane terms such as “homoplasy”, “heterotachy”, and “covariance”. A decade later, we, Eric Gaucher, David Liberles, and others were to use this signal to understand function in mammals, and humans, of proteins related to gout, cancer, diabetes, placental reproduction, and obesity.
All from an impolite remark, one that a scientist having an ego more fragile than Gaston’s would have taken as an insult.
Now, I wouldn’t recommend this approach to young scientists seeking research collaborators. For every scientist like Gaston, dozens more will interpret a challenge to an idea as a challenge to them personally. Our civilization is not that far removed from the world, where one’s status and alliances with one’s tribesmen was the difference between living an easy life and dying alone on the savannah. As Feyerabend frequently observed, very little in science, as it is actually practiced, is about fact, logic, or reasoning.
So, my advice is to always use social niceties when speaking to your organizational superiors