Archive for the ‘NIFarious Ideas’ Category

Your grandmother is much better at open reproducible science than you!

Posted on July 15th, 2016 in Anita Bandrowski, News & Events, NIFarious Ideas | No Comments »

Yes you read it correctly, I am calling you out on your ability to do open reproducible science.

This 3 minute video should convince you:

Screen Shot 2016-04-26 at 10.03.08 AM

If not, then leave a comment!

Protect Yourself from Zombie Papers

Posted on April 25th, 2016 in Anita Bandrowski, Data Spotlight, NIFarious Ideas | No Comments »

Another fun flier to post around the department.

Zombification of papers: the inability to use or validate information in the paper.
How can we stop this terrible plague on the scientific literature? – RRIDs help get the Key biological reagents identified and authenticated.

Feel free to print this fun flier and post it on your office door!

Screen Shot 2016-04-25 at 9.31.23 AM

RRID: Improve your Impact Factor!

Posted on April 25th, 2016 in Anita Bandrowski, News & Events, NIFarious Ideas | No Comments »

Please feel free to take this fun flier and post it around your lab to help your lab-mates to remember how to get an RRID into your next methods section or grant application.

Screen Shot 2016-04-25 at 9.31.23 AM

Brain Health Registry

Posted on March 3rd, 2014 in Anita Bandrowski, News & Events, NIFarious Ideas | No Comments »

The Brain Health Registry — led by researchers at UCSF — is a groundbreaking, web-based project designed to speed up cures for Alzheimer’s, Parkinson’s and other brain disorders.  It uses online questionnaires and online neuropsychological tests (which are very much like online brain games). It can make clinical trials — which are needed to develop cures — faster, better and less expensive.


The project is scheduled for a public launch in the spring, but we’re inviting you to be among the first to participate and provide feedback.


Click here to see our website and get more information about the Brain Health Registry.

  • It’s easy. It takes a few minutes to sign up and less than 3 hours per year. And it’s all done online, so you can do it from home — or anywhere you have Internet access.
  • It offers a breakthrough. 85% of clinical trials have trouble recruiting enough participants. By creating a large online database of pre-qualified recruits, The Brain Health Registry can dramatically cut the cost and time of conducting clinical trials. This is the first neuroscience project to leverage online possibilities in this way and on this scale.
  • It’s meaningful. With every click of the mouse, you help researchers get closer to a cure for Alzheimer’s and other brain diseases. If Alzheimer’s runs in your family, this may be an important gift to your loved ones.
  • It’s safe. Top scientists from some of the most respected institutions in medicine are leading the Brain Health Registry. They understand your need for privacy, and they will protect it at every step of the way.

We’re currently in our pre-launch phase.  Try it out!  If you offer feedback – and we hope you do – we will read it, consider it carefully, and respond to you directly.


As an early adopter, you can help us in two ways.  You can help in the way all members can help — by answering the questionnaires and taking the online brain tests, you strengthen the database that the scientific community needs.  You can also help us improve our new website – we’ll be making many changes, based on your feedback, before our public launch.


Please take the time to visit our sight, sign up and offer your feedback.

Resource Identification Initiative – AntibodiesOnline is giving away “nerd mugs” to help identify antibodies in your paper.

Posted on December 16th, 2013 in Anita Bandrowski, News & Events, NIFarious Ideas | No Comments »

Dear NIF Community members;

Our partners in crime, FORCE11, are hosting a working group, the Resource Identification Initiative, that is working with journals to make it easier to identify research resources used in the materials and methods of biomedical research through the use of unique identifiers.  For the pilot project, we are concentrating on antibodies, genetically modified organisms and software.  Our goal is to make identification of research resources:  1)  machine-processable;  2)  available outside any paywall;  3)  uniform across journals.  More information can be found at:

Our colleagues at Antibodies On Line have set up a beta testing site specifically for antibodies:

If you use antibodies in your research, or know those that do, please help us test the tools and provide feedback.  Antibodies Online is generously providing nerd mugs and shirts to those who participate (they make great Holiday gifts!).

Do you know what you don’t know? A gap analysis of Neuroscience Data.

Posted on October 17th, 2013 in Anita Bandrowski, Data Spotlight, Inside NIF, NIFarious Ideas | No Comments »

My thesis adviser, a colorful spirit and one whose wisdom will long be missed, used to say that undergraduate or professional students differed from graduate students in that they were asked to learn what was known about a subject, while graduate students were asked to tackle the unknown.

We, in higher education, are essentially seeking to find out what is not known and start to come up with new answers. How does one find out what is not known? In fact, is it possible to do that? Don’t most graduate students or postdocs add onto a lab’s existing body of knowledge? Adding to the unknown by building on the known? If this is how we work then does this create a very skewed version of the brain? How would we even know what is truly unknown?

Now we enter the omics era, where we try to find out all things about a set of things. We no longer want to know about a gene, we want to know about all of the genes, the genome of an organism. We want to account for all things of the type DNA and figure out which parts do what. In neuroscience, this tends to be a little more difficult. Mainly because we do not have a finite list of things that we can account for. We have a large quantity of species with brains, or at least ganglia, we have billions of cells and many more connections between them in a single human brain. The worst part is that these connections are not even static so a wiring diagram is only good for a few minutes for a single brain and then the brain reorganizes some of these connections.

Is the hope for an “omics” approach to neuroscience?

Well, the space is not infinite and has been studied over the last 100+ years so we have some ways of getting at the problem. We have a map!
Can we use this map to figure out some basic information about what we do and do not study? Well, the short answer at least for some things seems to be yes!

The Neuroscience Information Framework ( project has been aggregating data of various sorts that is useful to neuroscientists, and also a set of vocabularies for all of the brain parts, the map of the nervous system. So we can start to look at which labels are used for tagging data, and which are found in the literature? Are all parts of the brain equally represented by relatively even amounts of data or papers or are there hot spots and cold spots for data?

Below is a heat map generated using the Kepler tool for data sources vs brain regions across the canonical brain regions (a hierarchy built to resemble what one may find in a graduate level text book of neuroanatomy).
Screen Shot 2013-10-17 at 1.28.39 PM

Albeit the heat map is very hard to read (the darker the green the more data, you can generate your own by clicking on the graph icon in NIF), there is little doubt that all brain regions are not equal, and some have very little data, while others have a plethora of data begging the question: Are there popular brain regions and not-so-popular brain regions?

Screen Shot 2013-10-28 at 8.38.30 AM

Indeed, there are brain region annotations that are found more often, when looking at data and much like pop stars, they tend to have shorter names. The most popular data label is actually brain, and the least popular appears to be the Oculomotor nerve root. This is starting to tell us that most data is just labeled as “brain vs kidney”, but can we do better as neuroscientists? In fact, we can break down the labels into major regions like hindbrain, midbrain and forebrain and add up all of the data that fit into each of these. Note most of the data are attributed to the forebrain, housing some of the most popular brain regions such as the cerebral cortex and the hippocampus, but the hindbrain also comes back with some reasonable data, mainly for the cerebellum. It turns out that adding up all the data labels for midbrain regions results in an awkward sense that the midbrain may be completely non-essential to brain research. On the other hand, removing the midbrain appears to be essential to life, so why do neuroscientists not know much or at least publish much about the midbrain?

Screen Shot 2013-10-17 at 3.57.58 PM

So if you are hiding a big pile of data about the midbrain in your desk drawer, I would like to formally ask you to share it with NIF (just email so that I can stop thinking of the midbrain as the tissue equivalent of fly-over country.


Resource Identification Guidelines – now at Elsevier

Posted on September 6th, 2013 in Anita Bandrowski, Curation, Interoperability, NIFarious Ideas | No Comments »

The problem of reproducibility of results has been addressed by many groups, as being due to scientists having very large data sets and highlighting the interesting, yet most likely statistically anomalous findings and other science no-no’s like reporting only positive results.

Our group, has been working to make the methods and reagents reporting better and I am happy to report that this group has been seeing resonance of these ideas.

In a group sponsored by FORCE11, a group of researchers, reagent vendors and publishers has been meeting to discuss how to best accomplish better reporting in all of the literature and both the NIH and publishers themselves are now becoming interested in their sucess. The latest and greatest evidence of this can be found on the Elsevier website, as a guideline to authors, however this will soon be followed by a pilot project to be launched at the Society for Neuroscience meeting with over 25 journals and most major publishers.

Of course there is no reason to wait for an editor to ask to put in catalog numbers or stock numbers for transgenic animals. These should be things that we are trained to do in graduate school as good practices for reporting our findings.

We seem to be getting ready to change (or change back) to a more rigorous methods reporting, which should strengthen the recently eroded credibility of the scientific enterprise. I for one, hope that the message that will be communicated is: “scientists don’t hide problems, even endemic ones, we examine them and find workable solutions”.

Top 20 – Publications – of the month!

Posted on July 22nd, 2013 in Anita Bandrowski, Data Spotlight, NIFarious Ideas | No Comments »

At NIF we try to mix it up. Instead of bringing you the top databases this month, we thought that this may be of more interest.

A heart-felt congratulations to the most annotated papers of all time! It is sort of like being the most cited, but it is more like being the most useful.

The winners are clearly publications about databases, congratulations to InterPro and Protein Data Bank, the Gene Ontology itself, and also complete genomes for various organisms. The black horse in the race asks “How many drug targets are there?” and the answer appears to be at least 7000. Heyndrickx and Vandepoele may have the most citations per human being because they wrote the only paper on this list with only 2 authors, most of the rest dilute their citations by an order of magnitude.

In any case, we thought that this would be a fun set of data to look at.

* Mulder et al The InterPro Database 2003 cited 49030 times

* Camon et al The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro 2003 cited 49030 times

* Heyndrickx and Vandepoele Systematic identification of functional plant modules through the integration of complementary data sources 2012 cited 37987 times

* Matsuyama et al ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe 2006 cited 14156 times

* Barbe et al Toward a confocal subcellular atlas of the human proteome 2008 cited 9574 times

* Simmer et al Genome-wide RNAi of C elegans using the hypersensitive rrf-3 strain reveals novel gene functions 2003 cited 8249 times

* Heidelberg et al DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae 2000 cited 8192 times

* Imming et al Drugs, their targets and the nature and number of drug targets 2006 cited 7742 times

* Overington et al How many drug targets are there? 2006 cited 7664 times

* Gibbs et al Rat Genome Sequencing Project Consortium. Genome sequence of the Brown Norway rat yields insights into mammalian evolution 2004 cited 7594 times

* Berman et al The Protein Data Bank 2000 cited 7107 times

* Ceron et al Large-scale RNAi screens identify novel genes that interact with the C elegans retinoblastoma pathway as well as splicing-related components with synMuv B activity 2007 cited 6691 times

* Moran et al Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment 2004 cited 6389 times

* Read et al The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria 2003 cited 6155 times

* Young et al Odorant receptor expressed sequence tags demonstrate olfactory expression of over 400 genes, extensive alternate splicing and unequal expression levels 2003 cited 5935 times

* Buell et al The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv tomato DC3000 2003 cited 5518 times

* Heidelberg et al Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis 2002 cited 5183 times

* Methé et al Genome of Geobacter sulfurreducens: metal reduction in subsurface environments 2003 cited 4084 times

* Nelson et al Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen Listeria monocytogenes reveal new insights into the core genome components of this species 2004 cited 3881 times

* Ward et al Genomic insights into methanotrophy: the complete genome sequence of Methylococcus capsulatus (Bath) 2004 cited 3877 times

The data are composed of annotations aggregated from over 40 individual databases and the Gene Ontology Consortium. For a complete and current list of databases included please see the NIF annotations information page, and to see a complete list of current annotations see the NIF annotations complete data set.

Note, these numbers are accurate for July 20th, 2013, but may change as data are added.

There is a Link between literature and data, it has been there for years, but nobody ever found it

Posted on July 10th, 2013 in Anita Bandrowski, Curation, Data Spotlight, Force11, Interoperability, NIFarious Ideas | No Comments »

The NIH has had a recent request for information about the NIH data catalog to which our group and many others have responded. Many voices including fairly important ones from the white house are now calling for making scientific research data open, available and linked to the publications written about the data. This is a very good thing. It should lead to better handling and comparison of data and better science.

However, sitting in many recent meetings with members of various national libraries, who shall remain nameless, I am astounded to learn that not only the scientists, but also librarians have never found the LinkOut feature in PubMed.

LinkOut is a little option at the bottom of all articles in PubMed hidden by the good staff into complete obscurity, please see the screen shot below if you don’t believe me that such a feature exists.

Screen Shot 2013-07-10 at 3.10.35 PM

The article above links to two data sets, one is based on a curated set of annotations linking genes to genetic disorders, and the other is the a set of statements about antibody reagents used in this paper. Links from other papers lead to computation model code described in the paper, activation foci or data repositories.

Although it is certainly rarely used, the model organism communities, data repositories and researchers, have been diligently adding their data to PubMed in the form of links. We may quibble about the fact that PubMed asks many of us to reduce the specific links to data to generic links that lead to another version of the same article, but the fact is, that the links to data are present! Because they are present, if the National Library of Medicine ever decides to search them, export them, or acknowledge their existence, it would be a treasure trove of data to literature links that would not require a huge new investment in infrastructure.

I am not suggesting that our infrastructure could not be upgraded, in fact we have many more technical gripes that I will not bring up here, but I am suggesting that we all take advantage of the massive investment of time and energy of curators and authors over the last decades to meticulously link their data or data repositories to the literature.

The LinkOut broker has helped NIF aggregate a list of about 250,000 links from ~40 databases, but what PubMed must have is a much much larger set of data. The links provided by NIF can be searched through the NIF site, they can be filtered by category and by database, and they can be extracted and embedded into other sites like science direct (see Of these 1/4 million links that we provide to PubMed, between 100 and 200 users find them per month. I think that we can and should do better.

  • We can ask that PubMed makes links to data prominent.
  • We can ask that any links in PubMed be of good quality, e.g., results of text-mining output should not be included without verification by authors or curators.
  • We can ask that the links show actual data as opposed to the representation of the paper in another site (currently required).

If you feel the sudden urge to be an arm-chair activist, then please let PubMed know that it would be nice if they celebrated the current links between data and publications instead of hiding them.

The experience of a bench scientist with open publishing.

Posted on June 21st, 2013 in Anita Bandrowski, Force11, NIFarious Ideas | No Comments »

I recently asked a bench scientist about her experiences in publishing in this very new mode of scholarly communication, i.e. in F1000Research, which is open access, has an open review process and is about as transparent as the community has ever asked any journal to be. The question was how did she view this process.

To give a bit of background, she is still attempting to publish 3 articles in F1000 research, about work that she has done on tracking down the switch from benign to malignant tumor growth. Two of the articles are now accepted for publication and in the process of being indexed by PubMed (F1000Research 2013, 2:10 (doi: 10.12688/f1000research.2-10.v1), F1000Research 2013, 2:9 (doi: 10.12688/f1000research.2-9.v2)) and the last is in the bowels of the publishing machinery (Witkiewicz et al Article I).

I asked her a set of questions about the review process, which she discusses below. She agreed to let me post them here and just as a note, the articles prior to publication were viewed 1415, 1373 & 1005 times and downloaded 231, 330 & 321 times, respectively. This sort of buzz is seldom generated by published work so I have been quite surprised that it can be generated prior to publication.

Your questions are easy to answer; however, I would like to point out that my answers may not well represent the larger community of younger bench scientist. My sense of right and wrong has been shaped in different countries (Poland, Austria, and Canada) and at different times. Nevertheless, here they are for whatever it is worth:

How do you view the landscape of open scholarly communication, do you get lost in it?

If I do not feel lost in the maze of the new ways of communicating it may be because of not having explored it enough. So far I have been relying mostly on the traditional ways of searching literature: PubMed and following references within articles found that way, as needed. I do get personal copies of the Scientist and Nature Methods and attend meetings in San Diego that are relevant to my work. I think it was in The Scientist that I first read about PLoS ONE and later F1000Research. From the meetings I get new clues for additional searches of the literature on my own.

If you were asked to change your methods to include catalog numbers or unique identifiers, would this make you mad and would you comply?

The catalog numbers for antibodies, the strain of GFP labeled mice and references to cell lines are all in the first versions of the articles, as they should be. These sorts of things although tedious do not bother me and in the long run having all practical details in one easy to find place is helpful.

Were there things you appreciated about having an open review?

Yes, definitely. I very much appreciated the professional editorial help up front. Another and even more critical point is that if the referees listed by the journal decline the invitation to write a review, others, not listed there may be considered as well. I waited too long for second and third reviews not realizing that they would not come.

Were there things that were a lot harder?

No. It is perhaps a little hard to take that defending one’s position does not change anything in the end. The editor does not judge one way or another. However, I do not mind that because the negative comments do not disqualify the article if there are others. That is fair enough. Any rules are fine with me provided all parties play by the same rules. ‘Dura lex sed lex’: harsh law but law.

Do you think that open review is more or less fair than traditional reviews?

Open review is more fair although fewer people are free enough to take sides in public.