Archive for the ‘Essays’ Category

Reproducibility in the Recipes of Science

Posted on March 31st, 2016 in Anita Bandrowski, Essays, Interoperability, News & Events | No Comments »

Hello SciCrunch/NIF Community! Please read below for Dr. Bandrowski’s post on “Reproducibility in the Recipes of Science”

Reproducibility in the Recipes of Science

Reproducibility in science is a very difficult question, with much that has been said about it from industry, government and researchers themselves, for a good summary please see [Nature Special](http://www.nature.com/news/reproducibility-1.17552) on reproducibility.

Many more things can and will be said about this complex issue, but I wanted to ask a different question: do we not already have a good exemplar of reproducibility when we look at our favorite recipe sites? There are list of ingredients, pictures of the finished product, and of course plenty of detailed instructions. There are even places for novice cooks to tell the recipe owner “hey this is hard to reproduce” or “it does not work with Indian saffron”.

Why doesn’t the **methods** section of our papers not look like that?

I would like to introduce the members of the Society for Neuroscience to an initiative that has been trying to move the methods to look a little more like your favorite recipe site, by asking authors to do a little more thorough job when listing their key ingredients, such as antibodies.

It has been going on for a couple of years years as an agreement between several of the key journals in neuroscience, which ask authors to provide an RRID, or a unique identifier for key biological reagents in their methods sections. Currently these RRIDs are generated for antibodies, organisms and software tools.

We are very pleased that [eNeuro](http://eneuro.org/content/3/2/ENEURO.0046-16.2016) has recently joined the charge as has [Neuron](http://www.cell.com/neuron/rrid), so you may be seeing some of these RRIDs in the papers that you read.

However, we know that many journals will allow authors to add the RRIDs to their papers even if the journals are not officially pushing authors themselves, so we hope that you will join the charge with your next paper. Please go to [scicrunch.org/resources](https://scicrunch.org/resources) and search for your key biological resources, open the “cite this” box and copy the citation into your methods. This way, tracking down the ingredients used in any paper becomes much easier and that is of course the first step in experimental reproducibility.

Eating my own Dog Food!

Posted on July 4th, 2015 in Anita Bandrowski, Curation, Interoperability, News & Events | No Comments »

While not all of you have been fortunate enough to attend the first Beyond the PDF meeting, I will say this; it was eye opening for this scientist. To me, the most memorable statement from the meeting was when Geoffrey Bilder argued from the back of the room that we should all Eat Our Own Dog Food! What he meant was that anyone building tools should actually use them or proclaiming any broad “thou shalt-s” should himself live up to the particular proclamation.

Easier said than done, Geoffrey!

In the years since this historic meeting, these statements have been eating away at my psyche.

I lead the Resource Identification Initiative, a project to add unique identifiers to all papers that use: antibodies, model organisms, software tools or databases. Basically I am telling authors to do “my bidding” and make their papers better to search and give academic credit to developers of software tools like R or ImageJ. I am asking these authors to help others selflessly and do something different than they have done before.

When submitting a paper to Frontiers in NeuroInformatics, as a middle author at the very beginning of the RII project, I felt very reluctant to add RRIDs to the paper. Who was I to suggest such a thing? I waited for the editor to remind us to add the identifers, I waited and no question came. Before final submission, I overcame my very uncharacteristic muteness and asked my collaborators to add the RRIDs to a table where I felt they were appropriate. It turned out that my colleagues did not object and the journal editor, also didn’t say anything about including them. His journal was not yet on board, something that has been remedied since.

Why did I feel so strongly that I should not include an identifier for tools while telling others to do it?
What was I afraid of?
Change is hard!

I am really not sure now what I was so afraid of because after overcoming this initial scientific recalcitrance I simply put RRIDs in the next paper without a second thought and continued to put them in since.

So as I was drafting this blog, a colleague asked me to contribute to a table in her paper, I will be one of those middle authors (huge paper with tons of authors), but this time as with my own papers I have asked her to include the RRIDs without being afraid; it took me about 8 minutes to pull all relevant RRIDs from scicrunch.org/resources and the paper was just submitted. I do not care if the journal is participating in the initiative officially or not.

I guess that what I have learned from all of this, is that once you accept change it becomes the new normal and RRIDs are a great new normal. Thanks Geoffrey for nagging me, I am very glad to say that I have Eaten My Own Dog Food!

NIH Plan for Increasing Access to Scientific Publications and Digital Scientific Data

Posted on March 4th, 2015 in Anita Bandrowski, Interoperability, News & Events | No Comments »

The NIH put out a plan to increase access to scientific data.

What do they really mean and what does this mean to researchers?

Researchers have been asked to provide PubMed Central PMC identifiers in grant applications and this single requirement has pushed authors to submit their papers to PMC and many journals do this as a matter of fact leading to a large corpus of publications that are fully searchable texts. I think that researchers are now familiar with this process and see the benefit, as I do when I am at home and need to look up a piece of information from my old paper that a publisher tries to charge me $36 to find.

What happens to data and what is meant by data?
Will authors need to submit all of their supplementary data files to PMC?

Perhaps not, some wording in the document from the NIH shows that they know that data is not homogeneous. They recognize that they can’t handle the diversity in a good way without working with existing repositories.

They point out that data should be FAIR:
Findable
Accessible
Interoperable
Reusable
This is known as the FAIR standard.

They also state:
“A strategy for leveraging existing archives, where appropriate, and fostering public- private partnerships with scientific journals relevant to the agency’s research; Encourage public-private collaboration; Encourage public-private collaboration to … otherwise assist with implementation of the agency plan; Ensure that publications and metadata are stored in an archival solution that… uses standards, widely available and, to the extent possible, nonproprietary archival formats for text and associated content (e.g., images, video, supporting data).”

So will there be a set of repositories that are “approved” community standards? Will the NIH have a box for grantees to put in their community repository IDs?
Seems like a good direction!

For now, NIF has a very large list of repositories that will house your data.
Try this registry search.
There are over 1000 that respond to the query, but which one or which ones can you use?
It does not seem that the NIH is willing to be proscriptive, so it will be left to individual communities to rally around repositories that best serve them.
For now, NIF just aggregates the information around these and attempts to make them findable (the F in FAIR).

Statement of Commitment from Earth and Space Science Publishers and Data Facilities:

Posted on January 14th, 2015 in Anita Bandrowski, Interoperability, News & Events | No Comments »

This is an important committment from the CODATA and earth science community. Looking quite forward to circulating a similar document from the Neuroscience community.

 

Coalition on Publishing Data in the Earth and Space Sciences

 
Earth and space science data are special resources, critical for advancing science and
addressing societal challenges – from assessing and responding to natural hazards and
climate change, to use of energy and natural resources, to managing our oceans, air, and
land. The need for and value of open data have been encoded in major Earth and space
science society position statements, foundation initiatives, and more recently in
statements and directives from governments and funding agencies in the United States,
United Kingdom, European Union, Australia, and elsewhere. This statement of
commitment signals important progress and a continuing commitment by publishers and
data facilities to enable open data in the Earth and space sciences.
 

Scholarly publication is a key high­value entry point in making data available, open,
discoverable, and usable. Most publishers have statements related to the inclusion or
release of data as part of publication, recognizing that inclusion of the full data enhances
the value and is part of the integrity of the research. Unfortunately, the vast majority of
data submitted along with publications are in formats and forms of storage that makes
discovery and reuse difficult or impossible.
 

Repositories, facilities, and consortia dedicated to the collection, curation, storage, and
distribution of scientific data have become increasingly central to the scientific enterprise.
The leading Earth and space science repositories not only provide persistent homes for
these data, but also ensure quality and enhance their value, access, and reuse. In addition
to data, these facilities attend to the associated models and tools. Unfortunately, only a
small fraction of the data, tools, and models associated with scientific publications makes
it to these data facilities.
 

Connecting scholarly publication more firmly with data facilities thus has many
advantages for science in the 21st century and is essential in meeting the aspirations of
open, available, and useful data envisioned in the position statements and funder
guidelines. To strengthen these connections, with the aim of advancing the mutual
interests of authors, publishers, data facilities, and end­users of the data, a recent Earth
and space science data and publishing conference, supported by the National Science
Foundation, was held at AGU Headquarters on 2­3 October 2014. It brought together
major publishers, data facilities, and consortia in the Earth and space sciences, as well as
governmental, association, and foundation funders. Further informational meetings were
held with Earth and space science societies, publishers, facilities, and librarians that were
not present at the October meeting. Collectively the publishers, data facilities, and
consortia focused on open data for Earth and space science formed a working group:
Coalition on Publishing Data in the Earth and Space Sciences. As one outcome, this
group collectively endorsed the following commitments to make meaningful progress
toward the goals above. We encourage other publishers and data facilities and consortia
to join in support.
 

Signatory data facilities, publishers, and societies, in order to meet the need for
expanding access to data and to help authors, make the following commitments:
● We reaffirm and will ensure adherence to our existing repository, journal, and
publisher policies and society position statements regarding data sharing and
archiving of data, tools, and models.
● We encourage journals, publishers, and societies that do not have such statements
to develop them to meet the aspirations of open access to research data and to
support the integrity and value of published research. Examples of policies and
position statements from signatory journals and societies are listed here.
● Earth and space science data should, to the greatest extent possible, be stored in
appropriate domain repositories that are widely recognized and used by the
community, follow leading practices, and can provide additional data services.
We will work with researchers, funding agencies, libraries, institutions, and other
stakeholders to direct data to appropriate repositories, respecting repository
policies.
● Where it is not feasible or practical to store data on community­approved
repositories, journals should encourage and support archiving of data using
community­ established leading practices, which may include supplementary
material published with an article. These should strive to follow existing NISO
guidelines.

Over the coming year, the signatory Earth and space science publishers, journals, and
data facilities will work together to accomplish the following:
● Provide a usable online community directory of appropriate Earth and space
science community repositories for data, tools, and models that meet leading
standards on curation, quality, and access that can be used by authors and journals
as a guide and reference for data deposition.
● Promulgate metadata information and domain standards, including in the online
directory, to help simplify and standardize data deposition and re­use.
● Promote education of researchers in data management and organize and develop
training and educational tools and resources, including as part of the online
directory.
● Develop a working committee to update and curate this directory of repositories.
● Promote referencing of data sets using the Joint Declaration of Data Citation
Principles, in which citations of data sets should be included within reference
lists.
● Include in research papers concise statements indicating where data reside and
clarifying availability.
● Promote and implement links to data sets in publications and corresponding links
to journals in data facilities via persistent identifiers. Data sets should ideally be
referenced using registered DOI’s.
● Promote use of other relevant community permanent identifiers for samples
(IGSN), researchers (ORCID), and funders and grants (FundRef).
● Develop workflows within the repositories that support the peer review process
(for example, embargo periods with secure access) and within the editorial
management systems that will ease transfer of data to repositories.
 

A major challenge today is that much more Earth and space science data are being
collected than can be reasonably stored, curated, or accessed. This includes physical
samples, information about them, and digital data (sometimes streaming at rates of
terabytes per minute). Researchers and publishers are looking for guidance on what
constitutes archival data across diverse fields and disciplines. The major data repositories
provide leading practices that should help guide the types of samples, data, metadata, and
data processing descriptions that should be maintained, including information about
derivations, processing, and uncertainty.
 

To enable improved coordination and availability of open data, we encourage funders to
support these commitments, ensure a robust infrastructure of data repositories, and enable
broad outreach with researchers. As a general rule, data management plans promulgated
by funders should indicate that release into leading repositories, where available, of those
data necessary to support published results is expected at publication. The ultimate
measure of success is in the replicability of science, generation of new discoveries, and in
progress on the grand challenges facing society that depend on the integration of open
data, tools, and models from multiple sources.
 

Signatories
American Astronomical Society
American Geophysical Union
American Meteorological Society
Biological and Chemical Oceanography Data Management Office, Woods Hole
Oceanographic Institution (BCO­DMO)
Center for Open Science
CLIVAR and Carbon Hydrographic Data Office (CCHDO)
Community Inventory of EarthCube Resources for Geosciences Interoperability
(CINERGI)
Council of Data Facilities
Elsevier
European Geophysical Union
Geological Data Center of Scripps Insitution of Oceanography
ICSU World Data System
Incorporated Research Institutions for Seismology (IRIS)
Integrated Earth Data Applications (IEDA)
John Wiley and Sons
Magnetics Information Consortium (MagIC)
Mineralogical Society of America
National Snow and Ice Data Center
Nature Publishing Group
Proceedings of the National Academy of Sciences
Rolling Deck to Repository (R2R)
Science

Big Data vs Small Data: Is it really about size?

Posted on October 31st, 2014 in Anita Bandrowski, Curation, Data Spotlight, Inside NIF, Interoperability | No Comments »

We have been hearing for some time that when it comes to data, it is all about size. The bigger is better mantra has been all over the press, but is it really size that matters?

There are the so called “Big Data” projects such as the Allen Brain Atlas, which generates data, sans hypothesis, over the whole brain for thousands of genes. This is great because the goal of the project is to generate consistent data and not worry about which disease will or will not be impacted by each data point. That may be a great new paradigm for science, but there are not many projects like this “in the wild”.

Most data is being generated in the world of science can be considered small, i.e., would fit on a personal computer, and there are a LOT of labs out there generating this sort of data. So the question that we addressed in the recent the Big Data issue of Nature Neuroscience, is whether small data could organize to become big data? If such a thing is desirable, then what would be the steps to accomplish this lumping?

Here are the principles that we have extracted from working on NIF that we think will really help small data (from Box 2):

Discoverable. Data must be modeled and hosted in a way that they can be discovered through search. Many data, particularly those in dynamic databases, are considered to be part of the ‘hidden web’, that is, they are opaque to search engines such as Google. Authors should make their metadata and data understandable and searchable, (for example, use recognized standards when possible, avoid special characters and non-standard abbreviations), ensure the integrity of all links and provide a persistent identifier (for example, a DOI).

Accessible. When discovered, data can be interrogated. Data and related materials should be available through a variety of methods including download and computational access via the Cloud or web services. Access rights to data should be clearly specified, ideally in a machine-readable form.

Intelligible. Data can be read and understood by both human and machine. Sufficient metadata and context description should be provided to facilitate reuse decisions. Standard nomenclature should be used, ideally derived from a community or domain ontology, to make it machine readable.

Assessable. The reliability of data sources can be evaluated. Authors should ensure that repositories and data links contain sufficient provenance information so that a user can verify the source of the data.

Useable. Data can be reused. Authors should ensure that the data are actionable, for example, that they are in a format in which they can be used without conversion or that they can readily be converted. In general, PDF is not a good format for sharing data. Licenses should make data available with as few restrictions as possible for researchers. Data in the laboratory should be managed as if it is meant to be shared; many research libraries now have data-management programs that can help.

 

RRID’s are in the wild! Thanks to JCN and PeerJ

Posted on April 9th, 2014 in Anita Bandrowski, Curation, Essays, News & Events | 1 Comment »

We believe that reproducing science starts with being able to know what “materials” were used in generating the results.

Along with a truly dedicated group of volunteers from academia, government and non-government institutes, publishers and commercial antibody companies we have been running the Resource Identification Initiative (RII).

This initiative is meant to accomplish the following lofty goal: Ask authors to uniquely identify their antibodies (no easy task), organisms (an even harder task), and the databases and software tools that they used in their paper.

In order to ask them at the appropriate time, we gathered a group of journal chief editors to help us ask this question when authors are most interested in answering the question during the process of publication. We created many things to help them identify these things such as a database that stores information for 5 of the most common species used in experiments, antibody catalogs from over 200 vendors, and a database and tool catalog that contains over 3000 software tools and over 2500 academic databases, the largest of its’ kind.

We have been granted 3 months to determine if authors would actually do this. It has been two months, we fielded requests from about 30 users who could not find their resources, there have been more than 40 new software tools or databases registered to our tools registry, and more than 100 antibodies, but we kept waiting for RRIDs to show up in the literature.

Today our wait is over thanks to two papers, Khalil and Levitt in the Journal of Comparative Neurology and Joshi et al in PeerJ.

These authors apparently were able to correctly identify resources such as Matlab, NeuroLucida, ProteinDataBank and antibodies including anti-cholera toxin antibody from List Bio.

What does this tell us?

Well to start that this process is not impossible! That identifiers do exist for many things or the process of obtaining new ones is not so difficult that people can’t do this. It also tells us that when asked at the right time, authors are willing to go the extra step, find and provide identifiers to their reagents or software tools!

Great, but why do I care about a single paper that uses an antibody or Matlab?

Well, it turns out that for many years JCN and NIF staff have been working diligently to link papers through that same identifier so in the case of this cholera toxin antibody we have marked 23 other papers that have used it since 2006.

Screen Shot 2014-04-11 at 1.04.40 PM

 

 

 

Open Science? Try Good Science.

Posted on April 7th, 2014 in Author, Curation, Essays, Maryann Martone, News & Events | 1 Comment »

If the Neuroscience Information Framework is any guide, we are certainly in an era of “Openness” in biomedical science.  A search of the NIF Registry of tools, databases and projects for biomedical science for “Open” leads to over 700 results,  ranging from open access journals, to open data, to open tools.  What do we mean by “open”?  Well, not closed or, at least, not entirely closed.  These open tools are, in fact, covered by a myriad of licenses and other restrictions on their use.  But, the general theme is that they are open for at least non-commercial use without fees or undue licensing restrictions.

Open Science Share button

So, is Open Science already here?  Not exactly.  Open Science is more than a subset of projects that make data available or sharing of software tools, often because they received specific funding to do so.  According to Wikipedia, “Open science is the umbrella term of the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open notebook science, and generally making it easier to publish and communicate scientific knowledge.”   Despite the wealth of Open platforms, most of the products of science, including, most notably, the data upon which scientific insights rests, remain behind closed doors.  While attitudes and regulations are clearly changing, as the latest attempts by PLoS to establish routine sharing of data illustrate (just Google #PLOSfail), we are not there yet.

Why are so many pushing for routine sharing of data and a more open platform for conducting science?  I became interested in data sharing in the late 1990’s as a microscopist as we started to scale up rate and breadth at which we could acquire microscopic images.  Suddenly, due to precision stages and wide field cameras, we were able to image tissue sections at higher resolution over much greater expanses of tissue than before, when we were generally restricted to isolated snapshots or low magnification surveys.   I knew that there was far more information within these micrographs and reconstructions than could be analyzed by a single scientist.  It seemed a shame that they were not made more widely available.  To help provide a platform, we established the Cell Centered Database, which has recently merged with the Cell Image Library.  Although we were successful in the CCDB in attracting outside researchers to deposit their data, we were rarely contacted by researchers wanting to deposit their data. most of the time we had to ask, although many would release the data if we did.  But I do distinctly remember one researcher saying to me:  “I understand how sharing my data helps you, but not me”.

True.  So in the interest of full disclosure, let me state a few things.  I try to practice Open Science, but am not fanatical. I try to publish in open access journals, although I am not immune to the allure of prestigious closed journals.  I do blog, make my slides available through Slide Share, and upload pre-prints to Research Gate.  But I continue to remain sensitive to the fact that through my informatics work in the Neuroscience Information Framework and my advocacy for transforming scholarly communications through FORCE11 (the Future of Research Communications and e-Scholarship), I am now in a field where:  A)  I no longer really generate data.  I generate ontologies and other information artefacts, and these I share, but not images, traces, sequences, blots, structures;  B)  I do benefit when others share their data, as I build my research these days on publicly shared data.

But do I support Open Science because I am a direct beneficiary of open data and tools?  No.  I support Open Science because I believe that Open Science = Good Science.  To paraphrase Abraham Lincoln:  “If I could cure Alzheimer’s disease by making all data open, I would do so;  if I could cure Alzheimer’s disease by making all data closed, I would do so.”  In other words, if the best way to do science is the current mode:  publish findings in high impact journals that only become open access after a year, make sure no one can access or re-use your data, make sure your data and articles are not at all machine-processable, publish under-powered studies with only positive results, allow errors introduced by incorrect data or analyses to stay within the literature for years, then I’m all for it.

But, we haven’t cured Alzheimer’s disease or much else in the neurosciences lately.  That’s not to say that our current science, based on intense competition and opaque data and methods, has not produced spectacular successes.  It surely has.  But the current system has also led to some significant failures as well, as the retreat of pharmaceutical companies from neuroscience testifies.  Can modernizing and opening up the process of science to humans and machines alike accelerate the pace of discovery?  I think we owe the taxpayers, who fund our work in hope of advancing society and improving human health, an honest answer here.   Are we doing science as well as it can be done?

I don’t believe so.  And, as this is a blog and not a research article, I am allowed to state that categorically.  I believe that at a minimum, Open Science pushes science towards increased transparency, which, in my view, helps scientists produce better data and helps weed out errors more quickly.  I also believe that our current modes of scientific communication are too restrictive, and create too high a barrier for us to make available all of the products of our work, and not just the positive results.  At a maximum, I believe that routine sharing of data will help drive biomedical sciences towards increased discovery, not just because we will learn to make data less messy, but because we will learn to make better use of the messy data we have.

Many others have written on why scientists are hesitant or outright refuse to share their data and process  (see #PLOSfail above) so I don’t need to go into detail here.  But at least one class of frequent objections has to do with the potential harm that sharing will do to the researcher who makes data available.  A common objection is that others will take advantage of data that you worked hard to obtain before you can reap the full benefits.  Others say that there is no benefit to sharing negative results, detailed lab protocols or data, or blogging, saying that it is more productive for them to publish new papers than to spend time making these other products available.   Others are afraid that if they make data available that might have errors, their competitors would attack them and their reputations would be tarnished.  Some have noted that unlike in the Open Source Software community, where identifying and fixing a bug is considered a compliment, in other areas of scholarship, it is considered an attack.

All of these are certainly understandable objections.  Our current reward system does not provide much incentive for Open Science, and changing our current culture, as I’ve heard frequently, is hard.  Yes it is.  But if our current reward system is supporting sub-optimal science, then don’t we as scientists have an obligation to change it?  Taxpayers don’t fund us because they care about our career paths.  No external forces that I know of support, or even encourage, our current system of promotion and reward:  it is driven entirely by research scientists.  Scientists run the journals, the peer-review system, the promotion committees, the academic administration, the funding administration, the scientific societies and the training of more scientists.  Given that non-scientists are beginning to notice, as evidenced by articles in the Economist (2013) and other non-science venues about lack of reproducibility, perhaps it’s time to start protecting our brand.

While many discussions on Open Science have focused on potential harm to scientists who share their data and negative results, I haven’t yet seen discussions on the potential harm that Opaque Science does to scientists.  Have we considered the harm that is done to graduate students and young scientists when they spend precious months or years trying to reproduce a result that was perhaps based on faulty data or selective reporting of results?  I once heard a heart-breaking story of a promising graduate student who couldn’t reproduce the results of a study published in a high impact journal.  His advisor thought the fault was his, and he was almost ready to quit the program.  When he was finally encouraged to contact the author, he found that they couldn’t necessarily reproduce the results either.   I don’t know whether the student eventually got his degree, but you can imagine the impact such an experience has on young scientists.   Beyond my anecdotal example above, we have documented examples where errors in the literature have significant effects on grants awarded or the ability to publish papers that are in disagreement (e.g., Miller,  2006).  All of these have a very real human cost to science and scientists.

On a positive note, for the first time in my career, since I sipped the Kool Aid back in the early days of the internet, I am seeing real movement by not just a few fringe elements, but by journals, senior scientists, funders and administrators, towards change.  It is impossible to take a step without tripping over a reference to Big Data or metadata.  Initiatives are underway to create a system of reward around data in the form of data publications and data citations.  NIH has just hired Phil Bourne, a leader in the Open Science movement, as Associate Director of Data Science.  And, of course, time is on our side, as younger scientists and those entering into science perhaps have different attitudes towards sharing than their older colleagues.   Time will also tell whether Open Science = Good Science.  If it doesn’t, I promise to be the first to start hoarding my data again and publishing only positive results.

References:

Economist, How Science Goes Wrong, Oct 19, 2013

Miller, G.  (2006) A scientist’s nightmare: software problem leads to five retractions.  Science, 22, 314, pp 1856-1857.

 

Blog originally posted to Wiley Exchanges.

Resource Identification Quarter is Rapidly Approaching

Posted on January 27th, 2014 in Anita Bandrowski, Interoperability | No Comments »

What is resource identification quarter?
At the 2012 Society for Neuroscience meeting, NIF met with the editors-in-chief of about 25 neuroscience journals attempting to convince them that research resources (software tools, antibodies, and model organisms) should be treated as first class research objects and cited appropriately. At a follow up meeting at the National Institutes of Health, the journal editors agreed to start a pilot project to identify these resources using a uniform standard.

Why should we identify research objects?
The neuroscience literature unfortunately does not contain enough information for anyone to find many research resources. In a very typical paper by Paz et al, 2010 an antibody is referred to as “GFAP, polyclonal antibody, Millipore, Temecula, CA.” If someone tried to find the antibody in 2010, they would see that there were 40 antibodies at Millipore, today after merging with EMD, the catalog contains 51 antibodies for GFAP with no indication which ones may have been present before and which were new. Without even a catalog number, a researcher can potentially contact the authors or buy all of these antibodies and try them to see if any of them have a similar profile. You can imagine that at several hundred dollars per antibody and a number of weeks that would need to be spent to optimize staining, it seems no better than a shot in the dark.

What about transgenic mice, they can’t possibly be that hard to identify?
After many conversations with model organism database curators at MGI, it turns out that people’s data is often not included in the database specifically because the world experts on mice can’t tell which mouse is being used in the paper. The nomenclature of transgenic mice is somewhere between an art form and black magic, however including a simple stock number seems like a fairly simple solution of the problem. The nomenclature authorities for mice, rats, worms and flies happen to have convenient forms to ask for help or to name a new critter (the list is available here).

How should we identify research objects?
We need a set of unique identifiers, like Orcid ids or social security numbers, of the research resources. NIF has created a web page for authors to find these in one, reasonably convenient location. Searching for information that authors should have such as the catalog number for an antibody or a stock number for an organism, should give a result in the appropriate tab and a rather large “cite this” button will appear with the appropriate citation style.
A more detailed set of instructions and the search box can be found here: www.scicrun.ch/resources

So if you are considering submitting a paper over the next few months to one of the journals below, you will be asked to include a unique identifier for your research objects. We certainly hope that this is reasonably easy to accomplish and eagerly await a time when we would be able to ask the question: which antibody did Paz et al actually use on which transgenic mouse?

If you would like to see if the antibodies you used in any of your previous papers can also be annotated with the unique identifiers, we will be happy to help. Our friends at antibodies online are giving away coffee mugs and t-shirts to help get people interested in doing so. To annotate your paper you can go through their survey at:
http://www.antibodies-online.com/resource-identification-initiative/ This will earn a free mug and the data will be included in the antibody registry for posterity.

List of Participating Journals:
Annals of Neurology
Brain
Amer Journal of Human Genetics
Behavioral and Brain Functions
Biological Psychiatry
BMC Neurology
BMC Neuroscience
Brain and Behavior
Brain Cell Biology
Brain Structure & Function
Cell
Cerebral Cortex
Developmental Neurobiology
Frontiers in Human Neuroscience
Frontiers in Neuroinformatics
Hippocampus
J. Neuroscience
Journal of Comparative Neurology
Journal of Neuroinflammation
Journal of Neuroinformatics
Journal of Neuroscience Methods
Molecular Brain
Molecular Pain
Nature
Nature Neuroscience
Neural Development
Neuroimage
Neuroscience

Vasculature Morphogenesis: Synopsis of three related articles by Halina Witkiewicz, Phil Oh and Jan Schnitzer

Posted on January 3rd, 2014 in Data Spotlight, Essays, General information | No Comments »

The following is is a guest blog by Krystyna Gutowska.
The topic is a set of three articles by Witkiewicz et al. previously blogged about by NIF as an exemplar of open access publishing and a new open review process utilized by the Faculty of 1000 journals.

Common sense dictates that inhibiting malignant tumor growth (cancer) should be possible by inhibiting formation of new blood vessels. An artistic vision of that belief was painted in 1940 by Diego Rivera “The Hands of Dr Moore” (San Diego Museum of Art in San Diego). In 1972 Judah Folkman presented the concept for therapy of solid tumors by the anti-agiogenesis treatment [Greek angeion = vessel] for which he became well known. Yet, the quest for such treatment has not resulted in finding the cure for cancer, so far.

Screen Shot 2014-01-06 at 9.32.10 AM

In countless review articles the formation of tumor vessels was presented graphically to reflect researcher’s current understanding of how it was supposed to happen at the cellular and molecular level. Instead of such artistic and graphic representations, the three studies published in F1000Research on January 10th 2013 provide photographic documentation of the process that turned out to be different from previously imagined. Nobody had expected blood elements to be produced at the site of new tissue growth first and subsequently single cells to assemble around them into vessels. That would have been counter-intuitive; after all, to handle any fluid one needs a vessel. Besides, the blood formation after birth was supposed to be localized in bone marrow (medulla ossea), not extramedullary. Therefore the new vessels were assumed to be formed first and filled with blood next. That misconception prevailed in science for a long time. The essential contributing factor was the false belief on how the red blood elements were formed. Specifically, how the erythrocyte precursors (erythroblasts) were eliminating their nuclei to become anucleated erythrocytes, meaning: red cells without nucleus.

Paradoxically, the concept of the nucleus being separated from cytoplasm by expulsion (resembling separation of egg yolk from the white) came from 1967 morphological studies that used electron microscopy, just like the 2013 studies discussed here. Although the same methodology was used at both times, the animal models subjected to the ultrastructural analysis were different. The critical difference was the type of analyzed tissue (bone marrow or spleen versus tumor). In the bone marrow and spleen the erythrogenesis is relatively rare and to observe it the process had to be experimentally stimulated by inducing anemia in dogs or mice. However, the tumor-hosting mice used recently were not subjected to any pathogenic treatment except the surgical implantation of cultured tumor cells into dorsal skin fold of the animals. In that model the erythrogenesis was spontaneous and frequent enough for capturing images of various stages. The expulsion of the nucleus from the erythroblasts was not happening and macrophages (the cells said to internalize and digest the released nuclei) were rare in that environment. The nuclei were degraded in the process of erythrogenic autophagy not by the macrophages but by the very cells they were a part of. Each erythrogenic cell was remodeled into a few smaller red vacuoles by a nucleo-cytoplasmic conversion. No nuclear waste was released for macrophages to clean up. Such red vacuoles are no longer cells, although they have been referred to as the erythrocytes (red cells). The 2013 studies use the term erythrosome (red body) as a synonym for erythrocyte because it describes the sub-cellular nature of those blood elements better. The cells converting into erythrosomes were visually recognizable in tissue sections and their location outside bone marrow was evident.

The erythrogenesis turned out to be the central element of the vasculature formation induced locally by the growing tumor or by healing of the surgical injury inflicted by the implantation of the tumor cells; figuratively speaking and literally, i.e., as a source of energy and as a structural morphogenetic element chemotacticly attracting cells. (Erythrocytes are known to secrete high energy molecules, ATP). Those ultrastructural studies demonstrated for the first time that the extramedullar hematopoiesis and vasculogenesis were inseparable in live animals; hence, the term ‘vasculature’ included blood and vessels. Historically, the term ‘angiogenesis’ (expansion of existing vessels) was replaced by ‘neovasculogenesis’ (generation of new vessels from bone marrow derived precursor cells). Now, the most appropriate term appears to be ‘vasculature morphogenesis’ (formation of blood and vessels from local tissue stem cells).

In the first article the authors discuss the implications of the new findings for malignant as well as non-malignant tissue or organ morphogenesis (the organoblasts concept) and for tissue definition. The second article shows formation of a capsular vasomimicry that could potentially lead to spreading of tumor cells to various locations by fusing with morphologically similar lymphatic vessels or veins, i.e. to metastasis. The third article deals with tumor energy metabolism and explains the over half a century old Otto Warburg’s conjecture by cellular heterogeneity of tumors. That connection was missed by metabolic studies based on samples isolated from tissues because the isolation process destroys the tissue. Also missed was the role for the anaerobic metabolic pathway discovered by Warburg during cell division, when the structural changes in the mitochondria indicate their temporary functional disability. That reversible malfunction correlates with a particular stage of the cell cycle (mitosis).

The 1967 conclusion on the mechanism of the erythrocytic enucleation by expulsion of the nucleus derived form pathologically changed systems was erroneously extrapolated to systems not modified experimentally. Consequently it had a long-lasting and misleading effect on multiple studies in vascular biology. The appealing concept of inhibiting angiogenesis to stop tumor growth suffered from lack of understanding the cellular interactions leading to the vasculature formation (blood as well as vessels). To study those interactions the tissue must be preserved. Electron microscopy supported by immunocytochemistry (using antibodies to identify specific molecules in situ) is the method of choice for such purpose. The observations reported in the three articles discussed here appear of profound significance for tissue morphogenesis in general, not only in malignancy.

Refocusing on inhibiting tumor vasculature formation, with the full force of currently available technologies, presents a realistic chance to cure solid tumors in large number of patients despite the tumors’ genetic diversity perpetually introduced during cell divisions. Such strategy would not interfere with proliferation of normal cells that does not result in tissue growth. One prominent example is renewal of the gut epithelium. From clinical point of view the gut epithelium is critically important because of the role it plays in absorption of nutrients.

Where is your data now?

Posted on December 20th, 2013 in Anita Bandrowski, Essays, Interoperability | No Comments »

An interesting report out of the UBC, there is very little data left after a few years.

Vines et al. 2014 point out that after articles are published, the data is lost at a rate of about 17% per year.
This means that my experience with my data on those zip discs from the 1990′s is probably a fairly common experience of many researchers. I guess I just have to accept that I will never actually look at that old data and other people who may have found it useful will never find it.

Is it ok that all of this data is lost? Well it has always been that way, but the question is should it? We have the ability to share data sets with each other in unprecedented ways, almost for free. The granting agencies are requiring us to do it, and we are getting a lot of push from the media. Will this be enough to change the way that I and my colleagues think about data? Certainly hope so.

If you would like to have a place to put your data, please share it with any of the standard repositories in your field, or if you can’t find one share it with NIF.