Archive for the ‘Interoperability’ Category

Reproducibility in the Recipes of Science

Posted on March 31st, 2016 in Anita Bandrowski, Essays, Interoperability, News & Events | No Comments »

Hello SciCrunch/NIF Community! Please read below for Dr. Bandrowski’s post on “Reproducibility in the Recipes of Science”

Reproducibility in the Recipes of Science

Reproducibility in science is a very difficult question, with much that has been said about it from industry, government and researchers themselves, for a good summary please see [Nature Special](http://www.nature.com/news/reproducibility-1.17552) on reproducibility.

Many more things can and will be said about this complex issue, but I wanted to ask a different question: do we not already have a good exemplar of reproducibility when we look at our favorite recipe sites? There are list of ingredients, pictures of the finished product, and of course plenty of detailed instructions. There are even places for novice cooks to tell the recipe owner “hey this is hard to reproduce” or “it does not work with Indian saffron”.

Why doesn’t the **methods** section of our papers not look like that?

I would like to introduce the members of the Society for Neuroscience to an initiative that has been trying to move the methods to look a little more like your favorite recipe site, by asking authors to do a little more thorough job when listing their key ingredients, such as antibodies.

It has been going on for a couple of years years as an agreement between several of the key journals in neuroscience, which ask authors to provide an RRID, or a unique identifier for key biological reagents in their methods sections. Currently these RRIDs are generated for antibodies, organisms and software tools.

We are very pleased that [eNeuro](http://eneuro.org/content/3/2/ENEURO.0046-16.2016) has recently joined the charge as has [Neuron](http://www.cell.com/neuron/rrid), so you may be seeing some of these RRIDs in the papers that you read.

However, we know that many journals will allow authors to add the RRIDs to their papers even if the journals are not officially pushing authors themselves, so we hope that you will join the charge with your next paper. Please go to [scicrunch.org/resources](https://scicrunch.org/resources) and search for your key biological resources, open the “cite this” box and copy the citation into your methods. This way, tracking down the ingredients used in any paper becomes much easier and that is of course the first step in experimental reproducibility.

Eating my own Dog Food!

Posted on July 4th, 2015 in Anita Bandrowski, Curation, Interoperability, News & Events | No Comments »

While not all of you have been fortunate enough to attend the first Beyond the PDF meeting, I will say this; it was eye opening for this scientist. To me, the most memorable statement from the meeting was when Geoffrey Bilder argued from the back of the room that we should all Eat Our Own Dog Food! What he meant was that anyone building tools should actually use them or proclaiming any broad “thou shalt-s” should himself live up to the particular proclamation.

Easier said than done, Geoffrey!

In the years since this historic meeting, these statements have been eating away at my psyche.

I lead the Resource Identification Initiative, a project to add unique identifiers to all papers that use: antibodies, model organisms, software tools or databases. Basically I am telling authors to do “my bidding” and make their papers better to search and give academic credit to developers of software tools like R or ImageJ. I am asking these authors to help others selflessly and do something different than they have done before.

When submitting a paper to Frontiers in NeuroInformatics, as a middle author at the very beginning of the RII project, I felt very reluctant to add RRIDs to the paper. Who was I to suggest such a thing? I waited for the editor to remind us to add the identifers, I waited and no question came. Before final submission, I overcame my very uncharacteristic muteness and asked my collaborators to add the RRIDs to a table where I felt they were appropriate. It turned out that my colleagues did not object and the journal editor, also didn’t say anything about including them. His journal was not yet on board, something that has been remedied since.

Why did I feel so strongly that I should not include an identifier for tools while telling others to do it?
What was I afraid of?
Change is hard!

I am really not sure now what I was so afraid of because after overcoming this initial scientific recalcitrance I simply put RRIDs in the next paper without a second thought and continued to put them in since.

So as I was drafting this blog, a colleague asked me to contribute to a table in her paper, I will be one of those middle authors (huge paper with tons of authors), but this time as with my own papers I have asked her to include the RRIDs without being afraid; it took me about 8 minutes to pull all relevant RRIDs from scicrunch.org/resources and the paper was just submitted. I do not care if the journal is participating in the initiative officially or not.

I guess that what I have learned from all of this, is that once you accept change it becomes the new normal and RRIDs are a great new normal. Thanks Geoffrey for nagging me, I am very glad to say that I have Eaten My Own Dog Food!

NIH Plan for Increasing Access to Scientific Publications and Digital Scientific Data

Posted on March 4th, 2015 in Anita Bandrowski, Interoperability, News & Events | No Comments »

The NIH put out a plan to increase access to scientific data.

What do they really mean and what does this mean to researchers?

Researchers have been asked to provide PubMed Central PMC identifiers in grant applications and this single requirement has pushed authors to submit their papers to PMC and many journals do this as a matter of fact leading to a large corpus of publications that are fully searchable texts. I think that researchers are now familiar with this process and see the benefit, as I do when I am at home and need to look up a piece of information from my old paper that a publisher tries to charge me $36 to find.

What happens to data and what is meant by data?
Will authors need to submit all of their supplementary data files to PMC?

Perhaps not, some wording in the document from the NIH shows that they know that data is not homogeneous. They recognize that they can’t handle the diversity in a good way without working with existing repositories.

They point out that data should be FAIR:
Findable
Accessible
Interoperable
Reusable
This is known as the FAIR standard.

They also state:
“A strategy for leveraging existing archives, where appropriate, and fostering public- private partnerships with scientific journals relevant to the agency’s research; Encourage public-private collaboration; Encourage public-private collaboration to … otherwise assist with implementation of the agency plan; Ensure that publications and metadata are stored in an archival solution that… uses standards, widely available and, to the extent possible, nonproprietary archival formats for text and associated content (e.g., images, video, supporting data).”

So will there be a set of repositories that are “approved” community standards? Will the NIH have a box for grantees to put in their community repository IDs?
Seems like a good direction!

For now, NIF has a very large list of repositories that will house your data.
Try this registry search.
There are over 1000 that respond to the query, but which one or which ones can you use?
It does not seem that the NIH is willing to be proscriptive, so it will be left to individual communities to rally around repositories that best serve them.
For now, NIF just aggregates the information around these and attempts to make them findable (the F in FAIR).

Statement of Commitment from Earth and Space Science Publishers and Data Facilities:

Posted on January 14th, 2015 in Anita Bandrowski, Interoperability, News & Events | No Comments »

This is an important committment from the CODATA and earth science community. Looking quite forward to circulating a similar document from the Neuroscience community.

 

Coalition on Publishing Data in the Earth and Space Sciences

 
Earth and space science data are special resources, critical for advancing science and
addressing societal challenges – from assessing and responding to natural hazards and
climate change, to use of energy and natural resources, to managing our oceans, air, and
land. The need for and value of open data have been encoded in major Earth and space
science society position statements, foundation initiatives, and more recently in
statements and directives from governments and funding agencies in the United States,
United Kingdom, European Union, Australia, and elsewhere. This statement of
commitment signals important progress and a continuing commitment by publishers and
data facilities to enable open data in the Earth and space sciences.
 

Scholarly publication is a key high­value entry point in making data available, open,
discoverable, and usable. Most publishers have statements related to the inclusion or
release of data as part of publication, recognizing that inclusion of the full data enhances
the value and is part of the integrity of the research. Unfortunately, the vast majority of
data submitted along with publications are in formats and forms of storage that makes
discovery and reuse difficult or impossible.
 

Repositories, facilities, and consortia dedicated to the collection, curation, storage, and
distribution of scientific data have become increasingly central to the scientific enterprise.
The leading Earth and space science repositories not only provide persistent homes for
these data, but also ensure quality and enhance their value, access, and reuse. In addition
to data, these facilities attend to the associated models and tools. Unfortunately, only a
small fraction of the data, tools, and models associated with scientific publications makes
it to these data facilities.
 

Connecting scholarly publication more firmly with data facilities thus has many
advantages for science in the 21st century and is essential in meeting the aspirations of
open, available, and useful data envisioned in the position statements and funder
guidelines. To strengthen these connections, with the aim of advancing the mutual
interests of authors, publishers, data facilities, and end­users of the data, a recent Earth
and space science data and publishing conference, supported by the National Science
Foundation, was held at AGU Headquarters on 2­3 October 2014. It brought together
major publishers, data facilities, and consortia in the Earth and space sciences, as well as
governmental, association, and foundation funders. Further informational meetings were
held with Earth and space science societies, publishers, facilities, and librarians that were
not present at the October meeting. Collectively the publishers, data facilities, and
consortia focused on open data for Earth and space science formed a working group:
Coalition on Publishing Data in the Earth and Space Sciences. As one outcome, this
group collectively endorsed the following commitments to make meaningful progress
toward the goals above. We encourage other publishers and data facilities and consortia
to join in support.
 

Signatory data facilities, publishers, and societies, in order to meet the need for
expanding access to data and to help authors, make the following commitments:
● We reaffirm and will ensure adherence to our existing repository, journal, and
publisher policies and society position statements regarding data sharing and
archiving of data, tools, and models.
● We encourage journals, publishers, and societies that do not have such statements
to develop them to meet the aspirations of open access to research data and to
support the integrity and value of published research. Examples of policies and
position statements from signatory journals and societies are listed here.
● Earth and space science data should, to the greatest extent possible, be stored in
appropriate domain repositories that are widely recognized and used by the
community, follow leading practices, and can provide additional data services.
We will work with researchers, funding agencies, libraries, institutions, and other
stakeholders to direct data to appropriate repositories, respecting repository
policies.
● Where it is not feasible or practical to store data on community­approved
repositories, journals should encourage and support archiving of data using
community­ established leading practices, which may include supplementary
material published with an article. These should strive to follow existing NISO
guidelines.

Over the coming year, the signatory Earth and space science publishers, journals, and
data facilities will work together to accomplish the following:
● Provide a usable online community directory of appropriate Earth and space
science community repositories for data, tools, and models that meet leading
standards on curation, quality, and access that can be used by authors and journals
as a guide and reference for data deposition.
● Promulgate metadata information and domain standards, including in the online
directory, to help simplify and standardize data deposition and re­use.
● Promote education of researchers in data management and organize and develop
training and educational tools and resources, including as part of the online
directory.
● Develop a working committee to update and curate this directory of repositories.
● Promote referencing of data sets using the Joint Declaration of Data Citation
Principles, in which citations of data sets should be included within reference
lists.
● Include in research papers concise statements indicating where data reside and
clarifying availability.
● Promote and implement links to data sets in publications and corresponding links
to journals in data facilities via persistent identifiers. Data sets should ideally be
referenced using registered DOI’s.
● Promote use of other relevant community permanent identifiers for samples
(IGSN), researchers (ORCID), and funders and grants (FundRef).
● Develop workflows within the repositories that support the peer review process
(for example, embargo periods with secure access) and within the editorial
management systems that will ease transfer of data to repositories.
 

A major challenge today is that much more Earth and space science data are being
collected than can be reasonably stored, curated, or accessed. This includes physical
samples, information about them, and digital data (sometimes streaming at rates of
terabytes per minute). Researchers and publishers are looking for guidance on what
constitutes archival data across diverse fields and disciplines. The major data repositories
provide leading practices that should help guide the types of samples, data, metadata, and
data processing descriptions that should be maintained, including information about
derivations, processing, and uncertainty.
 

To enable improved coordination and availability of open data, we encourage funders to
support these commitments, ensure a robust infrastructure of data repositories, and enable
broad outreach with researchers. As a general rule, data management plans promulgated
by funders should indicate that release into leading repositories, where available, of those
data necessary to support published results is expected at publication. The ultimate
measure of success is in the replicability of science, generation of new discoveries, and in
progress on the grand challenges facing society that depend on the integration of open
data, tools, and models from multiple sources.
 

Signatories
American Astronomical Society
American Geophysical Union
American Meteorological Society
Biological and Chemical Oceanography Data Management Office, Woods Hole
Oceanographic Institution (BCO­DMO)
Center for Open Science
CLIVAR and Carbon Hydrographic Data Office (CCHDO)
Community Inventory of EarthCube Resources for Geosciences Interoperability
(CINERGI)
Council of Data Facilities
Elsevier
European Geophysical Union
Geological Data Center of Scripps Insitution of Oceanography
ICSU World Data System
Incorporated Research Institutions for Seismology (IRIS)
Integrated Earth Data Applications (IEDA)
John Wiley and Sons
Magnetics Information Consortium (MagIC)
Mineralogical Society of America
National Snow and Ice Data Center
Nature Publishing Group
Proceedings of the National Academy of Sciences
Rolling Deck to Repository (R2R)
Science

Big Data vs Small Data: Is it really about size?

Posted on October 31st, 2014 in Anita Bandrowski, Curation, Data Spotlight, Inside NIF, Interoperability | No Comments »

We have been hearing for some time that when it comes to data, it is all about size. The bigger is better mantra has been all over the press, but is it really size that matters?

There are the so called “Big Data” projects such as the Allen Brain Atlas, which generates data, sans hypothesis, over the whole brain for thousands of genes. This is great because the goal of the project is to generate consistent data and not worry about which disease will or will not be impacted by each data point. That may be a great new paradigm for science, but there are not many projects like this “in the wild”.

Most data is being generated in the world of science can be considered small, i.e., would fit on a personal computer, and there are a LOT of labs out there generating this sort of data. So the question that we addressed in the recent the Big Data issue of Nature Neuroscience, is whether small data could organize to become big data? If such a thing is desirable, then what would be the steps to accomplish this lumping?

Here are the principles that we have extracted from working on NIF that we think will really help small data (from Box 2):

Discoverable. Data must be modeled and hosted in a way that they can be discovered through search. Many data, particularly those in dynamic databases, are considered to be part of the ‘hidden web’, that is, they are opaque to search engines such as Google. Authors should make their metadata and data understandable and searchable, (for example, use recognized standards when possible, avoid special characters and non-standard abbreviations), ensure the integrity of all links and provide a persistent identifier (for example, a DOI).

Accessible. When discovered, data can be interrogated. Data and related materials should be available through a variety of methods including download and computational access via the Cloud or web services. Access rights to data should be clearly specified, ideally in a machine-readable form.

Intelligible. Data can be read and understood by both human and machine. Sufficient metadata and context description should be provided to facilitate reuse decisions. Standard nomenclature should be used, ideally derived from a community or domain ontology, to make it machine readable.

Assessable. The reliability of data sources can be evaluated. Authors should ensure that repositories and data links contain sufficient provenance information so that a user can verify the source of the data.

Useable. Data can be reused. Authors should ensure that the data are actionable, for example, that they are in a format in which they can be used without conversion or that they can readily be converted. In general, PDF is not a good format for sharing data. Licenses should make data available with as few restrictions as possible for researchers. Data in the laboratory should be managed as if it is meant to be shared; many research libraries now have data-management programs that can help.

 

Resource Identification Quarter is Rapidly Approaching

Posted on January 27th, 2014 in Anita Bandrowski, Interoperability | No Comments »

What is resource identification quarter?
At the 2012 Society for Neuroscience meeting, NIF met with the editors-in-chief of about 25 neuroscience journals attempting to convince them that research resources (software tools, antibodies, and model organisms) should be treated as first class research objects and cited appropriately. At a follow up meeting at the National Institutes of Health, the journal editors agreed to start a pilot project to identify these resources using a uniform standard.

Why should we identify research objects?
The neuroscience literature unfortunately does not contain enough information for anyone to find many research resources. In a very typical paper by Paz et al, 2010 an antibody is referred to as “GFAP, polyclonal antibody, Millipore, Temecula, CA.” If someone tried to find the antibody in 2010, they would see that there were 40 antibodies at Millipore, today after merging with EMD, the catalog contains 51 antibodies for GFAP with no indication which ones may have been present before and which were new. Without even a catalog number, a researcher can potentially contact the authors or buy all of these antibodies and try them to see if any of them have a similar profile. You can imagine that at several hundred dollars per antibody and a number of weeks that would need to be spent to optimize staining, it seems no better than a shot in the dark.

What about transgenic mice, they can’t possibly be that hard to identify?
After many conversations with model organism database curators at MGI, it turns out that people’s data is often not included in the database specifically because the world experts on mice can’t tell which mouse is being used in the paper. The nomenclature of transgenic mice is somewhere between an art form and black magic, however including a simple stock number seems like a fairly simple solution of the problem. The nomenclature authorities for mice, rats, worms and flies happen to have convenient forms to ask for help or to name a new critter (the list is available here).

How should we identify research objects?
We need a set of unique identifiers, like Orcid ids or social security numbers, of the research resources. NIF has created a web page for authors to find these in one, reasonably convenient location. Searching for information that authors should have such as the catalog number for an antibody or a stock number for an organism, should give a result in the appropriate tab and a rather large “cite this” button will appear with the appropriate citation style.
A more detailed set of instructions and the search box can be found here: www.scicrun.ch/resources

So if you are considering submitting a paper over the next few months to one of the journals below, you will be asked to include a unique identifier for your research objects. We certainly hope that this is reasonably easy to accomplish and eagerly await a time when we would be able to ask the question: which antibody did Paz et al actually use on which transgenic mouse?

If you would like to see if the antibodies you used in any of your previous papers can also be annotated with the unique identifiers, we will be happy to help. Our friends at antibodies online are giving away coffee mugs and t-shirts to help get people interested in doing so. To annotate your paper you can go through their survey at:
http://www.antibodies-online.com/resource-identification-initiative/ This will earn a free mug and the data will be included in the antibody registry for posterity.

List of Participating Journals:
Annals of Neurology
Brain
Amer Journal of Human Genetics
Behavioral and Brain Functions
Biological Psychiatry
BMC Neurology
BMC Neuroscience
Brain and Behavior
Brain Cell Biology
Brain Structure & Function
Cell
Cerebral Cortex
Developmental Neurobiology
Frontiers in Human Neuroscience
Frontiers in Neuroinformatics
Hippocampus
J. Neuroscience
Journal of Comparative Neurology
Journal of Neuroinflammation
Journal of Neuroinformatics
Journal of Neuroscience Methods
Molecular Brain
Molecular Pain
Nature
Nature Neuroscience
Neural Development
Neuroimage
Neuroscience

Where is your data now?

Posted on December 20th, 2013 in Anita Bandrowski, Essays, Interoperability | No Comments »

An interesting report out of the UBC, there is very little data left after a few years.

Vines et al. 2014 point out that after articles are published, the data is lost at a rate of about 17% per year.
This means that my experience with my data on those zip discs from the 1990′s is probably a fairly common experience of many researchers. I guess I just have to accept that I will never actually look at that old data and other people who may have found it useful will never find it.

Is it ok that all of this data is lost? Well it has always been that way, but the question is should it? We have the ability to share data sets with each other in unprecedented ways, almost for free. The granting agencies are requiring us to do it, and we are getting a lot of push from the media. Will this be enough to change the way that I and my colleagues think about data? Certainly hope so.

If you would like to have a place to put your data, please share it with any of the standard repositories in your field, or if you can’t find one share it with NIF.

Draft Declaration of Data Citation Principles, community comment are being sought

Posted on November 22nd, 2013 in Anita Bandrowski, Interoperability, News & Events | No Comments »

NIF is proud to support this important effort by members of FORCE11 (the future of scholarly communications) and request that the NIF community comment.

Announcing the “Draft Declaration of Data Citation Principles” .    The Data Citation Synthesis group of 40 individuals from 25+ organizations developed these draft principles over the past 9 months and now welcome feedback and comments from the community.    The feedback received by the end of 2013 will be reviewed and incorporated into the final principles.  Once the final principles are published, a mechanism will be in place for worldwide endorsement.  

Thanks for your input.


Elsevier and the Neuroscience Information Framework Work Together to Improve Reporting of Research in Neuroscience Literature

Posted on November 7th, 2013 in Anita Bandrowski, Data Spotlight, Interoperability | No Comments »

I am very excited to share the following press release with the NIF community.

 

Elsevier recommends authors to follow the Minimal Data Standards

Amsterdam, November 7, 2013Elsevier, a world-leading provider of scientific, technical and medical information products and services, announces its collaboration with the Neuroscience Information Framework (NIF), by incorporating the Minimal Data Standards across four of its neuroscience journals.

Minimal Data Standards are a set of recommendations developed by NIF, the most comprehensive portal of available web-based resources in the field of neuroscience, to facilitate resource identification in published neuroscience articles. One of the big challenges that neuroscientists face today is that research findings reported in the literature often lack sufficient details to enable reproducibility of methodology or reuse of data. With the launch of the Minimal Data Standards, NIF aims to address this issue.

Elsevier is one of the first scholarly publishers to adopt the Minimal Data Standards guidelines. Initially four Elsevier journals will take part in the pilot: Brain Research, Experimental Neurology, Journal of Neuroscience Methods, and Neurobiology of Disease. These journals will incorporate the guidelines into their article submission process, recommending authors to include gene/genome accession numbers, species specific nomenclatures, antibody identifiers, and software details in the methods section of their articles. More Elsevier neuroscience journals will join the initiative as the pilot further develops in 2014.

Prof. Maryann Martone, Professor of Neuroscience at the University of California San Diego, and Executive Director of the Future of Research Communications and e-Scholarship (FORCE11), said, “Scientific reproducibility starts with materials and methods. We are pleased to work with Elsevier to help neuroscientists make their methods more understandable for not only humans but also machines. This pilot is a step towards changing the way we write papers to take advantage of 21st century technology for searching and linking across vast amounts of information.”

Michael Osuch, Publishing Director for Neuroscience & Psychology at Elsevier said, “With our support for the Minimal Data Standards, we aim to make it easier for the community to identify the key resources used to produce the data in published studies. Neuroscience is a highly multi-disciplinary field with thousands of relevant web-based resources and data repositories. Direct linking to all of them would have been impossible without NIF’s capacity to serve as a central portal.”

Supporting the NIF to roll out the Minimal Data Standards pilot with the aim of developing better and more accurate resource identification within the neuroscience literature, falls within the scope of Article of the Future, Elsevier’s on-going program to improve the format of the scientific article.
# # #


About the Neuroscience Information Framework

An initiative of the NIH Blueprint for Neuroscience Research, the Neuroscience Information Framework (NIF) advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment. In addition to giving access to over 200 neuroscience relevant databases and data sets, NIF hosts millions of annotations on the literature, which includes information about the reagents used in the paper, links to data, and comments about the data or arguments presented in the paper.

About Elsevier
Elsevier is a world-leading provider of scientific, technical and medical information products and services. The company works in partnership with the global science and health communities to publish more than 2,000 journals, including The Lancet and Cell, and close to 20,000 book titles, including major reference works from Mosby and Saunders. Elsevier’s online solutions include ScienceDirect, Scopus, SciVal, Reaxys, ClinicalKey and Mosby’s Suite, which enhance the productivity of science and health professionals, helping research and health care institutions deliver better outcomes more cost-effectively.

A global business headquartered in Amsterdam, Elsevier employs 7,000 people worldwide. The company is part of Reed Elsevier Group plc, a world leading provider of professional information solutions. The group employs more than 30,000 people, including more than 15,000 in North America. Reed Elsevier Group plc is owned equally by two parent companies, Reed Elsevier PLC and Reed Elsevier NV. Their shares are traded on the London, Amsterdam and New York Stock Exchanges using the following ticker symbols: London: REL; Amsterdam: REN; New York: RUK and ENL.

Media contact
Shamus O’Reilly
Publisher Neuroscience
Elsevier
+44 1865 843651
s.oreilly@elsevier.com

Resource Identification Guidelines – now at Elsevier

Posted on September 6th, 2013 in Anita Bandrowski, Curation, Interoperability, NIFarious Ideas | No Comments »

The problem of reproducibility of results has been addressed by many groups, as being due to scientists having very large data sets and highlighting the interesting, yet most likely statistically anomalous findings and other science no-no’s like reporting only positive results.

Our group, has been working to make the methods and reagents reporting better and I am happy to report that this group has been seeing resonance of these ideas.

In a group sponsored by FORCE11, a group of researchers, reagent vendors and publishers has been meeting to discuss how to best accomplish better reporting in all of the literature and both the NIH and publishers themselves are now becoming interested in their sucess. The latest and greatest evidence of this can be found on the Elsevier website, as a guideline to authors, however this will soon be followed by a pilot project to be launched at the Society for Neuroscience meeting with over 25 journals and most major publishers.

Of course there is no reason to wait for an editor to ask to put in catalog numbers or stock numbers for transgenic animals. These should be things that we are trained to do in graduate school as good practices for reporting our findings.

We seem to be getting ready to change (or change back) to a more rigorous methods reporting, which should strengthen the recently eroded credibility of the scientific enterprise. I for one, hope that the message that will be communicated is: “scientists don’t hide problems, even endemic ones, we examine them and find workable solutions”.