Archive for the ‘Data Spotlight’ Category

If at first you do succeed: Publish a Replication Report with #RRIDs anyway!

Posted on January 30th, 2017 in Anita Bandrowski, Data Spotlight, Force11, News & Events, NIFarious Ideas | No Comments »

Science is the act of trying and trying again, whether or not we confirm what we think should be happening.

Begely and Ellis in their 2012 paper from Amgen stated that only about 11% of cancer studies were replicable sending shockwaves through the scientific community for years. However, the authors did not give the scientific community all of the data that showed the replicates.

This week in eLife, the Center for Open Science and a cohort of great ‘re-do-ers’ have just published the first batch of studies that are replicates of influential cancer studies, attempting to confirm or deny what the original study claimed. We at the RRID initiative have noted that the original studies often lacked identifying information in the reagents they used, as is alluded to by some of the replication attempts. These simple omissions make replication much more difficult, something that the ‘re-do-ers’ struggled with.

This is really a monumental step and we will wait for the final publications to determine whether these rigorous and fully transparent attempts also fall in the 11% replication level as claimed by Begely and Ellis, but so far some of the replicates show trends in the same direction reported by the original study authors, though no replication attempt has panned out exactly the same way as the original paper. We certainly need to wait and see for the rest of the reports, but I am personally heartened that the original authors are engaged in the replication, commenting on these reports attempting to understand their own data and the new data.

The immortal Aristotle was once reported to say “Quality is not an act, it is a habit.” I think that if he were alive today he would be very interested in these developments and would implore us to look at ourselves and if we did not like what we see, he would call on us to change for the better. While none of my papers will likely be the target of this kind of scrutiny, I do hope that the methods and results will stand up in the long term. This is a call to action for all of us, to be more precise and to do better delivering on the promise of science for the patients who deserve our very best attention and very best methodology.

NeuroMorpho announces 10 years and 50K Neuron Reconstructions

Posted on September 6th, 2016 in Anita Bandrowski, Data Spotlight, News & Events | No Comments »

We are excited to announce that almost exactly 10 years after the original launch, NeuroMorpho.Org passed the milestone of 50,000 reconstructions in the September 1, 2016 release of Version 7.0.

Screen Shot 2016-09-06 at 9.51.48 AM

This major update includes 12,693 additional reconstructions from 35 new datasets. The new data added in these last 6 months equal the total amount accumulated in the first seven years.

This release also introduce several new functionalities, including
(1) bibliography documenting data re-use from nearly 500 citations;
(2) ontology-smart searches by species, brain regions, neuron types, and experimental conditions;
(3) DOI minting capability for each article-associated dataset; and
(4) API enabling object-oriented access to data and metadata.

The literature coverage database was also updated to include publications through August 2016. Please visit the What’s new page for details on the added data and other updates. We appreciate any and all feedback and comments.

We are continuously grateful to all the data contributors who freely share their hard-won tracings with the community.

Sincerely,
The NeuroMorpho.Org team

A STAR is Born, Indeed

Posted on August 26th, 2016 in Anita Bandrowski, Data Spotlight, News & Events, NIFarious Ideas | No Comments »

News on the RRID front is encouraging!

Screen Shot 2016-08-26 at 4.37.49 PM

We have been very busy adding new journals over the last year. It is wonderful whenever we see a new journal with and RRID, especially when the instructions to authors are updated and you know that this is a serious effort from the editors.

More recently RRIDs are being type-set into journals by groups such as BMC, eLife (structured methods), Elsevier and Cell Press journals improving the syntax of the identifiers and allowing journals to link to databases from articles if they chose to do so.

However a step further has just been undertaken by an entire journal group. Cell Press has just restructured their methods section to make it “STAR: Structured Transparent Accessible Reporting”-compliant. This of course includes RRIDs!

http://dx.doi.org/10.1016/j.cell.2016.08.021

The idea is that authors create a list of research resources in a table helping to keep track of all the “ingredients one needs to replicate the study” and echoes the NIH language of Rigor and Transparency. This will be a real boon for reproducible science!

Some papers using the new format are already out from Cell:

http://www.sciencedirect.com/science/article/pii/S009286741631011X

http://www.sciencedirect.com/science/article/pii/S0092867416309953

http://www.sciencedirect.com/science/article/pii/S0092867416309321

We LOVE structured methods!

Protect Yourself from Zombie Papers

Posted on April 25th, 2016 in Anita Bandrowski, Data Spotlight, NIFarious Ideas | No Comments »

Another fun flier to post around the department.

Zombification of papers: the inability to use or validate information in the paper.
How can we stop this terrible plague on the scientific literature? – RRIDs help get the Key biological reagents identified and authenticated.

Feel free to print this fun flier and post it on your office door!

Screen Shot 2016-04-25 at 9.31.23 AM

What is the identity of your Cell Line?

Posted on April 12th, 2016 in Data Spotlight, Inside NIF, News & Events | No Comments »

The SciCrunch portals now contain a data source that will help people figure out if their cell lines have been reported to be contaminated and the Resource Identification Portal at scicrunch will start asking authors to check this source at the time of publication.

Screen Shot 2016-04-11 at 11.00.21 AM

Members of the International Cell Line Authentication Committee (ICLAC) have been working with ExPASy on Cellosaurus, a comprehensive data registry for cell lines and cell line information. Cellosaurus assigns a cell line identifier to each cell line, cross links these identifiers to products available at any of the ~20 cell lines stock centers which make them available and adds notes where concerns have been raised about a cell line.

NIH has recently announced a set of reproducibility principles that target cell line authentication as an important part of research reproducibility and expect that most grants, starting in May 2016, will include a new attachment that explains the authentication of key biological resources, including cell lines.

For these reasons, we are proud to announce that Resource Identification Portal will now include Cellosaurus the core database of annotated cell lines and hope that authors begin to identify their cell lines by the RRID in the coming months, helping to keep track of this key biological resource.

Do you have high performance computing needs for your computational work?

Posted on April 11th, 2016 in Data Spotlight | No Comments »

The Neuroscience Gateway (http://www.nsgportal.org) allows neuroscientists to use commonly available computational and imaging tools such as NEURON, NEST, Freesurfer etc on supercomputers free of charge. The online portal provides an easy to use interface to upload models or input files, run simulations and retrieve results. This NSF funded project is available for all researchers and the application form to access NSG is at

https://www.nsgportal.org/reg/reg.php

For any questions, please email nsghelp-at-sdsc.edu

Integrated Annotation just added the 7-million-th record

Posted on February 27th, 2015 in Anita Bandrowski, Data Spotlight, News & Events | No Comments »

Yes we do have annotations!

What can we do with these annotations?

* When you are reading a paper, would you like to know if the data you are looking at has been stored somewhere?

* Would you like to know if someone figured out what antibody the authors used?

* What about the mouse described in the paper, is there additional information in MGI?

The integrated annotation view is an aggregate of any database included in NIF that contains the PubMed Identifier.

In over 50 databases there are citations containing PubMed Identifiers, a reference for a particular data record. While each database is different, there are some themes. Records may include reagents used in the paper like AddGene plasmids, data that is stored somewhere like ModelDB computational models, or they may include a set of values that were extracted from the paper like BioNumbers.

Through a software tool called the LinkOut Broker, we submit these data to PubMed (unless the database does this already), an annotation that says this paper is referenced in a particular database. However, these citations are not searchable in PubMed and so we have made the integrated annotation view to allow NIF users to search these same annotations.

However, we know that people read papers in many places, pdf readers and on line so we have started working with several groups including a team at Science Direct to push the data into the places where the readers are. We are proud to work with the Elsevier Antibody App team, who created an application visible in Science Direct in all Elsevier papers that have an antibody annotated in the antibodyregistry.org.

An example paper from Experimental Neurology can be viewed here http://www.sciencedirect.com/science/article/pii/S0014488614003896

Did you know? The IMPC maintains a large list of predicted mouse gene phenotypes

Posted on February 16th, 2015 in Anita Bandrowski, Data Spotlight, News & Events | No Comments »

The Monarch project (monarchinititiave.org) with the NIF project have brought in many sources that are now available from NIF or many of the SciCrunch portals that contain a wealth of phenotype information.

The International Mouse Phenotyping Consortium is one of these sources and the creates, curates, and maintains targeted knockout mutations in embryonic stem cells for 20,000 known and predicted mouse genes. These phenotypes are available through several views showing the variant phenotypes.

What can be learned from phenotype data? Phenotype is a superset of disease, so this data can be instrumental in figuring out if a better model for the disease you are studying exists and what are the associated traits to each organism. A worm researcher may not be aware that a fly mutation expresses the same phenotype, but perhaps does so as a result of a different genotypes / knockouts.

 

Check out other sources of phenotype data also available:

WormBase provides anatomical and genetic information of C. elegans and related research nematodes. This Worm:VariantPhenotypes view curates the relationship between an allele and a phenotype, where the allele can be a genetic or RNAi-induced change. 100.00% (543,874 Results)

Online Mendelian Inheritance in Man (OMIM) curates human genetic diseases from the literature. The OMIM:VariantPhenotype view describes the curated relationships between genes, allelic variants (if available), and diseases/traits. 100.00% (28,706 Results)

WormBase provides anatomical and genetic information of C. elegans and related research nematodes. The GeneExprLoc view shows the localization of gene expression in C. elegans anatomy. 100.00% (72,346 Results)

OMIM is a human curated authoritative source of information about disease to gene connections. The DiseaseGeneAssociation view is organized by the OMIM phenotype/disease identifiers, and lists all genes and text annotated to a given disease or phenotype. more about OMIM      100.00% (4,809 Results)

HPO annotations provide annotations of human phenotypes and diseases. This phenotype to gene view is the associations between a phenotype and it’s putative causative gene based on the link between a gene and it’s known involvement in a disease. 100.00% (284,441 Results)

The Mouse Phenome Database is a project at the Jackson Laboratory, which characterizes mouse studies based on the types of measurements that are made in each study. This MeasurementDefinitions view shows the curated mappings of the assay measurements to the relevant phenotype, trait, and anatomy terms at are measured. 100.00% (14,765 Results)

The HPO group provides annotations of phenotypes of human diseases, linked to OMIM, Orphanet, and DECIPHER.    100.00% (116,600 Results)

Online Mendelian Inheritance in Animals (OMIA) is a data set describing phenotype relationships with individual breeds and genes. This BreedPhenotypes view curates species and breed-specific-phenotype relationships for non-model organisms. 100.00% (15,516 Results)

Animal Quantitative Trait Loci Database collects and provides publicly available trait mapping data, i.e. QTL (phenotype/expression, eQTL), candidate gene and association data (GWAS), and copy number variations (CNV) mapped to livestock animal genomes to facilitate locating and comparing discoveries within and between species. Additional information regarding QTL data can be found at the Animal QTL Database FAQ.   100.00% (28,751 Results)

The ZFIN Genotype-Phenotype View  contains Genotype-to-Phenotype mappings in ZFIN, with experimental-environmental context. This Genotype-Phenotype view is a combination of intrinsic (organismal) and extrinsic (experimental/morphant) genotypes, in the context of environmental conditions. The effective genotypes are extracted and built from ZFIN genotype-phenotype data following the GENO genotype ontology model as developed by the Monarch Initiative. 100.00% (85,118 Results)

FlyBase is a database of genetic and molecular data for D. melanogaster and other Drosophila species. Flybase:Phenotypes are the curated links for phenotypes of the flies of a specified genotype, in a specified environment, attributed to a publication. 100.00% (275,697 Results)

The International Mouse Phenotyping Consortium creates, curates, and maintains targeted knockout mutations in embryonic stem cells for 20,000 known and predicted mouse genes. The IMPC:MousePhenotypes view reports on the genotypes and associated phenotypes collected from a broad based primary phenotyping pipeline in all the major adult organ systems. All phenotype calls are found to be significant with a p-value < 1 x 10-4. 100.00% (7,156 Results)

Mouse Genome Informatics offered by Jackson Laboratory includes information on integrated genetic, genomic, phenotypic, and biological data of the laboratory Mouse. The MGI:Phenotypes view presents the curated relationships between genotypes and phenotypes. 100.00% (275,856 Results)

The NHGRI Elements of Morphology: Human Malformation Terminology is being developed by a group of international clinicians working in the field of dysmorphology to standardize terms used to describe human morphology, thereby increasing the utility of descriptions of human phenotype and facilitating reliable comparisons of findings among patients. 100.00% (400 Results)

The Mouse Phenome Database is a project at The Jackson Laboratory which collects and curates mouse strain survey data for behavior, physiology, and anatomy. Data are available for inbred and recombinant inbred strains, chromosome substitution strains, other classical panels, Collaborative Cross (CC) lines and Diversity Outbred (DO) populations. 100.00% (235 Results)

The ClinVar aggregates information about sequence variation and its relationship to human health. The ClinVar:VariantPhenotypes view provides information on sequence alterations present in genes and the resulting phenotypes. For records listing more than one variation, data is presented with the assumption that the individual sequence alterations are in cis. 100.00% (458,639 Results)

The Mouse Phenome Database is a project at The Jackson Laboratory which collects and curates mouse strain survey data for behavior, physiology, and anatomy. Data are available for inbred and recombinant inbred strains, chromosome substitution strains, other classical panels, Collaborative Cross (CC) lines and Diversity Outbred (DO) populations. The MPD:StrainPhenotypes view computes the extreme outlier phenotypes (>2 s.d.) as compared to the overall mean for each assay, and maps the quantitative measurements to their qualitative phenotype. (The strains measured for each assay varies, and therefore the means computed may be drawn from collections of different strains.) 100.00% (8,605 Results)

The International Mouse Phenotyping Consortium creates, curates, and maintains targeted knockout mutations in embryonic stem cells for 20,000 known and predicted mouse genes. The IMPC:KnockoutPhenotypes view reports on the phenotypes collected from a broad based primary phenotyping pipeline in all the major adult organ systems. 100.00% (7,156 Results)

Worm brain in Robot body!

Posted on December 4th, 2014 in Anita Bandrowski, Data Spotlight | No Comments »

Well we have done it. Captured the imagination of media types!
Wish it sounded a little more like science and less like science fiction, but heck, this is a little bit of science posing as science fiction, right in our back yard.

In terms projects that started at UCSD/CRBS, the Open Worm is a fantastic success. Dr. Larson, the creator of NeuroLex, founded the Open Worm, funded it with kickstarter (take that, NIH), an open “hacker” community that asks “Can we fully simulate the C.Elegans?” Biomechanics, neurons, molecules….you name it. If you put these models together into one virtual organism, will the organism function as expected?

If there is ever an organism that is amenable to simulation, it is the worm (sorry Human Brain Project folks) and the argument goes that what we learn from the worm simulations we can start to apply to other simulated species.

Check out Stephen’s Ted Talk:

Big Data vs Small Data: Is it really about size?

Posted on October 31st, 2014 in Anita Bandrowski, Curation, Data Spotlight, Inside NIF, Interoperability | No Comments »

We have been hearing for some time that when it comes to data, it is all about size. The bigger is better mantra has been all over the press, but is it really size that matters?

There are the so called “Big Data” projects such as the Allen Brain Atlas, which generates data, sans hypothesis, over the whole brain for thousands of genes. This is great because the goal of the project is to generate consistent data and not worry about which disease will or will not be impacted by each data point. That may be a great new paradigm for science, but there are not many projects like this “in the wild”.

Most data is being generated in the world of science can be considered small, i.e., would fit on a personal computer, and there are a LOT of labs out there generating this sort of data. So the question that we addressed in the recent the Big Data issue of Nature Neuroscience, is whether small data could organize to become big data? If such a thing is desirable, then what would be the steps to accomplish this lumping?

Here are the principles that we have extracted from working on NIF that we think will really help small data (from Box 2):

Discoverable. Data must be modeled and hosted in a way that they can be discovered through search. Many data, particularly those in dynamic databases, are considered to be part of the ‘hidden web’, that is, they are opaque to search engines such as Google. Authors should make their metadata and data understandable and searchable, (for example, use recognized standards when possible, avoid special characters and non-standard abbreviations), ensure the integrity of all links and provide a persistent identifier (for example, a DOI).

Accessible. When discovered, data can be interrogated. Data and related materials should be available through a variety of methods including download and computational access via the Cloud or web services. Access rights to data should be clearly specified, ideally in a machine-readable form.

Intelligible. Data can be read and understood by both human and machine. Sufficient metadata and context description should be provided to facilitate reuse decisions. Standard nomenclature should be used, ideally derived from a community or domain ontology, to make it machine readable.

Assessable. The reliability of data sources can be evaluated. Authors should ensure that repositories and data links contain sufficient provenance information so that a user can verify the source of the data.

Useable. Data can be reused. Authors should ensure that the data are actionable, for example, that they are in a format in which they can be used without conversion or that they can readily be converted. In general, PDF is not a good format for sharing data. Licenses should make data available with as few restrictions as possible for researchers. Data in the laboratory should be managed as if it is meant to be shared; many research libraries now have data-management programs that can help.