If a Tree Falls In The Digital Forest, Does It Make a Sound?

Posted on July 16th, 2010 in Curation, Essays, General information | No Comments »

By Anita Bandrowski, Ph.D.

Humanity began writing on stone and clay tablets, and then moved to papyrus, paper, and now we write with electrons.  Does it seem that our media for information storage is becoming more flimsy or is it better to search through piles of electrons than card catalogs?  How can we save the wonderful work that we are all paying for (in the form of government funded research)?  Do database records hold the same value as published papers?  If so, how can we maintain them indefinitely?  Should there be a paper version of each database?  How can cloud computing, the linked data/open data initiatives help?  What is the role of libraries in this sort of data landscape?

In my own experience, working on a semantic web project called the Neuroscience Information Framework (NIF) at the University of California San Diego, I noticed something strange that has happened to our society that bears on these questions.  For several months my desk was housed among many others in one of those open workspaces whose explicit goal is to improve communication between the individuals (no cubicles). One day there was an interruption in the wireless service in the building.  This interruption resulted in the inevitable frustration of “I can’t do what I was just doing,” but then a tremendous event occurred: these strange entities who had been toiling near me and whose existence I acknowledged with a nod each morning became real humans.  The amazing awakening resembled an episode of Star Trek where the Borg, a half machine half biological group fully integrated in the hive mind suddenly lost connectivity to the hive and were bumbling around, very confused.  People all around me began waking up from the technology trance and started to act more like … people.  They greeted me, we exchanged opinions of the wireless services, and we met.

With my Borg experience in mind, questions of our deep dependence on technology crystallized.  What if the power went off on wikipedia?  What if google didn’t exist?  How would I find things?  How would I be able to work without google docs?  In this networked world, is it possible that we can’t survive without the collective?

The level of integration of online information and search systems with our lives has become very eerie, to say the least.

As scientists do we have the same issues?  Can’t we do research without PubMed?  A few years ago while at Stanford, a colleague and I were talking to an art historian and the conclusion of the discussion was, “if it (a scientific paper or a piece of data) does not exist on the web, then it does not exist”.  Something quite contrary to the experience of the art historian, who apparently still did research in a physical building that contained actual papers, books, and non-digital versions of art.

So, then, who backs up the data that we are becoming completely dependent on?  When researchers move to a new university or pass onto the great beyond, what happens to the data stores that they maintained?  Do they take their data with them setting up cloud computing operations?

The good news is scientific data in databases, whether or not its published on paper, is backed up and data are regularly checked for integrity at most sites.  Data and software tools are also replicated in so called “mirrors”, which are essentially copies of the same data or software tools that serve a particular community.   Additionally, the National Library of Medicine copies and stores many of the significant databases in their systems, allowing researchers to access them and storing a digital copy for posterity. For example, the Gensat project data exist on Rockefeller servers, but also a mirror of the data is set up at NCBI (the electronic national library of medicine and the home of PubMed).

This seems safe enough. However, the directors of the National Institutes of Health are not always as willing to indefinitely support databases as they are to pay researchers to set them up.   So after five or ten years when the funding runs out, what happens to all that data that researchers painstakingly toiled for many years to gather?  Some data was published on paper, some was likely not published anywhere or pulled together from papers by raw human effort such as the Ki database, which gathered the raw numbers from many publications for affinity between drugs and receptors.  Many databases contain that elusive negative data which is not considered worthy of publishing by the ‘peer reviewing’ crowd, but which may save other researchers tremendous time if they try to replicate an experiment that several others already found did not work.  Some databases migrate to funded projects and then are maintained by other universities while the funding is in flux, but some simply vanish into the ether.  Should someone maintain them?

The experience of the private human genome project “Panther,” started by Craig Venter at Celera Inc, later Applera, later Applied Biosystems, later an unsupported project at the Stanford Research Institute, and now potentially rising from the ashes into a new project, shows that industrial data may have a similar or potentially an even more dire fate.

In recent years, several movements have swept data science. One is the open data movement and another is the linked data movement.  Both bear on this issue of data maintenance.  The linked data movement (one of the buzzwords in the semantic web community) attempts to link all pieces of related information by formal relationships, sort of like playing an enormous game of “Six Degrees of Kevin Bacon” with scientific data.  Obviously, these data sets must be openly accessible for this to work, so the open data movement spurred the creation of huge datasets readable by anyone in the world.  These data sets include some of the most valuable biomedical data, such as OMIM and PubMed, but also include wikipedia and other less than peer-reviewed data.   Lots of the people in the open data world talk about their preferred ways of storing that data, such as “tuples” or graphs, but all this boils down to a couple of main ideas:

  1. A piece of data should persist in a reliable way, with a reliable address.
  2. A piece of data should be in a format that is readable by others.
  3. A piece of data should have a unique identifier, a social security number.
  4. A piece of data is not owned by anyone, but should be traceable to its origin.

Therefore, the open data community has a vested interest in making all data available for their systems to consume and compute, including the databases whose authors, or whose authors funding, has expired.

In the model of linked data, as a ‘six degrees of Kevin Bacon’ analogy, the data graph would suffer if the record of a movie were to be wiped off the graph.  Would we still know that Tom Hanks was connected to Kevin Bacon if Apollo 13 was no longer a data link?    Probably, but the link would no longer be direct.

The problem with linked data disappearing is that the relationship between Aquaporin4 and Eric Nestler is less well established than the relationship between Tom Hanks and Kevin Bacon. Actually, a database of supplementary materials contains this connection (see Drug Dependent Gene database). Indeed, if the data are deposited inside of a database but are not central nodes of discourse they may disappear without a sound.  However, their inherent value may not be in their connectivity; it may instead be that they are valuable in a direction that few have pursued as a line of investigation, such as a promising lead for a therapeutic agent in a particular disease, or the piece of negative data that will spare another researcher a year of fruitless endeavor.

Linking Open Data (LOD) project map

The six degrees of online data sources

The stance of the Neuroscience Information Framework (NIF) as a member of the semantic web community is that data should be preserved because it may be useful at a later time.  The larger question is who will pay to preserve the data?  What is the role of libraries in an age where books are no longer made of paper, but stores of knowledge with ‘a front end’ and a ‘back end’?  Will we have thousands of databases taking up room in library basements somewhere, where they can be accessed like so many other ‘collections,’ or will projects such as NIF be the keepers of these data because they can integrate the searching of the data across data structures?  Who will champion data preservation in the digital age?

NIH Meeting on Informatics for Data and Resource Discovery in Addiction Research

Posted on July 2nd, 2010 in News & Events | No Comments »

The NIH Meeting on Informatics for Data and Resource Discovery in Addiction Research will be held from July 8 – 9, 2010. It will be held at the Neuroscience Center Building – Conference Room C and D in Rockville, Maryland. For more information, please visit: http://www.seiservices.com/nida/1014080/

Addiction research is amassing increasing amounts of complex data, and creating greater numbers and types of research resources, ranging from software tools and chemical reagents to animal models, images, biomarkers, biological and behavioral assays, biomaterial repositories, specialized data sources, and web portals; however, most remain hidden in unstructured or semi-structured sources such as journal articles or web pages centered on particular laboratories, institutions or grants. Concurrently, the broader biomedical research community is developing additional tools and data which also can inform and advance addiction research. With over 1500 different databases, alone, useful to neuroscientists, how do addiction researchers find, query, compare, relate, and employ appropriate data and resources efficiently and effectively? Equally important, how do they collect, report and share their own data and resources to make them interoperable and discoverable beyond a single research paper or web posting? To foster knowledge growth in this complex environment, informaticians are turning to resource registries, data federation, semantic tools and other approaches to enable data and resource discovery and analyses, as well as hypothesis generation and testing.

Who Should Attend: Developers, generators, providers, and users of biomedical and biobehavioral research data and resources at the post-doctoral level and beyond, representing a new cadre of addiction investigators enthusiastic about interconnecting data, knowledge and resources to advance substance abuse research are especially encouraged to attend.

This instructional workshop will address the following:

  • Introduction to tools and methods for discovering data and resources available to addiction researchers
  • Barriers to data and resource discovery and use
  • Application of best practices in structuring, identifying, presenting and reporting data and resources to enhance their discovery and interoperability
  • Enabling concept based queries through community adopted vocabularies
  • Case studies and lessons to be learned from major efforts in other areas of neuroscience research
  • Exciting new approaches for tying together statements made in scientific publications or on the Web to scientific evidence, biological terminologies, and knowledge bases, and to claims and counterclaims made by other researchers.
  • Roundtable discussion of discoverability and potential interoperability of data and resources described in “speed-talks” by attendees

CCDB adds beautiful data set

Posted on June 25th, 2010 in News & Events | No Comments »

The CCDB just released 3 beautiful large scale brain mosaics detailing the distribution of plasma membrane calcium-dependent ATPases in rat brain published in the study by Kenyon et al. (Journal of Comparative Neurology, in press, electronic version available DOI: 10.1002/cne.22439).  For more information or to view the data set, please visit http://ccdb.ucsd.edu/index.shtm.  You may also wish to try out our Web Image Browser, a new tool for browsing and annotating large microscopic imaging data sets through the web. The WIB allows users to turn off and on different channels and adjust contrast through a web browser.

NIF Webinar – June 15, 2010 / Topic: URLs and URIs

Posted on June 8th, 2010 in News & Events, Webinar Announcement | No Comments »

The Neuroscience Information Framework (NIF) hosts Webinar series on topics focused on collaborating with NIF, getting involved in building the NIF vocabulary, using NIF portal resources, as well as other appropriate NIF topics.

Our next NIF Webinar is scheduled for June 15th, 2010. Please join Dr. Jeffrey Grethe for an informational session on URLs and URIs and their purpose in NIF.  Below is information on how to join the online meeting and accompanying teleconference.

Date and Time: Tuesday, June 15, 2010 • 11:00-12:00 PST
Topic: URLs and URIs: Unique Identifiers, Why Are They Useful?
Presenters: Dr. Jeffrey Grethe
URL: http://connect.neuinfo.org/webinar
Dial-In (toll-free): 866-740-1260
Access Code: 8220739

There will be a discussion period involving the Shared Names Project and the NCBO BioPortal.


Live Session with NIF – June 4, 2010 / Topic: Introduction to NIF

Posted on June 2nd, 2010 in News & Events | 1 Comment »

An informative session and demonstration on the usage of NIF will be held this Friday,  June 4th, 2010. Please join us to learn strategies on finding software tools, reagents, ko-mice, and data!

Date and Time: Friday, June 4, 2010

11:30 AM – 1:00 PM Demonstration & Pizza
1:00 PM – 2:30 PM Open Discussion With NIF Team Members

Topic: Introduction to NIF
Location: UCSD Leichtag Biomedical Building, Room 107

This session is open to students, researchers, and educators!  Free pizza will be available for attendees.

View the NIF Flyer

We hope to see you there!

The Bioimage Informatics 2010

Posted on June 1st, 2010 in News & Events | No Comments »

The Bioimage Informatics 2010 will be held from September 17 to19 at the new Gates and Hillman Centers of the School of Computer Science at Carnegie Mellon University.

It will bring together researchers working at the intersection of bioimaging, computer science, engineering and biological sciences. Researchers will discuss applications of image analysis, computer vision, data mining, machine learning, visualization, and informatics methods to mechanistic analyses of biological systems. Discussion will also cover emerging technologies in biology that are leading to new and complex image data sets that require image informatics.

There is a current call for abstracts (Oral and Poster Presentation), deadline July 1, 2010.

Abstracts for presentations, posters or software/hardware demonstrations related to all aspects of bioimage data mining and informatics, are welcome. Appropriate topics include but are not limited to:

• Bioimage feature measurement, description, extraction, and selection
• Object segmentation and tracking in bioimages
• Object/pattern recognition and understanding in bioimages
• Bioimage ontology and related data mining
• Bioimage data visualization
• Learning models from bioimages
• Other bioimaging related techniques, including transmission, compression, registration, storage, database, etc.
• Tools/software for bioimage data processing and data mining
• Bioimage related biology, bioinformatics, and biomedicine applications, e.g. 3D protein structure reconstruction, gene regulatory network/pathway modeling, etc.
• Joint analysis using both bioimages and other data (e.g. sequences, microarray, protein interaction, etc.)

Abstract Submission: Interested participants should electronically submit an abstract via email no later than July 1, 2010 following the instructions at http://lane.compbio.cmu.edu/bii2010. The abstracts will be reviewed and the authors will be notified of abstracts selected for oral or poster presentation by July 30.

Registration Information: All participants, including invited speakers, will be expected to cover their own expenses and registration fees. A range of housing options will be available. The registration fee of $90 will include an opening mixer, coffee and tea during session breaks, lunches, and a closing cookout on Sunday evening. Optional social events, including one highlighting the history of robotics in Pittsburgh (home of the Robot Hall of Fame) are planned.

Important Dates

June 01, 2010: Opening of registration site.
July 01, 2010: Deadline for abstract submission.
July 30, 2010: Notification of acceptance.
August 17, 2010: End of early registration.
September 17, 2010: Meeting opens.

For more information, please visit http://lane.compbio.cmu.edu/bii2010.

German INCF Node Symposium

Posted on May 28th, 2010 in News & Events | 4 Comments »

The German INCF Node in Munich will be covering “Neuroinformatics: Linking Brain Research from Physiology to Models” at their next symposium held on June 17.

The symposium will feature a number of exciting talks by renowned scientists, exploring how neuroinformatics can contribute to progress in neuroscience. We look forward to bringing together an interested audience from the INCF Nodes and the National Bernstein Network Computational Neuroscience to discuss issues of analysis, data sharing, and interactions between experimental and computational approaches in neurophysiology. In addition to the talks and discussion, there will be a poster session. All attendants are welcome to present a poster.

To register, please send a brief email to symposium2010@g-node.org. Please indicate if you would like to present a poster. Detailed information will be available at http://www.g-node.org/symposium2010.

——————————————
G-Node Inaugural Symposium

“Neuroinformatics: Linking Brain Research from Physiology to Models”

June 17, 2010, 9:00-13:00
Ludwig-Maximilians-Universität München
Biozentrum, Martinsried

Speakers:

Piotr Durka, University of Warsaw, Poland
Gaute Einevoll, Norwegian University of Life Sciences, Norway
Sten Grillner, Karolinska Institute, Sweden
Colin Ingram, University of Newcastle, UK
Mayank Mehta, UCLA, USA

For further information please visit
http://www.g-node.org/symposium2010

NIF Webinar – May 11, 2010 / Topic: Features of the New NIF 2.5

Posted on May 4th, 2010 in News & Events, Webinar Announcement | 2 Comments »

The Neuroscience Information Framework (NIF) hosts Webinar series on topics focused on collaborating with NIF, getting involved in building the NIF vocabulary, using NIF portal resources, as well as other appropriate NIF topics.

Our next NIF Webinar is scheduled for May 11th, 2010. Please join Dr. Anita Bandrowski, NIF Project Curator, for an informational session on the new features offered in NIF 2.5. If you missed the last Webinar on NIF 2.5, this is your chance! Below is information on how to join the online meeting and accompanying teleconference.

Date and Time: Tuesday, May 11, 2010 • 11:00-12:00 PST
Topic: Features of the New NIF 2.5
Presenters: Dr. Anita Bandrowski
URL: http://connect.neuinfo.org/webinar
Dial-In (toll-free): 866-740-1260
Access Code: 8220739

NIF 2.5 includes new services, incorporates new features to the search interface, and solves known issues/bugs. Through this informational session, find out more about NIF cards, NIFSTD entity recognition, new literature, and improvements on search.  Learn more about the services such as the Ontology Access Service, SOLR Service, and  the varying API interfaces. New features to look forward to are Concept-based search through autoexpansion, MyNIF capabilities to save searches, highlighted search results according to a semantic category in the NIF ontologies, and many more. Known issues regarding the scroll bar, browser compatibility, and search box behavior have been resolved and we have even improved on performance and stability of NIF Literature and Pubmed Results.

We look forward to seeing you there!

The Meaning of “Is”

Posted on April 16th, 2010 in Curation, Essays, General information | 1 Comment »

That’s an easy one, with all due respect to our former president.  As far as the NIF is concerned, “IS” is the inferior salivatory nucleus.  How do we know?

Perform a search in NIF and you will see various terms highlighted in the search results (the current highlighting color is brick red, but we are open to suggestions).   Hover over each of these highlighted terms and NIF will tell you what the term means to the NIF system.  If you hover over “IS,” NIF tells you it’s an anatomical structure. If you right click on it and ask to see “IS” in the Neurolex, it will tell you that IS is an abbreviation for the inferior salivitory nucleus.  This new feature is an example of what is often called “entity recognition.”

In the formal world of knowledge representation, an entity is that which is perceived, known, or inferred to have its own distinct existence.  For NIF, entities are those things like organisms, cells, molecules, and techniques that define our domain.  These entities are represented in the NIF ontologies.  Each entity has its own numerical identifier, sort of like a social security number, that uniquely identifies the entity.  This identifier is used to point to different ways of saying the same things to the same entity.  For example, NIF doesn’t care whether you call entity birnlex_2645, the IS, inferior salivary nucleus, or Freddy, for that matter.  They are all (and always) the same thing.

Unfortunately, the richness and complexity of our language makes recognizing entities a tricky thing, as everyone who uses a search engine knows.  Not only can we call the same entity many things, but we can call many entities the same thing.  Chances are that the IS highlighted by NIF in the search results actually is not the inferior salivatory nucleus but the third person form of the verb “to be,” or perhaps it is the initial segment of an axon or the Institute for Science.    Right now, NIF doesn’t really know.

In future releases of NIF, we will be working towards improving the accuracy of our entity recognition.  Why?  Because once we know that IS is a brain nucleus, we can find anything that is known about it:  its projections, its genes, the diseases in which it is affected.  A preview of what is coming can be seen in the NIF Cards.

IS Search

Search for IS with NIF Card

NIF cards for each entity can be viewed by right clicking over the highlighted term and selecting “Show NIF card” from the menu. NIF cards currently are only implemented for anatomical structures and cells.

For now, however, we hope you will explore the new NIF and develop an appreciation for the difficulties of semantic search by seeing what NIF thinks the results mean.  You may be surprised!

NIF 2.5 Now Released

Posted on April 15th, 2010 in General information, News & Events | No Comments »

NIF Version 2.5 is now released.  Please visit the list of top features.

Please provide your valuable feedback by sending an email to support@neuinfo.org.  Thank you.