NIF Webinar – October 27, 2009 / Topic: Brede Tools

Posted on October 23rd, 2009 in News & Events, Webinar Announcement | No Comments »

Greetings! The next NIF Webinar is scheduled for October 27, 2009. Please join Dr. Finn Aarup Nielsen for an informative session about the Brede tools. Details below.

Date: October 27, 2009 / 11:00 – 12:00 noon PDT
Topic: Brede Tools
Presenter: Dr. Finn Aarup Nielsen
URL: http://connect.neuinfo.org/webinar
Dial-In (toll free): 866-740-1260
Access Code: 8220739

Description: This Webinar provides an introduction to the Brede tools. The tools consist of the Brede Toolbox, Brede Database, and Brede Wiki. The database and the wiki are collections of neuroimaging studies and their brain activations as well as taxonomies/ontologies. The toolbox is able to process and perform a range of meta-analyses with the data and present the results on the Web. This Webinar will demonstrate the database and the search facility as it appears on the Web.

Check out Nielsen’s journal article on Visualizing data mining results with the Brede tools!

We look forward to seeing you there.

NIF Social Gathering!

Posted on October 16th, 2009 in News & Events | No Comments »

Join us on Monday, October 19, 2009 from 6:30-9:00 pm for Neuroscience 2.0 – Networking data, tools, and people.

This exciting social gathering will be held at the Chicago Hilton in the Lake Erie Room (8th floor).

Refreshments will be served!

This social is co-sponsored by the International Neuroinformatics Coordinating Facility (INCF), Neuroscience Information Framework (NIF), and Whole Brain Catalog™.

Please join us! :)

NIF at Neuroscience 2009

Posted on October 16th, 2009 in News & Events | No Comments »

The Neuroscience Information Framework (NIF) is participating in this year’s Neuroscience 2009, the 39th annual meeting of the Society for Neuroscience, October 17– 21, 2009 in Chicago at the  McCormick Place.

Be sure to visit NIF in booth #2103, where we will be demonstrating the newly-released NIF 2.0.

NIF is also co-hosting Neuroscience 2.0 – Networking data, tools, and people, a social gathering on Monday evening,  October 19th at 6:30 pm at the Chicago Hilton in the Lake Erie Room. Everyone is welcome.

We look forward to seeing you at Neuroscience 2009!

Neuroscience Satellite Event

Posted on October 15th, 2009 in News & Events | No Comments »

Today, NIF’s satellite event will include a workshop focused on two main topics.

  1. Introduction to data/resource sharing and discovery: This session will focus on resources available for helping you share data, data centric tools for collaborating with other researchers, and resources within the Neuroscience Information Framework for discovering and making resources known.
  2. Introduction to semantic frameworks (e.g. ontologies) being developed for the neuroscience community: This session will focus on
    a) ontology/terminology resources such as NeuroLex, which provides a comprehensive collection of common neuroscience domain terminologies and b) spatial frameworks such as digital atlases.

Participants will also be introduced to community building tools such as Wikis. Presenters will introduce the wiki environment (i.e. MediaWiki) used by NeuroLex and Wikipedia.

* Participants are advised to bring a laptop in order to follow the workshop session.

It will be held at the Chicago Hilton Lake Erie Room (8th floor) from 1:00 – 6:00 p.m. See you there!

New “Related Abstract Search” Tool

Posted on October 15th, 2009 in News & Events | No Comments »

During the Neuroscience 2009 conference, the Lab of Neuroinformatics at RIKEN BSI will demonstrate the enhancements of their Web-accessible tool, “Related Abstract Search.”  For more information, please visit the NIF community news page.

Brede database Webinar – October 27, 2009

Posted on October 12th, 2009 in News & Events, Webinar Announcement | No Comments »

NIF will be attending this year’s Neuroscience 2009 conference in Chicago, IL, from October 17 – 21.  Because of that, there will be a slight delay in the NIF Webinar schedule.

The next NIF Webinar will be about the Brede database.

Date:  October 27, 2009

Time: 11:00 am – 12:00 noon PDT

Topic:  Brede database

For more information about our Webinars, please visit our Webinar calendar at http://www.neuinfo.org/index.shtm#webinars.

We look forward to seeing you there.

Professional vs. self-curation

Posted on October 12th, 2009 in Curation, Essays | No Comments »

Benefits and pitfalls of integration of two very different data types, by Dr. Anita Bandrowski, NIF Curator

Overview of NIF registration processes and the role of DISCO:
The Neuroscience Information Framework (NIF) project has a dynamic inventory of more than 2300 neuroscience-relevant resources. What makes that inventory dynamic is that NIF encourages resource providers to register their resource to our catalog of “all things neuroscience.”  This process is not terribly involved for resource providers as they need to fill out basic information about their resource such as the URL, name, description and keywords.  In the near future, resource providers will also be able to take away a “DISCO” file, short for resource discovery.  This file is maintained on the resource providers’ Web site.  Resource providers maintain the currency of information within this file at the source.  When a change is made, NIF is alerted to the change through an automated agent that crawls the site periodically.  In this way, resource providers do not need to provide updated information to NIF or any other system that indexes it.  The updates are performed by the system. In this way, the NIF catalog is kept up-to-date without having to visit each of the 2000+ sites currently listed.

The process of provider registration is a good idea, and we are not the only ones to think of it.  Other projects in biomedical science essentially seek to accomplish the same goal.   Of these, the Biositemaps project, supported by the National Centers for Biomedical Computation, has advanced considerably towards implementing a similar technology.   NIF believes that if providers register their resources using one of these tools, then they should not have to do it again using a slightly different tool. Rather, the data generated by all tools should be accessible to all systems.  We have just completed an exercise in harvesting Biositemaps files into NIF and provide here our experience with and perspectives on the exercise.

Rationale for integration:

Tools such as Biositemaps and DISCO allow the people who know the most about their resources, i.e., those who created them, to describe those resources so that search engines can easily find them.  This “self description” is a great idea in theory, but in practice it may not work as intended.  The NIF project, a framework for resource description and discovery, has recently developed tools to harvest the descriptions from Biositemaps.  We believe that biomedical resources should be described in a consistent manner and made discoverable so that projects similar to NIF can present them to our user community.  During this exercise, we have come across several problems that were echoed by other projects attempting to do similar things.

At the outset, the Biositemaps initiative was created as a Google sitemaps-like database that was intended to point search engines to appropriate information about biological software and data sets.  Biositemaps has a great deal of appropriate data about biologically relevant software tools. Because of this, NIF was highly interested in importing this data, which was especially enticing because the data was able to be dynamically updated by the resource providers, meaning that if the particular software tool has a new version, the search systems would be notified automatically of any update.

Metadata structure compatibility and vocabularies:
NIF has made a conscious decision to have a very simple metadata structure to alleviate problems, including the inappropriate use of metadata fields and the time intensiveness of both the curation effort and the training of curators.  The original NIF developed a fairly comprehensive structure (still available at  http://neurogateway.org; see also Gardner et al., 2008) that was populated by the resource providers themselves.  These resource providers were mostly scientists who were building tools or databases.  Many scientists are not metadata experts, and this led to a very inconsistent labeling of resources at the outset of the NIF project.  The inconsistencies in annotation made searching for resources a very difficult task; furthermore, the complicated structure was not intuitive to the end user.  The simple structure adopted by NIF [MM1]alleviated the curation and search problems and also turned out to be quite useful for integrating lots of different metadata structures, including Biositemaps.  The mapping of fields from Biositemaps to the NIF was very simple, taking only a few days to reconcile.

The most significant effort for achieving integration was the mapping the resource types, e.g., database, software tools.  Biositemaps populates the resource type from the Biomedical Resource Ontology (BRO: http://bioportal.bioontology.org/ontologies/39002), while NIF uses the NIFSTD  resource ontology.  These two efforts were developed independently but are now  converging by concerted effort of both groups.  However, during this process, they continue to have some differences.  For example, some classes exist in one ontology and not the other, e.g., core facility that is explicitly labeled in the BRO and not labeled in the NIF.  Thus, if resource providers mark their resources as a core facility, the NIF can’t automatically ingest this information, requiring intervention by a human curator.  Therefore, we have continued to align the BRO and NIFSTD as much as is humanly possible to alleviate the need for human intervention.

Data structure compatibility and scope:

While the metadata structure harmonization has taken some effort, it is a tamable exercise, but we have noticed that the data within Biositemaps supplied by resource providers is extraordinarily heterogeneous in quality.  In about 200 out of 400 Biositemaps, the data are well formed, but for the remaining records, there is partial information including missing resource names or URLs, making it difficult to take all of the data in Biositemaps and import it into NIF in an automated fashion.  The NIF registry database (as all databases) expects to see certain minimal data including a name and a URL. When these items are not present, the database does not accept the record.  Additionally, heterogeneity comes from the amount of descriptive text. NIF registry records prepared by curators have text of 3-6 paragraphs in most cases, but most Biositemaps resources describe themselves in a single sentence.  NIF uses longer descriptions because we found out early in the project that longer descriptive text includes many keywords NIF users would use for search that may not be included as keywords, making search through the NIF registry more effective.  With minimal descriptions, it is unlikely that the NIF search interface would retrieve Biositemaps resources in a sea of NIF curated resources.  Finally, the issue of combining records that are already present in NIF with Biositemaps data presented some challenges to our system.  Because we don’t yet have a universal way of assigning URI’s to resources, resources tend to be cross-listed in many catalogs.  For this reason, NIF is supporting the Common Naming Project (http://neurocommons.org/page/Common_Naming_Project).  As NIF had already provided additional curation to the resources listed that was in many cases more thorough than that supplied by the resource providers, the process of reconciling and merging of information was not straightforward. To address the problems noted above, NIF has updated the registry data structure to accommodate two versions of each record that coexist, one is a storage bin for automated data and the other the human curated version.  Any record that is publicly available in the NIF will be curated by a human, yet with automatic registration, the human curator will be prompted to review the site whenever an update occurs.

Resource characterization is a tricky problem, and it is difficult to know for a particular audience the correct way to represent a resource. For example, the Biositemaps entry for the I2B2 project (https://www.i2b2.org/; an NIH-funded National Center for Biomedical Computing containing a large amount of software resources) created individual Biositemaps for each plug-in to their software tools.  This is an issue of scope. Because NIF’s curators as a policy do not divide resources to this extent, we consider most plug-ins to be a part of the software resource, not an individual resource (there are some exceptions ,such as MATLAB libraries).  For a project such as NIF, resources need to be well defined because trying to catalog every resource useful to neuroscientists can be a daunting task if a resource is too narrow, such as a plug-in.  If we consider a resource to be appropriate for NITRC, a software library with hundreds of software applications, then it will take a curator some time to annotate this.  However if we consider each plug-in to each program a resource, the task becomes too large and is not likely to help users.  On the other hand, if a user is looking for a very specific plug-in, then having access to each individually is likely to be useful.

To solve this scope problem, we have created a uniqueness criterion for the URL, meaning that if the URL is not unique among several Biositemaps “resources,” then the resource descriptions will be folded into one.  The solution is not perfect because unrelated resources could potentially have the same URL, but this strategy solved more problems than it created.

Summary:
“Self registration” tools such as Biositemaps can be used to help human curators annotate a resource, including alerting curators that a resource has been created. However, while these tools can certainly help, we believe that these self-reporting tools do not replace trained human curators.

NIF Webinar – August 18, 2009 – 11:00-12:00 noon PDT

Posted on August 13th, 2009 in News & Events, Webinar Announcement | No Comments »

The next NIF Webinar will be held on August 18, 2009.  Please join Dr. Anita Bandrowski for our next informative session. Details follow.

Date: August 18, 2009 / 11:00 – 12:00 noon PDT
Topic: NIF Resources and Automated Resource Discovery Tools
Presenter: Dr. Anita Bandrowski
URL: http://connect.neuinfo.org/webinar
Dial-In (toll free): 866-740-1260
Access Code: 8220739
Description: The Neuroscience Information Framework (NIF) has been trying to include in its registry sites of those resource providers that register with either DISCO (Yale) or Biositemaps (NIH).  The idea is a nice one. Ff a Web site wishes to be easily discovered by Web crawlers (such as Google or NIF) and, consequently, the people who search those results, the webmaster can obtain a little bit of html code from either the biositemaps or DISCO and place it on their web site.  This code tells crawlers what resources are present on those sites.  This is a great simple idea, to put the people who know their resource in charge of describing it.  The devil, however, is always in the details.  The interpretation of “what is a resource” can wreak havoc on this sort of automated resource discovery. Join us for this evolving discussion.

Four Things You Can Do to Make Your Database More Interoperable

Posted on June 25th, 2009 in Interoperability, News & Events | 2 Comments »

As part of the Neuroscience Information Framework (NIF), we provide access to data contained in databases and structured web resources (e.g. queryable web services), sometimes referred to as the deep or hidden web, that are independently maintained by resource providers around the globe. We believe that this federated model is the most practical way to provide our users with access to the latest data without NIF having to maintain a centralized resource.  A federation model assumes that we can access each database or service and allow users to discover these resource through the NIF.  It also lets us merge data from different databases, essentially mixing and matching results in a way that is useful to our users.

The NIF has been registering databases and structured web resources for just over a year, moving discussions of database interoperability from the theoretical to the practical realm. As new databases are created every day, we thought it would be useful to provide our perspective on this issue so that decisions can be made at the outset that would improve the likelihood that the database can interoperate with others later on.  We are not going to discuss the relative merits of database platforms, e.g., relational, XML, object-oriented.   Nor will we consider here whether RDF is the answer to all interoperability problems (but stay tuned).  Rather, here we will focus on our experience with integration of existing databases, most of which are relational.

What is interoperability?  We define it simply as:  ”the ability of a system or component to function effectively with other systems or components” (http://www.yourdictionary.com/interoperability).  Why would you as a resource provider want to become interoperable?  Here are 3 good reasons:

1)      To be found.  NIF is just one of many portals on the web, we specialize in scientific data, so we have tools that allow scientists to search for all genes expressed in a particular brain region, for example, but our problems are the same ones dealt with by all search portals:  where is the information that I want?  Usually, it is scattered across web pages, pdf files and databases, many of which cannot be searched effectively by search engines.  Academics and NIH are excellent at providing wonderful data, data models, and manuscripts describing them, but we at NIF have discovered that academics who create databases are usually not so good at marketing.  If you have just created a database, do you want others to find it and use it?  If you follow a few simple rules for your database or other type of data resource, your data will have a higher impact on the community.

2)      To be useful:  No matter how comprehensive a database you create, you will be capturing only a tiny fraction of information on biological systems.  That’s why we have so many databases out there.  NIF has identified over 1500 independent databases that are potentially useful to neuroscience and we find more everyday.  A single individual would spend their lives locating these resources and querying them;  NIF lets you query them all simultaneously and combine and compare information across them (or at least, we will let you do that in the future).

3)      To be helpful:  I hear scientists complain all the time that they can’t use microscope parts from one manufacturer on an instrument made by another manufacturer.  I myself complain when I can’t use someone else’s cell phone charger for my phone.  Yet, we as scientists are unwilling to work a little extra to make our data and databases interoperable. We understand that databases are developed for a specific purpose by a specific group to serve their needs.  We also understand that the financial and technological resources for creating and maintaining these resources widely differ.  But we also know that there are some practices which can make it more or less difficult for a resource like NIF to make the contents of a web accessible database available and usable.

The issue of database interoperability is a complex one, and some excellent frameworks and discussions are available.  (http://www.sei.cmu.edu/isis/guide/introduction/lcim.htm).  For the purposes of this discussion, we will simplify the issue and address issues of interoperability at two levels: technical and data.

At the technical level,  we have encountered several roadblocks to making data resources available through the NIF.  At the most basic level are issues of access, e.g., institutional firewalls, stability of access privileges and access methods.  At another level, we have encountered difficulties in efficiently accessing databases when the identifiers for individual records change upon update. We have noted that some databases and vocabularies use identifiers that get regenerated every time the resource is updated. This practice makes it very difficult for NIF to maintain appropriate indices and links. We recommend that identifiers be stable; if they are to be removed, they should be made obsolete rather than deleted.  Related to this is the use of sessions to retrieve data pages instead of stable URI’s. Under this practice the application allows a user to access data only in a linear manner, i.e. the main page showing the cerebellum must be accessed before any of its subcomponents. Each session generates a temporary pointer or ’session identifier’, which makes it difficult for a system such as NIF to make use of much of the specific data elements inside of resources that use sessions extensively. To harness the power of all the data available in these resources, they need to be made available outside of their web interface.

The above issues touch upon the ability of NIF to issue queries against a remote database, generate a search index and return results.  Equally important is the ability for NIF to search the database effectively and provide comprehensible and useful results to NIF users.  In our experience, the lack of a standard terminology is one of the major impediments to effective search across databases. In our very first database federation exercise, we registered 3 databases that had data on neurons:  NeuronDB, Neuromorpho and CCDB.  If we look at the list of neuron names, we see that there were 3 variants on the same cell class:  Cerebellar Purkinje cell;  Purkinje neuron, Purkinje cell.  If a user typed “Purkinje”, results may not be specific to Purkinje neurons and may contain information related to Purkinje fibers.  However, if a user wanted specific information about “Purkinje neuron” they would not retrieve records from all 3 databases.  Fortunately, we now have the NeuroLex, a lexicon for neuroscience that maps all 3 of these terms as synonyms to each other and assigns a unique ID to the class.  When issuing a query from NIF, NIF attempts to autocomplete all terms from the Neurolex vocabulary.  If we have the term, we automatically search for synonyms.  If a source uses any of the terms mapped to the ID by Neurolex, the result will be returned.  If a source uses a custom abbreviation (PC) or a symbolic notation (Purkinje cell = 1), then special mapping of the source database will have to occur using our concept mapping tool.  Note that we are not considering here whether the meaning of Purkinje cell is the same across all of these sources.  Meaning is a more difficult issue and one which will be addressed in a future blog.  But for now, just having a standard, non-symbolic term makes integration of databases a lot easier.

So here are our top 4 barriers to data federation in the NIF:

1)      Unstable identifiers:  Every time the database updates, the identifiers change and all pre-indexed links to those data records break;

2)      Access:  For increased utilization of the data, stable access needs to be provided either through a public connection to the database, a periodic dump of the database contents or through structured web services;

3)      Sessions: For general information results and data should be accessible using a static (i.e. non session based or stateless) URL;

4)      Vocabulary:  Use a standard terminology and avoid symbolic notations where possible.


Welcome to the NIF!

Posted on March 30th, 2009 in General information, News & Events | No Comments »

We’re glad that you’re here. The Neuroscience Information Framework project is a community portal for neuroscience researchers (although everyone is welcome) that provides the means and access to find data, tools, materials and information to drive neuroscience discovery. NIF is supported by the NIH Blueprint consortium and is built for neuroscientists by neuroscientists, working through the neuroinformatics committee of the Society for Neuroscience and the International Neuroinformatics Coordinating Facility. NIF makes use of advances in search, knowledge engineering, text mining and human legwork to access the so-called “hidden web,”,i.e., dynamic databases and other content that is not indexed by search engines. For years, people have been asking “Why can’t we have a Google for neuroscience?” Well, this is it! We maintain a custom web index built around neuroscience, a catalog of curated resources, access to many neuroscience databases, a large literature corpus for neuroscience, all accessible simultaneously through a simple search interface. You may search these resources using the NIF vocabularies, an extensive vocabulary that covers many domains of neuroscience. Why is this important? We use these vocabularies to search for synonyms and related terms so that you can focus or expand your searches, e.g., not “Neuron” but 125 different types of neurons, classified by neurotransmitters and brain regions. As the NIF evolves and the vocabularies grow, we’ll use them more and more for searching and organizing results.

With the release of NIF 1.5, we have significantly upgraded NIF, both its contents and its look and feel, and we need your feedback! Is something missing from NIF? Let us know by recommending a resource. Are you building a database and would like to make use of the framework? Then visit our recommended best practices and use our vocabularies. Do you need help getting started with NIF? Then attend one of our on-line tutorials. Is it easy to use and useful to you? If not, help us make the NIF better by becoming a beta tester.

We want the NIF to be a community place for neuroscientists and those who want access to neuroscience data and tools. We’ll be making use of Wikis and other Web 2.0 tools so that you can not only come to the NIF to take information away but also contribute your expertise and leave some of your experience behind. We have already set up the NeuroLex Wiki to help us build the NIF vocabularies. The NIF Blog is a place where we will discuss neuroinformatics and information retrieval topics based on current trends and our experience building the NIF. If you have an opinion or experience you’d like to share, we’ll be happy to work with guest bloggers.