<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>NIF Blog &#187; Essays</title>
	<atom:link href="http://blog.neuinfo.org/index.php/category/essays/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.neuinfo.org</link>
	<description>Neuroscience Information Framework</description>
	<lastBuildDate>Fri, 16 Jul 2010 18:55:50 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>If a Tree Falls In The Digital Forest, Does It Make a Sound?</title>
		<link>http://blog.neuinfo.org/index.php/general-information/if-a-tree-falls</link>
		<comments>http://blog.neuinfo.org/index.php/general-information/if-a-tree-falls#comments</comments>
		<pubDate>Fri, 16 Jul 2010 18:36:56 +0000</pubDate>
		<dc:creator>NIF Blogger</dc:creator>
				<category><![CDATA[Curation]]></category>
		<category><![CDATA[Essays]]></category>
		<category><![CDATA[General information]]></category>

		<guid isPermaLink="false">http://blog.neuinfo.org/?p=327</guid>
		<description><![CDATA[By Anita Bandrowski, Ph.D.
Humanity began writing on stone and clay tablets, and then moved to papyrus, paper, and now we write with electrons.  Does it seem that our media for information storage is becoming more flimsy or is it better to search through piles of electrons than card catalogs?  How can we save the wonderful work [...]]]></description>
			<content:encoded><![CDATA[<p><strong>By Anita Bandrowski, Ph.D.</strong></p>
<p>Humanity began writing on stone and clay tablets, and then moved to papyrus, paper, and now we write with electrons.  Does it seem that our media for information storage is becoming more flimsy or is it better to search through piles of electrons than card catalogs?  How can we save the wonderful work that we are all paying for (in the form of government funded research)?  Do database records hold the same value as published papers?  If so, how can we maintain them indefinitely?  Should there be a paper version of each database?  How can cloud computing, the linked data/open data initiatives help?  What is the role of libraries in this sort of data landscape?</p>
<p>In my own experience, working on a <a href="http://www.w3.org/2001/sw/">semantic web</a> project called the <a href="http://www.neuinfo.org/nif/nifgwt.html?query=all">Neuroscience Information Framework</a> (NIF) at the University of California San Diego, I noticed something strange that has happened to our society that bears on these questions.  For several months my desk was housed among many others in one of those open workspaces whose explicit goal is to improve communication between the individuals (no cubicles). One day there was an interruption in the wireless service in the building.  This interruption resulted in the inevitable frustration of &#8220;I can&#8217;t do what I was just doing,&#8221; but then a tremendous event occurred: these strange entities who had been toiling near me and whose existence I acknowledged with a nod each morning became real humans.  The amazing awakening resembled an episode of Star Trek where the Borg, a half machine half biological group fully integrated in the hive mind suddenly lost connectivity to the hive and were bumbling around, very confused.  People all around me began waking up from the technology trance and started to act more like … people.  They greeted me, we exchanged opinions of the wireless services, and we met.</p>
<p>With my Borg experience in mind, questions of our deep dependence on technology crystallized.  What if the power went off on wikipedia?  What if google didn’t exist?  How would I find things?  How would I be able to work without google docs?  In this networked world, is it possible that we can’t survive without the collective?</p>
<p>The level of integration of online information and search systems with our lives has become very eerie, to say the least.</p>
<p>As scientists do we have the same issues?  Can’t we do research without PubMed?  A few years ago while at Stanford, a colleague and I were talking to an art historian and the conclusion of the discussion was, “if it (a scientific paper or a piece of data) does not exist on the web, then it does not exist”.  Something quite contrary to the experience of the art historian, who apparently still did research in a physical building that contained actual papers, books, and non-digital versions of art.</p>
<p>So, then, who backs up the data that we are becoming completely dependent on?  When researchers move to a new university or pass onto the great beyond, what happens to the data stores that they maintained?  Do they take their data with them setting up <a href="http://en.wikipedia.org/wiki/Cloud_computing">cloud computing</a> operations?</p>
<p>The good news is scientific data in databases, whether or not its published on paper, is backed up and data are regularly checked for integrity at most sites.  Data and software tools are also replicated in so called “mirrors”, which are essentially copies of the same data or software tools that serve a particular community.   Additionally, the National Library of Medicine copies and stores many of the significant databases in their systems, allowing researchers to access them and storing a digital copy for posterity. For example, the Gensat project data exist on Rockefeller servers, but also a mirror of the data is set up at NCBI (the electronic national library of medicine and the home of PubMed).</p>
<p>This seems safe enough. However, the directors of the National Institutes of Health are not always as willing to indefinitely support databases as they are to pay researchers to set them up.   So after five or ten years when the funding runs out, what happens to all that data that researchers painstakingly toiled for many years to gather?  Some data was published on paper, some was likely not published anywhere or pulled together from papers by raw human effort such as the Ki database, which gathered the raw numbers from many publications for affinity between drugs and receptors.  Many databases contain that elusive negative data which is not considered worthy of publishing by the ‘peer reviewing’ crowd, but which may save other researchers tremendous time if they try to replicate an experiment that several others already found did not work.  Some databases migrate to funded projects and then are maintained by other universities while the funding is in flux, but some simply vanish into the ether.  Should someone maintain them?</p>
<p>The experience of the private human genome project “Panther,” started by Craig Venter at Celera Inc, later Applera, later Applied Biosystems, later an unsupported project at the Stanford Research Institute, and now potentially rising from the ashes into a new project, shows that industrial data may have a similar or potentially an even more dire fate.</p>
<p>In recent years, several movements have swept data science. One is the <a href="http://en.wikipedia.org/wiki/Open_science_data">open data movement</a> and another is the <a href="http://en.wikipedia.org/wiki/Linked_data">linked data movement</a>.  Both bear on this issue of data maintenance.  The linked data movement (one of the buzzwords in the semantic web community) attempts to link all pieces of related information by formal relationships, sort of like playing an enormous game of &#8220;Six Degrees of Kevin Bacon&#8221; with scientific data.  Obviously, these data sets must be openly accessible for this to work, so the open data movement spurred the creation of huge datasets readable by anyone in the world.  These data sets include some of the most valuable biomedical data, such as OMIM and PubMed, but also include wikipedia and other less than peer-reviewed data.   Lots of the people in the open data world talk about their preferred ways of storing that data, such as “tuples” or graphs, but all this boils down to a couple of main ideas:</p>
<ol>
<li>A piece of data should persist in a reliable way, with a reliable address.</li>
<li>A piece of data should be in a format that is readable by others.</li>
<li>A piece of data should have a unique identifier, a social security number.</li>
<li>A piece of data is not owned by anyone, but should be traceable to its origin.</li>
</ol>
<p>Therefore, the open data community has a vested interest in making all data available for their systems to consume and compute, including the databases whose authors, or whose authors funding, has expired.</p>
<p>In the model of linked data, as a ‘six degrees of Kevin Bacon’ analogy, the data graph would suffer if the record of a movie were to be wiped off the graph.  Would we still know that Tom Hanks was connected to Kevin Bacon if Apollo 13 was no longer a data link?    Probably, but the link would no longer be direct.</p>
<p>The problem with linked data disappearing is that the relationship between Aquaporin4 and Eric Nestler is less well established than the relationship between Tom Hanks and Kevin Bacon. Actually, a database of supplementary materials contains this connection (see <a href="http://www.neuinfo.org/nif/nifgwt.html?query=aqp4%20mcclung">Drug Dependent Gene database</a>). Indeed, if the data are deposited inside of a database but are not central nodes of discourse they may disappear without a sound.  However, their inherent value may not be in their connectivity; it may instead be that they are valuable in a direction that few have pursued as a line of investigation, such as a promising lead for a therapeutic agent in a particular disease, or the piece of negative data that will spare another researcher a year of fruitless endeavor.</p>
<div class="wp-caption aligncenter" style="width: 496px"><a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-07-14.html"><img title="LOD map" src="http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-07-14.png" alt="Linking Open Data (LOD) project map" width="486" height="364" /></a><p class="wp-caption-text">The six degrees of online data sources</p></div>
<p>The stance of the Neuroscience Information Framework (NIF) as a member of the semantic web community is that data should be preserved because it may be useful at a later time.  The larger question is who will pay to preserve the data?  What is the role of libraries in an age where books are no longer made of paper, but stores of knowledge with ‘a front end’ and a ‘back end’?  Will we have thousands of databases taking up room in library basements somewhere, where they can be accessed like so many other ‘collections,’ or will projects such as NIF be the keepers of these data because they can integrate the searching of the data across data structures?  Who will champion data preservation in the digital age?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neuinfo.org/index.php/general-information/if-a-tree-falls/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Meaning of &#8220;Is&#8221;</title>
		<link>http://blog.neuinfo.org/index.php/general-information/the-meaning-of-is</link>
		<comments>http://blog.neuinfo.org/index.php/general-information/the-meaning-of-is#comments</comments>
		<pubDate>Fri, 16 Apr 2010 22:57:52 +0000</pubDate>
		<dc:creator>lee</dc:creator>
				<category><![CDATA[Curation]]></category>
		<category><![CDATA[Essays]]></category>
		<category><![CDATA[General information]]></category>

		<guid isPermaLink="false">http://blog.neuinfo.org/?p=250</guid>
		<description><![CDATA[THE MEANING OF "IS" - APPLYING SEMANTICS TO NIF 2.5 -- 

As far as the NIF is concerned, “IS” is the inferior salivatory nucleus. How do we know? Learn about how NIF deals with "entity recognition" in this exciting new blog post! ]]></description>
			<content:encoded><![CDATA[<p>That’s an easy one, with all due respect to our former president.  As far as the NIF is concerned, “IS” is the inferior salivatory nucleus.  How do we know?</p>
<p>Perform a search in NIF and you will see various terms highlighted in the search results (the current highlighting color is brick red, but we are open to suggestions).   Hover over each of these highlighted terms and NIF will tell you what the term means to the NIF system.  If you hover over “IS,” NIF tells you it’s an anatomical structure. If you right click on it and ask to see “IS” in the <a href="http://neurolex.org/wiki/Category:Inferior_salivatory_nucleus">Neurolex</a>, it will tell you that IS is an abbreviation for the inferior salivitory nucleus.  This new feature is an example of what is often called “entity recognition.”</p>
<p>In the formal world of knowledge representation, an entity is that which is perceived, known, or inferred to have its own distinct existence.  For NIF, entities are those things like organisms, cells, molecules, and techniques that define our domain.  These entities are represented in the <a href="https://confluence.crbs.ucsd.edu/display/NIF/NIF+Ontologies+and+Terminologies">NIF ontologies</a>.  Each entity has its own numerical identifier, sort of like a social security number, that uniquely identifies the entity.  This identifier is used to point to different ways of saying the same things to the same entity.  For example, NIF doesn&#8217;t care whether you call entity birnlex_2645, the IS, inferior salivary nucleus, or Freddy, for that matter.  They are all (and always) the same thing.</p>
<p>Unfortunately, the richness and complexity of our language makes recognizing entities a tricky thing, as everyone who uses a search engine knows.  Not only can we call the same entity many things, but we can call many entities the same thing.  Chances are that the IS highlighted by NIF in the search results actually is not the inferior salivatory nucleus but the third person form of the verb “to be,” or perhaps it is the initial segment of an axon or the Institute for Science.    Right now, NIF doesn&#8217;t really know.</p>
<p>In future releases of NIF, we will be working towards improving the accuracy of our entity recognition.  Why?  Because once we know that IS is a brain nucleus, we can find anything that is known about it:  its projections, its genes, the diseases in which it is affected.  A preview of what is coming can be seen in the <a href="http://neuinfo.org/tutorials/general_search/nif2.5_newfeatures.shtm">NIF Cards</a>.</p>
<div id="attachment_254" class="wp-caption alignleft" style="width: 310px"><a href="http://blog.neuinfo.org/wp-content/uploads/2010/04/ISsearch.png"><img class="size-medium wp-image-254" title="ISsearch" src="http://blog.neuinfo.org/wp-content/uploads/2010/04/ISsearch-300x137.png" alt="IS Search" width="300" height="137" /></a><p class="wp-caption-text">Search for IS with NIF Card</p></div>
<p>NIF cards for each entity can be viewed by right clicking over the highlighted term and selecting “Show NIF card” from the menu.  NIF cards currently are only implemented for anatomical structures and cells.</p>
<p>For now, however, we hope you will explore the new NIF and develop an appreciation for the difficulties of semantic search by seeing what NIF thinks the results mean.  You may be surprised!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neuinfo.org/index.php/general-information/the-meaning-of-is/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Defining Adulthood</title>
		<link>http://blog.neuinfo.org/index.php/general-information/define-adult</link>
		<comments>http://blog.neuinfo.org/index.php/general-information/define-adult#comments</comments>
		<pubDate>Wed, 03 Feb 2010 16:00:12 +0000</pubDate>
		<dc:creator>Jade</dc:creator>
				<category><![CDATA[Curation]]></category>
		<category><![CDATA[General information]]></category>

		<guid isPermaLink="false">http://blog.neuinfo.org/?p=159</guid>
		<description><![CDATA[THE PROBLEM
Adulthood, like many terms we use for describing data, is a very poorly defined and a somewhat arbitrary concept. When does an organism become an adult? The answer in general would be “it depends on how you define adult.” In the highly charged world of scientific discourse, people may argue correctly that there is [...]]]></description>
			<content:encoded><![CDATA[<p><span style="text-decoration: underline;"><strong>THE PROBLEM</strong></span></p>
<p>Adulthood, like many terms we use for describing data, is a very poorly defined and a somewhat arbitrary concept. When does an organism become an adult? The answer in general would be “it depends on how you define adult.” In the highly charged world of scientific discourse, people may argue correctly that there is no single definition of adult that would satisfy everyone or that there is a magical time point at which it occurs. The question for the Neuroscience Information Framework or any other group attempting to integrate data from many sources is not whether one group of definitions is correct, but rather whether such a concept is useful for comparing and understanding data.</p>
<p>To illustrate this point, MGI or the mouse genome informatics project, which is <em>the</em> place to go for all things mouse (from mouse strains to ontologies and genes), does not define the term adult, because of the disagreement among scientists as to what constitutes the break between juvenile and adult mice (personal communication). Of course MGI does have the “adult brain ontology”, among other resources labeled with the term adult. So they use the term as it is useful and describes a set of organismal characteristics, but are unwilling to define the term due to the ambiguities in the definitions.</p>
<p>Other large datasets, such as the Allen Brain Atlas do not deal with these sorts of definitions; rather they take data only from postnatal day 55 animals, which they consider safely within the adult range.</p>
<p>In an ideal world, we would provide a standard set of organism attributes for every subject used that is provided in a computable form, e.g., age, weight, sexual maturity. Anyone would therefore request data only from those subsets of animals that were comparable, e.g., between ages 30 days and 90 days and between 100g – 200g. Within a given resource, e.g., database, one can easily set up such a system. However, for a system like NIF that searches across broad swaths of information contained in individual databases, XML files, HTML pages and text, it is currently impossible to provide such a universal computational service on the fly even for something that should be conceptually simple, e.g., representation of age (days, months, years, prenatal, embryonic etc). Nevermind the fact that such information is not consistently available for a source.</p>
<p>A consideration of the literature shows that many times the only label for age is “adult” with no specifics provided.</p>
<p>For databases that take and analyze data from published work, like neuromorpho.org, the word adult is the only age that accurately describes a particular data set. Automated systems recognize this term, but if the definition is not constant across sources, the “adult” is not a useful bucket for aggregating information. One source may have adult as starting at P21 while another at P30. Furthermore automated systems would not be able to translate “P55” as adult, or “week 5” into adulthood unless there was a definition that could be applied.</p>
<p><span style="text-decoration: underline;"><strong>DEFINITION OF ADULT</strong></span></p>
<p>The question is whether we can come up with a definition of adulthood that can be consistently applied. Most of the biological definitions of adulthood deal with the readiness of an organism to reproduce, sexual maturity, or the notion that an animal is full-grown. Both definitions have inherent problems. For example, many species including male rats do not stop growing until death, making “full-size” only applicable when animals have reached their death. Similarly, sexual maturity may be defined as the onset of estrus, but can also be defined as the termination of ‘pubescence’ a period of time that is difficult to access in a rat or mouse.</p>
<p>Adding a little complexity to the problem is the relatively simple question of what is the day of birth. Scientists from various entrenched camps define postnatal day zero as the day of birth and others define it as postnatal day one. Neither group is incorrect, but anyone attempting to bring together data from various datasets (or publications) is required to spend a large amount of time attempting to understand whether the particular piece of data comes from an animal that is P5 or P4.</p>
<p>Due to the inherent problems in defining such a thing, the ontology community (a community concerned with establishing standards in discourse in scientific communication) and many researchers that build databases meant to compare data from various sources treat adulthood with caution. Nonetheless, as evidenced by its wide use, the concept of “adult” is useful and often stands alone as an important characteristic for defining data even though it is not well defined for any species.</p>
<p><span style="text-decoration: underline;"><strong>THE ARBITRARY BUT DEFENSIBLE SOLUTION</strong></span></p>
<p>The above-mentioned problems with defining adulthood are echoed and magnified in humans, because of a need to access emotional maturity and readiness to take on the tasks of independent existence in a complex society.   The solution to determining what an adult human is has been strangely simple and boils down to a number.  Any parent of a teenager knows that there is no magical event that happens on the 18<sup>th</sup> birthday of a child, but for legal systems a hard cut-off is needed, so that treatment of criminal activities and rights bestowed on individuals are clearly defined.  Therefore in almost all advanced societies the legal adult is 18 years of age, whether or not they are emotionally ready to be one or whether or not the pubertal period has passed.</p>
<p>We suggest that a similar arbitrary but defensible cut-off date should be established and implemented for all research animals so that when age of animal is reported as “adult” we can, with some degree of certainty, compare data of one study to the thousands of other similar studies.</p>
<p>According to the work of Finlay and Darlington (Science, 268:1578-84)<strong> </strong>with the chronometry of species, the final important steps in brain development of mice occur 29.7 days after conception, or postnatal day 12 (birth is P0 in this case), menstruation typically begins between postnatal day 25 and 40 and body growth is completed at about age postnatal day 50.  So we can use the arbitrary date of postnatal day 50 as the definition of adult mouse, as this is a reasonable standard for an adult.  We will define the day of birth as postnatal day 0.  Mice between the age of P0 and P24 will be termed juvenile and mice between P25 and P49 should be termed early adult.</p>
<p><span style="text-decoration: underline;"><strong>IN CONCLUSION</strong></span></p>
<p>In the NIFSTD (Neuroscience Information Framework standard ontology) we will define arbitrary but defensible standards for mice and other common research species as this sort of standard is an important part of establishing a common framework in discussion, and not necessarily dealing with the absolute scientific truth.</p>
<p>The reason that we need a standard for age and many other such common terms is that we need to establish a point of reference, which will allow for accurate communication about results.  This is presumably the reason that the standard international system of units (SI) was put in place and we believe in the standardization of certain common variables in experiments for the sake of effective data analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neuinfo.org/index.php/general-information/define-adult/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Professional vs. self-curation</title>
		<link>http://blog.neuinfo.org/index.php/essays/professional-vs-self-curation</link>
		<comments>http://blog.neuinfo.org/index.php/essays/professional-vs-self-curation#comments</comments>
		<pubDate>Mon, 12 Oct 2009 16:53:22 +0000</pubDate>
		<dc:creator>lee</dc:creator>
				<category><![CDATA[Curation]]></category>
		<category><![CDATA[Essays]]></category>

		<guid isPermaLink="false">http://blog.neuinfo.org/?p=68</guid>
		<description><![CDATA[NIF's tireless curators continue to add valuable resources to NIF's registry and data federation.  New tools, however, allow "self registration" so that resource providers can work with NIF curators in presenting accurate and up-to-date content.  Find out about Biositemaps and learn how to do the DISCO at NIF! ]]></description>
			<content:encoded><![CDATA[<h3><em><strong>Benefits and pitfalls of integration of two very different data types</strong></em><em><strong>, by Dr. Anita Bandrowski, NIF Curator<br />
</strong></em></h3>
<p><em><strong>Overview of NIF registration processes and the role of DISCO:</strong></em><br />
The Neuroscience Information Framework (NIF) project has a dynamic inventory of more than 2300 neuroscience-relevant resources. What makes that inventory dynamic is that NIF encourages resource providers to register their resource to our catalog of &#8220;all things neuroscience.&#8221;  This process is not terribly involved for resource providers as they need to fill out basic information about their resource such as the URL, name, description and keywords.  In the near future, resource providers will also be able to take away a &#8220;DISCO&#8221; file, short for resource discovery.  This file is maintained on the resource providers&#8217; Web site.  Resource providers maintain the currency of information within this file at the source.  When a change is made, NIF is alerted to the change through an automated agent that crawls the site periodically.  In this way, resource providers do not need to provide updated information to NIF or any other system that indexes it.  The updates are performed by the system. In this way, the NIF catalog is kept up-to-date without having to visit each of the 2000+ sites currently listed.</p>
<p>The process of provider registration is a good idea, and we are not the only ones to think of it.  Other projects in biomedical science essentially seek to accomplish the same goal.   Of these, the Biositemaps project, supported by the National Centers for Biomedical Computation, has advanced considerably towards implementing a similar technology.   NIF believes that if providers register their resources using one of these tools, then they should not have to do it again using a slightly different tool. Rather, the data generated by all tools should be accessible to all systems.  We have just completed an exercise in harvesting Biositemaps files into NIF and provide here our experience with and perspectives on the exercise.<br />
<em><br />
<strong>Rationale for integration:</strong></em><br />
Tools such as Biositemaps and DISCO allow the people who know the most about their resources, i.e., those who created them, to describe those resources so that search engines can easily find them.  This &#8220;self description&#8221; is a great idea in theory, but in practice it may not work as intended.  The NIF project, a framework for resource description and discovery, has recently developed tools to harvest the descriptions from Biositemaps.  We believe that biomedical resources should be described in a consistent manner and made discoverable so that projects similar to NIF can present them to our user community.  During this exercise, we have come across several problems that were echoed by other projects attempting to do similar things.</p>
<p>At the outset, the Biositemaps initiative was created as a Google sitemaps-like database that was intended to point search engines to appropriate information about biological software and data sets.  Biositemaps has a great deal of appropriate data about biologically relevant software tools. Because of this, NIF was highly interested in importing this data, which was especially enticing because the data was able to be dynamically updated by the resource providers, meaning that if the particular software tool has a new version, the search systems would be notified automatically of any update.</p>
<p><strong><em>Metadata structure compatibility and vocabularies:</em></strong><br />
NIF has made a conscious decision to have a very simple metadata structure to alleviate problems, including the inappropriate use of metadata fields and the time intensiveness of both the curation effort and the training of curators.  The original NIF developed a fairly comprehensive structure (still available at  <a href="http://neurogateway.org">http://neurogateway.org</a>; see also Gardner et al., 2008) that was populated by the resource providers themselves.  These resource providers were mostly scientists who were building tools or databases.  Many scientists are not metadata experts, and this led to a very inconsistent labeling of resources at the outset of the NIF project.  The inconsistencies in annotation made searching for resources a very difficult task; furthermore, the complicated structure was not intuitive to the end user.  The simple structure adopted by NIF [MM1]alleviated the curation and search problems and also turned out to be quite useful for integrating lots of different metadata structures, including Biositemaps.  The mapping of fields from Biositemaps to the NIF was very simple, taking only a few days to reconcile.</p>
<p>The most significant effort for achieving integration was the mapping the resource types, e.g., database, software tools.  Biositemaps populates the resource type from the Biomedical Resource Ontology (BRO: <a href="http://bioportal.bioontology.org/ontologies/39002">http://bioportal.bioontology.org/ontologies/39002</a>), while NIF uses the NIFSTD  resource ontology.  These two efforts were developed independently but are now  converging by concerted effort of both groups.  However, during this process, they continue to have some differences.  For example, some classes exist in one ontology and not the other, e.g., core facility that is explicitly labeled in the BRO and not labeled in the NIF.  Thus, if resource providers mark their resources as a core facility, the NIF can’t automatically ingest this information, requiring intervention by a human curator.  Therefore, we have continued to align the BRO and NIFSTD as much as is humanly possible to alleviate the need for human intervention.<br />
<em><strong><br />
Data structure compatibility and scope:</strong></em><br />
While the metadata structure harmonization has taken some effort, it is a tamable exercise, but we have noticed that the data within Biositemaps supplied by resource providers is extraordinarily heterogeneous in quality.  In about 200 out of 400 Biositemaps, the data are well formed, but for the remaining records, there is partial information including missing resource names or URLs, making it difficult to take all of the data in Biositemaps and import it into NIF in an automated fashion.  The NIF registry database (as all databases) expects to see certain minimal data including a name and a URL. When these items are not present, the database does not accept the record.  Additionally, heterogeneity comes from the amount of descriptive text. NIF registry records prepared by curators have text of 3-6 paragraphs in most cases, but most Biositemaps resources describe themselves in a single sentence.  NIF uses longer descriptions because we found out early in the project that longer descriptive text includes many keywords NIF users would use for search that may not be included as keywords, making search through the NIF registry more effective.  With minimal descriptions, it is unlikely that the NIF search interface would retrieve Biositemaps resources in a sea of NIF curated resources.  Finally, the issue of combining records that are already present in NIF with Biositemaps data presented some challenges to our system.  Because we don’t yet have a universal way of assigning URI’s to resources, resources tend to be cross-listed in many catalogs.  For this reason, NIF is supporting the Common Naming Project (<a href="http://neurocommons.org/page/Common_Naming_Project">http://neurocommons.org/page/Common_Naming_Project</a>).  As NIF had already provided additional curation to the resources listed that was in many cases more thorough than that supplied by the resource providers, the process of reconciling and merging of information was not straightforward. To address the problems noted above, NIF has updated the registry data structure to accommodate two versions of each record that coexist, one is a storage bin for automated data and the other the human curated version.  Any record that is publicly available in the NIF will be curated by a human, yet with automatic registration, the human curator will be prompted to review the site whenever an update occurs.</p>
<p>Resource characterization is a tricky problem, and it is difficult to know for a particular audience the correct way to represent a resource. For example, the Biositemaps entry for the I2B2 project (<a href="https://www.i2b2.org/">https://www.i2b2.org/</a>; an NIH-funded National Center for Biomedical Computing containing a large amount of software resources) created individual Biositemaps for each plug-in to their software tools.  This is an issue of scope. Because NIF’s curators as a policy do not divide resources to this extent, we consider most plug-ins to be a part of the software resource, not an individual resource (there are some exceptions ,such as MATLAB libraries).  For a project such as NIF, resources need to be well defined because trying to catalog every resource useful to neuroscientists can be a daunting task if a resource is too narrow, such as a plug-in.  If we consider a resource to be appropriate for NITRC, a software library with hundreds of software applications, then it will take a curator some time to annotate this.  However if we consider each plug-in to each program a resource, the task becomes too large and is not likely to help users.  On the other hand, if a user is looking for a very specific plug-in, then having access to each individually is likely to be useful.</p>
<p>To solve this scope problem, we have created a uniqueness criterion for the URL, meaning that if the URL is not unique among several Biositemaps &#8220;resources,&#8221; then the resource descriptions will be folded into one.  The solution is not perfect because unrelated resources could potentially have the same URL, but this strategy solved more problems than it created.</p>
<p><em><strong>Summary:</strong></em><br />
&#8220;Self registration&#8221; tools such as Biositemaps can be used to help human curators annotate a resource, including alerting curators that a resource has been created. However, while these tools can certainly help, we believe that these self-reporting tools do not replace trained human curators.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.neuinfo.org/index.php/essays/professional-vs-self-curation/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
