Archive for the ‘Essays’ Category

We have A LOT of neuroscience information, and would like to share….

Posted on May 14th, 2013 in Curation, Inside NIF, Jonathan Cachat | No Comments »

Over the past 4 years, the Neuroscience Information Framework systematically scanned the literature, internet and social buzz for all things neuroscience (& biomedical science). This tedious bookkeeping has resulted in the largest, most comprehensive catalog of neuroscience-relevant information ever amassed – with the added bonus of semantically enhanced search functions. And now, we would like to share it with you via myNIF…but before those details…

What do we mean “neuroscience information”?

Neuroscience information includes data, resources, literature, grants, multimedia, social buzz, a lexicon and more..

Data: Over 140 independent databases (i.e. CCDB, Grants.gov, GENSAT) are deeply indexed and semantically mapped by NIF – representing over 400 million pieces of data. These data are considered part of the “hidden web”, not indexed by major search engines because do so requires specialized database query statements for retrieving data within, rather than on the surfaces of pages surrounding the database. NIF has developed technologies to regularly re-crawl and update data content, index it, and provide search within the contents of these databases simultaneously. Moreover, data resulting from a search can be exported with a single click into standard data formats for desired, subsequent analysis. This can simply save  you time – if you need to know what type of serotonin receptors have been classified in zebrafish (Danio rerio) – searching NIF for ‘zebrafish serotonin receptor’ provides results from authoritative data providers (HomoloGene, EntrezGene) which can be compared instantly, rather than visiting each site separately, and comparing through notes, multiple windows, or several downloads. In addition to this primary information , the results also include related, and sometimes very helpful information about zebrafish and serotonin – signaling pathways, antibodies, and grant information.

Resources: Need to find a software analysis package for microarray data? NIF can recommend 41 options, as well as 100+ unique organizations, centers, labs and websites that  have similar interests. Looking for non-governmental funding of ALS research? Here are 7. What about a tissue bank with Alzheimer’s disease CNS tissue samples available for researchers? NIF is aware of around 88 worth a look. All of this to convey that a resource is object or entity, with a website, that provides potential value to neuroscience research or the researchers. Importantly, this catalog of resources indexed by NIF is maintained at NeuroLex, a semantic mediawiki website. Homologous to Wikipedia, in that any one can contribute their resource or favorite resources, but endowed with reasoning capabilities permitting logical reasoning on relationships between data (i.e. list all GABAergic Neurons).

How Do You Evaluate a Database

Posted on May 3rd, 2013 in Author, Essays, Force11, Maryann Martone, News & Events, NIFarious Ideas | 3 Comments »

by Maryann E Martone

I was speaking with a colleague recently who, like many of us, had experienced the frustration of trying to support his on-line resources.  He has assembled a comprehensive on-line resource, it is used by the community and was used by others to publish their studies.  It is not Genbank or EBI;  it is one of the thousands of on-line databases created by individuals or small groups that the Neuroscience Information Framework and others have catalogued.  My colleague has spent years on this resource, pored over hundreds of references and entered close to a million statements in the database.  By many means, it is a successful resource.  But in the grant review, he was criticized for not having enough publications.  I experienced the same thing in a failed grant for the resource that I had created, the Cell Centered Database.  In fairness, that was not the most damning criticism, but it just seemed so very misplaced. I had succeeded in standing up and populating a resource, well before there was any thought of actually sharing data.  People used the database and published papers on it, but apparently I should have been spending more time writing about it and less time working on it.

The problems of creating and maintaining these types of resources are well known and were discussed at Beyond the PDF2:  to be funded, you have to be innovative.  But you don’t have to be innovative to be useful.  To quote or paraphrase Carole Goble at the recent conference,  “Merely being useful is not enough.”

But presumably there is a threshold of perceived value where “merely being useful” is enough.  I am thinking of the Protein Databank or Pub Med.  These resources are well funded and also well used but hardly innovative.  I am guessing that many of the resources like my colleague and I created were started with the hope that they would be as well supported and integral to people’s work as the PDB or Pub Med.  But the truth is, they are not in the same class.  But they are still valuable and represent works of scholarship.  We are now allowed to list them on our biosketch for NSF.  So my question to you is:  how do we evaluate these thousands of smaller databases?

Ironically, our peers have no trouble evaluating an article about our databases, but they have much more trouble evaluating the resource itself.  How does one weigh 30,000 curated statements against 1 article?  What level of page views, visits, downloads and citations make a database worthwhile?  If my colleague had published 10 papers, the reviewers wouldn’t have likely checked how often they were cited, particularly if they were recent.  What is the equivalent of a citation classic for databases?  If you don’t have the budget of NCBI, then what level of service can you reasonably expect from these databases?  I thought that the gold standard was a published study that utilized your database to do something else, by a group unconnected to you.  Grant reviewers found that unconvincing.  Perhaps I didn’t have enough? But how many of these do you need, relative to the size of your community,  and on what time frame should you expect them to appear?  Sometimes studies take years to publish.  Do they need to be from the community that you thought you were targeting (and whose institute may have funded your resource) or does evidence from other communities count?

So perhaps if we want to accept databases and other artefacts in lieu of the article, we should help define a reasonable set of criteria by which they can be evaluated.  Anyone care to help here?

What is the Cerebral Cortex?

Posted on January 14th, 2013 in Anita Bandrowski, Curation, Essays, Force11, Interoperability, News & Events | No Comments »

by Anita Bandrowski,

This may seem a silly question, but lets see if you are more like a fifth grader or more like me. It appears that a fifth grade class I recently interacted with can answer a question that I am having a lot of trouble with. They rattle off “the outside part of the brain”. True enough.
They can point to it, its the part that is “squiggly”. True enough.
“It is the part that thinks”. Ok, we can go with that answer.

So why are these fifth graders smarter than I am? Pun intended.

Read the rest of this entry »

Why I started blogging-a scientist’s perspective

Posted on December 19th, 2012 in Essays, Force11, Maryann Martone, News & Events, NIFarious Ideas | 7 Comments »

by Maryann Martone

A recent post at the London School of Economics Social Science Impact blog on “Finding the time to blog” reminded me that I wanted to write a blog about why I started to blog. The use of social media and its proper place in academic communications is being discussed in many circles. Over at FORCE11, we aggregate quite a few blog feeds like the one from LSE where these issues are thoroughly covered. I wanted, however, to share a personal perspective. Like many scientists, I suspect, I was at first reluctant to blog. I did write a few posts for the NIF blog when we started it up, but then stopped because “It takes too much time”. Each blog took me several weeks before I was happy with it and, as is well advertised, blogs don’t count towards academic promotion, etc. So if I was going to spend that amount of time, I might as well spend it towards something that does count: writing papers, giving talks, training, teaching, networking and, oh, doing research. Besides, who would want to hear what I had to say?

Well, the astute reader might have noted that many of our rewarded activities involve someone (funders, conference organizers, students) actually paying to hear what we have to say. And, the astute reader might also note that a blog is a much more effective communication vehicle than most of these for accomplishing these tasks. I started to blog for real when I realized that a blog is my communication with the world. A lot of money has been invested in me as a vehicle for knowledge acquisition and integration. The more I share that with the world, the better I do my job. A blog is not a learned treatise which needs to carefully consider all angles, acknowledge all references in a specified format and go through rounds and rounds of editing to craft the language so as to offend nobody with unsupported statements. A blog is a written yet highly interactive version of the type of conversation I engage in every day with students, colleagues, audiences. It is my thoughts on a topic, developed over a lifetime of active inquiry, open to correction and discussion. You can believe them or not, just as you choose to believe them when I am speaking to you in an informal or formal setting.

But unlike these other forms of transient communication, where my words evaporate into the air, blogs live on the net. They are searched by Google, so they can be found easily. And they are living things, open to comment, discussion, updating. Once I realized what a blog could be, I could fire one off in a matter of minutes. Do I get some things wrong? Sure. But isn’t that why we communicate with each other in science, so we can try to put our thoughts in order in a way where flaws can be exposed? It was a magical moment when I read over a blog that I had posted earlier and realized that I had left out a part of the argument. Oh no! But then I just opened edit and put it in. But what if I misrepresent some part of an argument or forget to acknowledge someone? Isn’t that why we have peer review? Well, if you want peer review, just read the comments. Usually, someone will correct you if they care enough. And again, you can immediately acknowledge that input and modify your posting or post a new one. So rather than blogging taking me away from my job, I actually think it lets me do it better. It is a freeing form of communication. Scientists generally are interesting people, but you would never know it from the articles they produce. But you do when you get them talking. And that, imho, is what a blog should be: scientists talking for everyone’s benefit.

A Call to Science Bloggers

Posted on November 9th, 2012 in Force11, Interoperability, Jonathan Cachat, News & Events | 5 Comments »

With the growth of scientist participation in blogging and social networks, a considerable amount of meritable scientific chatter is unfolding online. Several prominent blogs have emerged, in fact NIF is now indexing may of these sites (via RSS feeds) and can be found in the Multimedia data type***.

In our continued effort to integrate and link data, NIF would like to create two-way links between your blog posts and the scientific articles they discuss through NIF Literature. For example, if users find an article in NIF Literature we can provide links to blogs or tweets that have discussed this article, in addition to the current link to full text access options. Your site or blog would also be included on PubMed search results thank’s to NIF automated LinkOut feature services.

However, it is currently very hard to achieve this goal and would require substantial manual curation efforts. In order to automate this process, we submit a few simple guidelines to the online science community.

1) Blogs and other long-form posts should always include related PubMed Identifiers (PMID) in citations. References can be in text, or placed together at the end of a post, but either way should include PMID: ######## for all citations. This standardized format of ‘PMID:######’ was suggested by the BioDBcore and biosharing.org initiatives and we strongly support it.

  • This is a MindHacks post without any citation information at all (aside form a link to Nature) – this is the worst possible scenario, for the purposes of this article. It is a wonderful dialogue on this exciting article, but very unlikely that people reading this article will ever know that this post exists – unfortunate for everyone involved.
  • This Neuroskeptic post correctly included citation information, along with the PMID, at the end of the post (see Screenshot below).

2) Short-form posts should include PMIDs when possible, particularly if linked directly to article. For example a recent tweet here.

3) Be found – index with search engines including NIF. For more information about submitting your site, blog or resource to NIF check here or fill in the small form here.

The internet was designed to enable a web of links between ideas, information and people. Following these simple guidelines will not only increase the connectivity between data, the social and semantic links are also valuable to information creators. First, it promotes more opportunities for scientific exchange and feedback. Secondly, it provides additional avenues to calculate impact metrics – similar to those observed by AltMetrics.org and PloS Journals.

Do you have any other thoughts related to increasing data integration and interpretability? Share them here in the comments below!

***If you would like to have your blog or site included within the NIF index drop us a line – info@neuinfo.org

How to make the most annoying biological database

Posted on November 4th, 2012 in Anita Bandrowski, Force11, Interoperability, NIFarious Ideas, Uncategorized | No Comments »

Dear biological database owners,

We have attempted to let people know how to make databases more interoperable and discoverable, but this blog takes a very different take on the idea. The ideas brought forward include making data silos, generating non-unique identifiers and my current favorite is the 44 page getting started guide.

So, what is it that you will build next?

Those mean journals won’t publish my methods!

Posted on September 13th, 2012 in Essays, Force11, Maryann Martone, News & Events, NIFarious Ideas | 5 Comments »

The NIF team recently attended the Neuroinformatics Conference, held in Munich, Germany.  The conference featured several lively discussions on the reproducibility problem in neuroscience (and neuroinformatics) and what should be done.  Many in the audience complained that part of the problem is that the journals, especially the high impact ones (you know who you are), are cutting materials and methods further and further.  Many calls were made to put pressure on the publishers, and NIF is certainly all for that.  But thanks to our involvement in FORCE11, we asked the question to the audience “Why are you relying on the journals for this?  If you think that you need detailed materials and methods, why aren’t you publishing them on the web?  Your paper can still be in the journal, but why aren’t you making videos explaining your methods and posting them on You Tube or Sci Vee.  Why aren’t you using wikis like Open Wetware to make your detailed protocol available?  Why aren’t you writing a blog including details about your paper, including more detailed methods?  Why aren’t you putting your data into public repositories?  Why aren’t you creating a video protocol using Jove?”  I think it’s time for the scientific community at large to start asking themselves these questions.  But more importantly, it is time for the scientific community to act.  Scientists need to start cleaning their own house.  We do not have to wait for the journals to allow us to make our science more reproducible.  For the good of our respective fields, we should be doing this now.  If you don’t like these venues, NIF would be happy to host your videos and protocols.  NIF doesn’t care where something is, as long as we can link to it.  And NIF will link your protocol/video/blog to your published article in Pub Med too, using our link out feature.  These links are also featured in NIF literature as well.   And don’t go telling us that if the journals don’t do it, then it won’t be part of the permanent record.  That is undoubtedly true.  But until the journals change, the materials and methods will continue to shrink.  Isn’t a short term solution better than no solution at all?

How to bury your academic writing (or should I write that book chapter?)

Posted on August 31st, 2012 in Essays, Force11, Maryann Martone, News & Events, NIFarious Ideas | No Comments »

A recent blog post by Dorothy Bishop on “How to bury your academic writing” came through this week that considers the question of the relative impact of book chapters vs published articles.  She concluded that book chapters generated far fewer citations than published articles and attributed it to the fact that book chapters are generally behind a pay wall, often a fairly hefty one (the latter is my opinion, not hers).  It prompted a follow up blog post by Pat Thompson “Is writing book chapters a waste of time?” in defense of  writing book chapters. I don’t think that Ms. Bishop was saying that book chapters were a waste of time; indeed, she claimed that some of her best scholarly work was done as book chapters, as the medium allows for more speculation and creativity than journal articles.  I too have found that to be true;  some of my best works were book chapters, even though I was told early on in my academic career that book chapters were generally a waste of time and effort, as they did not count towards academic promotion (at least in the biomedical field). But they allowed me greater literary freedom than the typical biomedical article, and I was able to speculate and develop arguments without reviewers crying “foul!”.   But even I can’t gain access to many of these chapters anymore, except as my original word files, unless I have a copy of the book around.  So I concur with Ms. Bishop that writing book chapters is perfectly fine, but writing them on-line where they can be found and actually read would likely make them much more useful.  There are a lot of interesting tools and models out there where this could be done, e.g., Wikibooks . I confess that very rarely in my career have I been tempted to answer one of the many invitations to edit a book.  But if I did, I would strongly consider taking Ms. Bishop’s advice:  “My own solution would be for editors of such collections to take matters into their own hands, bypass publishers altogether, and produce freely downloadable, web-based copy.”

Adapted from a piece I wrote at FORCE11.

Maryann Martone, Neuroscience Information Framework

Are You Sure You Published An Open Access Article?

Posted on August 24th, 2012 in Anita Bandrowski, Essays, Force11, News & Events, NIFarious Ideas | 3 Comments »

Thanks to PubMedCentral academics everywhere can read papers!!! This is a wonderful asset allowing scientists from all places, even academic backwaters like Harvard where libraries can no longer afford to subscribe to various journals (see the open letter from the Harvard libraries).

Seems like progress is truly fantastic in the open access world, or is it?

Peter Murray Rust, a great proponent of the open access movement, posted a blog asking is-this-paper-open-access? Like the green washing of consumer goods, it looks like open access is just as tricky. If a journal article is published as open access, Peter points out that there are several questions that must be answered including:

  • Can I post it on the web? For commercial use? For any use?
  • Is it Green? Or Gold? BOAI compliant? Or something else? How did you tell?
  • Is it gratis? Is it libre? If so what permissions have been relaxed?
  • Can I send someone a copy? Anyone? Or just a non-commercial?
  • Does its location affect whether it is Open Access?
  • Has someone paid for Open Access? Would their funders be satisfied?

So open access is not always open access. Our experiences in the Neuroscience Information Framework, showed us that publishers seem to really hate robots. All the major publishers now seem to say nasty things like “no robots, this is a human’s only establishment”. Now, these are not the bad robots of science fiction, but information gathering robots that can download a paper and identify things in that paper then run interesting algorithms to figure out how those things are related. The process is called ‘text-mining’ and there have been some papers that reported that when this type of robotic reading is done on a large corpus of biomedical journal articles, powerful statistics can identify marine bacterial proteins as likely targets for some types of cancers. Indeed, these reading robots could be very useful, but access to the full text articles is largely denied to them, for example see this piece in the Guardian.

What all of the text miners get (including NIF and many other projects) is the The PMC Open Access Subset :

“The PMC Open Access Subset some or all openaccess content is a relatively small part of the total collection of articles in PMC. Articles in the PMC Open Access Subset are still protected by copyright, but are made available under a Creative Commons or similar license that generally allows more liberal redistribution and reuse than a traditional copyrighted work. Please refer to the license statement in each article for specific terms of use. The license terms are not identical for all articles in this subset.”

So as scientists what are we to do? For a start, perhaps publishing within the open access subset (PLoS is a good example) is a no-brainer.

So all I need is a number?

Posted on August 17th, 2012 in Curation, Force11, Interoperability, Maryann Martone | No Comments »

In the Neuroscience Information Framework (http://neuinfo.org), we often tout the importance of using unique identifiers rather than text strings as a way to ensure that search engines like NIF can mitigate the ambiguity associated with searching for strings.  NIF provides access to the largest source of neuroscience information on the web, by providing simultaneous search over multiple databases, catalogs and literature databases.  If you search for Ca2 in NIF, you will find information on calcium, the hippocampus and a gene called CA2.  Unique identifiers can disambiguate among these by assigning unique handles to each;  a sort of social security number for each thing that we want to talk about.  Many groups are creating and promoting unique identifiers for all sorts of entities:  people (e.g., ORCID), articles (PubMed ID’s) and they are very handy things.  NIF itself has gotten into the business through its unique resource identifiers and antibody ID’s.   So all I need is a number, right?  Alas, no.  Because numbers, like names, are not unique either.  I just searched through NIF and found an antibody in the Beta Cell Consortium Database.  There was a column for “people who are using this” with a reference of  10077578.  Clicking on it took me to an article in PubMed, so clearly it is a Pub Med ID.  Great, I thought.  I want to see who else references that paper in NIF.  So I typed in PMID:10077578 into the NIF search interface and was able to retrieve the article in the NIF literature database.  But that’s not what I wanted.  Most of the times, database providers don’t provide the prefix PMID;  rather, they list just the numbers in a column labeled “Reference” or “Citation”.  So I typed in 10077578 and got multiple hits in the data federation from several databases.  Great, I thought.  Here are other sources of information that are referencing this paper.  Unfortunately, one was to Novus Biochemical antibody 100-77578, and one was to the gene Rumal_1324 (GeneID: 10077578).  So, clearly a number is not enough.  Some sort of name space is required, e.g., PMID:10077578 clearly tells me where I am to look.  NIF should have known better and is working to resolve this glitch, by identifying each number with a prefix, and in time, a full URI (Uniform Resource Identifier, not an upper respiratory infection).  The semantic web community has been working on these standards for a long time and discussion of the URI  is beyond this post.  But this is yet another example of why we at NIF encourage resource providers to think globally about their data;  are we producing our data in a form that makes it easier to link individual parts of our resource to other parts?