by Anita Bandrowski,

This may seem a silly question, but lets see if you are more like a fifth grader or more like me. It appears that a fifth grade class I recently interacted with can answer a question that I am having a lot of trouble with. They rattle off “the outside part of the brain”. True enough.
They can point to it, its the part that is “squiggly”. True enough.
“It is the part that thinks”. Ok, we can go with that answer.

So why are these fifth graders smarter than I am? Pun intended.

Let me tell you a story about how I came to ask the question above. The real question that I am trying to find out the answer to is “which genes are expressed in the cerebral cortex?” and I am trying to answer the question by doing a meta-analysis of existing data so I don’t have to sample the cortex from any more critters (or humans for that matter, something that is relatively hard to do).

To my delight, there are many groups of scientists and curators that gather and store a lot of this sort of experimental data, and as a data scientist, I am thrilled and tickled pink (if a scientist can experience such an emotion) that many of the other scientists are now making their data available to me. This has been a change building in the scientific community culminating in recent years in new data sharing guidelines handed down to us from Mount NIH for all publicly funded research. Great!

Ok, so I now have this wonderful data, laboriously gathered by various hard working souls, I load it into a tool like this one that the Neuroscience Information Framework makes available, I run my analysis (after figuring out how “true”, “weak-to-moderate” and “42″ are all related to gene expression) and I determine that the major variable determining which genes are expressed are the database that the answer came from. Wow, not the answer I was looking for! Its sort of like asking a million people their favorite color and determining that a person’s favorite color is most related to who asked the question. This just can’t be true!

So my question or my data is obviously wrong in some important way.

One important thing that is wrong is that anatomists disagree about the answer to the question that the kids thought was quite obvious. For example, is the olfactory bulb a part of the cortex? The answer to this question is very important because the olfactory bulb is a very different structure than say the primary visual cortex because the bulbs express thousands of genes that register various odors, something that the rest of the cortex does not do.

If we look at all of those very hard working individuals and ask them what is the cortex the lines and broders they will draw around the structure are remarkably different. We can know this about some of them so I have included definitions (as far as sub-parts and superparts below).

Looking at the table, we see that depending on the atlas (and associated database) the term cerebral cortex has very different sub-parts, such as the hippocampus, olfactory bulb, entorhinal and insular areas. Many don’t even agree what is the superstructure, and some like the Mouse Brain Library don’t have the term, but nobody would argue that such a structure does not exist in the mouse, quite to the contrary the fifth graders can point to it.

Now before you run off thinking that this is some esoteric problem that ontologists have to deal with, consider the following: these databases specify what they mean to a much greater degree than scientists, but much of the data in the databases comes from primary research. The data in a paper rarely includes enough information about the types of mental models (i.e., does the cerebral cortex according to you include the olfactory bulb) that the researcher is operating under and often the curator makes assumptions about what the researcher may mean by such a term because of whom they studied under or had published previously. In fact, even though apparently on twitter some may consider that methods sections may be too honest (see #overlyhonestmethods, its actually pretty funny), most data scientists are struggling to figure what is meant by something as basic as the cerebral cortex.

The lack of agreement in science about some basic things like how do you cut up the brain so that “similar chunks” end up together under one label are not new, but the proliferation of high throughput biology and the so called “big data” is bringing these disagreements into focus as people try to make sense of the data that is generated. Will your physician know what your genome screen means if we can’t agree on where the sample was taken from? Should the scientific community decide that some brain parts are in fact immutable like a hand or foot so that data from one lab can be compared to the data from another lab? Well some are trying, see and I hope that they succeed!