FISH.Link researchers from FWL, UoM and KCL came together on the 28th September to talk about requirements and data gathering. We used the data gathering exercise to help us drive the requirements and vice versa; our requirements also drove us to look at our data and ask for more data.
We wish to integrate various data about fresh water biology so that fresh water biologists can ask useful questions. In this first phase of FISH.Link, we wish to do some simple integration to see what we can do and how we might do it. A basic prototype is also a classic way of eliciting further requirements.
Very generally, the style of question we have follows the form 'compare species in a range of locations that have various features'. Less abstractly, we might ask 'compare plant species in bodies of fresh water in Cumbria that are either still or running, at above 1,000 metres with a mean summer temperature of 3 degrees celcius or higher'. Such a query (or one that actually makes more biological sense) would retrieve data that may in itself answer a question, but may also provide data for further analyses.
we have various data sources:
* Major ions, lakes & tarns, 1982
* Major ions, Duddon & Windermere, 1982
* Tarns Database
* Ferry House Level Data
* Windermere temp
* Meto data for Cumbria
* RIVPACS
* BIOSYS
* National Park Population
In this first phase, we have decided to restrict ourselves to Cumbria; our FWL partners know the biology and the data -- we will expand later to check that what we're doing copes with more than just Cumbria.
To answer the types of question we want, we need to integrate one or more of the given data sets. The data mixes both objects (species, geographical locations; bodies of water) as well as data about those objects (temperature, level, and so on) at various time points. All of these data are observations made with some sampling method.
We integrate on objects. Two data sets share an object such as a tarn and say different things about that tarn. We integrate on such objects so all the data on that tarn is then found in one place. Ideally, we can then navigate or link through to other objects and expand or alter the query.

A diagram was produced at this meeting and gives an overview of the data we have. One thing this exercise brought home was the observation-centric nature of all these data. Observations, of course, also bring in attribution; an observation was made by someone or some project that needs attribution.
We can see that there are many object types upon which we might integrate:
* species;
* locations
* more and more
The next step is to check by which means these objects are identified. If all use a common scheme, then integration becomes easier. If not, then mappings will ahve to be made. This is why we are going through these datasets asking these kinds of (admitedly niave) queries---in order to explore the data and how it works.
This initial exploration also helps us find out what our vocabulary needs are.