FISH.link is drawing to a close and it's time to reflect on the outputs of the project. Our key output has been the development of tooling that extends the workflow identified in the FISHNet project, supporting the conversion of datasets to RDF. FISHNet allows the decoration of datasets with metadata, but this is not at a level of detail supporting annotation of individual columns. It can thus be hard to understand what a particular column in a data set means. This is, of course, even more of an issue when machine processing or linking of datasets is to be done. As a simple example, datasets may well use terms such as "MASL", "m.a.s.l.", "Map_alt_m" and "altitude", all to refer to the same measurement (height above sea level). This use of different terms is often due to established working practices within communities.
The FISH.Link tools (working title WaterColumn) have been developed to address these issues, allowing the association of additional column level information with a dataset. The tools have specifically been developed to work with spreadsheet data. Spreadsheets are commonly used to capture or record data and there are a multitude of tools, commercial, free and open source that can handle and manipulate formats such as CSV. A number of the datasets targeted during the project were already in spreadsheet form, or could easily be massaged into such a format. Our experiences in other projects supporting scientists (for example the development of tools such as RightField and Populous) reinforce this use of spreadsheets.
The tool allows a user to annotate a spreadsheet with column level metadata that provides additional information about the data contained in the column including the type of item, the type of data and any associated units. These annotations make use of terms from shared vocabularies. Annotated spreadsheets can then be converted to RDF using additional tooling (based on the open source XLWrap spreadsheet-to-RDF wrapper). The conversion process validates that the metadata conforms to the vocabulary. Converted datasets can then be uploaded to a triple store and queried via SPARQL or published as Linked Data.
The tools are primarily aimed at working freshwater biologists without particular technical expertise. Such users are comfortable with the use of spreadsheet based tools, but would not be expected to be familiar with, for example, the tooling required to convert datasets to RDF.
Integration with the FISHNet repository has been achieved through the use of Fedora PIDs in the tooling.
The picture below shows the basic workflow supported by the tool.

- A spreadsheet is pulled from the repository in csv, open office or excel. Tool adds empty dropdown selections and embeds vocabulary in sheet. New excel sheet placed in repository.
- Domain expert pulls amended sheet from repository and annotates columns with appropriate vocabulary. Reviewer can review process and enhance vocabulary if necessary. Result is annotated excel sheet, pushed back to repository
- Annotated sheet passed to tool which generates XLWrap mapping file.
- XLWrap converts annotated sheet to RDF using mapping file. RDF added to triplestore
- SPARQL queries against triplestore produce (integrated) tabular data for analysis.
Using this triple store/RDF based approach enabled us to integrate a number of data sources. The sources used were chosen to support our prime case study which was focused around a paper by Jones, Li and Maberly, investigating relationships between species richness and alttitude. The analysis in the paper requires combination of information from different data sources. SPARQL queries can now be used to pull out "wide tables" of data for further analysis. A key point to note here is that the analysis is still left in the hands of the scientist -- the FISH.link infrastructure is simply about supporting the integration/extraction of the data.
In the final weeks of the project we hope to complete the set up of a repository at FBA containing the converted datasets. Tools will also be made available via the project web site.