The OpenKnowledge eHealth Use Case

Using the OpenKnowledge system to ease re-use algorithms written by biomedical and biological scientists working in the Proteomics domain

By: George Anadiotis, Paolo Besana, David Dupplaw, Dietlind Geldoff, Frank van Harmelen, Spyros Kotoulas, Adrian Perreau de Pinninck, Dave Robertson and Ronny Siebes. June 2007.

General Description
The number of different proteins that are present in a certain tissue (e.g. human liver cells) under certain conditions (e.g. after intake of alcoholic beverages) can easily reach several hundreds, or more. Characterising this contingent of proteins, i.e. identifying as many of the proteins present as possible and considering other information that is known about each of them, is crucial for biologists trying to understand the underlying regulation and adaptation of the respective biological system (here the human liver). A technologically advanced strategy to characterise proteins on a large scale involves fragmenting the proteins and the use of mass spectrometric analysis to determine the amino acid sequence of each fragment. This technique is often referred as “Proteomics” by biologist researchers. To accomplish an actual identification of each protein, the fragments are compared with the sequences stored in centrally maintained databases. This is undertaken either in-house or via WWW-servers and the chances for identification vary depending on the quality of the samples subjected to mass spectrometry. Given this limitation, proteomics experts express a great interest in gaining access to data resulting from their colleagues’ research, to help them with their own analyses. Interestingly, even access to data that was of no further use to some researchers and was discarded, could be of interest to others. In this case study, we want to ease the cooperation between these researchers in the proteomics domain, by developing a system that allows to share, invoke and publish workflow descriptions and services in an open way.

The problem
The pool of potentially available knowledge about proteomics on the Internet is huge. It is fed by the traditional Web: by application programs feeding data onto theWeb, by Web services accessed through various forms of application interface, by devices that sense the physical environment, and so on. It is consumed in a wide variety of ways and by diverse mechanisms (and of course consumers may also be suppliers). Proteomics experts express a great interest in gaining access to the knowledge resulting from their colleagues’ research, to help them with their own analyses. Interestingly, even access to data that was of no further use to some researchers and was discarded, could be of interest to others. The current solutions are inadequate to enable easy re-use of the knowledge provided by peers in the field.

The solution
The aspiration of the FP6 funded OpenKnowledge project is to allow knowledge to be shared freely and reliably, regardless of the source or consumer. Reliability here is interpreted as a semantic issue. The Internet is in the fortunate situation that physical and syntactic reliability have been solved to satisfactory degrees, making semantic reliability the main challenge. Semantic reliability means that we want the meaning ascribed to knowledge that is fed into the pool, to be preserved adequately for the purposes of consumers.
Of course such “open knowledge sharing” is an aspiration that we know to be unattainable, in the strong sense where all knowledge supplied can be consumed with perfect freedom and reliability. Globally consistent common knowledge is impossible to guarantee in an asynchronous distributed system. The good news is that only a small proportion of the pool of available knowledge will be of use to any given consumer, since each must have an upper limit on how much knowledge it can process. A pragmatic aim of open knowledge sharing, then, is to obtain knowledge appropriate to the activities in which each consumer wants to engage, while maintaining free and (adequately) reliable connections between suppliers and consumers. By building a system, we demonstrate that in the proteomics scenario, sharing workflows and services at very low cost to consumers and suppliers is possible. The novelty of this system is that each interchange of knowledge is made in the context of the (shared) workflow descriptions. We then address the (unavoidable) tasks of ontology mapping, query routing, etc. using algorithms that are comparatively simple because they can (at no additional cost) use knowledge about the structure of the interaction and the ways in which it has been performed (successfully or unsuccessfully) within a peer group.

Key Benefits of Using Semantic Web Technology
Key benefits for the OpenKnowledge project include:
• Localisation of experts, Web-services and workflows in a distributed way by semantic enabled discovery algorithms
• A system in which people can develop, share and visualize via different user interfaces, the semantic descripitions of the resources (experts, services and workflows)
• Providing a shared point of access for all people interested in any field, including the proteomics domain
• Enabling re-use of mappings between terminologies used in different domains using Semantic Web technology.

See Other Use Cases

categories [ ]