This Page is an Archive of an old discussion - June 11th 2010
Dear BioPAX community,
The excitement, interest and community around BioPAX has recently been growing leading to an expansion of the requirements and goals for BioPAX. Now is a good time to state these explicitly, both for the long and short term, and to define steps of how to get from here to there in some detail.
An outcome of the recent (Nov 2005) bioPAX meeting in Tokyo was that there is enough interest in the community to pursue two different sets of requirements. One is facilitating exchange of existing pathway knowledge between data producers and consumers and integration of pathway information from multiple sources into a more comprehensive whole; the other, providing a more computationally powerful representation of pathway knowledge that makes use of the rich sets of tools being developed in the semantic web community and facilitates logical computation and automatic reasoning and discovery. Both of these goals are highly valuable and closely related, and we should pursue both.
To fulfill the first requirement, we need an ontology-based common format for increasing the accessibility of existing pathway data for data exchange, integration, visualization and analysis. This data exchange (DX) track will provide a common data schema built on top of existing representations. This schema is already quite mature and can be completed for general use in the near future. To fulfill the second requirement, we need to develop the ability to use existing and future semantic web tools to reason over and query BioPAX pathway models. This second track, the semantic web (SW) track, promises to add powerful new uses to BioPAX, such as automatic reasoning that facilitates seamless, more robust and more efficient data integration. As these tools are still in dynamic development, we have to do more research to determine which features are currently supported by OWL and/or OWL tools, which tools will become available in the short and medium turn, the level of effort for data providers to map to a modified representation, and a realistic time plan for implementation.
As there is a considerable difference between the two tracks in terms of timing and level of development, we can benefit from running two parallel tracks, provided both have clear goals and practical procedures for reaching the goals.
Here, as a next step in defining the process, we propose a work plan for the "DX" track.
A SW track workplan will also need to be developed and presented in an additional document.
Although ambitious, we believe that this work plan is feasible. In addressing some level of detail of roles, communication and decision making, we expect an effective and productive process.
Please let us know what you think and whether you will be able to play an active role over the next few months.
Chris Sander, Chris Lemer, Gary Bader, Emek Demir (6 December 2005)
Motivation and Scope
The next major goal of this work plan is to extend the capabilities of the existing BioPAX ontology to cover data available in signaling databases, as previously planned in the BioPAX roadmap. The scope of this is best focused on currently available types of data and a desire to be able to accommodate such data in the short term. It's next release will by no means be final, and may eventually undergo substantial revision and/or be replaced by more expressive ontologies developed in the SW track.
This work plan aims to cover four additional types of biological phenomena:
- molecular states
- generic physical entities
- gene regulation
- genetic interactions.
Each of these phenomena will be covered in independent subproposals, each with its own time plan. The suggested overall deadline for the release candidate is March 2006.
As we are basing our decision making on short term goals, stakeholders of this track are community members who need to provide and consume data in the short term. Any person can be a stakeholder in any of the proposals, provided that they meet the criteria, and make the necessary commitment as defined below. We define two types of stakeholders:
Data Providers are parties who already have a substantial amount of data and are willing to export it to this level. Data providers should make a commitment to take part in the requirements phase by setting the requirements for their data and the testing phase by writing prototype exporters and giving feedback, within specified deadlines.
Data Consumers are parties who have existing tools/prototypes and are willing to import the available data. Data consumers should make a commitment to take part in the requirements phase by setting the requirements for their tools and the testing phase by writing importers and giving feedback, within specified deadlines.
We propose to collect requirements that can be implemented and tested within a three month period. Thus it is essential that each stakeholder participates actively in the requirements and testing phases. A person/group can be both a data consumer and provider.
Participants and Roles
Any person who is a stakeholder or represents a stakeholder can be a participant. Each stakeholder should be represented by at least one participant for requirements and testing phases.
Roles define the extra responsibilities of participants of each proposal. There are three different roles and a participant can have more than one role, and each role is typically carried out by several participants. Roles are self-assigned on a voluntary basis, however volunteers are expected to commit to a substantial amount of effort.
Editor : Write proposal, provide OWL implementation and worked examples.
Reviewer : Review each step of proposal and give feedback.
Domain Analyst : Analyze the represented database/tool and provide requirements.
Tester : Test the proposal by providing worked examples, exports, software prototypes.
We will use the proposal template in the BioPAX wiki as our development methodology for each proposal.
Requirements Selection and Prioritization
Major requirement: requested by at least one data consumer and two data providers. The proposals should attempt to cover all major requirements, prioritized based on the number of data providers and consumers stating this requirement.
Minor requirement: does not meet the above criteria. Only covered if there is no major objection, if it does not conflict with a major requirement, and is straightforward to implement.
Implementation and Testing Strategy:
After a requirements freeze, editors of each proposal work in short cycles creating proposals and releasing them to the reviewers for feedback. The iteration ends when all reviewers are satisfied, and a beta is released to the testers. A proposal is further improved based on tester input, resulting in the first stable proposal. After all proposals are stabilized, they are merged and further tested forming a release candidate, which is posted in the BioPAX.org website. After further general testing and review, and checking for backwards compatibility (possibly in a hackathon and F2F meeting) the candidate is released as the first stable version.