From BiopaxWiki
Jump to: navigation, search

Frequently Asked Questions About BioPAX


  • Why does BioPAX use OWL (which is based on RDF)?
    • An early vote resulted in the decision to use OWL instead of XML Schema to represent BioPAX. The main reasons were:
      • The ability to describe a class hierarchy
      • The ability to use the same language for the ontology and the data exchange format
      • Compatible with future technologies like the semantic web that some users are interested in
      • There was a promise that many tools for OWL would quickly become available to match the existing tools for XML Schema. As of 2005, these tools are still being matured.
      • OWL is a type of XML, so XML tools would still be useful for parsing OWL
  • Why does BioPAX have a simple ontology using a limited number of classes instead of creating many specialized classes?
    • Early on, it was decided that BioPAX would not try to create many specialized classes for the following reasons:
      • It is difficult for a group of people to agree on how to classify things as you get into more specific classes, but it is usually relatively easy to get agreement on a general classification scheme. For instance, should small molecules be specialized to aromatic and aliphatic compounds or should they be specialized into alcohols, phenols, carboxylates, etc. Few people would disagree with the need to represent small molecules in BioPAX, but many would disagree with any one subclassification.
      • Fewer classes makes BioPAX more general for a wider number of users. For instance, the inclusion of eukaryotic cellular locations directly in BioPAX makes BioPAX more difficult to use for those only interested in prokaryotic biology. On the other hand, if a simple cellular location class is used for all cellular locations (and another ontology is referenced to describe them), then it is equally useful for prokaryotic and eukaryotic biologists.
      • It is difficult to build, maintain and document a large class structure, thus less classes means faster development.
  • In the OWL community, it is possible to represent data using classes or instances. Why does BioPAX use instances for describing all data instead of classes?
    • Instances are more of a natural fit with most database and programming technologies and thus are more intuitive to most data providers and users.
    • Instances are much more compact to write out and are easier to read for humans and tools. Class vs. instance representation example.
  • Is an XML Schema or DTD available for BioPAX files?
    • No, but someone is very welcome to develop one! BioPAX is described in OWL, which is an extension to RDF. Both RDF and OWL are standard types of XML, but there are three main differences between OWL/RDF and classical XML (DTD and XML Schema-based).
      • 1. RDF is a network (graph) by default while typical XML is hierarchical by default. The RDF graph is tied together by RDF IDs e.g. <bp:protein rdf:ID="protein45">. You define a protein once, assign it a unique RDF ID within the document and reference it every time you use it. In XML, by default you would define the protein again each time you use it. The tradeoff is that RDF is maximally non-redundant by default, while XML is maximally redundant by default and RDF is thus more difficult to read and edit while is XML easier to read and edit. Any XML Schema for RDF must take into account the RDF ID attribute on most elements, which is relatively easy.
      • 2. OWL/RDF does not require an order for its elements while XML Schema does. Any XML Schema describing RDF would be limited to describing a single ordering and thus could not be used to validate RDF in general. It would still be useful for generating RDF.
      • 3. OWL describes a number of types of logical restrictions on classes and properties that are not found in XML Schema (though many of them, like cardinality are). XML Schema also describes restrictions that are not available in OWL e.g. regular expressions that must be match for valid attribute and element values.
      • An XML Schema for BioPAX would have to describe BioPAX RDF directly taking into account the above points or would have to describe the classes, properties and restrictions in BioPAX in the normal XML Schema manner and be interconverted to OWL/RDF.
  • Is a UML model available for BioPAX files?