RDF-IDs in BioPAX

From BiopaxWiki
Jump to: navigation, search

This page is not yet ready for prime-time.

Although the idea of this page is general, examples are BioPAX Level2 specific. Please continue to L4Workgroups pages, such as Semantic Web or Validation ones (Semantic_web/linking/CVs#URIs.26CVs, URI-CV_Discussion, etc.), for more information.

Main points I want to make here:

  1. RDF-IDs are required for all instances in OWL
  2. They are not needed for data exchange, however
    • A) Data producer lables an xref with an RDF-ID, data consumer throws that ID away when they integrate it into their database
 B. In early uses of BioPAX, RDF-IDs have not been stable (no need).
  1. E.g. a particular xref would likely have a different RDF-ID after every data conversion
  2. Storing RDF-IDs for the numerous instances created during conversion to BioPAX would be a serious burden to users
  3. Stable RDF-IDs required for Semantic Web
  4. Need ID to be stable in order to create links between different pieces of information
  5. As semantic web develops, we will likely face increased demand for stable RDF-IDs

Possible solutions:

  1. Create stable RDF-IDs for entities
    • A) Entities (e.g. proteins, pathways, etc.) likely to have primary keys in native DBs
 B. Users could create MD5 of primary keys, use these as RDF-IDs
  1. Subordinate utility class IDs to entity IDs
    • A) E.g. if primary ID is "protein1", a utility class ID could be "protein1.unificationXref.Swiss-Prot.P20425"
      1. Or the MD5 of that string
 B. Not sure if this would work in all cases...