1. Goal: refactor APWeb relatively quickly with the expressed intent that its content could be retrieved easily by both human and machine queries. For example, we may want to grab raw content as attributed annotations in iPG2P services.
2. Solution: there are many web service and semantic web service models. So instead of getting into all of that, here's a "quick and easy" 100% RESTful model:
2a. Refactor existing APWeb content so that each taxon's entry is a single web page. That basically follows the current inter-taxon organization.
2b. For intra-taxon organization (i.e., the mostly unstructured content on each page), standardize the web page outline into a template (i.e., standardized sections applicable to virtually all taxon entries).
2c. Associate a light-weight ontology w/ each section heading, sub-heading, etc. in the template. Call this the "Angiosperm Phylogeny Ontology" or something like that. (Note there is already and Ascomycete Phenotype Ontology APO, so you might want to call it something such that its acronym is mapped to something not yet used. See http://bioportal.bioontology.org and www.obofoundry.org for portals to the many popular ontologies).
2d. Fill in the template with content for each taxon
3. Here's the key to making it easily retrievable by both humans and machines:
3a. Associate a unique URL w/ each taxon page of the form:
For example, if the <APWebDomain> is www.apweb.org (used here as an example; that's not the current domain name), then the page for Acorales is http://www.apweb.org/Acorales. Return content in HTML.
3c. On dereferencing the template sections of 3b (e.g., http://www.apweb.org/Acorales/Name), simply return the content in plain text. No HTML. For images, return the binary and set the MIME-type accordingly.
3d. To annotate with semantic mark-up, associate a unique URL with the major ontological terms following the above naming conventions but with a different subdomain; e.g., http://ml.apweb.org/Acorales/Name. Here, the subdomain "ml" is short for "markup language". The 'ml' and 'www' subdomains should be organizationally parallel.
The use of a separate subdomain simplifies your web server handling while avoiding polluting URLs with technology suffices. (For example, using http://www.apweb.org/Acorales/Name.owl for .owl versions is perhaps attractive, but it breaks good practice because it binds an implied technology implementation with a URL).
3e. On dereferencing 3d, return semantically annotated entries in OWL serialized in RDF/XML. The content should be identical to the plain text content returned in 3c; the only difference is that the content from the ml subdomain is valid RDF and OWL using the APWeb ontology. For 3a, content is the concatenation of the various template sections; again, in RDF/XML OWL, not HTML.
3f. Put a nice home page on the package. Have the nav bar dereference the links above w/ HTTP GETs so that the users' browser displays RESTful URLs for each entry. Add a documentation page explaining all this so that machine calls-simple HTTP RESTful GETs-are known to developers.
4. The back-end to support this is straight-forward:
4a. Map the APWeb ontology into a standard RDBMS data schema.
4b. Have your web server map URLs into a servlet that makes database calls:
4b.i For high level taxon page calls (e.g., http://www.apweb.org/Acorales), generate the page content during DB maintenance runs and simply serve up static pages
4b.ii For individual template items (ontological terms), call into the DB w/ standard DB drivers at transaction time
5. Make the APWeb ontology publicly available in a form suitable for semantic web services; see me for various recs.
6. You're done; deploy and publish. You'll have complete flexibility to add more sophisticated (semantic) web services if/when appropriate.