Linked Data
May 19, 2010
"Conjuring" as a Linked Data design pattern
Everywhere I look there are blobs laying around that nobody quite knows what to name except "database", "record", or 1, 2, 3, 4, 5... Oh sure, some of these blobs are XML and they're indexed. Some of them are even accessible via a Web API using hackable query string parameters. Still, though, nobody's quite sure what to name the blobs. If you find yourself in a similar situation, try this:
http://example.org/database/1 (303 redirect to...)
http://example.org/database/1/ (2XX returning the blob)
Got those URIs working? Now go into the code for the latter and add this:
if (request.getHeader("Accept") == "application/rdf+xml") {
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#">
<owl:Thing rdf:about="http://example.org/database/1" />
</rdf:RDF>
} else {
blob
}
That unnamable thing you've been calling "1" now has a useful globally-unique HTTP identifier that establishes its presence on the Semantic Web as Linked Data.
Now comes the conjuring trick I saw my colleague Andrew Houghton performing. Free your mind of the idea that this blob is one "thing" and write an elevator speech describing the use cases for the blobs. If your blob is XML, look at the element names to help you write the speech. Circle the important nouns in the speech and peer into the blob looking for such a thing. If you think you see one, append the noun to the latter URI with a "#" like so:
http://example.org/database/1/#noun1
http://example.org/database/1/#noun2
etc.
Now that these meaningful names have actionable HTTP identifiers, we need to announce their existence inside the <rdf:RDF> like so:
<owl:Thing rdf:about="http://example.org/database/1/#noun1" />
<owl:Thing rdf:about="http://example.org/database/1/#noun2" />
You've just conjured up meaningful things out of the blob and given them Linked Data names using semantically-rich HTTP URIs. Granted, some of the semantics implied in the URI aren't reflected in the RDF, but that is easily remedied. And if somebody says you are lying about the existence of noun7 for record 53, update your code and/or the blob to fix the bug (it's not a "lie" since it wasn't intentional). Apologize and explain to them that conjuring always has been and always will be an inexact science. ;-)
This "Conjuring" pattern is a trustworthy starting point for making the important things lurking inside your blobs discoverable and available for unexpected reuse.
March 20, 2010
"Information" is not very helpful
If you buy into the "information resource vs. non-information resource" conception of Linked Data, the answer to the question "is this identified thing an information resource?" is either "yes" (2XX), "maybe" (303 or hashURI) or "I don't know" (4XX).
The second answer changes if you buy into the "Web Document vs. Real World Object" conception of Linked Data: "is this identified thing a Web Document?" "yes" (2xx), "no" (303 or hashURI) or "I don't know" (4xx).
As a client looking at response codes, I can't really know which interpretation the Linked Data URI designer intended. Nevertheless, as a coiner of Linked Data URIs, we need to choose one or the other (and hopefully not waffle and use both.)
Which interpretation would you choose? Keep in mind that "document" can be modeled as an abstract concept and thus identified as a "Real World Object" using 303 links to a variety of "Web Document" representations.
March 19, 2010
Who invented "information"?
I've been ranting on the OpenURL listserv lately about the potential of Linked Data and occasionally making a fool of myself by referring to "information resource" when I really meant "non-information resource". What on earth was the person who originally coined the word "information" thinking? Didn't they know that Linked Data would come along someday and transform our understanding of reality based on the novelty of "non-information"? Couldn't they have invented a term signifying the negation of "information" instead so that people like me wouldn't get confused by people like me?
March 12, 2010
You can't argue with use cases...
You can't argue with use cases. That's a pity, so I started a list of things I like to argue with instead:
- the words (names) people use when talking about use cases
- how those named things need to be related to one another in order to satisfy a set of use cases
- how existing names and relationships can best be adapted to accommodate new use cases
- the value of extending, constraining, instantiating existing abstractions to maximize unexpected reuse
- whether or not things should be named and behave on a wire
Hmm. What am I forgetting?
March 5, 2010
Hype Cycles, Efficient Frontiers, and Linked Data
In the technology domain, "things" follow a Hype cycle. I don't fully understand why, but something tells me this is unavoidable.
In the investing domain, the assumption is that every investment portfolio has a risk-return profile that can be optimized in line with an "efficient frontier". Again, I don't fully understand why, but something tells me this makes sense in spite of (or more likely especially in) these tough economic times.
Regardless of the times, it seems to me that these principles are somehow related. There must be an efficient frontier for any given hype cycle. It's not clear how anyone can measure this, but without having a handle on it we are making wild guesses about resource allocations.
OTOH, I don't want anyone to get the impression that I sit around thinking about the connection between hype and efficiency all day long. It's just an idea that popped into my head. What I sit around all day thinking about is ways of producing and consuming Linked Data efficiently. This is the hype cycle that I hope without hope is being managed efficiently.
December 2, 2009
Linked Data and Cool URI Patterns
MVC scaffolding frameworks like Grails and Ruby on Rails prove that the identity and behavior of Web resources can easily be generalized when they are based on a domain model. What Grails and presumably most other frameworks fail to account for is the fact that things named in a domain model identify real world objects. It is time to correct this oversight.
Recall that the primary things named in a domain model fall into a handful of categories: class, instance, attribute, relationship, operation, and the model itself. The Grails scaffold automatically provides HTTP URIs for Web document representations for some of these things. Here are examples using the default Grails URI mapping:
- Model Web Document
- http://example.org/
- Class Web Document
- http://example.org/{className}/{operationName}
- Instance Web Document
- http://example.org/{className}/{operationName}/{instanceName}
[Beware that Grails names these path segment tokens based on analogous MVC concepts:
{className}={controller}
{instanceName}={id}
{operationName}={action}
Also beware that the default Grails URI patterns are deficient in other ways, but it is difficult to change them. As a result, the URI patterns below are reluctantly forced into the default mold.]
The first enhancement for Linked Data compliance is real world object identifiers support for everything in the model. For some domain model categories 303 (See Other) redirect behavior is appropriate:
- Real World Model
- http://example.org/{modelName}/rwo
- Real World Class
- http://example.org/{className}/rwo
- Real World Instance
- http://example.org/{className}/rwo/{instanceName}
These can be implemented by creating a special controller for the {modelName} and adding a new content-negotiable "rwo" action to it and the default scaffold controller. Real world object URIs for attributes and relationships can then piggy-back on Real World Class as hash URIs:
- Real World Attribute
- http://example.org/{className}/rwo#{attributeName}
- Real World Relationship
- http://example.org/{className}/rwo#{relationshipName}
Now that real world object identifiers are defined for everything in the domain model, the only thing lacking is an RDF representation alongside the scaffold's HTML representation. This will be examined in a subsequent post.
December 2, 2009
Domain Modeling and Linked Data
As a principle of object-oriented design, most things worth naming in a domain fall into one of several categories: class, instance, data type, attribute, relationship, or operation. Here is a Grails (domain) class example illustrating the patterns (excluding instance):
class Person {
String name
Organization employer
String toString() {
"${name}"
}
}
Whether we realize it or not, the names assigned in this way form an ontology that uniquely identify everything in the domain. Also note that MVC frameworks like Grails automatically inject all domain classes with machine-level create, read, update, and delete (CRUD) operations. This accounts for the naming and persistence of instances as well.
Now that we know how to systematically name everything of interest in a domain, we need to realize that every one of those things identifies a real world object (RWO). Conversely, every RWO of interest in a domain can and should be named according to object-oriented principles.
The next trick is to take this machine-local ontology and project it onto the Web as Linked Data. Grails scaffolds provide a glimpse of how this can be automated by creating a parallel controller like so:
class PersonController {
def scaffold = true
}
This effectively forces Grails to inject the domain class with globally-unique HTTP URI identifiers and CRUD behaviors. Unfortunately, the default Grails scaffold only provides HTTP URIs for Web documents that represent the real world objects. In contrast, Linked Data requires separate HTTP URIs for the RWOs themselves. The other problem is that it does not automatically provide an RDF representation.
Both problems can be solved by customizing the scaffold Controller with new actions to support content-negotiable RWO URIs and scaffold views that produce RDF. Details will be examined in subsequent posts.
Jeff
June 5, 2009
Revisiting the Union of httpRange-14 and Linked Data
PeteJ provided some really interesting feedback to my class diagram reconciling Linked Data and httpRange-14. To start, Pete says:
It certainly isn't true that a 303 redirect allows me to conclude that the identified resource is a "Real World Object".
I agree that HTTP 303 See Other does not allow clients to "conclude" that the URI identifies a RWO. This requires evidence of the server's intent which is currently expensive to obtain. One of the goals of Linked Data Architecture (LDA) is to make this intent more transparent and discoverable, but that is a different story.
What 303 does allow, though, is for clients to "assume" that the URI identifies a RWO. The Web Document at the other end of the 303 must contain information of substance about the (assumed) RWO in at least a tangential way or else the server wouldn't have bothered to redirect the client there. Granted, this is a weak form of "aboutness" but it is "aboutness" nonetheless and can be strengthened by the server over time as appreciation grows of the significance of Real World Object URIs.
The same principles apply to the use of hash URIs to identify Real World Objects.
Moving on to some of Pete's other points, he gives an example of a ringbinder on this desk. This Real World Object is easily transformed into a Web resource by assigning it an HTTP URI that returns a 303 redirect to another URI that identifies the corresponding "information resource". These are two different resources (related by "aboutness") and in his example it would be semantically sensible for the latter (information) URI to return 404 Not Found without compromising the Web identity of the RWO. If and when information about the RWO becomes available (e.g. an OCR PDF or MARCXML representation), the information resource can be updated and the identity of the RWO URI will be strengthened.
On the topic of a Web document that describes another Web document, I think this is an interesting use case that deserves a blog entry of its own.
On the topic of a URI identifying an HTML document that returns "gobbledygook", we need to accept httpRange-14's decision that that every Web document is an information resource and understand that information is invariably about something. The fact that the information resource is non-nonsensical doesn't justify ignoring httpRange-14 and say it isn't really an information resource after all, nor does the fact that the Real World Object it presumably describes is currently anonymous and nonsensical. Anonymous and nonsensical Real World Objects exist in abundance and we could start to make headway on some of them if we went to the trouble of assigning them names (HTTP URIs) and returning whatever information we have regardless of its apparent incoherence.
On another topic, I agree that a Web document may describe several RWOs, but nevertheless the Web document as a whole describes a RWO in its own right, namely the aggregation of those RWOs. The fact that nobody bothered to assign an HTTP URI to this aggregation (presumably with 303 behavior in this case) doesn't mean the RWO doesn't exist.
As subtle and philosophical as these issues may sound to others, I am pretty sure Pete and I would agree that discussing them and hammering out a common understanding is vital to leveraging the Web with maximum efficiency.
Jeff
June 5, 2009
AtomPub, AtomPub, AtomPub
In the beginning, HTTP defined a create, read, update, and delete (CRUD) model for managing and using Web resources. The fact that these operations map to HTTP methods named POST, GET, PUT, and DELETE obfuscates things a little, but not significantly. And although it may not be obvious, anything can be identified as a resource on the Web. More will be said on the significance of this below.
Since a CRUD model is integral to the HTTP specification, the need for AtomPub may seem counterintuitive. In fact, the AtomPub CRUD model does not replace the HTTP CRUD model, it just explains how to apply it to Web resources that exist in the context of "collections". This need originated in the blogosphere where developers wanted to manage Web documents known as "blog entries" in conformance with the HTTP CRUD model. Fortunately, the AtomPub specification did not couple tightly with this type of Web document and developers are slowly realizing it can be used to manage any collection of Web documents.
[As an aside, a create, update, and delete model also lies at the heart of SRU Update. The key difference is that AtomPub chose to build on Web standards whereas SRU Update chose to build on SOAP standards. If you are wondering if SRU Update is for you, consider the fact that the SRU Update CrUD model will end up getting tunneled over HTTP despite the competing models. Why add the overhead?]
Getting back to an earlier point, people are realizing that AtomPub is effective at managing collections of Web documents other than blog entries. What may not be obvious is that there are other types of Web resources besides the familiar Web documents we all know and love. Specifically, Linked Data tells us that Real World Objects can also be identified as Web resources. Just as AtomPub accommodated the mental migration from blog entries to Web documents, it is equally effective for managing collections of Real World Objects. If it's not clear how this can be done, start with this earlier blog entry and watch this blog space for followups.
May 15, 2009
How Do Domain Modeling and Atom(Pub) Fit Together?
In various forums, I have claimed the benefits of using domain modeling (DM) to design RESTful Web resources, but explaining why is surprisingly difficult. Efforts so far have been geared towards the domain modeling crowd, but there are so many bells, whistles, and misunderstandings of DM that the simple beauty is easily missed. An appreciation of the Model-View-Controller (MVC) pattern helps immensely, but if you're like me a clear understanding of MVC is hard to come by. Last night I was thinking that an RDF route might be a more efficient. Today an AtomPub perspective seems more likely. For people who are unfamiliar with all of the above, other routes need to be mapped out. I would claim, though, that the situation is analogous to the "Ah ha!" transition from the service-oriented to resource-oriented perspective. Atom(Pub) is probably the easiest route to this realization, so the shortest path to appreciating DM probably lies in this same direction.
The first thing to note is that a collection in Atom(Pub) maps directly to a "class" in DM. There is no need to subclass in DM to accomplish this relationship. Likewise, a member in Atom maps directly to an "instance" in DM. Again, no subclassing is necessary.
The discussion becomes easier we set aside the create, update, and delete capabilities of AtomPub for the moment. We can do that because AtomPub mostly just tells us is how the URIs we need to support the Atom Syndication Format should react if we throw POST, PUT, and DELETE at them instead of GET.
Given this simplification, an Atom Feed is merely a representation of a class (collection) and an Atom Entry is merely a representation of an instance (member). It's just another representation that could/should be negotiable from a Generic Document that it shares with the text/html, text/xml, application/json, or any other representation that the server thinks is worth supporting. From the (MVC) controller POV, Atom is just another type of view (representation format) that it needs to produce from the model. Creating a class in DM that subclasses Atom would be equivalent to subclassing text/html or application/json. It's not necessary.
May 15, 2009
Linked Data and httpRange-14 Concepts and Relationships
There is some understandable confusion about the relationship between concepts in Linked Data and httpRange-14. This class diagram may help:

Note that unlike httpRange-14, a Linked Data interpretation requires additional, but relatively lightweight, assumptions on the Architecture of the World Wide Web. Here are at least some of them:
- Clients can unambiguously classify a resource as a Real World Object (RWO) either by the presence of a hash in the HTTP URI (deduced prior to the HTTP request) or an HTTP 303 See Other status in the response (deduced after the HTTP request). In comparison, an HTTP 200 OK can be used to unambiguously classify the resource as a Web Document/Information Resource.
- This implies that Real World Object vs. Web/Document/Information Resource are disjoint.
- Every Real World Object needs to have at least one associated Web Document/Information Resource that contains information about it.
- Every Web Document describes one RWO. The fact that some of these RWOs are mash-ups doesn't change this fact. Traditional understandings of the Web commonly fail to identify these Real World Objects with HTTP URIs.
- The Web Document returned by dereferencing a RWO URI is expected to contain information about the RWO. Linked Data/httpRange-14 do not split hairs between data and metadata. It is all just information and information is invariably about something in the real world.
One of the beauties of embracing the Linked Data assumptions as a matter of policy is that they benefit others who may not even be aware of their significance.
Jeff
April 28, 2009
Using AtomPub to Discover Linked Data
In our blog entry introducing the Linked Data Architecture (LDA), we suggested the complementary nature of Linked Data, AtomPub, and domain modeling. The jargon used in these different systems is uncoordinated but the concepts are easily mapped. For example, a "class" in domain modeling maps to a "collection" in AtomPub and an "instance" maps to a "member".
The harmonization of AtomPub and domain modeling provides a powerful model for discovering high-resolution resources that are suitable for "unexpected reuse". Here is a feed of the "person" class/collection with a single "instance/member" named "alice".
<feed xmlns="http://www.w3.org/2005/Atom">
<id>urn:uuid:D9A1E60A-5F8A-4361-8CC6-E28EFD66B3AE<id>
<title>Person Instances</title>
<author><name /></author>
<updated>2009-04-28T00:00:00.0Z</updated>
<link rel="self" type="application/atom+xml"
href="http://example.org/view/person/instances.atom" />
<link rel="related" title="Real World Class"
href="http://example.org/person" />
<entry>
<id>urn:uuid:FC3921ED-BD3F-4479-AE48-73AAA38EE19E</id>
<title>Alice</title>
<author><name /></author>
<published>2009-04-28T00:00:00.0Z</published>
<updated>2009-04-28T00:00:00.0Z</updated>
<link rel="edit"
type="application/atom+xml;type=entry"
href="http://example.org/person/alice/entry.atom" />
<link rel="edit-media" type="application/rdf+xml"
href="http://example.org/person/alice/entry.datum" />
<link rel="alternate" type="application/xhtml+xml"
href="http://example.org/person/alice/default.html" />
<link rel="alternate" type="text/n3"
href="http://example.org/person/alice/default.n3" />
<link rel="alternate" type="text/turtle"
href="http://example.org/person/alice/default.ttl" />
<link rel="related" title="Real World Instance"
href="http://example.org/person/alice" />
<content type="application/rdf+xml"
src="http://example.org/person/alice/default.rdf" />
<summary type="text" />
</entry>
</feed>
Since the concept of domain model "class" maps to Atom "collection", we should expect them to appear as such in an Atom Service Document.Here is an example:
<service xmlns="http://www.w3.org/2007/app"
xmlns:atom="http://www.w3.org/2005/Atom">
<workspace>
<atom:title>My Domain Model</atom:title>
<collection href="http://example.org/person">
<atom:title>Person</atom:title>
<accept>application/rdf+xml</accept>
</collection>
</workspace>
</service>
Note that in order to accommodate Linked Data, the collection/@href refers to a Real World Class URI. This URI should do an HTTP 303 See Other redirect to a Class Generic Document from which the Atom feed Class Web Document shown above can be negotiated.
Ideally, the server root will act as a Model Generic Document for the model from which the Atom Service Model Web Document can be negotiated. The LDA resource categories for "model" are shown here:
| Domain Modeling Concept | LDA Category Name |
| Model | Real World Model |
| Model Generic Document |
| Model Web Document |
The HTTP behaviors for these LDA categories can be found here.
Jeff Young
Andrew Houghton
April 20, 2009
The Union of Domain Modeling, Linked Data, and AtomPub
[2009-04-28 - Replaced the "model" path segment with "view" to reflect the semantics better.]
[2009-04-20 - Added some extra examples for Class Web Document and Instance Web Document to provide additional clues.]
The union of Domain Modeling, Linked Data, and AtomPub results in an complementary set of use cases. To summarize, a domain model is a conceptual model of a system that can be described by visual representations called "class diagrams". Meanwhile, Linked Data establishes the Web identity of Real World Objects as distinct from traditional Web Documents that say something about the Real World Object. And finally, AtomPub establishes HTTP/1.1-compliant create, read, update, and delete (CRUD) operations for Web resources in general.
There are important synergies between these use cases. In essence, domain models identify resources according to semantically-rich categories like "model", "class", "attribute", "datatype", "instance" and "operation". The resources that are identified by the domain model can then be mapped mechanically to globally-unique HTTP URIs and those resources can then be managed with HTTP/AtomPub CRUD operations and discovered by an Atom Service Document and Atom Feeds. My colleague Andrew Houghton and I have identified a limited set of resource categories that are capable of supporting all of the Web application use cases that we can imagine. These use cases include Linked Data and Semantic Web support. We are calling this union of Domain Modeling, Linked Data, and AtomPub, the "Linked Data Architecture" (LDA).
Of the resource categories identified, six are fundamental and their behaviors are self-contained:
| Domain Modeling Concept | LDA Category Name |
| Class | Real World Class |
| Class Generic Document |
| Class Web Document |
| Instance | Real World Instance |
| Instance Generic Document |
| Instance Web Document |
The first set of resource categories map directly to classes in a domain model, while the second set of resource categories map to instances of those classes in the domain model. Each resource category has a distinctive set of HTTP/AtomPub behaviors as expressed in the LDA Fundamental Behaviors table. These resource categories can be mapped onto consistent URI patterns. The following URI patterns are descriptive and designed for maximum "hackability", but are not prescriptive. Other URI patterns could be created, but the resource category behaviors associated with them remain unaffected:
| Domain Modeling Concept | LDA Category Name | Example URI |
| Class | Real World Class | http://example.org/person |
| Class Generic Document | http://example.org/person/ |
| Class Web Document | http://example.org/view/person/about.html http://example.org/view/person/schema.xsd http://example.org/view/person/sru http://example.org/view/person/feed.atom etc. |
| Instance | Real World Instance | http://example.org/person/alice |
| Instance Generic Document | http://example.org/person/alice/ |
| Instance Web Document | http://example.org/person/alice/about.html http://example.org/person/alice/home.html http://example.org/person/alice/photo.jpg http://example.org/person/alice/about.json http://example.org/person/alice/foaf.rdf http://example.org/person/alice/entry.atom etc. |
If you imagine truncating the example URIs above you can start to imagine the use cases for some of the other categories which we will discuss in a later article. The information presented here is still in a raw form, but several people have expressed an interest. We are confident enough of the efficacy and consistency of these six categories to start making LDA available.
Jeff Young
Andrew Houghton
April 17, 2009
Hash URIs: The Other Real World Object Identifier
A few days ago, I blogged about a basic example of Linked Data that avoided references to RDF and the Semantic Web. The next day, I blogged about how RDF fit in with Linked Data. In the latter, I said:
"The Hash URI solution is difficult to explain and limited in functionality. Don't worry about choosing this option unless you're a geek."
I stand by that, but there are a godawful number of smart geeks in the world and some of them may be wondering about the fuss. Furthermore, there are legitimate use cases for Hash URIs, particularly in the context of smallish ontologies relative to, say, Dewey. Naturally, the Semantic Web community cares deeply about these use cases.
To recap, Real World Object URIs based on the 303 See Other should return information about the Real World Object and only that object (give or take a domain banner and some Google Ads). If it returns information about some other Real World Object, something is wrong and you should complain to the owner of the Real World Object URI.
In contrast, the Hash URI solution allows a finite many Real World Object URIs to share a common Web Document. The Semantic Web community loves this solution because they don't have to perform an HTTP GET request for every single Real World Object the way they would with the 303 solution (actually two requests/per given the redirect).
I suspect that the Semantic Web community will want to split the Web Document hair so they can talk about "object fragments" or some such, but unless you're a Semantic Web geek, don't worry about this. From the HTTP perspective, it's just a Web Document.
April 15, 2009
How Does RDF Fit In with Linked Data?
In my last blog entry "A Basic Linked Data Example", I explained Linked Data without appealing to RDF or the Semantic Web. As Ed Summers points out in his comment, though,
"it's important not to dismiss RDF as some semantic web pipe dream"
I absolutely agree. Here is my attempt at explaining the connection between the two.
Cool URIs for the Semantic Web suggests two ways to identify a Real World Object (e.g. me).
- Hash URIs
- 303 URIs
The Hash URI solution is difficult to explain and limited in functionality. Don't worry about choosing this option unless you're a geek.
The 303 URI solution is much easier to explain as illustrated in my last blog entry. To recap, it is just an HTTP redirect from the URI for a Real World Object to a URI containing information about that Real World Object. This information is transmitted back to the client as a Web Document. All the information that is known about the Real World Object (which I will collectively call "the datum") can be represented in a variety of ways such as HTML, XML, JPEG, and/or... wait for it... some sort of RDF document! Each individual representation can include as much or as little of the datum as needed to satisfy the client's use case.
The fact that these different representations contain different subsets and formats of the datum is irrelevant. It's all just information about the Real World Object expressed in Web Document form. Each of these Web Document representations, including RDF, has a unique URI somewhere (indicated in the HTTP Content-Location header), but the Generic Document makes it possible to obtain (aka negotiate) any of these directly from a common URI. If an RDF-aware agents wants an RDF representation, make them negotiate for it from the Generic Document URI just like browsers negotiate for HTML. If they want to do SPARQL queries on the RDF graph for the entire universe, let them do it on the client-side for now. :-)
But don't let the Semantic Web community brainwash you into believing you MUST provide an RDF representation before you're allowed to call your URIs "Linked Data". Creating a Real World Object URI that does a 303 redirects to a Generic Document URI where clients can negotiate for HTML and XML, for example, is good enough to call it Linked Data. Even if your Generic Document URIs don't support the delivery of RDF representations directly, the Real World Object URIs you are providing allows your resources to be used "sensibly" on the Semantic Web.
Jeff
April 14, 2009
A Basic Linked Data Example
Most discussion of Linked Data happen in the context of the Semantic Web which places it beyond the grasp of most mere mortals (e.g. me). Here is a example that doesn't require omniscience.
Imagine that I create an HTTP URI for me: not my home page or my online photo, but me. In Linked Data this is called a Real World Object identifier. To illustrate, I will use http://purl.org/jyoung. [Strictly speaking I am telling a little white lie, but as soon as the new PURL server is installed I can fix this dishonesty.]
If you click on my Real World Object URI your web browser will "redirect" you to a different URI that contains information about me. This other URI is is called a Web Document. If you actually do this, you will observe this URI in the address bar http://www.oclc.org/research/staff/young.htm.
There is a vital difference between these two URIs. One identifies me as a living breathing (so far) person in the "real world". The other identifies a document containing information about me. The fact that there are many Web Documents that contain information about me is a different issue. The fact that I may have many Real World Object identifiers is yet another separate issue. Don't let these details confuse you. I am not a Web Document and vice versa. No RDF mumbo-jumbo is needed to understand this or make it true.
[Now for the little white lie. The Real World Object identifier I used (http://purl.org/jyoung) returns an HTTP 302 Found redirect to the other URI. What it needs to return instead is an HTTP 303 See Other. The reason has to with the strict interpretation of these codes. The old PURL server does not support HTTP 303 but the new one will.]
April 14, 2009
"Tweets" and Thoughts
I just started using the Twitter service, and I am impressed. Each "tweet" of 140 characters or less answers the simple question "What are you doing?" If you set aside mundane answers like "going to the store for a loaf of bread", there is something more interesting going on. More often than not (?), what we are doing is thinking. According to this interpretation, a tweet is a concise representation of an HTTP URI-identified thought. A blog entry serves this same thought identification function but typically delivers a more verbose representation of that thought.
Getting back to my normal mode of incomprehensibility, it should be possible to identify a single thought with an HTTP URI and allow clients to content-negotiate for a tweet representation or a blog entry representation, depending on their level of interest in that thought. These two types of representations need to be better integrated into blog software and Twitter clients.
Jeff
April 13, 2009
Does Everything = Data?
kidehen posted a tweet related to #linkeddata that I want to examine, but I can't do it in the 140 character Twitter limit. He says:
"Ground zero for me: data is everything, and everything is data. To make data useful each datum needs to have Identity."
I disagree with the first sentence because boiling everything down to data seems to conflate two categorically different kinds of things: Real World Objects and Web Documents.
I basically agree with the second sentence. The reason is that we are looking at two sides of the same coin: the thing itself (the RWO), and information about the thing (the datum). One side can't sensibly exist without the other. This datum can be represented in a variety of ways, which is the tie-in to Web Documents and the "old web". With this in mind, I might rephrase the 2nd sentence like so:
"To make a Real World Object's identity reusable it, and thus its datum, needs to have an actionable identifier on the Web."
April 12, 2009
Amendment: Strengthening Identity on the Web
[Changed 2008-04-13 to use 302 Found instead of 307 Temporary Redirect.]
A couple of days ago, I blogged about 3 ways to strengthen identity on the Web. The 3rd way is unconventional and requires an example.
The general idea is that a client should be able to navigate/negotiate to representations of a resource that are hosted under a different domain. Here's an example in the context of Linked Data:
In principle, then, any Web Document on the distributed Web that says something *about* alice as a Real World Object could be reached from a single URI and using conventional HTTP behaviors to automate the process.
April 10, 2009
3 Ways to Strengthen Identity on the Web
If you have an HTTP URI for something in the Real World, there are three ways to strengthen its identity on the Web via its content-negotiable Generic Document URI:
- Improve the quality of the content in any given negotiable representation (which is the conventional solution)
- Add variant negotiable representations that are hosted locally (which can be more effective than most developers realize)
- Add variant negotiable representations that are hosted remotely (which seems to be a novel idea that deserves more consideration)
HTTP/1.1 supports all of these options. The 3rd option assumes a level of trust and cooperation with the remote host, but this is reasonable in many situations.
April 2, 2009
Linked Data and Search Engines
Within the last 7 weeks, a number of major search engines have announced their support for canonical URIs. This ties in very nicely with the growing interest in Linked Data. Specifically, a Real World Object URI producing a 303 redirect to a Generic Document is ideal for use as a canonical URI. Here is an example of the pattern:
| http://example.org/Person/alice | Real World Object (the "canonical URI") |
| http://example.org/Person/alice/ | Negotiable Generic Document |
| http://example.org/Person/alice/about.html | Web Document (HTML) |
| http://example.org/Person/alice/foaf.rdf | Web Document (foaf:Person) |
In the <head> section of the about.html document you can then add this:
<link rel="canonical" href="http://example.org/Person/alice" />
This same canonical URI would be the one used as the value of the rdf:about in the foaf.rdf representation:
Even if you don't understand or care about RDF, this pattern makes sense, and can accommodate the addition of an RDF representation later when the need becomes clearer.
March 29, 2009
What are the Authoritative URIs for Linked Data Concepts?
There are three vital concepts needed to understand Linked Data:
- Real World Object
- Generic Document
- Web Document
Ironically, these three concepts seem to lack authoritative Real World Object URIs. Let's take the concept of Real World Object itself as an example. There are a number of Web Documents that contain information about Real World Objects, but the only plausible URI I could find that might identify the concept of a Real World Object is http://www.w3.org/TR/cooluris/#semweb. Presumably this is not the URI the W3C would like us to use, though, since an RDF representation is not negotiable from this location.
Without Real World Object URIs for these concepts, the semantics of the things we want to assert will be messed up.
March 28, 2009
Sensible Linked Data URI Patterns
[Before we get into the meat of this blog entry, I want to assert that the relationship between Linked Data and domain modeling patterns are more important than URI patterns. The explanation for this jumble of words will need to wait for another day, though.]
For now, let's imagine that a single easy set of URI patterns for Linked Data can be prescribed for arbitrary use cases. (Hash URIs for Real World Objects are considerably less functional, so they are ignored in this analysis.) These are the most important patterns:
http://{domainName}/{className}/{instanceName} (Real World Object identifier returning a 303 redirect to...)
http://(domainName}/{className}/{instanceName}/ (Generic Document identifier that negotiates a representation identified by...)
http://{domainName}/{className}/{instanceName}/{operationName} (Web Document that contains information about the Real World Object.)
Here are some examples that we can compare with alternatives:
domainName = www.example.com
className = Person
instanceName = alice
operationName = about.html and foaf.rdf
http://www.example.com/Person/alice (Real World Object)
http://www.example.com/Person/alice/ (Generic Document)
http://www.example.com/Person/alice/about.html (Web Document)
http://www.example.com/Person/alice/foaf.rdf (Web Document)
Compare these for sensibility, hackability and extensibility to the equivalent examples used in the Cool URIs for the Semantic Web document:
http://www.example.com/id/alice (Real World Object)
http://www.example.com/doc/alice (Generic Document)
http://www.example.com/doc/alice.html (Web Document)
http://www.example.com/doc/alice.rdf (Web Document)
The use cases for the truncated forms of these alternative patterns don't align, but if we examined them closely I think most people would agree that the former pattern has superior discoverablility capabilities. These URI patterns aren't adequate for use cases like searching the Person class or discovering the set of available classes, but they cover the basics.
There is a connection with domain modeling here, but that explanation will have to wait for another day also.
March 26, 2009
What Does Your OpenID Really Identify?
An OpenID is a secured URL that can be used for authentication and access control. I created a couple of OpenIDs today using popular providers and was surprised to realize that they don't identify me as a Real World Object. Instead, they identify a Web Document that may provide information about me. If the difference isn't clear, see my Linked Data: A Love Story blog entry.
If anyone knows of an OpenID provider that can establish my identity as a Real World Object according to Linked Data principles, I would be interested to hear about it.
March 25, 2009
Linked Data: A Love Story
Linked Data
Do you love me?
The Prince
Yes, I love your home page Web Document.
Linked Data
Ok, but do you love me?
The Prince
Yes, I love your Generic Document and all the information about you.
Linked Data
Ok, but do you love me?
The Prince
Yes, I love your Real World Object.
And so the Prince and Linked Data lived happily ever after.
The end.
March 25, 2009
Amplifying Linked Data
My colleague Andrew Houghton and I have been thinking a lot about Linked Data lately and we have observed a few patterns that deserve broader consideration:
- The key concepts defined by Linked Data are Real World Objects, Generic Documents, and Web Documents. The Linked Data literature seems to give Real World Objects and Web Documents equal footing, but the role of Generic Document is currently understated because all resources in the universe effectively fall into one of these three categories.
- The relationship between Linked Data and Domain Modeling deserves more attention. You can start to see glimpses of this intersection in FOAF, but in fact all Web applications can and should be domain modeled and those domains/models/classes/instances/operations reused to minimize redundancies and maximize efficiencies in systems.
- The interrelationships between Linked Data, domain models and AtomPub deserve more attention. If we consider these aspects in terms of the Model-View-Controller or Model-View-Presenter patterns, we should find that it is possible to construct a general-purpose controller capable of supporting any domain model and any set of use cases.
This last point brings us back to something Andy and I blogged about 9 months ago:
- Everything is a resource
- All operations are CRUD
The significance of these statements became clearer in the context of Linked Data, but this brief blog entry admittedly leaves too much unexplained and unjustified. We will continue to work on that, but hopefully these ideas will be thought-provoking in the mean time.
|