Monday 10 December 2007

Object 'PID's and UUID, why not?

Handles, DOIs... schemes to provide unique, persistant identifiers. But what's the one flaw that unites all of these schemes?

They only work for as long as the people involved want them to.

If the money dries up behind the Handle resolver, what then? What happens to attempts to assign the same handle to different items? What about duplication?

So, step one is to acknowledge that there is no perfect way to uniquely identify something. Step two is making do with something that is less that perfect.

Which is where I started thinking about UUIDs. From the page:

A UUID is essentially a 16-byte (128-bit) number. The number of theoretically possible UUIDs is therefore 216*8 = 2128 = 25616 or about 3.4 × 1038. This means that 1 trillion UUIDs would have to be created every nanosecond for 10 billion years to exhaust the number of UUIDs.
So, it's fair to say that there are plenty of these ids to go around.

But if we randomly assign these ids to anything, what is the likelyhood of an id being assigned twice? I am lazy and loathe to do the calculations myself, but luckily I don't have to. The bottom line is that when 70,368,744,177,664 (2^46) ids have been randomly assigned, the chance of any of these ids being the same is 2.5 billion to one.

I like those odds.

In my Fedora-centric view, this means that objects and related URLs go from:

ora:909 -> http://ora.ouls.ox.ac.uk:8080/fedora/get/ora:909

to:

ora:0ddfa057-d673-4ed3-9186-e141c50bf58f -> http://ora.ouls.ox.ac.uk:8080/fedora/get/ora:0ddfa057-d673-4ed3-9186-e141c50bf58f

So, we now have something that is citable and unique all by itself. It needs no scheme or organising body or agency to remain unique.

It's not very human readable though, is it? But, in all seriousness, when was the last time you typed in an address by hand to go directly to a resource? Should I be embarrassed to admit that I find myself typing things as trivial as 'google maps' into my address bar on occasion, because I know that it sends it to google as a search and provides me with results I can click on?

For something that needs to be permanent, to be citable, and to be resolvable, I think UUIDs work as object ids. And as for the more human focused urls, urls that can be read in a mobile browser or in an email perhaps - What's wrong with the semi-permanent urls from services such as tinyurl.com?

3 comments:

David said...

this is good stuff Ben!

t00m said...

Thanks for this post.

But what if you want to share your objects in a network. Which is the best prefix id?

I mean, is it right to put a prefix like your email (or better, your foaf:mbox_sha1sum)?

Ben O'Steen said...

@toom, you might want to look at this in which I go into uuids and Fedora pids in more detail.