OpenID Vendor Lock-In (sort of)

Continuing my previous post about OpenID and Vendor Lock-In, a reader of this blog named Andrew commented on the previous post about a problem he had with MyOpenID.com and Zooomr. He has some valid points here which I wanted to highlight in this post (he also had some points that I think can be easily fixed or that are actually a non issue). You can also read my complete answer to Andrew here.

Prior to discovering the whole idea and notion of OpenID Andrew registered to Zooomr. Zooomr’s accounts are actually OpenID accounts which they provide, so every Zooomr user also gets an OpenID account that he can use on other OpenID supported sites.

Zooomr delegates the management of the OpenID to MyOpenID.com through their affiliates program. (UPDATE: Apparently, this is not true. For some reason I thought it was the case, but it is not)

After Andrew got to know OpenID he wanted to truly own his identity by using his own domain (either use delegation or run his own server, whatever he chooses), but now he could not use his identity in Zooomr since Zooomr doesn’t have the notion of supporting multiple OpenID identities tied to the same Zooomr account.

In fact, he is tied to his OpenID identity in Zooomr for using Zooomr and since he got a Pro4Life account this identity will never die.

In my previous post I’ve suggested a couple of ideas to avoid OpenID vendor lock-in. I now want to add an additional point:

  • Sites should support the ability to associate multiple OpenID identities so that a user can add, remove and switch the identity used to access a certain account in a certain site.

Jyte and claimID, for example, support the ability to add multiple OpenID identities and associate them with a single account of each site respectively. You can then login to these sites with each and every one of the OpenID identities you have associated with your account.

Could not run/locate “i386-pc-linux-gnu-gcc”

I have Gentoo Linux on my home machine and after I’ve upgraded GCC (and subsequently the whole toolchain) I wanted to compile a perl related library – crypt-rsa.

When I tried to emerge it, it failed with the following error:

Could not run/locate “i386-pc-linux-gnu-gcc”

After searching around I found this thread on the Gentoo forums which had some instructions how to handle this issue, but it didn’t help much.

In one of the posts on that thread they said to re-emerge the offending package (if you find it). I figured, since I’m trying to compile something related to Perl, perhaps Perl is the problem.

I re-emerged it and, surprise surprise, it worked so I thought I’d share it with the world.

OpenID, Trust, Vendor Locking and Delegation

There is a lot going on about OpenID these days and a lot of claims are being raised which prevents greater adoption of OpenID by users.

One of these claims is about Trust and Vendor Locking. How can I trust a certain OpenID vendor? after all, gaining access to my OpenID account will give access to all of the sites I’ve signed in/up using OpenID.

This is a legitimate claim, since it reminds everyone of how Microsoft Passport.NET Live ID is not that successful being a one vendor, non transferable identity.

One of the key elements of OpenID is that it’s decentralized and there is no one body that controls it but if a user signed up to a certain OpenID vendor they are essentially locked into that vendor unless they have the proper skills or items that allows them to perform delegation.

Having delegation is exactly the thing to make all of these claims go away since delegation give the power back to the user. The underlying OpenID vendor will supply the service but everything MUST go through the user’s domain to get to the vendor, thus allow the user to change vendors without being locked in.

The problem with delegation, however, is that it requires a certain amount of preparation. You either need to have your own site/blog and add the necessary <head> tags or you need to use a service like FreeYourID.com (I’ve previously written about it here) which gives you a URL composed out of your name (using the .name domain).

The problem with the solution of FreeYourID.com is that its only one .name vendor that provides this service. Although they are responsible for the whole .name TLD it is still a sort of vendor locking. If all .name providers will support such a service, things will look much better.

To sum things up, a possible answer for the claims about OpenID, Trust and Vendor Locking is to simply highlight the benefits of delegation and provide all of the necessary technical means needed to make this as easy as possible.

Below is a list of a couple of ideas I thought about (some are more of a wishful thinking since it doesn’t depend on the OpenID community alone) which might make things easier for everyone:

  • Support for OpenID for .name domains available with all the .name providers
  • Built-in support for Delegation in blogging platforms including hosted ones such as WordPress.com, Blogger, TypePad and the rest (for WordPress blogs that you are on your own server/domain you can use my OpenID Delegation plugin :-) )
  • Support for migrating existing accounts in existing sites to an OpenID account, thus allowing users to consolidate their various accounts on various sites into an OpenID account.
  • Support for migration of accounts between OpenID vendors including support in the OpenID spec to figure out a permanent redirection and perform a necessary fix up (similar to a permanent redirection performed in HTTP).

Technology is suppose to make things easier for everyone and lower the barrier of participation so that everyone, regardless of their skills, can use technology for their benefit. Let’s lower the participation barrier for OpenID and let everyone claim their own identity.

Online Life Feed

After reading Grant Robertson’s post – “Taming your own river of news” I’ve decided to use Yahoo Pipes to create my online life feed (it sounds better than “Eran’s river of news”, don’t you think?)

You can check it out here.

Basically I aggregate the feeds from this blog, my Advanced .NET debugging blog, my Yedda questions, my Yedda answers, my del.icio.us links and my Flickr photostream.

These feeds are most of the content I’m generating or contributing to (at least the ones with a feed in it). If I’ll remember some other feeds that I’m contributing to and forgot to add, I’ll update the pipe.

I’m quite sure that the rest of the features Grant wanted, like being able to group it by Year/date, source and topic are probalby best kept for the various RSS readers (mostly the desktop ones).

Go on and create your own online life feed and share it with everyone! :-)

FreeYourID.com

I’m probably the last person to talk about this but Scott Kveton posted on his blog that his company, JanRain and GNR (who manages the .name top level domain) has come into partnership to deliver a solution that encompasses a .name URL for you as well as built-in OpenID delegation support.

Check the details at the FreeYourID.com site.

You’ll get a 90 days free trial, after which it will cost $10.95/year.

You’ll get a forwarding email address in the form of yourFirstName@youLastName.name (if its available) as well as a site in the form of www.yourFirstName.yourLastName.name. You can forward that site to whatever page you wish.

The best part is that you automagically get to use this URL (which is rather easy to remember. Duh!) as your OpenID URL in any OpenID enabled site.

The OpenID provider for this service is, of course, JanRain’s own MyOpenID.

I don’t know how much similar services for .name domain (minus the OpenID support, of course) cost per year, but I think this is one of the cheap ones.

The only thing I can add to the discussion in the comments section on Scott’s post, is that if GNR will enable other people using a .name solution to migrate to this new service, that would really make things going. Oh, and they should probably also offer an Email box (which might make this solution cost a bit more, but I think its worth it) because the few people that I know of have a real Email box attached to ther .name solution.

I don’t think that I’ll need a .name solution since I own sandler.co.il which is more than fine by me, but this is great for anyone who doesn’t want to mess too much with settings up domains, sites and the rest.

Yahoo Pipes, Microformats and Extendability

I think Yahoo Pipes is really cool. The main attraction is its slick user interface and ease of use.

I just created a pipe of all of the Recent Questions of Yedda translated using Babelfish to French and it took less than 5 minutes.

I do have a couple of ideas that I think will make Yahoo Pipes into something very interesting:

  • Accept Regular HTML pages
  • Have a built-in Microformats parser
  • Support for a more complex piping scripting (perhaps in the form of a JavaScript script)
  • Support for state saving (or at least a limited way such as the ability to compare the previous version of the page/feed you are piping)

Accept Regular HTML pages
Currently, Yahoo Pipes (at least as far as I’ve figured it out) accept only feeds (Atom, RDF, RSS, etc). The other building blocks that works with Yahoo Search, Google Base and Flickr eventually output a feed to Yahoo Pipes. Having the ability to retrieve a page instead of a feed and manipulate it will make things a lot more interesting and will allow VERY interesting meshups and ideas

Built-In Microformats parser
If Yahoo Pipes will accept regular pages, having a built-in Microformats parser will allow people to extract various types of structured information stored in the Microformats on the pages, thus, creating a reacher and more interesting abilities with Yahoo pipes.

Pipes Scripting
Having custom scripting abilities to Yahoo Pipes will make it really great and will allow a burst of innovation and interesting things composed with Yahoo Pipes. Of course, this feature is the most complex one from both development and security since having 3rd party code run on your servers is always a problematic thing. But, I’m sure the fine people at Yahoo can limit that.

One idea that comes into mind is writing such scripts in JavaScript, thus the whole running of the scripts on top of a page will be contained into a JavaScript environment and can only work on the input of the file being parsed.

State Saving
State saving will allow users to create a more complex pipe that can be aware of changes. The simplest one is to compare to the previous version of the page/feed, thus allow the pipe writer to figure out what to output.

An interesting pipe example that uses some of the things I’ve talked about above would be to have a pipe that listen to a certain drivers vendor’s driver page (most of the drivers vendors don’t have a feed that I can subscribe to and know when there are newer versions of a driver and things like that). The pipe would extract the current version and date from the page and compare it to the previous version stored at Yahoo of that page. If it has changed, it will add an item to the feed’s pipe saying that a new version exists, etc.

What do you think? Will this work? Would you be interested in such things?

Bad Text and Part of Speech Tagging – Background

I’ve recently been fascinated with some aspects of Natural Language Processing (NLP) having worked on some of them at my day job.

One of the key aspects that are very important for a computer program to understand natural language is called Part of Speech Tagging (POS or POST).

Basically, in the POS tagging phase, the computer assigned the part of speech (noun, verb, adjective, etc) to each word of the specified text, thus allowing the computer to figure out what this text is about and perform later analysis with it.

The POS step is very crucial since its output will later be used in the rest of the reasoning process of understanding what the text is about and every mistake at this stage will be dragged onwards making the end result way off target.

The problem with most POS taggers (see a list of most of the free ones here) is that they assume that the text you are trying to tag is grammatically correct and (hopefully) is free of spelling mistakes. Proper casing (upper case and lower case of words and letters) is also important to distinguish various types of POS. The other type of POS taggers perform unsupervised learning and can be trained to work with various text types.

The problems begin when the text is not grammatically correct, contains spelling mistakes, is not correctly punctuated and proper casing is non existent or used wrongfully.

These problems are most common on the Internet and stem from various reasons:

  • English is not the native tongue for a large part of the Internet users making grammar, spelling and punctuation mistakes a bit more common.
  • A portion of the current young Internet users (and I’m not coming from a judgmental approach) use a lot of Internet shortcuts and improper casing and grammar.

The big challenge is to still being able to understand what the text is about in spite of these problems.

Since the mistakes varies from person to person (and possibly from group to group – which might make things easier. I haven’t done or seen a research about that yet), pre-training your POS tagger is not very useful since the mistake rate will be quite high. Running an unsupervised learning algorithm on each of these text will be time consuming and might return strange results due to the fact that there are quite a bit of error types that can appear in the text.

Handling one sentence or just a set of keywords in search engines is relatively easier than figuring out what a block of text (a couple of sentences, a paragraph or even a set of paragraphs).

I’ve been experimenting with various techniques on extracting more meaningful results badly formed English text. Some of them are not POS tagging in the tranditional sense of the POS tagging (i.e. tagging each and every word in the text), but rather a way of figuring out the most interesting words in a block of text that might imply what this text is really about.

The goal of my experimentation is to try and develop an algorithm that will output a set of words or phrases in various combinations that will later allow me, using a co-occurrence database containing some statistics about different words, to output the main areas that the supplied text talks about.

In later posts I’ll try to describe various algorithms I’ve been experimenting with that should increase the efficiency of understanding the main subject of a block of a grammatically improper, badly spelled and wrongfully cased text.

If any of you who actually read this post knows people that are working on the subject (or similar aspects of it) and that can point me to some interesting articles on the subject, please leave a comment.

Feel free to leave a comment if you’d like to discuss some of these aspects.

Help find Jim Gray

If you don’t already know, Jim Gray, a computer scientist and Turing Award winner has disappeared at sea on Jan 28th 2007 while solo sailing his boat on a trip to Farallon Island near San Fransisco.

His friend, Werner Vogel – Amazon’s CTO, has harnessed the help of Amazon’s Mechnical Turk to get people to search for any interesting items in a couple of satellite images. If users mark that these images are worth further investigations they will be treated as such.

You can join in and help from here.

I’ve started to help a bit, but there are a couple of things that I think are fairly easy to do and can greatly help:

  1. Image tiles that are completely blank (usually a side effect of the alignment post process of satellite imagery) should not be considered a HIT. It’s easy to check and its easy to save unnecessary clicks from people helping out.
  2. I think the tiles can be a bit bigger, thus, covering more grounds in a single HIT. Bigger tiles might also enable people to see wreckage formatioms which (god forbid) will give an indication that something has happened.

In addition to that, if the satellite images were released, I’m sure there are more than a few people with knowledge and code that can help identify some of the object automatically (I know I have more than a few such codes to identify various forms in various sizes in an image).

This might give this help a bigger boost.

I just hope Jim will be found in time.

UPDATE (2007/02/05 16:53 IST): I’m not sure if I’m suppose to publish this, but the directory in which one sees the various images in Mechnical Turk are stored in a server where you can get the satellite images broken into tiles in a big zip file. There are ~100 satellite images there. Perhaps some of you (and maybe me if I can find some time) can download it, mash it up back into one picture and run various analysis tools on it.

You can grab the images from here.

I hope it will help find Jim quicker.