Crawling to the people

Yaniv let the cat out of the bag about some of our ideas for making other parts of the search and its relevant data open, free and accessible to all of us.

I’d thought I’ll add some background and my thoughts on the subject.

First, the idea was iterated a couple of times when we were in that place where you have a solution(s) and you are seeking a problem(s) to solve.

It all started from this post by Jeremie Miller. Jeremie, being the good guy that he is, was thinking about create standards and protocols to make the crawling, processing and sharing of data for search and search engines public, free and accessible. While neither Yaniv nor I are in Jeremie’s loop and have no idea of what he is up to (but you can count on it to be interesting, that’s for sure), we talked about it a bit and it sunk in.

We both liked the idea of having the raw data accessible as well as being able to run custom post processors that can make something useful out of it so that no one is tied to whatever logic and algorithms the crawler writer enforces.

Then came the announcement from Kevin Burton about spinn3r, a service that re uses the web index of the Blogosphere crawled by TailRank’s crawler and allows you (and everyone else) to use that crawled data.

This information also sunk in and today at lunch (which did take quite a while :-) ) we started to brainstorm about it a bit more seriously.

This can really open up and innovate search from the bottom up. Give access to a lot of people to APIs and capabilities that were previously only available for big companies. This is the platform that can create something very interesting.

We would love to hear your comments.

Amazon Recommendations, Big Giant Collection Books, Reprints and New Editions

I really like Amazon. I really like Amazon’s recommendations and ever since I inputed most of my books into Amazon I get really good recommendations.

There is one thing that bothers me, though.

I recently made a big order from Amazon and included two books which I was long overdue in owning and reading them. The books were “Long Dark Tea Time of the Soul” and “Dirk Gently’s Holistic Detective Agency” both by Douglas Adams.

After the purchase, Amazon recommendation started to offer me other Dougls Adams books such as “Mostly Harmless“, “So Long and Thanks for all the Fish” and “The Restaurant at the End of the Universe“.

I previously told Amazon that I already own “The Ultimate Hitchhiker’s guide to the Galaxy” which is one large book containing all 5 of the hitchhiker’s guide novels (3 of them are the books mentioned above).

Since I own a book that include those books I would have figuring that Amazon will know that and handle that similar to how they handle situations in which a book is reprinted or has some newer edition (usually with minor changes or no changes at all). The recommendation engine doesn’t handle that because it probably doesn’t take into account that this one book is a collection of other books and in addition to that.

Due to the Hitchhiker’s guide to the Galaxy movie they have re-printed the series so there are newer edition out there, which is probably one of the causes I see these books again.

It’s not that uncommon to have such a book that contains multiple previous titles that were a part of a series before. For example I also own “The Great Book of Amber: The Complete Amber Chronicles” which is one big book that contains the 10 books in the Amber series by Roger Zelazny (luckily I haven’t told Amazon about that so I’m not getting recommendations to buy the same books again).

Perhaps Amazon should take a look into such collection books as well as handling re-prints and newer edition in a different way.

For example, for reading books (not technical books that often have newer editions that do change and add things) I would expect by default to not see any new re-prints and things like that unless I specifically opted that in my settings.

For technical/reference books I would like, by default, to see newer editions because these new editions (usually) add and update information and in most cases its important to stay up-to-date or at least know that there is a newer edition.

For paperback vs. hard cover editions, Amazon seems to handle it well and does understand that if I have the paperback edition I don’t need to be recommended of the hard cover edition and vice versa. I can only assume they implemented it by saving some kind of a reference between these books, so perhaps they should add a new type of reference/link for books that are a collection of other books and other such links to handle the rest of the things I’ve mentioned above.

What do you say? Am I the only book maniac/Amazon maniac/Recommendation maniac out there that thinks about this? :-)

Amazon Checkout Interface – Group to as few shipments as possible

I recently ordered a couple of books from Amazon.

When reaching the check out screen I, obviously, selected to group my shipments to as few as possible. I then looked and saw that it was grouped into two shipments, one book should be shipped the next day and the other 4 should ship only on the 20th of March – almost two months afterwards!

This was a bit strange considering the fact that Amazon showed that all books were in stock.

I figured there is probably a book or two causing the delay of the whole shipment, so I switched to the “ship as soon as the books are available” option and saw that one book (one book alone) caused the delay of the whole shipment.

I removed it (with great sorrow – it will wait for the next batch of Amazon books from my wish list), set the “group to as few shipments as possible” and everything was in one big happy shipment.

I wonder what other customers who are a bit less proficient in computers would have done. I’m guessing one of 3 options:

  1. Order and not notice that it will take two months for the shipment to come
  2. Select the option to send things as soon as they are available and pay a bit more
  3. Cancel the shipment and go elsewhere

Why didn’t Amazon add a check to see if the shipment will take more time than it should alert the user and tell him/her which item is the one causing the delay? It shouldn’t be that hard to check something along the lines of

if (scheduledShipmentDate > DateTime.Now.AddMonths(1)) {



Sometimes it’s the little things that tick me off. I’m a great fan of Amazon and it’s really the only place I can get almost any book I can think of, but sometimes a man’s got to post on his blog when a man’s got to post on his blog.

Amazon E-Commerce Web Service API

I’ve recently experimented with Amazon’s E-Commerce Service.

In general, it’s a very complete API giving you access to almost every piece of information including titles, images, prices (and historical prices) that Amazon stores.

There were two things that were a bit problematic, in my opinion, which I think should be addressed.

The first thing is the ItemSearch method. This method allows you to search for items answering a set of criterias.
I need to find a few books according to some keywords I got as input.
After looking in the documentation, I’ve started to use the “Keywords” property. The nice thing about it, is that you simply give a least of words (seperated by spaces) and it will return the results.

The problem with it is that its not a “smart” search. It says it will try treat these keywords as keywords or pharses. I searched on two keywords “saw” and “deck” so I inputed “saw deck” and got nothing.

It took me a while to search in the API and find out that what I really wanted was to use the “Power” property which allows entering a more sophisitcated search phrase such as “subject:(saw or deck)”.

This is really annoying and VERY unintuitive. How can I actually know that a property named “Power” is for the advanced search?!

Another issue that troubled me is related to the structure of the API. It seems there is a specific attribute for a lot of the things Amazon is exposing. Perhaps there is a place for a more generic version of this web service. On that will allow a user to get a more “Generic” object or data representation of all the various items Amazon have that will enable Amazon (and other users) to not change the various item specific structures whenever they add new types of items.

Look at the WSDL and the API samples yourself and tell me what you think.

GoogleWorld the new Web and privacy

Whether it is Gmail, Google Base, Google Video, Google Answers, Froogle, Google Blog Search, Google Book Search, Google Maps and Google Toolbar, Google seems to be conquering the world by offering a lot of services in different and diverse areas.

(You can get a good review of the various Google Services here)

With your Google Account (which is also your Gmail email), Google can also track a person specifically and learn things about what him/her, what he/she searched for, shoped, interest in, etc.

Actually, according to this, Google also learns a lot about you even without having a Google Account.

The main problem with Google is that they are not actually showing the users what they are doing with this information.

Yes, they have privacy policy. Yes they claim they are “not evil“, and to some degree I believe them, but I really want to know what is being done with the information being gather on me.

Let me take Amazon as an example. When I buy things at Amazon they save it in their database. They also encourage me to fill in a wish list or even mark products that I already own so they will be able to offer me products that I’m interested in.

In addition to that, when they recommend something to me they always tell me why this product was offered to me and I can directly see and understand what they did with the information they gather about me and the information I have supplied them.


Google will soon hit the privacy wall hard and as more sites of the “forbidden word” will start gathering more and more information about people and their doings, I think its time for Google and the rest of the world to start actually showing to people what is being done with this information.

A good start would be like Amazon is doing by telling you why things have been recommended.