Vertical Search or Meta Search
I wrote before that the vertical search apps are built based on the core search apps. Google has taught us quite a lot about what goes into a successful search app. Google validated that aggregation can be very successful. The siloed model of building proprietary networks and keeping people on the network is not the only way to be successful. In fact, Google proved that aggregation can not only be succesful, but can also be scalable and high margin.
However, to be a complete search app there need to be few other things.
First of these is Comprehensiveness. What good is aggregation, if i still need to go to few other sites? Comprehensiveness doesn’t really mean that you should have every possible item, but there should be enough coverage that 90%+ users donot need to use another site 99% of the time. These numbers are not really scientific, but you get the idea. None of the so called travel vertical searches are comprehensive. They may be aggregating all the booking engines, but booking is just part of my travel planning. They leave out bulk of the travel queries regarding reviews, location selection, hotel selection etc. I consider trip advisor more of a travel vertical search than any of the above.
Then there is Relevance. Since early on, the problem on the web hasn’t been about finding results. It’s always been about finding the ‘right’ results. Google’s competitive advantage has been PageRank more so than anything else. The key is to come up with scalable algorithms for relevance that work across a breadth of queries. Clearly, hiring harmonious editors to optimize search results manually for each term is not scalable.
Let me talk a bit more about relevance. When it comes to vertical search, none of the vertical search apps out there does relevance well. Product search is searching purely on text. I can refine based on other attributes, but the first set of results presented to me are nothing more than simple text match. For example, froogle shows “digital camera” as a sample query. When i click on it, it shows ‘canon powershot A95′ as the most ‘relevant’ result for my query. I can’t really tell what makes that more or less relevant than any other camera on that list. While there might be more real life queries, this illustrates the core issue, that the relevance is mostly text match.
The folks dealing in the search world always hear the words structured, semi-structured, unstructured etc. The general consensus is that structured data is better for searching and refining. However, structured data always leads us to data normalization problem. Every one has structured data, and they all structure it differently. This is by far the most painful problem i have encountered dealing with various feeds. It seems simple enough and easy enough to be solved by standards, yet it isn’t. It repeats in almost all verticals.
There are extensions to normalization problems - like identifying the “product” or “item” uniquely or deduping. For example, how can you tell it’s the same job that got posted on craigslist and hot jobs? Or it’s the same item by the same seller for sale on both yahoo! auctions and ebay?
I personally believe that if a vertical solves the relevance problem well, then they don’t really need to solve deduping.


