Right in the middle of the madding crowd: Designs for making your searches faster

In the previous parts we looked at different ways of speeding up application based transaction searches. In this part we will look at pushing the boundaries to an extreme limit without compromising on application data integrity through use of search engines.
Search Engines
Search engines are different types. The ones we all are familiar with, the internet search engines scour the web for data and catalog them to help us find the web pages that provide the content we look for.
The second kind that is the internet search engine applied to the end-user desktop. These are classified as desktop search engines and they help users search their local desktops for anything from documents to music files and emails by keywords.
The one we are going to use in this context is an enterprise search engine that is essentially a stripped down version of a desktop search with extensions that enable application developers harness the power of the search engine.
Before we look at how we will use search engines, let us quickly and briefly understand how search engines work.
Search engines fundamentally is all about indexing various key words against the content link so that when a user searches using any of the key words, the mapped content can be presented to the user.
For instance if we have a book called
"Fifty ways to make a sandwich" by "Danny Shrill" , a fast cooking guide to working men and women, Penguin Books

A search engine will index many of the key words in the context of the book such as

The author name :Danny Shrill
publisher : Penguin Books
subject:Sandwich
Category that is cooking and fast food
Published : 2009
Type : Paperback

When a user searches for books using any one or more of the keywords shown above, the book's name will be thrown as a possible candidate for the book the user is searching for.

The accuracy of the search engine depends on the relevance of the key words that are indexed by the engine.

Search Engines and enterprise applications
Search engines make not be relevant for use in enterprise application searches across the board. Use of search engines should restricted to cases if they meet the following criteria

Where Search performance SLAs are very low (Expected response times are very low)
Where data volumes are very high
where user may not be able to provide accurate data points for search

A high level approach to using search engines.
Let us approach this with an example. let us assume that we are building a product search in amazon.com. When a user searches for a product in amazon.com, it has to fulfill the following criteria

its got to be really fast
its got to be accurate in retrieving relevant products
its got to be flexible when users make mistakes in spelling what they are looking for
its got to provide a good set of suggestions when users cannot find what they want

How do we go about using a search engine that blends with classic relational database system for retrieving results.
Let us apply all of what we learned in the previous parts and quickly summarize the steps

We will create a denormalized table structure
We will use a threaded searching mechanism for burst searching

To use a search engine following activities have to be done

Build key word catalog
Integrate indexing mechanism

Build Key word catalog
Key word catalog is a dictionary of terms a user will typically use for searching. The key words can be classified and grouped depending on need. Let us now attempt to build a key word catalog for our exercise.
Keyword catalog for amazon products

Name
type - books, music, ebook, audiobook
Genre - Fiction, Self help, pop, rock
Author/writer
Played by
support cast
publisher
year published
media : hard bound, paperback, ebook, download, dvd

this is a brief collection keywords by which a prospective amazon.com customer could look for products. This can also be reverse worked from the criteria in your basic / advanced search screens.
Once this has been built, the next step is to integrate the search engine to index and execute the searches.
Integrating the search engine
Search engine integration has two parts to it.

Indexing
Searching

Indexing
Indexing the product database can be done in multiple ways. The simplest way to do this is when a product data is modified in the system. This way any new product data added to the system or when an existing product information is modified in the system, the search engine is updated with its content.
This ensures that the search engine indexes are up to date. Let us look a bit closer into how indexing is done.
Most search engines refer to index entries as a compilation of Book(s). Each book is a catalog entry comprising of two parts. There are many other meta parts, but we will restrict our discussion to these two parts to keep things simple.

Key words
Identifier

Key words as we saw earlier are a grouped collection of key identifiers typically used by users to search. An identifier , is a unique identifier such as a primary key that the search engine considers as result of the search.

So, as presented above if we ensure that the search engine indexes content every time a transaction is created or modified, the search engine will use its complex algorithms to hash and store the indexes for fast retrieval.

Searching
The next part of integration is the search process itself. When a user performs a search, the search program should first use the search engine API and ask the search engine to return the KeyIDs that match the criteria.
Search engine APIs are fairly simple and enable searches to be done by simply providing a list of key words or additionally their classifications.
Search engines also provide extended api to search with emphasis on certain classifications and to return search results with desired matching criteria to further refine results.
The key identifiers returned by the search engines are then used in transactional searches to retrieve the data from the denormalized data structure in the database.
Since the database results are retrieved by directly specifying the key identifier, the retrieval will be the fastest.

Query the search engine with search criteria
Search engine returns books matching the criteria with specified order of relevance
application forms db query with respective KEy identifiers in the books
Database throws up relevant rows based on the primary keys specified.

This approach has shown to provide the fastest results as it relies on custom built search component to retrieve the results. However this approach comes with its own cost maintaining the search indexes in a cluster friendly location.

Let us quickly take stock of this approach vis-a-vis the checklist we put together in the first part.

Consideration	Compliance	Remarks
Search must be fast	Complies	Much Improved performance over earlier approach
Search must be accurate	Complies
Search must use minimal system resources	Complies	This is much better to the earlier DB only approach.
Search must avoid redundant queries	Complies	still executes redundant queries when paginating
Search must provide current data	Complies	The cached portion is only the identifiers and the data can be retrieved directly from the database
Pagination must be fast	Complies	Optimized Cache Access
Must facilitate on demand sorting	Complies	Requires requerying the db unless sorting API is available in the programming language. eg.LINQ
Must facilitate on demand result filtering	Complies
Must be multi-lingual friendly	Complies
Solution must be cluster friendly	Complies*	Subject to support from Caching solution

Closing comments
From a solution perspective we have looked at many options that are available with us for building efficient searches. However, we have only scratched the surface and many of the solutions we have adopted have deeper tuning options that allow us to further exploit their features to better deliver searches.
I will close this subject with this part hoping that this series has provided an impetus for you to embark on your own discovery process to further analyze and evolve a solution that best fits your needs.

Right in the middle of the madding crowd

Monday, November 2, 2009

Designs for making your searches faster - Part 5

0 comments:

Post a Comment

Blog Archive

Categories

Ads Banner

My Blog List

Site Info

Followers