Right in the middle of the madding crowd: Designs for making your searches faster

In this part, we will look for an improvised solution for the problem we have been discussing. That is making enterprise transaction searches faster.
Caching Search Results
One of the weakest links in the earlier solution was that we had to query the database each time the user hits 'next' or 'previous' in the search results. This cost performance dearly and also system resources.
Now, if we could cache the search results somehow we can avoid querying the database repeatedly and instead access the data quickly from the cache. Before we look at caching solutions let us first look at what we are looking to gain by using a cache
When using a cache,

The database tables should not be repeatedly searched for the same content upon pagination
Quick access to page content without scouring over large content
Automatic purging by some criteria
Fast response
Should be cluster friendly when used in multi server production environment

Search solutions are many and of different types, choosing one that fits your requirements is really a matter of personal choice and applicability in your context. Here are a few options

Session with state server (.NET)
View state (for .NET)
ASP.NET cache (for .NET)
JCS (Apache/Java)
Velocity (Microsoft)
JBoss Cache(Java)

are some solutions, we can even quickly put together our own caching solution if resources are available. I will discuss custom caching solutions in a separate post.
All of the above solutions use their own approach for caching,but provide a very simple interface for using them. We will not delve into the nuances and the differences in the approaches adopted by each of the solutions, but we will assume that one of the options have been used.
So, how does caching help? At a high level, the solution involves

Retrieving the search results by normal SQL query of the database
Converting them into a form that is easily serialized and saved in a cache and saving them in a cache.
Retrieve data from the cache each time user paginates.

Using the Cache
Most cache solutions use the ubiquitous key,value pair approach to caching at a fundamental level. There are higher levels of distribution and control using regions and other such concepts, however at a fundamental level, it retains the simple pair method of cache access.

Content can be stored in cache using simple code such as shown below. Note that the exact syntax may vary depending on the caching solution used.
Cache objCache = CacheFactory.GetCache("SearchCache");
objCache.put("MyCustomerSearch",objByteArraySearchResults);

The above statements show how a handle to the cache container by name 'SearchCache' is obtained and a new cache entry is made against the key "MyCustomerSearch". The value passed is the serialized form(Byte array) of the search result collection created by the search query. Other options such as converting the results to XML can also be considered.
To retrieve the results from cache we execute the simple command

objByteArraySearchResults = (Byte[])objCache.get("MyCustomerSearch");

Unique Key
In a real world multi-user scenario, the above example will fail as the key is not unique and the result will most likely be corrupted. To ensure uniqueness in a multi-user scenario, we will generate a unique key. There are different ways to generate a unique key. One such way is specified below

SearchName+UserName+SessionID+timestampofsearchexecution

For instance the Customer search key would look like

CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112

This key is dynamically computed and will remain unique for most general requirements. It is very important that this key is stored in the page context / page scope as a hidden variable as this key is to be used every time an access has to be made to the cache.

Reality Check
We have a basic solution for search using a caching approach. let us check against the considerations on how this performs as a search solution

Consideration	Compliance	Remarks
Search must be fast	Complies
Search must be accurate	Complies
Search must use minimal system resources	Partly Complies	This is much better to the earlier DB only approach.
Search must avoid redundant queries	Complies	still executes redundant queries when paginating
Search must provide current data	Partly Complies	Cached results do not reflect updates that happen after the search is executed.
Pagination must be fast	Partly Complies	Avoids redundant queries
Must facilitate on demand sorting	Complies	Requires requerying the db unless sorting API is available in the programming language. eg.LINQ
Must facilitate on demand result filtering	Complies
Must be multi-lingual friendly	Complies
Solution must be cluster friendly	Complies*	Subject to support from Caching solution

As one can see from the table above, there has been significant improvements in promised performance over the earlier approach by avoiding redundant database queries and by keeping the search data close to the point of use.

This works well for searches that involve large rows but medium sized results up to a few hundreds.

Caching - Extended
The above section concluded with the compliance checklist does not indicate full compliance for fast pagination. The reason is that when a user navigates a page, the cache is accessed and the entire result set is retrieved, deserialized before the search program can extract the exact number of rows required to display the page the user wishes to navigate to.
When the result set is fairly large, there is a significant cost to loading the entire result set and deserializing it.
Instead, if we could organize the cache in such a way that i could directly load only the content i need without having to load the entire result set, it would provide considerable performance advantage over the basic caching approach.
To achieve this we can use indexed caching.
Before delving into the details, let us step back and understand the requirement for pagination.
Search results are displayed mostly in a table / grid. Pagination is done using a simple Next or a Previous button. Some cases direct navigation to a specific page is given through direct links or other such options.

The number of rows displayed per page / page size is also fairly consistent in most cases with some flexibility being provided where a user can choose between a set of values such as 10, 50,100 or other such options.
With this as a basis let us look at how we can evolve a solution that will help us effectively use caching to make navigation faster.
The number of rows per page is a multiple of 10 as can be seen from the options provided for the page size.
10x1 = 10
10x5 = 50
10x10 = 100

Splitting the cache
The solution will not be based on splitting the search results and storing them as separate cache entries, each containing a part of the search results that contain only enough to display per page. Let us call the number of rows we store per cache entry as a block. Each cache entry will be marked with its own block number in addition to the key as defined earlier.
So, the syntax for the key will now become

SearchName+UserName+SessionID+timestampofsearchexecution_BlockNumber

Let us discuss this with above example.

Option 1 - least common divisor approach

In this approach we will store only as many rows in a cache entry as there are in the least common divisor. So, for the above example we will store ten rows per cache entry. So, a search that returns 110 rows will contain 11 blocks as shown below
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_1 - Rows 1-10
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_2 - Rows 11-20
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_3 - Rows 21-30
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_4 - Rows 31-40
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_5 - Rows 41-50
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_6 - Rows 51-60
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_7 - Rows 61-70
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_8 - Rows 71-80
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_9 - Rows 81-90
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_10 - Rows 91-100
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_11 - Rows 101-110

So, when the page size is set to 10 rows per page, the cache key can be computed by directly appending the page number to the key
However, when page size is set to 20 rows per page, the following computation is used to retrieve the results directly.

BlocksToRetrieveperPage = RowsPerPageTobeDisplayed/RowsPerBlock

In this case the above formula would translate to
BlocksToRetrieveperPage = 20/10

The next parameter required is the starting block to generate the key so that we can directly navigate to the cache block.

BlockToStart = ((RowToStartFrom)/RowsPerBlock)+1

When page size is set to 20 and the user clicks next button when in the third page,
BlockToStart = ((61)/10)+1

In this case the blocktostart is at block 7. As we can see from earlier sections, the 7th block features the 61'st row .
The pseudocode to process the cache is given below

WHILE BLOCKS_RETRIEVED < blocks_retrieved="BLOCKS_RETRIEVED+1;
RETRIEVE FROM CACHE THE ENTRY WITH KEY (SearchName+UserName+SessionID+timestampofsearchexecution_+"_"+BlockToStart)
ROWSTODISPLAY=ROWSTODISPLAY + CACHE_ENTRY_RETRIEVED
BlockToStart = BlockToStart +1;
END WHILE

In this approach, the number of blocks to be accessed for varying page sizes are as given below

Page Size :10 Blocks accessed per page is 1
Page Size :20 Blocks accessed per page is 2
Page Size :50 Blocks accessed per page is 5
Page Size :100 Blocks accessed per page is 10

This option is scalable even if the customer decides to provide more options for page sizes such as 30, 40 etc., as long as it is a multiple of 10 and the number of rows displayed per page will never be less than 10

Option 2 - Optimizing the block access
To push the envelope further, we can strike a balance between the number of blocks loaded per page and the size of data loaded per page access.
For instance if we change the rows per block to be 50 in the above example, we would end up with only three cache entries
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_1 - Rows 1-50
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_2 - Rows 51-100
CustomerSearch_JohnDoe_LJSL0SDF9SD797D7F_20091023233029112_3 - Rows 101-110

In this approach the number of block access made for varying page sizes are shown below for a result set of 110 rows

Page Size :10 Blocks accessed per page is 1 //block 1
Page Size :20 Blocks accessed per page is 1 //block 1
Page Size :50 Blocks accessed per page is 1 //block 1
Page Size :100 Blocks accessed per page is2 //block 1 and 2

As we can see the number of page access has been optimized while striking a balance between the number of cache accesses made vis-a-vis the volume of data dealt with per cache access. This would provide a consistent user experience when paginating.

While both options are very powerful, the choice is really up to the designer to choose from. Let us look at the compliance check list again

Consideration	Compliance	Remarks
Search must be fast	Complies
Search must be accurate	Complies
Search must use minimal system resources	Complies	This is much better to the earlier DB only approach.
Search must avoid redundant queries	Complies	still executes redundant queries when paginating
Search must provide current data	Partly Complies	Cached results do not reflect updates that happen after the search is executed.
Pagination must be fast	Complies	Optimized Cache Access
Must facilitate on demand sorting	Complies	Requires requerying the db unless sorting API is available in the programming language. eg.LINQ
Must facilitate on demand result filtering	Complies
Must be multi-lingual friendly	Complies
Solution must be cluster friendly	Complies*	Subject to support from Caching solution

We have looked at making effective use of caching in this section. In the next part we will look at pushing the performance envelope further.

Right in the middle of the madding crowd

Thursday, October 22, 2009

Designs for making your searches faster - Part 3

0 comments:

Post a Comment

Blog Archive

Categories

Ads Banner

My Blog List

Site Info

Followers