Right in the middle of the madding crowd: Designs for making your searches faster

In this part we will look at a few basic solutions that can be considered. Before we dive into the solution, let us quickly put together a design considerations chart that we can use for compliance verification

Consideration	Compliance
Search must be fast
Search must be accurate
Search must use minimal system resources
Search must avoid redundant queries
Search must provide current data
Pagination must be fast
Must facilitate on demand sorting
Must facilitate on demand result filtering
Must be multi-lingual friendly

Denormalized data structures with Persisted Views / Materialized Views
Denormalized tables are flattened tables that contain all the columns from all the tables that need to be queried to retrieve a search result. Using denormalized tables can greatly speed up queries as it reduces the number of joins one has to perform in a query.
let us quickly look at a sample denormalized structure.

The image shows a simple customer table and a set of related tables used to maintain the customer address information. Let us consider a scenario where we are developing a customer search using these tables.
The query would look like
'

SELECT c1."FIRST NAME", c1."MIDDLE NAME", c1."LAST NAME", c2.ADDRESS1, c2.ADDRESS2, c3.NAME, c4.NAME
FROM
CUSTOMER c1,
ADDRESS c2,
STATE c3,
COUNTRY c4
WHERE
c2.CUSTOMER_ID = c1.ID AND
c3.ID = c2.STATE_ID AND
c3.COUNTRY_ID = c4.ID and.... '

This is a fairly simple scenario and this uses four tables to execute a search. Imagine more complex situations where more such tables could be involved, the queries would get only more complex.
A multi-user search involving such or more complex queries running on even a small volume database could lead to lower performance.

To make this faster, we can consider using views. Look at the above snapshot again and at the CUSTOMER_VIEW entity shown in darker shade of gray. This entity has all the columns from all the individual physical entities. Denormalized view as the name states will house redundant data. However, it greatly simplifies the query required to perform the search. Persisted views in sharp contrast with traditional views are not resource hogs.
Traditional views are created on a per-query basis where a view is created for every query run on the view on which the user query is executed. This will result not only in a proliferation of views at any instant, but also serious resource hog by the views.
To avoid this we will consider using persisted views. I will briefly touch upon the concept of persisted views and will strongly recommend googling the topic for detailed study.
Imagine a persisted view to be a real table that is also a view that can be periodically synchronized with the actual set of tables.
In our example from the screenshot above, imagine CUSTOMER_VIEW to be an actual table that physically exists on the database that is periodically refreshed from its source tables.
This offers the best of both worlds

The proliferation of runtime views are avoided
Data is synchronized automatically at specified intervals
The search query can also be greatly simplified.

Before we look into the search let us quickly look at the script to create the view shown above. The script shown here is for Oracle

'CREATE MATERIALIZED VIEW CUSTOMER_VIEW
REFRESH FAST START WITH SYSDATE
NEXT  SYSDATE + 1/24
WITH PRIMARY KEY AS
SELECT
c1."FIRST NAME", c1."MIDDLE NAME", c1."LAST NAME",
c2.ADDRESS1, c2.ADDRESS2, c3.NAME, c4.NAME
FROM 
CUSTOMER c1,
ADDRESS c2,
STATE c3,
COUNTRY c4
WHERE 
c2.CUSTOMER_ID = c1.ID AND
c3.ID = c2.STATE_ID AND
c3.COUNTRY_ID = c4.ID '

The above query creates a materialized view and refreshes its data every hour which is fine given that customer data rarely changes by the hour.

Note: SQL Server provides views with SCHEMA BINDING which is similar to materialized views.

Let us look at the query we would write to retrieve the same data from the CUSTOMER_VIEW
SELECT c1."FIRST NAME", c1."MIDDLE NAME", c1."LAST NAME", c1.ADDRESS1, c1.ADDRESS2, c1.NAME, c1.NAME
FROM
CUSTOMER_VIEW c1,
WHERE
<.... search criteria>
As you can see the query has greatly simplified.

Let us quickly verify the check list to see how this solution fares

Search must be fastSearch must be accurateSearch must use minimal system resourcesSearch must avoid redundant queriesSearch must provide current dataPagination must be fastMust facilitate on demand sorting facilitate on demand result filteringMust be multi-lingual friendly

Consideration	Compliance	Remarks
Search must be fast	Complies
Search must be accurate	Complies
Search must use minimal system resources	Partly Complies
Search must avoid redundant queries	Does not comply	still executes redundant queries when paginating
Search must provide current data	Partly Complies	Persistent view frequency must be tuned appropriately
Pagination must be fast	Does not comply	Redundant querying makes it slow
Must facilitate on demand sorting	Complies
Must facilitate on demand result filtering	Complies
Must be multi-lingual friendly	Complies

We can observe from the checklist that this solution is only slightly better than the easiest route we discussed in part 1.
However, this is a pretty good solution provided,
1. The tables being queried do not undergo changes frequently
2. It is acceptable to provide near current data
3. Volumes are not extremely high

For instance, product catalogs, customer master data are good candidates for this approach as they fulfill most of the above-mentioned criteria.

In the forthcoming parts, we will look at solutions that attempt to address other considerations in the checklist.

Right in the middle of the madding crowd

Wednesday, October 21, 2009

Designs for making your searches faster - Part 2

0 comments:

Post a Comment

Blog Archive

Categories

Ads Banner

My Blog List

Site Info

Followers