Consideration | Compliance |
Search must be fast | |
Search must be accurate | |
Search must use minimal system resources | |
Search must avoid redundant queries | |
Search must provide current data | |
Pagination must be fast | |
Must facilitate on demand sorting | |
Must facilitate on demand result filtering | |
Must be multi-lingual friendly | |
Denormalized data structures with Persisted Views / Materialized Views
Denormalized tables are flattened tables that contain all the columns from all the tables that need to be queried to retrieve a search result. Using denormalized tables can greatly speed up queries as it reduces the number of joins one has to perform in a query.
let us quickly look at a sample denormalized structure.
The image shows a simple customer table and a set of related tables used to maintain the customer address information. Let us consider a scenario where we are developing a customer search using these tables.
The query would look like
'SELECT c1."FIRST NAME", c1."MIDDLE NAME", c1."LAST NAME", c2.ADDRESS1, c2.ADDRESS2, c3.NAME, c4.NAME
FROM
CUSTOMER c1,
ADDRESS c2,
STATE c3,
COUNTRY c4
WHERE
c2.CUSTOMER_ID = c1.ID AND
c3.ID = c2.STATE_ID AND
c3.COUNTRY_ID = c4.ID and....
This is a fairly simple scenario and this uses four tables to execute a search. Imagine more complex situations where more such tables could be involved, the queries would get only more complex.
A multi-user search involving such or more complex queries running on even a small volume database could lead to lower performance.
To make this faster, we can consider using views. Look at the above snapshot again and at the CUSTOMER_VIEW entity shown in darker shade of gray. This entity has all the columns from all the individual physical entities. Denormalized view as the name states will house redundant data. However, it greatly simplifies the query required to perform the search. Persisted views in sharp contrast with traditional views are not resource hogs.
Traditional views are created on a per-query basis where a view is created for every query run on the view on which the user query is executed. This will result not only in a proliferation of views at any instant, but also serious resource hog by the views.
To avoid this we will consider using persisted views. I will briefly touch upon the concept of persisted views and will strongly recommend googling the topic for detailed study.
Imagine a persisted view to be a real table that is also a view that can be periodically synchronized with the actual set of tables.
In our example from the screenshot above, imagine CUSTOMER_VIEW to be an actual table that physically exists on the database that is periodically refreshed from its source tables.
This offers the best of both worlds
- The proliferation of runtime views are avoided
- Data is synchronized automatically at specified intervals
- The search query can also be greatly simplified.
'CREATE MATERIALIZED VIEW CUSTOMER_VIEWThe above query creates a materialized view and refreshes its data every hour which is fine given that customer data rarely changes by the hour.
REFRESH FAST START WITH SYSDATE
NEXT SYSDATE + 1/24
WITH PRIMARY KEY AS
SELECT
c1."FIRST NAME", c1."MIDDLE NAME", c1."LAST NAME",
c2.ADDRESS1, c2.ADDRESS2, c3.NAME, c4.NAME
FROM
CUSTOMER c1,
ADDRESS c2,
STATE c3,
COUNTRY c4
WHERE
c2.CUSTOMER_ID = c1.ID AND
c3.ID = c2.STATE_ID AND
c3.COUNTRY_ID = c4.ID '
Note: SQL Server provides views with SCHEMA BINDING which is similar to materialized views.
Let us look at the query we would write to retrieve the same data from the CUSTOMER_VIEW
SELECT c1."FIRST NAME", c1."MIDDLE NAME", c1."LAST NAME", c1.ADDRESS1, c1.ADDRESS2, c1.NAME, c1.NAME
FROM
CUSTOMER_VIEW c1,
WHERE
<.... search criteria>
As you can see the query has greatly simplified.
Let us quickly verify the check list to see how this solution fares
Search must be fastSearch must be accurateSearch must use minimal system resourcesSearch must avoid redundant queriesSearch must provide current dataPagination must be fastMust facilitate on demand sorting facilitate on demand result filteringMust be multi-lingual friendly
Consideration | Compliance | Remarks |
Search must be fast | Complies | |
Search must be accurate | Complies | |
Search must use minimal system resources | Partly Complies | |
Search must avoid redundant queries | Does not comply | still executes redundant queries when paginating |
Search must provide current data | Partly Complies | Persistent view frequency must be tuned appropriately |
Pagination must be fast | Does not comply | Redundant querying makes it slow |
Must facilitate on demand sorting | Complies | |
Must facilitate on demand result filtering | Complies | |
Must be multi-lingual friendly | Complies | |
We can observe from the checklist that this solution is only slightly better than the easiest route we discussed in part 1.
However, this is a pretty good solution provided,
1. The tables being queried do not undergo changes frequently
2. It is acceptable to provide near current data
3. Volumes are not extremely high
For instance, product catalogs, customer master data are good candidates for this approach as they fulfill most of the above-mentioned criteria.
In the forthcoming parts, we will look at solutions that attempt to address other considerations in the checklist.
0 comments:
Post a Comment