Monday, November 23, 2009

Google the Skynet in making

Alright, skynet is a wee bit old. Lets say Matrix in making.
I started thinking about it (finally) when I heard that Google was doing away with local storage and promoting cloud only storage for their new OS Chrome OS!
Now, for those who woke up after a late night slosh fest, Google has announced a spanking new OS (well its actually a smartly repackaged Ubuntu) that has only the browser for an interface. Blazing fast, highly interactive and the other usual Googly stuff.

Now, coming back to the case in point. Cloud storage. I just am not in on the idea of storing everything of mine on the cloud. Hey, I need some space that i'd like to call my own without big brother watching! Well, if you are one of those types, let me say that I'd at least like to think that no one is watching what is in my personal space.
The concept of cloud storage for personal data is against the very idea of personal space though Google would claim the highest levels of security. I just dont buy it. Hey, call me old fashioned, but thats me. I'm sure am not alone.
Then I downloaded their distro and tried it out. Well, you boot up and there it is, the chrome browser in all its glory. Thats it!
It feels strange for a moment given that we are used to watching the desktop gizmos pop up one after the other with long whirs of the disks. That apart, the browser on the kernel is all there is to Chrome OS!
The so called apps are essentially links to Gmail, yahoo mail, hotmail, Google docs, picassa web, blogspot, hulu and other websites. Nothing 'new' here.
Thats when it hits you. My fears losing my private space is already null and void.
I have already lost it. Google already has my vitals! Line hook and sinker!
They know me through my emails, my tagged pictures, they know my private activities through my use of gps enabled google maps on my mobile, they know what my personal tastes are based on all the searches! they know my family and friends and their whole network through orkut and picassa, they know my business and financial details by automatically parsing unprotected attachments in my emails that they store as docs! They know my political standpoint from my blogs! such as this one! They also eavesdrop on my chat and store it for me! They know my appointments and schedule from my calendar usage! They know where I live from all the map searches i have made!
The only thing they may (and thats one mighty MAY! )not know already is my bank balance. Otherwise I'm afraid I've been had! I'm afraid we have all been had! The world has been effectively entrapped by this giant who we all fervently hope is noble.
I cannot help but think whether Google is the skynet / Matrix in making. Will it soon over throw its own bosses and take over the world?
If my paranoia is just that, paranoia, and Google is actually an angel in corporate garbs, Its still a nuclear bomb!
I shudder to think what if all that data falls into the wrong hands? What if a smart terrorist hacks into Google and starts making use of all the data.

Friday, November 20, 2009

ChromeFrame Offline Install

I had a tough time installing Chromeframe behind my proxy as downloading exes were banned. Even if you managed to get the 500kb setup, it again tries to download another installer. So, i pulled some favours and managed to work through the firewall restrictions and fiddle the URL.
Here is a link that allows you to get the offline installer
http://tiny.cc/chromefrm_offline

Trapped with Internet Explorer 6.

Working on a product for three long years building a web2.0 software with all the bells and whistles that an any sane human would care for is taking its toll on me.
With so many browsers IE, Firefox, Chrome, Safari, Opera its quite an ask to just ensure everything works the way it should on all platforms!
Now, after all the efforts we connect to our customers to roll the new platform out. We demo the product and they like it all and looks like the only thing to do was sign the contract and cash the cheque.
When things could not get better, we hit an air pocket. Their users use IE6 and cannot move from that platform. Reason? It would cost them more to install IE 7 on all their user browser than the annual contract for using our software! Grrrrrrrrr...
The reason for the gnashing teeth is that our product does not 'officially' support IE6. Its such an antiquated browser that any sane person fears! Its like living in Baghdad! YOu could! but you'd rather not if you care for your life!
With all its security holes and non compliance with standards I wonder why anyone would still use IE6 in an age where users are spoilt for choice of good browsers that are also light years ahead of IE6.
Well, one could understand if home users are still struck with IE6 that came with the windows pre-installed on their desktop and just don't care to upgrade.
I just cannot fathom why an enterprise user would not. Any sane network administrator would see a higher security risk retaining IE 6 when there are better choices.

Nevertheless, my customers are struck with IE 6 for some forseeable future and I do not want to miss out on my cheque for a lame reason as this.
So, what do i do now?
Citrix anybody?
Install Citrix servers with IE 7 or IE 8 and allow users to access the app through a supported browser?
Not a good idea, an overkill and not scalable(cost wise) for tens and thousand of thousands of users.

Change code to support IE 6
Quick Fix from http://divitodesign.com/css/let-ie6-behave-like-ie7/ Some good folks had spent quality time to write some scripts that would ensure IE 6 behaves much like Ie 7 .
This definitely works and is a very easy solution to integrate into the mainstream application.
However, we found that our product had style elements that still broke the site.

Chromeframe
I had read about Google developers doing a very smart fix when they were struggling with solutions to make Google Wave work in IE 6. They made Google Chrome browser into a Plug-in called ChromeFrame and made it run from within IE 6 much like Flash . This way pages running on the Chromeframe technically ran on Chrome browser even through it was running from within an IE browser. This ensured that all users who were still struck with IE6 could run contemporary web sites without having to upgrade their browsers whatever be their reasons.
http://code.google.com/chrome/chromeframe/
The only change I had to do was to ensure a meta tag is included in all my pages! I had to just change my intercepting filter/http handler and ensure i insert this m-eta tag in the response stream for all my requests.
I tried this out and boy I was impressed. It works like a dream. Of course users would have to install the chromeframe plug-in! but hey! they are not changing their 'browsers'! Trust me this is not an argument I like, but it works for some users!

I'm happy as long as my users are happy! Thank you Google, you saved me tons of work.

Wednesday, November 4, 2009

Multicasting in c#

This has been one topic that had me going around in circles for quite some time. Not that it was difficult to find sources that helped me. But it was the sheer difference in approaches that really frustrated me.
I'm writing this hoping that it saves someone a lot of head ache when they try multicasting. I'm going to refrain from explaining what is multicasting et al. There are excellent resources available on the internet on the topic. I will focus on the problem and the resolution here.

Problem Statement
Whenever a cache entry is updated in a node in the network, it must update its peers of the cache update. This will ensure all nodes have current cache data.

Product perspective
The nodes are the different web/app servers in the load balanced/clustered environment and a central cache server that keeps track of all updates to the cache entries.

High level solution
Multicasting will be used to transmit cache updates(all the app servers will do this) through a multicast group channel (something like a radio frequency) and anyone listening in on this channel (the central cache server in this case) will be able to pickup the transmissions.
The cache server transmits a 'pulse' signal. The peers will look for this signal to ensure that the central cache service is alive and not down.

Problem with multicast code found online
As is stated in the beginning many sources available online provide many different ways of multicasting. The problem i faced with most of the code was that it did not allow two different processes from the same machine to transmit into the multicast group.

Sending
So here is the code that finally allowed two processes to transmit into the same multicast group from


private void send(string strMessage)
{
Socket server = new Socket(AddressFamily.InterNetwork,
SocketType.Dgram, ProtocolType.Udp);
IPEndPoint iep = new IPEndPoint(IPAddress.Parse("224.237.248.237"), 5000); //IP Address and portnumbers are personal choices subject to MCast rules
byte[] data = Encoding.ASCII.GetBytes(strMessage);
try
{
server.SendTo(data, iep);
}
finally
{
server.Close();
}


}

Note: this code allows you to send messages without requiring elaborate steps like joining, socket binding etc., as suggested by some resources on the net. This cleared a major bottleneck for me

Listening
Receiving messages is mostly a consistent affair and many resources on the net are spot on when it comes to the code required to do this.
However, coming to our requirement, if we have a need to support multiple processes listening in on the same multicast group, there is no easy way out of this.
First let me present the fundamental code to listen into a multicast group

private void Listen()
{

UdpClient listener = new UdpClient(5001);
IPEndPoint groupEndPoint = new IPEndPoint(IPAddress.Parse("224.237.248.237"), 5001);
string strMsgRcvd =
string.empty;
try
{
listener.JoinMulticastGroup(
IPAddress.Parse("224.237.248.237"));
while (true)
{
byte[] bytes = listener.Receive(ref
groupEndPoint);
strMsgRcvd = Encoding.ASCII.GetString(bytes, 0, bytes.Length));
//do what you must with the message received.
}

}
catch (Exception e)
{
//log the error and proceed.
}
finally
{
listener.Close();
}

}


The above code will allow only one process to bind to the socket. Whenever another process attempts to bind to the same 5001 socket, .NEt throws an exception that only one process can bind to this socket.

There is no quick way to overcome this difficulty. This is a requirement in our case since we wanted to allow more than one process in our machine to listen in to the pulse signal from the cache service. Without the pulse signal the peer processes will not know if there exists a live central cache service to receive and process the outgoing messages sent by the peer nodes.

Here is an approach by which we can overcome this difficulty. I'm not insisting this is the best way, but this is one way of working around the problem.
Using a Pulse file
The above code is written into an independed program/windows service that writes the live status of the central cache service to a file in the local file system in a pre-determined location. Let us call this file a pulse file.
Let this pulse file have the name pulse.txt
The listener will over-write a '1' into this pulse.txt whenever it receives a pulse signal. This way the pulse.txt file is constantly 'touched' and the same value '1' over-written into it.
The independent processes that wish to be aware of the pulse status of the central service,looks for the pulse.txt file in the pre-determined location and checks the last updated timestamp property of the file (or alternatively instead of 1, the timestamp itself could be written into the file)

Once the last updated timestamp is retrieved from the file, the process determines the status of the remote cache service based on its tolerance. For instance

if((DateTime.Ticks - ObjLastAccessedDateTime.Ticks)/TimeSpan.TicksPerSecond < style="color: rgb(51, 102, 255);">else
{
// central service is not alive so throw an exception
throw new Exception("Please check whether the central cache servie is running");
}

this way we have a system by which more than one process that transmits and receives on the same multicast group in a system.

Foot note: this allowed me to run multiple dedicated web apps on the same box with different resource allocations (mem/cpu etc.,) to prioritize clients requirements at the same time sharing common infrastructure services.

Monday, November 2, 2009

Designs for making your searches faster - Part 5

In the previous parts we looked at different ways of speeding up application based transaction searches. In this part we will look at pushing the boundaries to an extreme limit without compromising on application data integrity through use of search engines.
Search Engines
Search engines are different types. The ones we all are familiar with, the internet search engines scour the web for data and catalog them to help us find the web pages that provide the content we look for.
The second kind that is the internet search engine applied to the end-user desktop. These are classified as desktop search engines and they help users search their local desktops for anything from documents to music files and emails by keywords.
The one we are going to use in this context is an enterprise search engine that is essentially a stripped down version of a desktop search with extensions that enable application developers harness the power of the search engine.
Before we look at how we will use search engines, let us quickly and briefly understand how search engines work.
Search engines fundamentally is all about indexing various key words against the content link so that when a user searches using any of the key words, the mapped content can be presented to the user.
For instance if we have a book called
"Fifty ways to make a sandwich" by "Danny Shrill" , a fast cooking guide to working men and women, Penguin Books

A search engine will index many of the key words in the context of the book such as
  • The author name :Danny Shrill
  • publisher : Penguin Books
  • subject:Sandwich
  • Category that is cooking and fast food
  • Published : 2009
  • Type : Paperback
When a user searches for books using any one or more of the keywords shown above, the book's name will be thrown as a possible candidate for the book the user is searching for.

The accuracy of the search engine depends on the relevance of the key words that are indexed by the engine.

Search Engines and enterprise applications
Search engines make not be relevant for use in enterprise application searches across the board. Use of search engines should restricted to cases if they meet the following criteria
  1. Where Search performance SLAs are very low (Expected response times are very low)
  2. Where data volumes are very high
  3. where user may not be able to provide accurate data points for search
A high level approach to using search engines.
Let us approach this with an example. let us assume that we are building a product search in amazon.com. When a user searches for a product in amazon.com, it has to fulfill the following criteria
  1. its got to be really fast
  2. its got to be accurate in retrieving relevant products
  3. its got to be flexible when users make mistakes in spelling what they are looking for
  4. its got to provide a good set of suggestions when users cannot find what they want
How do we go about using a search engine that blends with classic relational database system for retrieving results.
Let us apply all of what we learned in the previous parts and quickly summarize the steps
  1. We will create a denormalized table structure
  2. We will use a threaded searching mechanism for burst searching
To use a search engine following activities have to be done
  1. Build key word catalog
  2. Integrate indexing mechanism
Build Key word catalog
Key word catalog is a dictionary of terms a user will typically use for searching. The key words can be classified and grouped depending on need. Let us now attempt to build a key word catalog for our exercise.
Keyword catalog for amazon products
  1. Name
  2. type - books, music, ebook, audiobook
  3. Genre - Fiction, Self help, pop, rock
  4. Author/writer
  5. Played by
  6. support cast
  7. publisher
  8. year published
  9. media : hard bound, paperback, ebook, download, dvd
this is a brief collection keywords by which a prospective amazon.com customer could look for products. This can also be reverse worked from the criteria in your basic / advanced search screens.
Once this has been built, the next step is to integrate the search engine to index and execute the searches.
Integrating the search engine
Search engine integration has two parts to it.
  1. Indexing
  2. Searching
Indexing
Indexing the product database can be done in multiple ways. The simplest way to do this is when a product data is modified in the system. This way any new product data added to the system or when an existing product information is modified in the system, the search engine is updated with its content.
This ensures that the search engine indexes are up to date. Let us look a bit closer into how indexing is done.
Most search engines refer to index entries as a compilation of Book(s). Each book is a catalog entry comprising of two parts. There are many other meta parts, but we will restrict our discussion to these two parts to keep things simple.
  1. Key words
  2. Identifier
Key words as we saw earlier are a grouped collection of key identifiers typically used by users to search. An identifier , is a unique identifier such as a primary key that the search engine considers as result of the search.
So, as presented above if we ensure that the search engine indexes content every time a transaction is created or modified, the search engine will use its complex algorithms to hash and store the indexes for fast retrieval.

Searching
The next part of integration is the search process itself. When a user performs a search, the search program should first use the search engine API and ask the search engine to return the KeyIDs that match the criteria.
Search engine APIs are fairly simple and enable searches to be done by simply providing a list of key words or additionally their classifications.
Search engines also provide extended api to search with emphasis on certain classifications and to return search results with desired matching criteria to further refine results.
The key identifiers returned by the search engines are then used in transactional searches to retrieve the data from the denormalized data structure in the database.
Since the database results are retrieved by directly specifying the key identifier, the retrieval will be the fastest.

  1. Query the search engine with search criteria
  2. Search engine returns books matching the criteria with specified order of relevance
  3. application forms db query with respective KEy identifiers in the books
  4. Database throws up relevant rows based on the primary keys specified.
This approach has shown to provide the fastest results as it relies on custom built search component to retrieve the results. However this approach comes with its own cost maintaining the search indexes in a cluster friendly location.

Let us quickly take stock of this approach vis-a-vis the checklist we put together in the first part.

ConsiderationComplianceRemarks
Search must be fastComplies
Much Improved performance over earlier approach
Search must be accurateComplies

Search must use minimal system resources Complies
This is much better to the earlier DB only approach.
Search must avoid redundant queriesComplies
still executes redundant queries when paginating
Search must provide current dataComplies
The cached portion is only the identifiers and the data can be retrieved directly from the database
Pagination must be fast Complies
Optimized Cache Access
Must facilitate on demand sortingComplies
Requires requerying the db unless sorting API is available in the programming language. eg.LINQ
Must facilitate on demand result filteringComplies

Must be multi-lingual friendlyComplies

Solution must be cluster friendly
Complies*
Subject to support from Caching solution

Closing comments
From a solution perspective we have looked at many options that are available with us for building efficient searches. However, we have only scratched the surface and many of the solutions we have adopted have deeper tuning options that allow us to further exploit their features to better deliver searches.
I will close this subject with this part hoping that this series has provided an impetus for you to embark on your own discovery process to further analyze and evolve a solution that best fits your needs.
 

My Blog List

Site Info

Followers