Thursday, November 18, 2010

First look at Google BigQuery

In this blog I would primarily be talking about the Google BigQuery API. This has been a major step in the computing world to provide database as a service on cloud computing environment. BigQuery API has been designed to analyse huge amounts of data in really less timeframe. Interface of the service is quite straightforward, and involves HTTP calls to get data from the tables.
Since Google BigTable is a NoSQL oriented database, the format of BigQuery doesn't support JOINS but is able to support lots of other features like grouping, ordering, ranges, rejection and so on.


Overview:

BigQuery API only provides a basic querying mechanism to the end users. There is no functionality that allows you to manipulate the data once uploaded into the BigTable structure. You can only append more data into the structure or fire SELECT queries against the data. This makes BigQuery engine more of an analytics tool over trillions of rows of data.

BigTable is a part of Google's storage servers and can be run without the need of any additional hardware. Moreover, the API has been exposed via simple HTTP and JSON, which are easy to put in any application. One can leverage these benefits to cut down on the hardware costs of setting up high performance systems for analysis.

The whole process of carrying out analysis on some dataset involves two basic steps:

1. Importing data into the BigTable structure (Through Google Storage and import job)
2. Running the queries against that data

Importing the data:

In order to analyse your data, you should do following steps to port your data into the Google BigTable system:

1. Ensure that you have Google Storage and BigQuery enabled account with Google.
2. Upload the data in CSV format to Google storage using their storage manager HTTP interface. Header information is not required. Only the columns of CSV must match the structure of the table as specified in subsequent steps.
3. Use cURL tool to get the authorization key which can be used in subsequent calls to the HTTP service exposed. Command to be executed is :

curl -X POST -d accountType=GOOGLE -d Email=@gmail.com -d Passwd= -d service=ndev -d source=google-bigquery-manualimport-1 -H "Content-Type: application/x-www-form-urlencoded" https://www.google.com/accounts/ClientLogin

This call will return an authorization key whose format is similar to :

DQAAAK8AAADlegSF4xMNrCgUrxFv3NFCYbBp4KCpdhOCOioIfJ7txXrRB9FYcvDAv7ooGvmaWoDCj_
SPHolxr9Q6gK8-cd4SR_QkIv3QcM_pQlVADtMWtv63hqzdz16KMZ1tDizeyOIe-vJNenAYqrjX4xVSTH2F
_5Fy9fc1mXxL77BtKXFMlV0xqbX9swpqXXhk_Zqvw8K0fbBE2o6kkU3FW7NDN8IXLrhqC_VEEaKXGAD3-1JIyg

Just keep a note of the key as it will be used further. Down the line, I will refer to it as AUTH_KEY
4. Create the table using following command.

curl -H "Authorization: GoogleLogin auth=AUTH_KEY" -X POST -H "Content-type: application/json" -d '{"data": {"fields": [ {"type": "string", "id": "name", "mode": "REQUIRED"}, {"type": "integer", "id": "age", "mode": "REQUIRED"}, {"type":"float", "id": "weight", "mode": "REQUIRED"}, {"type": "boolean", "id": "is_magic", "mode": "REQUIRED"} ], "name": "/testtable" }}' 'https://www.googleapis.com/bigquery/v1/tables'

bucket_name is the name of bucket where you want the reference of your table to be created. Buckets are nothing but folders inside Google storage.
5. After you have successfully added the table, import process needs to be triggered to port the uploaded CSV data into the BigTable structure. Following command will do the same :

curl -H "Authorization: GoogleLogin auth=AUTH_KEY" -X POST -H "Content-type: application/json" -d '{"data": {"sources": [{"uri": "/.csv"}]}}' 'www.googleapis.com/bigquery/v1/tables/%2Ftesttable/imports'

* Please note that testtable is the table name that you have created in step 4.
* Response to this kind of a request is a JSON string which is of a format : {"data":"kind":"bigquery#import_id","table":"/testtable","import":""}}
* import_id is useful in getting the current state of the import process. It also returns errors in case it has encountered any. Any errors will just fail the import process and rollback everything.
6. To know the status of your import process, just fire the following command:

curl -X GET https://www.googleapis.com/bigquery/v1/tables/%2Ftesttable/imports/ -H "Authorization: GoogleLogin auth=AUTH_KEY"

Querying the data:

After the data is successfully ported, you can query the database using BigQuery. Just download the bqshell tool from here and build the code with all the required dependencies. The tool works on python and has a detailed "how-to" to install it.

This tool has a login console in which you can specify the username and password and query the sample datasets or your own datasets that you create using cURL. This tool is a parser to the JSON responses returned from the BigQuery API and displays them in a SQL output fashion. Alternatively, you can also use cURL calls to query the database and see the JSON response yourself.

Conclusion :

* BigQuery API shows good performance in scenarios where you have huge amounts of data that needs to be processed. Running any query on almost 28 million rows uploaded in a test data set gives back response in just 2-3 seconds. The time includes request post and response recieved times as well.

* There are sample data sets provided by Google themselves, one of which contains 60 billion records and queries get executed in 4-8 seconds. For more reference, please visit the following link : http://code.google.com/apis/bigquery/docs/dataset-mlab.html

* One of the drawbacks is that there is no way to insert data without the upload and import job process. One can also not delete any record from the table once inserted. In order to change the data, you will have to delete the table and do the complete import process again. One thing worth mentioning is that in case you only need to add data to an existing table, you can do so by firing another upload job for the CSV file containing additional data.

For more info.... search on Google below

Tuesday, September 28, 2010

jQuery in Action!

Learn about 2 different jQuery plugin patterns (pattern A and pattern B) -- by the end of this tutorial, you should be able to grasp the basic behind writing custom jQuery plugins. jQuery books as well as a few online tutorials were used as references in writing this tutorial.

Tutorials for writing jQuery plugins are plentiful on the Internet as are jQuery books. When I was learning about writing jQuery plugins for the first time, one of the difficulties that I had while looking at jQuery plugin tutorials written by other people was that they merely explained what code needed to be implemented in order to make a jQuery plugin function. This is often enough, but I like to be thorough. For example, some of the basic questions one may have when beginning learning about how to write a jQuery plugin are
# What is the difference between using $.myfunction and $.fn.myfunction?
# What does the dollar sign mean?
# What does the jQuery function jQuery.extend do and how to use it?
# How to initialize my jQuery plugin and pass it function parameters?
# How to provide default values for and how to override initialization parameters? If you are curious about in depth implementation of jQuery interface for adding your own code into existing framework, this tutorial is for you!

In this jQuery plugin tutorial I will walk you through the process of writing your own jQuery plugins. This process is the same every time you decide to create a new plugin. Once you understand how your JavaScript code can be integrated into the main jQuery object you will no longer need documentation and will be able to focus on programming plugins.
Objects in JavaScript

As you may already know, JavaScript is an object-oriented language. It would be improper to describe jQuery to someone who had not yet studied objects in JavaScript. The basic philosophy of object design in JavaScript is that when you declare (or define) a function, that function is by default a class and can be instantiated as an object. We can use the new operator to instantiate copies of the same class. It just so happened that instead of the traditional keyword class, JavaScript prefers the keyword function. It is interesting to note that a class in JavaScript can be used both as a class from which other classes can be inherited (by implementing what is known as prototype inheritance) or the same exacty construct can be used as an actual function that we can call later.
The main jQuery object

The main jQuery object is simply defined with the keyword function as an object whose identifier name is $. For a deeper insight into what this really means, I have written another tutorial What does the dollar sign mean? Be sure to read it if you're still confused about the dollar sign identifier notation.

The jQuery object is equivalent to a global function, such as window.$. You see, in JavaScript, when you extend the window object (that must exist in all browsers by design), or in other words, attach new member functions with the dot (.) to the window object, it means that you will be able to call that object either by window.myobj(); or simply myobj(); You are allowed to do so because every function you attach to the window object can also be called from the global scope in your script. Internally, the jQuery object is created like so:


var jQuery = window.jQuery = window.$ = function(selector, context)
{
// ...
// other internal initialization code goes here
};

This precise declaration allows us to refer to the main jQuery object by the identifier name jQuery, or the dollar sign ($). You need to know that jQuery, window.jQuery, window.$ or simply $ can be used interchangeably, because as the declaration stated in the code block above tells us, it refers to the same object in memory. Please note the selector and context parameters of the jQuery function object. The selector is usually a string that selects a number of elements from the DOM. It can also be the this object (a self-reference). The jQuery selector parameter accepts the same values you would expect to use in a CSS style definition. For example, consider the following jQuery object call:


// Select all elements of class: "someClass" and
// apply a red border to all of them
jQuery(".someClass").css("border", "1px solid red");

// Select an element with id: someId and insert dynamic html into it
jQuery("#someId").html("So Bold!");

// Exactly the same as above, but using $
$("#someId").html("So Bold!");

This is an example of how powerful a short jQuery statement can be. You dont need to worry about writing your own document.getElementById functions that tend to clutter the code. With just one line of code, jQuery selects all elements of the requested type by scanning the entire DOM and applies the desired effect. Many people use jQuery just for the intuitive element-selector functionality, but if you like this so far, you are going to love the rest of the features at your command when you work with the jQuery framework. It's good to note that jQuery cares about cross-browser compatibility.
jQuery Plugin Entry Point

I would like to begin with a jQuery plugin example code as seen used by most people, followed by an explanation of what it means. But because I am targeting the absolute jQuery beginners who may not have enough experience with JavaScript, I'd like to clarify a few things first. When learning a new language or a framework such as jQuery, you need to understand where the entry point of your plugin program is. Traditionally, for years prior to jQuery, some JavaScript programmers liked to execute their crucial code in the window.onload function as illustrated below:


// Override the onload event
window.onload = function()
{
// the page finished loading, do something here...
}

This code actually overrides the onload event of the HTML tag. All this means to us is that our code will be executed soon as the page is finished loading. It makes sense because sometimes pages take time to load, or the downloading process is segmented by the browser architecture. We would not want to compile and execute any JavaScript code on a page that is currently being loaded. The jQuery internal architecture also utilizes the window.onload event, but before it does so, it checks whether the entire DOM (document object model) has been loaded because it is very important. It is not enough for jQuery to know that the page has been loaded, we must ensure that the DOM has been fully constructed. This is achieved by listening to the DOMContentLoaded in most browsers, but we don't need to worry about this at all because jQuery takes care of this for us internally. To provide us with this functionality, jQuery gives us a new method called ready that we can call on the main jQuery object itself. When writing a jQuery plugin, we use the ready function to check whether we are 'ready' to execute our plugin code. Please note that this is not yet the plugin code, this will be the entry point of our plugin. You can think of this as a jQuery's version of window.onload function:



// Define the entry point
$(document).ready(function()
{
// The DOM (document object model) is constructed
// We will initialize and run our plugin here
}


Let's think for a moment that we are writing a plugin called Shuffle. Assuming the plugin provides two separate functions for initialization as well as execution of the base code, the code may have looked something like the following:



// One way to initialize plugin code
$(document).ready(function()
{
jQuery.Shuffle.initialize( "monalisa.jpg", 5, 8, 67, 1500);
jQuery.Shuffle.run();
}


More than often, the code above can be improvised. Is it required to use this specific format for initializing and executing our plugin? No. For me personally, because I come from C and C++ background writing computer games, I like to separate initialization and execution function calls like I show above.
Internal definition of a jQuery plugin

The coding style is entirely up to you but also depends on what you are trying to accomplish. It is easy to think that a plugin must be written, initialized and executed in a certain way all the time but this is simply not true. The reason you see many different styles and syntactical differences in jQuery plugin code is that the programmers are trying to accomplish different things. Additional knowledge of JavaScript Inheritance and function closures may help here quite a bit.

First let's talk a little about the following syntax seen in many a plugin written by jQuery programmers. It is very easy to get confused about it especially for programmers with only intermediate knowledge of JavaScript. So, this may seem quite ambiguous at first.



(function($){ ... })(jQuery);


What is going on here? First of all, in a real-world scenario, the three dots would be replaced with actual code that we would like to be executed. Here we see a definition of an anonymous function (also known as function closure), that takes a parameter called dollar sign ($). The function is wrapped in parenthesis, but why? You see, the function in this example is an anonymous function that has no reference to itself, but by wrapping it in parenthesis, the JavaScript syntax allows us to reference to an anonymous function we just created. We could have added a dot (.) after the function definition wrapped in parenthesis and called a member function that this function object supports. We could then add the dot at the end of that statements as well, and call yet another member function that can be executed on the type of the object returned by the previous function. This chaining feature of JavaScript language is common throughout many other script languages, for example Perl. It is a common script-language feature because it makes your code miniature.

So, by wrapping an anonymous function with parenthesis we can reference to that function's memory location without actually refering to the name of that function - and well, we can't do that because the function has no name! Furthermore, not only can we reference to a function that has no name, we can also call that function in the same statement that created it. And that's exactly what's going on here. The nameless function is defined, wrapped in parenthesis, and then that function is immediately called. This is only one example of the function closure usage that you will see throughout jQuery and other advanced JavaScript code.

Why would someone do such a thing? There are several reasons. This is done to minimize code length and more importantly, to hide parameters that start with the dollar sign character from the global scope of the script. The real reason for doing that is for cases where you are using multiple frameworks that may use the dollar sign function object in the global scope of the program (which is a very bad design idea in itself). But in real circumstances, when would we see that happen? That depends on circumstances and developer's choices. Since in this tutorial we are not using any additional frameworks or outside code, this obscure and sometimes confusing syntax is not necessary and the chances of creating a conflict are zero. When learning, it is sometimes a good idea to simplify. Let's take a look at a very basic plugin construction idea:





Final Thoughts

Well, this tutorial has already taken several hours to write and edit. I hope I pointed out some of the things that made jQuery plugin development more clear for you. Have fun building jQuery plugins!

Interestingly enough, I also found some books about jQuery on Amazon:

Read more

Search

Google
VAMSI