Wednesday, April 1, 2015

Moore's Law, Cloud Computing and DW/BI

In this blog, I will explore the latest DW/BI trends in the context of Moore's law and cloud computing. So without further ado, let us first understand what these terms mean.

Moore's Law:


Moore’s law states that the number of transistors per square inch on an integrated circuit doubles every two years. In other words, the processor speeds, or overall processing power for computers will double every two years. This can be seen in the picture on the left. From the year 1971, the processing power and the number of transistors have been on the increase and this trend is expected to continue.


Cloud Computing:


Cloud computing is the delivery of on-demand computing resources over the Internet on a pay-for-use basis. Cloud computing is defined as a type of computing that relies on sharing computing resources rather than having local servers or personal devices to handle applications.







We can see from the above definitions that, Moore's law and Cloud computing typically deal with handling the processing power and storage needs of companies. If the company was to adopt Moore's law, they own their data storage but run the risk of having a lot of unused storage and processing capability. On the other hand, if they avail the cloud computing capabilities, they rent the storage and pay only for what they use. Of course, cloud is more expensive. Hence, both of them have their pros and cons and the companies need to make the decision and adopt one of them in order to handle their increasing data storage needs. 

Now, let us see how the classical data warehouse has evolved due to the advent of cloud computing.


The Elastic Data warehouse



The Snowflake Elastic Data Warehouse is a new data warehouse built for the cloud. Its architecture separates data storage from compute, making it  able to take advantage of the elasticity, scalability and flexibility of the cloud. It is a relational database with full support for standard SQL. Hence, Snowflake empowers analyst with self-service access to data, which enables organizations to take advantage of the tools and skills that they already have.

This elastic Data Warehouse provides:
  • Data warehousing as a service. With snowflake's data warehouse, analysts can focus on getting value from data rather than on managing hardware and software.
  • Multidimensional elasticity.  Elastic scaling makes it possible to simultaneously load and query data because every user and workload can have exactly the resources needed, without contention.
  • Single service for all business data.  Analysts can query structured and semi-structured data in a single system without compromise.


Amazon Redshift



Amazon Redshift is a petabyte-scale data warehouse solution that makes it simple to analyze a company's data using existing business intelligence tools. Amazon Redshift is fast, cheap, secure, fully managed and scalable.

It delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Amazon Redshift uses standard PostgreSQL JDBC and ODBC drivers, allowing companies to use a wide range of familiar SQL clients. With a few clicks of the AWS Management Console or a simple API call, it is easy to change the number and type of nodes in the cloud data warehouse as the company's performance or capacity needs change.

Personally, I believe it is better to pay for what we use instead of owning a lot of unused data storage and processing power that is rarely used. So, I think cloud computing is the way forward for building and scaling data warehouses.


References:

http://www.ibm.com/cloud-computing/us/en/what-is-cloud-computing.html
http://www.webopedia.com/TERM/C/cloud_computing.html

Thursday, March 5, 2015

Presentation and Visualization Methods


Visualization of data differs from one business process to another. While one process related information is best explained in a tabular format, it might be too cumbersome to represent a different process related data in the same tabular fashion. It may be visually appealing and simple to represent it in a graphical format instead. In this blog, I have discussed three business vignettes, the important metrics and the best visualization techniques to represent them so that it is easy for the business users to understand and make decisions.

Insurance

Wikipedia defines insurance as an equitable transfer of the risk of a loss, from one entity to another in exchange for payment. The payment here is the monthly, quarterly or yearly premium we pay for obtaining the insurance policy. When we face a loss, we can then make a claim to the insurance company to mitigate the loss. Some key metrics associated with the insurance process are claim ratio, average cost per claim, customer satisfaction etc.

One useful metric that insurance companies use is the Claims Ratio that measures the number of claims in a period and divides that by the premium earned for the same period. It's important to note that insurance is the business of managing risks and, to do that well, the insurer needs a thorough understanding of this metric. If this metric value is higher than expected, then further investigation is required to find out the reason (eg: fraud).

We can present this information using a line SPLINE chart that shows the value for every quarter. Alternatively, we can present this information in a pivot table. However, I feel the combo chart that I have presented below; a combination of SPLINE and Bar chart is the best way to present the comparison of claims/premiums and also the claims ratio. This information is visually appealing and easily comprehendible for a business user.





Healthcare

Some important metrics that Healthcare uses to improve their operational efficiency and patient care are:  Time to health care services, Average length of patient days, ER Waiting times and current ER occupancy.

The Average Length of Stay KPI measures how long, on average, patients stay in a hospital after being diagnosed with a condition. This metric can vary based on what type of conditions a patient is diagnosed with. This information can help you analyze why the average stay length value is high or low. Is it influenced by hospital-acquired infections, or by excellent healthcare service? This metric helps answer these questions and to make changes or improvements.


We can represent this measure in a pie chart, a pivot table or a bar chart arranged by medical condition and days. However, this information is best presented in a horizontal bar chart as shown below due to its simple and clear representation.



Customer relationship management

Customer relationship management (CRM) is a system for managing a company’s interactions with current and future customers and involves the various stages of negotiations with the customer. Some metrics that are crucial in a CRM are: sales pipeline, customer satisfaction, unconverted leads, sales order etc.

Sales Pipeline measures future inbound revenue and the quality of the pipeline. For example, it helps us to answer questions like what types of deals are your sales people working on, and are those the right deals? More importantly, how do they measure up to the company's revenue target? Is sales staff focusing on the more profitable deals?


We can analyze the sales pipeline by using one pie chart per month or by using a line chart. However, I feel it is better to store this information in a tabular format as shown below to keep it simple for the user.




In conclusion, there is no one single data visualization technique that can be employed across industries. Information needs to displayed to the business users in the form that is simple, clear and easily understandable for them.

References:
http://www.claricent.com/categories/press-releases/

http://www.crmsearch.com/crm-kpi.php