Thursday, February 19, 2015

Understanding Structured vs Big Unstructured Data


Understanding Big Unstructured Data v/s Structured Relational Data


The buzzword in the business world today is Big Data and Analytics. So, what is this big data? It is the massive amount of data that gets generated every second, every day due to the advent and popularity of social media networks. To understand all the hype associated with big data analytics, we need to get what the challenges are and what kinds of data matter to the organizations nowadays? To do this, let us first understand the difference between the kinds of data prevalent and relevant to companies today: structured and unstructured data.

Structured Vs Unstructured Data in Enterprises




Structured Data

Structured data is any data that is neatly organized and available in a user understandable format (organized as rows and columns in a relational database). It is easy for businesses to handle and extract useful information from structured data. However, it may not provide as much insights as the other variant, which is the unstructured data. Typical examples are data available in a relational database, Microsoft access etc.

Unstructured Data

On the other side of the spectrum, we have the messy, unorganized data called the unstructured data. It is very difficult to handle this type of data with traditional databases, as it does not adhere to a fixed, unambiguous format and takes up a lot of space. This explosion of unstructured data happened as a result of the development of social media like Facebook, Twitter and represents all user interactions and behavior. Oftentimes, this unstructured data is like a treasure that is yet to be discovered. It can give some game-changing insights to the extent that structured data cannot offer.  Some examples of unstructured data are e-mails, videos, images, and audio files etc. as all these data lack a definite structure.

Present State and Volume of Data

Now, let us look at the volume of both structured and un-structured data available in organizations. As you can see in the image below, the amount of space that unstructured data occupies is about 90% while only 10% of data was structured in the year 2014. It is also expected that in enterprises, the volume of unstructured data will grow at a much faster rate than structured data and this rapid explosion of unstructured data is what is called as big data.





Enterprises are looking for ways to leverage this massive amount of data and glean insightful information related to their customer base in order to gain the competitive edge.

Can a data warehouse provide support for handling this much sought-after unstructured data? The answer is no. While a data warehouse is pretty good with structured data, it cannot handle unstructured effectively. Tapping intelligence from an enormous and increasing volume of data with no specific format is difficult. Organizations are resorting to Hadoop based tools to handle and analyze unstructured or big data.

Limitations of Data warehousing in analyzing data types

While data warehouse is very good at storing structured data and giving users access to intelligence from this data. It cannot handle unstructured data directly due to the following limitations:
  • Lack of structure: unstructured data cannot be directly placed in a data warehouse because it lacks organization. Even if we manage to store unstructured data, no intelligence can be derived out of it. For example, we cannot identify if a tweet is positive/negative in its raw unprocessed form and storing this post in a data warehouse is useless if we cannot use it for analytics.
  • Volume of Unstructured data: As we saw before, about 80-90% of data in organizations is unstructured and this is constantly on the rise. Data warehouses were not built to handle such a large, constantly increasing volume of data.
  • Real-time data: Some of the big data sources feed data in continuously in real time and data warehouses do not have the capability to handle such real-time feeds.

Future Data warehousing Trends

Many experts feel Hadoop platform and Data lakes are the future and would give a tough competition and might possibly replace the enterprise data warehouses.

Others strongly disagree with this view and argue that a combination of Hadoop, enterprise data warehouse and relational databases is the way forward. They feel that while, Hadoop platform is ideal for dealing with big data, transactional and other structured data are best handled in a data warehouse. They propose the following modern architecture to handle the different data types involving all the new technologies like Hadoop and in-memory processing.

 Modern Architecture



I strongly feel that the combination of these powerful technologies is the future of business intelligence and analytics in organizations and Enterprise data warehouse is here to stay!

References:







No comments:

Post a Comment