Understanding Big
Unstructured Data v/s Structured Relational Data
The buzzword in the business world today is Big Data
and Analytics. So, what is this big data? It is the massive amount of data that
gets generated every second, every day due to the advent and popularity of
social media networks. To understand all the hype associated with big
data analytics, we need to get what the challenges are and what kinds of data
matter to the organizations nowadays? To do this, let us first understand the
difference between the kinds of data prevalent and relevant to companies today:
structured and unstructured data.
Structured Vs Unstructured Data in Enterprises
Structured Data
Structured data is any data that is neatly organized
and available in a user understandable format (organized as rows and columns in
a relational database). It is easy for businesses to handle and extract useful information
from structured data. However, it may not provide as much insights as the other
variant, which is the unstructured data. Typical examples are data available in
a relational database, Microsoft access etc.
Unstructured Data
On the other side of the spectrum, we have the messy,
unorganized data called the unstructured data. It is very difficult to handle
this type of data with traditional databases, as it does not adhere to a fixed,
unambiguous format and takes up a lot of space. This explosion of unstructured
data happened as a result of the development of social media like Facebook, Twitter and
represents all user interactions and behavior. Oftentimes, this unstructured
data is like a treasure that is yet to be discovered. It can give some
game-changing insights to the extent that structured data cannot offer. Some examples of unstructured data are e-mails,
videos, images, and audio files etc. as all these data lack a definite
structure.
Present State and Volume of Data
Now, let us look at the volume of both structured and
un-structured data available in organizations. As you can see in the image
below, the amount of space that unstructured data occupies is about 90% while
only 10% of data was structured in the year 2014. It is also expected that in
enterprises, the volume of unstructured data will grow at a much faster rate
than structured data and this rapid explosion of unstructured data is what is
called as big data.
Enterprises are looking for ways to leverage this
massive amount of data and glean insightful information related to their customer
base in order to gain the competitive edge.
Can a data warehouse provide support for handling
this much sought-after unstructured data? The answer is no. While a data warehouse
is pretty good with structured data, it cannot handle unstructured effectively.
Tapping intelligence from an enormous and increasing volume of data with no
specific format is difficult. Organizations are resorting to Hadoop based tools to
handle and analyze unstructured or big data.
Limitations of Data warehousing in analyzing data types
While data warehouse is very good at storing
structured data and giving users access to intelligence from this data. It
cannot handle unstructured data directly due to the following limitations:
- Lack of structure: unstructured data cannot be directly placed in a data warehouse because it lacks organization. Even if we manage to store unstructured data, no intelligence can be derived out of it. For example, we cannot identify if a tweet is positive/negative in its raw unprocessed form and storing this post in a data warehouse is useless if we cannot use it for analytics.
- Volume of Unstructured data: As we saw before, about 80-90% of data in organizations is unstructured and this is constantly on the rise. Data warehouses were not built to handle such a large, constantly increasing volume of data.
- Real-time data: Some of the big data sources feed data in continuously in real time and data warehouses do not have the capability to handle such real-time feeds.
Future Data warehousing Trends
Many experts feel Hadoop platform and Data lakes are
the future and would give a tough competition and might possibly replace the
enterprise data warehouses.
Others strongly disagree with this view and argue
that a combination of Hadoop, enterprise data warehouse and relational
databases is the way forward. They feel that while, Hadoop platform is ideal
for dealing with big data, transactional and other structured data are best
handled in a data warehouse. They propose the following modern architecture to
handle the different data types involving all the new technologies like Hadoop
and in-memory processing.
Modern Architecture
I strongly feel that the combination of these
powerful technologies is the future of business intelligence and analytics in
organizations and Enterprise data warehouse is here to stay!
References:
- https://www.youtube.com/watch?v=xfx-dnUNZ_k
- http://smartdatacollective.com/michelenemschoff/206391/quick-guide-structured-and-unstructured-data
- http://www.webopedia.com/TERM/U/unstructured_data.html
- http://www.computerweekly.com/feature/How-to-manage-unstructured-data-for-business-benefit
- http://www.kdnuggets.com/2012/07/data-science-and-prediction-vasant-dhar.html
- http://www.strategyand.pwc.com/media/uploads/Strategyand_Benefitting-from-Big-Data.pdf
- http://www.networkworld.com/article/2170086/tech-primers/how-to-use-visualization-tools-to-derive-data-intelligence-from-unstructured-data.html
- http://www.bisoftwareinsight.com/future-of-data-warehousing/
- https://blogs.saphana.com/2014/03/04/big-data-does-not-require-big-complex-solutions/
No comments:
Post a Comment