A Big Data Guide for Noobs
If you’ve used the internet at all in the past decade or so, you would have come across this word quite a few times – Big data. Naturally, your interest would have peaked to find out what exactly is big data and why is everybody making such a big deal out of it. If that is the case, you’ve come to the right place, my friend.
What is Big Data?
Like its name suggests, Big Data is a huge collection of data that is drawn from a vast resource and is constantly growing on an exponential scale. This, however, is a watered down definition. In reality, big data is such a huge repository of data that the data management tools that we have been using for analytics, have been rendered obsolete and newer technologies have sprung up overnight to keep up with the scale. Given the magnitude of this technology, there is a lot of ground for us to cover. So let’s get started.
The Big Data Scale
With all the hype around the size of big data databases, one might be wondering, how big is big data really. It is estimated that every day around 2.5 quintillion bytes of data is generated. That is 2.5 followed by 18 zeros. If that is hard for you to assimilate, it is around 2.5 billion gigabytes. That is an impressively eerie amount of data, especially considering the fact that a healthy percentage of that could be generated from even from the silly banter on the Youtube comments section. Anyway, we are not here to talk about one of the millennial generation’s favorite hobbies, but to talk about the data that governs almost everything we do today.
Types of Big Data
Big Data can be classified into three major types.
1. Structured Data
This is data that has fixed format and length. Structured data is usually comprised of numbers, dates, and strings. Structured data is obtained from a myriad of sources, including machine-generated data, human-generated data, and sensor-based data. Experts estimate that around 20 percent of the data available today could be classified as structured data. This is also the data that is being processed in the most comprehensive manner and from which most value is derived.
Contrary to the concept of structured data, this type of data is comprised of non-uniform or non-field based data. It includes all text and multimedia-based data such as word documents, audio files, video files, and other documents. This is the most abundant form of data. It is estimated that unstructured data accounts for around 80 to 90 percent of data generated by organizations today. While this form of data is quite difficult to analyze, the actionable insights acquired by doing so could yield vital actionable insights that could be leveraged to cope with the competition.
3. Semi-structured Data
This type of data displays certain properties of being structured in form but it is not defined in a relational database as with structured data. XML and non-SQL based data are good examples of semi-structured data. Such data will be easier to analyze, allowing us to leverage better insights.
Other Characteristics of Big Data that you Need to be Aware of
One of the most defining aspects of big data is its diversity. From text to audio and video, there are a lot of data varieties. Handling such a wide variety of data on such a large scale is a messy tedious affair. Even storing such data will require a wider range of storage tools to match the scale, let alone the nature of the data. Maintaining any sort of consistency when it comes to big data is nothing more than a fantasy. Especially when you consider the fact that each browser, platform, and the web page has its own specific data, the versatile nature of the data really exasperates things. Processing this data, therefore, could pose as an insurmountable task. One of the most significant ones is the loss of vital parts of each piece of data while processing and analyzing. This beats the very purpose of big data. While traditional computing methods fail at the first whiff of such data, agile based technologies have proven to be quite effective when it comes to analyzing and processing big data. There are several programs to this end that should be explored. Perhaps we could cover those in another article.
Given the large volume of data that floats around the internet these days, speed is a factor that plays a key role in making anything out of it. The idea behind big data has always been to leverage it to use every bit of data available. And given the density of competition, companies have to get their big data sorted out fast. Keeping track of the frequency or velocity of data generation will allow companies to gain insights on the growth of data and how fast it is being relayed to various ends.
Volume is the most significant attribute of big data. Today there are a lot of sources for data generation and these sources are generating an unfathomable amount of data. Even labeling a collection of data as ‘big data’ depends on its volume. So, this is a crucial attribute that needs to be taken into consideration before delving deeper into the data.
Big data is undoubtedly a technology that has been revolutionizing various vertically of enterprise for nearly a decade now. As newer technologies are developed to process and analyze big data, the valuable insights that are driving millions of businesses today will become more insightful as well as productive.