Big data is at the core of every industry’s growth. From politics to social media and everything in between. It assists us with predicting trends, visualising data and learning more about the world in which we live in.
In this article we will be uncovering the following:
- What is Big Data?
- How is Big Data made?
- Why is Big Data important and who uses Big Data?
- How can Big Data be used in business?
- Five V’s of Big Data.
Clive Humby has been working with Big Data ever since the inception of his data science company dunnhumby in 1989. He believes that data is so powerful and valuable in the modern age that he calls it “the new oil”.
Data sure is pretty valuable in terms of business development. But what makes Big Data so big?
Big Data is the key to understanding trends within your business and mapping business objectives to real world activities.
What is Big Data?
Dictionary.com defines Big Data as:
“Data sets, typically consisting of billions or trillions of records, that are so vast and complex that they require new and powerful computational resources to process: Supercomputers can analyze big data to create models of global climate change.”
Before, it was commonly accepted that the smaller the data set the more accurate our result set will be e.g. I am allowed to take only 30 samples of data to conjure up a result, of course I am going to choose the 30 most accurate samples I can get my hands on.
Not with Big Data, it is now generally accepted that data volume will trump careful sample selection and when we push this data into big data algorithms, we can get more valuable results.
How is Big Data generated?
Let’s take a simple example of going to the movies. You will search for a movie on your phone – generating data through your search history. When you are looking at reviews about that movie you will usually ask some questions which will be sent as a post to the database.
Whilst at the movies you may post on Instagram or twitter to update your followers about the movie. Social media sites generate a vast amount of data which is unstructured in size and context, it is a prime example of Big Data.
Every tweet and every hashtag you post is stored and replicated in a database.
The database admin will need to ensure that their data processing attempts to verify the veracity of the data.
Who uses Big Data?
Just about everyone!
Here are some examples:
Social Media - e.g. Instagram
Clickstream analysis -what a shopper puts in a basket
Internet Search History - Google Trends
Retail – Recording sales and transactions
Retail – Mining sales and transactions
Banking – Money and trends
Education - Grades and student count
Medical – Patient Records
Consumers can also be producers and vice versa, as shown by the Retail example above.
Why is Big Data important?
Social media sites can store your posts so that you can look back on them at the touch of a button, can you imagine how much data is being stored just for this action alone?
Take Facebook for example they have 2.38 billion users, they currently ingest 105 terabytes of data each half hour.
“Without Big Data, you are blind and deaf and in the middle of a freeway.” – Geoffrey Moore.
- Geoffrey Moore is correct in the terms of the speed that data is growing at, if you do not move with it you will surely be left behind or knocked out in this case.
Almost everything we do online generates Big Data and it is essential to business in more ways than one:
- Businesses can examine the Big Data and make business decisions off the back of the data.
- Shopping sites can auto generate coupons based on how the user is interacting with the site.
- Advertising companies can mine for information about people so as best to target them with ads of interest within seconds.
From customer satisfaction, personal happiness to business profitability Big Data has a strong reign on today’s business world.
Types of Big Data available
There are many types of data available, here are a few examples:
- IOT e.g. smart cars
- Publicly available data sets e.g. birth rates
- Social media e.g. Instagram
What qualifies as Big Data?
Big Data is any data set that becomes too big for our normal relational databases to manage.
Big Data can be used by small or large organizations depending on the use case. For example, a company who deals with travel bookings might get 1000s of bookings a minute leading to too much data to be stored relationally, a larger company might also have thousands of posts a minute on their social media campaign.
There are many different types of data within Big Data.
Most of this is unstructured data.
Data is often in XML or JSON format but may also be simple text, such as email.
- Geospatial (location)
- Audio (voice)
- Strings (letters)
- Blobs (images)
If you are new to the data world and would like to know more about relational databases check out the post below
Storage of Big Data
NoSQL databases have been introduced to deal with the volume of Big Data and the unstructured way in which we have to store it, these databases are built with scalability in mind.
Google are able to process 20 petabytes a day stored in their Google Bigtable database.
Because of Major trends in the continuous growth of data volumes we continuously need to process larger amounts of data in a shorter time to cater for Big Data.
Consistency of Big Data
When it comes to consistency of Big Data it does not matter so much, whereas in a traditional relational database it would still be extremely important.
Take for example your bank balance, it would be very important that it is consistent and you always get the correct value. However it will not be the end of the world if you cannot find the post on Facebook where you tagged your friend in a cute cat video. This is known as the lost update problem. If you are curious to know more about the lost update problem check out the link below.
The five V’s of Big Data
Is said to be the internet scale of the data. This is the size of the data, how much are we dealing with?
What different types of data are we dealing with? There are many types of data, the usual ones we deal with being strings and numbers but also there are now audio and video, even geospatial. 80% of data stored is unstructured 5 – 10% of data stored is structured
This is the trustworthiness of the data, it is hard to gauge just how accurate our Big Data is, considering all of the different factors that will influence our data, e.g. typos etc. Volume of the data can compensate for the lack of accuracy in some cases.
It is important to know how much value our Big Data holds as it is sometimes hard to measure when we have such a large volume.
Velocity is said to be the real time speed in which data travels, an example of this would be click streams or live weather reports.
Big Data Tools
Google were earlier innovators for Big Data and introduced a framework called MapReduce which has since progressed to today’s offering of Hadoop.
Hadoop is open source and uses a network of many computers to work with large datasets and do large scale computations. Hadoop uses the MapReduce model to store and distribute data. Learn some more about Hadoops history here:
There are many tools to deal with Big Data and help you visualize your data, one nice example is Google Big query and Google Data Studio. In Google BigQuery your data is stored in an unnormalized data warehouse that allows fast and efficient querying of your data. Amazon have similar offerings with Athena and Amazon QuickSight. Data studio is essentially Googles version of Tableau. If you would like to know more about structured and unstructured data check out this link below
I wish I could tell you that is a wrap and that is all there is to Big Data, however the world of Big Data is ever expanding and we must be willing to move with the data and expand our knowledge of new ways to manage, store and visualize the data to better serve the use case at hand.
How can you see Big Data impacting your life?
Some nice reading to follow on: Big data insights