The Four V’s of Big Data [INFOGRAPHIC]
Imagine all the information you alone generate each time you swipe your credit card, post to social media, drive your car, leave a voicemail, or visit a doctor. Now try to imagine your data combined with the data of all humans, corporations, and organizations in the world! From healthcare to social media, from business to the auto industry, humans are now creating more data than ever before.
To help us talk about “big data,” IBM data scientists break it down into four dimensions: volume, velocity, variety, and veracity. Here’s some information about each so you can better understand the fundamentals.
Volume: Scale of Data
Big data is big. It’s estimated that 2.5 quintillion bytes (2.3 trillion gigabytes) of data are created every day. By 2020, we are expected to create 40 zettabytes (that’s 43 trillion gigabytes) of information, an increase of 300 times the amount of data in existence in 2005. Why are we producing so much data? For starters, 6 of the world’s 7 billion people now have cell phones. As infrastructure becomes increasingly available and affordable, cell phone use such as text messaging is bound to increase exponentially.
The amount of information being collected is so huge that modern database management tools are becoming overloaded and therefore obsolete. The need to find new ways of supporting big data helps explain the need for more data scientists. By 2015, the U.S. will see 1.9 million new IT jobs; 4.4 million will be created globally.
Velocity: Analysis of Streaming Data
The sheer velocity at which we are creating data today is a huge cause of big data. The New York Stock Exchange alone captures one terabyte of trade information during each session. Each time you drive, sensors in your car monitor items like fuel level and tire pressure; modern cars use close to 100 such sensors. For this reason computer systems within cars are becoming more advanced in be able to process the large amounts of data collected by cars’ sensors every minute. By 2016, it is projected there will be 18.9 billion network connections – that’s almost 2.5 connections per person on Earth. As we continue to create more data, we will use more methods to monitor the information, too.
Variety: Different Forms of Data
As tech moves into more realms of our lives, big data is taking on a larger variety of forms. As of 2011, the global size of data in healthcare alone was estimated to be 150 exabytes (161 billion gigabytes). As hospitals continue to adopt systems for electronic medical records, this number can only increase. By 2014, there will be an estimated 420 million wearable, wireless health monitors in use, storing constant data about our bodies that was never monitored so extensively before. And then there’s common internet consumption: currently, we watch over 4 billion hours of video on YouTube and share 30 billion pieces of content on Facebook each month. Remember, only a portion of the world currently has reliable internet access. Imagine how heavy our internet use will be when more of the world gains steady internet access.
Veracity: Uncertainty of Data
Data scientists will also have their work cut out keeping big data organized. As data currently stands, it’s hard to know which information is accurate and which is out of date. This is why one in three business leaders do not trust the information they use to make decisions. What’s more, poor data quality costs the U.S. economy around $3.1 trillion each year, giving scientists a huge incentive for establishing systems that maintain the veracity of data.
If organized and used correctly, big data can help us spot business trends, prevent diseases, and combat crime, among other things. As humans continue to create more data in their daily lives, the work of data scientists will become that much more important and useful.