What are the examples of big data

Big data


1. Definition

Big Data (from English big "big and data "Data") means amounts of data that

  • too big or
  • are too complex or
  • change too quickly or
  • too weakly structured

are to be evaluated with manual and classic methods of data processing. The more traditional term in German is Bulk data. Big data is often the collective term for digital technologies that are held responsible in technical terms for the new era of digital communication and processing and in social terms for social upheaval. Big data basically stands for large amounts of digital data, but also for analysis and evaluation.

2. The 5 V’s in Big Data

2.1 In the explanation of big data, “big” refers to the three dimensions:

  • Volume: Scope, data volume
  • Velocity: The speed with which the amount of data is generated and transferred
  • Variety: Range of data types and sources

This definition is expanded to include the two Vs:

  • Value: Big data potential for socio-economic development
  • Veracity: Inconsistent and incomplete data, latency and ambiguity

The extensions for the entrepreneurial added value and ensuring data quality stand.

As a catchphrase, the term “big data” is subject to continuous change; so will with Big data In addition, the complex of technologies that are used to collect and evaluate this amount of data is also often described.

2.2 Origin of data

The collected data can come from various sources:

  • starting with all electronic communication,
  • about data collected by authorities and companies,
  • up to the records of various surveillance systems.

3. Areas and users of big data

Big data can also include areas that were previously considered private. The desire of industry and certain authorities to have free access to this data, to be able to analyze it better and to use the knowledge gained, inevitably comes into conflict with protected personal rights of the individual. One way out is through

  • Anonymize before exploiting, if not already through
  • Anonymize before evaluation

to reach. Classic users of Big Data methods are the providers of social networks and search engines. The analysis, acquisition and processing of large amounts of data is commonplace in many areas today. Amounts of data are generally used to implement corporate goals or for national security. So far, large sectors, companies and areas of application in the economy, market research, sales and service management, medicine, administration and intelligence services have used digital methods of data collection for themselves. The recorded data should be further developed and used in a beneficial way. The collection of data is mostly used for group-oriented business models, as well as trend research in social media and advertising analyzes in order to identify future-oriented and profitable developments and to invest in these forecasts.

4. Background

Amounts of bulk data typically grow exponentially. According to calculations from 2011, the data volume generated worldwide doubles every 2 years. This development is mainly driven by the increasing machine generation of data, e.g. B. via protocols of telecommunication connections (Call Detail Record, CDR) and web access (log files), automatic detection of RFID readers, cameras, microphones and other sensors. Big data also occurs in the financial industry (financial transactions, stock market data) as well as in the energy sector (consumption data) and in healthcare (prescriptions). Large amounts of data are also generated in science, e. B. in geology, genetics, climate research and nuclear physics. The IT industry association Bitkom described big data as a trend in 2012. In the case of large data complexes, the uneconomical effort of storing in reserve is not possible. Then only metadata is saved or the evaluation starts concurrently or at least slightly delayed with the creation of the data.

5. More meanings of big data

Big data primarily describes the processing of large, complex and rapidly changing amounts of data. As a buzzword, the term is also used in the mass media for

  • the increasing monitoring of people through secret services also in western states, for example through data retention
  • the violation of Personal rights from customers to companies
  • the increasing lack of transparency in data storage due to delocalization (Cloud computing)
  • the wish of the industry from the available data Competitive advantage to be able to attain
  • the automation of production processes (Industry 4.0, Internet of Things)
  • the non-transparent automation of decision-making processes in software
  • the stake new technologies instead of standard software (especially in companies with conservative IT, often through the use of software as a service to circumvent internal IT restrictions)
  • the development of own software solutions ("in-house IT") instead of the use of "off-the-shelf" software by external companies
  • Advertising based on data on internet and mobile phone usage
  • the organization of cooperation in the context of people analytics projects

even if this involves neither large nor complex amounts of data.

6. Big data examples

In research, by linking large amounts of data and statistical evaluations, new knowledge can be gained, especially in disciplines in which a lot of data was previously evaluated by hand. Companies hope that the analysis of Big Data will provide opportunities to gain competitive advantages, to generate savings potential and to create new areas of business. Government agencies hope for better results in criminology and the fight against terrorism. Examples of expected benefits include:

  • Real-time evaluation of web statistics and adaptation of online advertising measures (but has been done for years)
  • Better, faster market research
  • Detection of irregularities in financial transactions (Fraud detection)
  • Introduction and optimization of an intelligent energy consumption control (Smart metering)
  • Recognizing connections in medical diagnostics
  • Real-time cross-selling and upselling in e-commerce and stationary sales
  • Development of flexible billing systems in telecommunications
  • Secret service creation of movement profiles with programs such as Boundless Informant
  • Data access and analysis of spatiotemporal raster data in science and industry, for example according to the Open Geospatial Consortium standard Web Coverage Service
  • Predicting epidemics
  • Improvements in working conditions for employees, such as reducing burnout rates, through data-based change projects
  • Finding specialists through data-supported web analyzes
  • Processing of data from weather satellites and other scientifically used sensors

However, the pure analysis of customer data is not automatically big data - many marketing applications are often much more about "small data" analytics.

7. Big data processing

Classic relational database systems as well as statistical and visualization programs are often not able to process such large amounts of data. For big data, new types of data storage and analysis systems are used, which work in parallel on up to hundreds or thousands of processors or servers. There are, among others, the following challenges:

  • Processing of many records
  • Processing of many columns within a data record
  • Fast import of large amounts of data
  • Immediate query of imported data (realtime processing)
  • Short response times (latency and processing time) even for complex queries
  • Ability to process many simultaneous queries (Concurrent queries)
  • Analysis of various types of information (numbers, texts, images, ...)

The development of software for processing big data is still at an early stage. The MapReduce approach is well known and is used in open source software (Apache Hadoop and MongoDB) as well as in some commercial products (Aster Data, Greenplum, etc.).

8. Criticism of Big Data

8.1 Spongy term and hype

The term “big data” is used for any type of data processing, even if the data is neither large, complex or changing rapidly; and can be easily processed using conventional techniques. The increasing softening of the term means that it is increasingly becoming a meaningless marketing term and, according to many forecasts, will experience a strong devaluation within the next few years (“valley of disappointments” in the hype cycle).

8.2 Missing standards

There is criticism of "Big Data" mainly to the effect that the data collection and evaluation is often carried out according to technical aspects, i.e. that, for example, the technically easiest way is chosen to collect the data and to evaluate the possibilities of processing this data, is limited. Basic statistical principles such as that of a representative sample are often neglected. So criticized the social researcher Danah Boyd:

  • Larger amounts of data do not have to be qualitatively better data
  • Not all data is equally valuable
  • “What” and “why” are two different questions
  • Care should be taken with interpretations
  • Just because it's available doesn't mean it's ethical

For example, one researcher found that people do not have more than 150 friendships (Dunbar number), which was then introduced as a technical limitation in social networks - on the false assumption that acquaintances called "friends" reflect real friendships. Certainly not everyone would name all of their Facebook friends as friends in an interview - the term “friend” on Facebook only signals a willingness to communicate.

8.3 Missing substance of the evaluations

Another critical approach deals with the question of whether big data means the end of all theory. Chris Anderson, the magazine's editor-in-chief Wired 2008 described the credibility problem of every scientific hypothesis and every model with simultaneous real-time analysis of living and non-living systems. Correlations are becoming more important than causal explanations, which can often only be verified or falsified later.

8.4 Lack of regulation

The Schleswig-Holstein data protection officer Thilo Weichert warns: "Big data opens up possibilities of informational abuse of power through manipulation, discrimination and informational economic exploitation - combined with the violation of basic human rights."


More from the world of online marketing: