Which is the best institute for Hadoop

Open source training:

Apache Hadoop

Big data processing with Apache Hadoop

The analysis of extensive company data provides insights into often hidden relationships. A problem often arises from the diversity of the data recorded, on the other hand, this diversity is a special opportunity - provided that the flood of data is managed efficiently.

Tools and methods for systematic data analysis (data mining) have existed for a long time. But when it comes to unstructured content such as texts in blogs or on websites or documents in a distributed CMS, you quickly come across their system limits. Database servers are the optimal solution in many scenarios, but clear limits can also be seen there as soon as scalability and reliable processing of unstructured data are required. In particular, the scalability based on inexpensive standard hardware and the flexible integration options in many existing IT systems are the strengths of the Apache Hadoop cluster system.

Our goal is to make it easier for you to get started with big data processing. You can also install the tools yourself or download a preconfigured distribution, e.g. from Cloudera Inc., from the Internet.

But what comes after that? That is exactly what we will introduce to you in our practical seminar. We will go into specific application examples and show you which methods can be used to process them efficiently. Using practical examples, we work out which tools are useful in the Hadoop environment, for which types of tasks they can be used and how you can efficiently transfer existing data into the system. Then you will be able to decide which of your tasks can be solved with the MapReduce approach and you will start a new interesting topic yourself: extract new information from your existing data!


2 days, € 945.00 + 19% VAT = € 1,124.55

A full 8 hours per day, complete basic equipment of original literature, free internet access everywhere, rental notebook, full board, drinks (special types of wine are billed separately), pastries, home-baked cakes, sauna, supporting program.

Additional or reduced services on request:

Surcharge for overnight stay in a twin room (large, comfortable room)€ 59.00 + 7% VAT = € 63.13per night
Surcharge for overnight stay in the Linuxhotel flat share€ 83.00 + 7% VAT = € 88.81per night
Surcharge for single room (subject to availability, please book in good time)€ 129.00 + 7% VAT = € 138.03per night
Discount if you do not take full board-29.41 € + 19% VAT = -35.00 €per day
Price reduction if you do not take part in the supporting program-8.40 € + 19% VAT = -10.00 €per evening

Tax deductibility * Cancellation conditions


Let us know your preferred date


Jörn Kuhlenkamp is a research associate at the TU Berlin and specialized in system management and the development of distributed scalable systems, especially database systems, in cloud environments. As part of his scientific activities at the Karlsruhe Institute of Technology (KIT), the TU Berlin and international, industrial research centers such as the IBM T.J. Watson Research Center, he was able to gain excellent theoretical and practical knowledge in the operation of systems in the Apache Hadoop environment.

Jörn Kuhlenkamp has been publishing international research work on the subject of scalable distributed systems in cloud environments for several years and gives lectures worldwide that further advance the state of the art in this area.

Participation requirements

Basic knowledge in:

If you are unsure about this, we will be happy to advise you by e-mail or by phone * (you can reach Mr. Martin Gerwinski or Ms. Laura Trinowitz on weekdays from 9 a.m. to 5 p.m. on + 49-201 8536-600).

Course content


  • Areas of application for Apache Hadoop
  • Design goals and further developments
  • The Apache Hadoop Ecosystem

Basic calculation models and basic services

  • Single iteration jobs: Apache Hadoop MapReduce
  • Multiple iteration jobs: Apache Spark
  • Coordination in distributed systems: Apache Zookeeper

Storage systems

Storage systems manage and allow access to the database for calculations and save results. Get to know relevant, highly available and scalable storage systems and their different properties that provide basic services for the execution of jobs.
  • Hadoop Distributed File System (HDFS)
  • Apache Cassandra
  • Apache HBase

Job specification

In order to enable a quick and error-free specification of jobs, a large number of frameworks can be used, which are located on higher levels of abstraction than MapReduce or offer implementations for certain application domains. Use practical examples to learn which framework is suitable for which problem.
  • SQL: Apache Hive
  • Data flows: Apache Pig
  • Calculations on graphs: Apache Giraph
  • Machine learning: Apache Mahout

Resource Negotiation

In order to be able to execute different jobs reliably and in parallel on a cluster, the execution of different jobs must be coordinated and the resources provided per job must be managed.
  • Apache Hadoop YARN
  • Apache Mesos


Hadoop clusters can be deployed and operated in different environments or used as a hosted service.
  • Hosted Service: AWS Elastic MapReduce, Google Cloud Dataproc
  • IaaS Deployment: AWS EC2

Cluster management and tools

Get to know tools and techniques to deploy, operate and optimize a Hadoop cluster.
  • Monitoring
  • Performance tuning
  • High availability
  • security


  • If you want, you can arrive by 10 p.m. the day before and use the evening to talk shop by the fireplace or in the park.
  • On the course days from 9 a.m. to 6 p.m. (with 2 coffee breaks and 1 lunch break) around 60% training and 40% exercises. Of course, every participant often works with the speaker on the notebook provided by us.
  • Afterwards dinner and offers for shop talk, excursions and much more. We create an atmosphere in which experts can exchange information freely. If you don't want that, you won't be forced to do anything and you will find peace at all times.