23 March, 2017
JADS organizes successful Hadoop Workshop for students
Text by Kai-Tao Yang.
On March 14th 2017, the DataLab of JADS invited five pre-master students for an entire day of workshop on Hadoop. Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. In this workshop, Data Engineers from the DataLab of JADS firstly taught students step-by-step on how to establish a Hadoop cluster using laptops, routers, and Ethernet cables. Then students established a Hadoop cluster using two laptops, and added the third laptop into the cluster without affecting the existing computers. Finally, several testing examples (e.g., grep, wordcount) were executed on the cluster successfully. Documents that used in this workshop can be downloaded here.
These five students were invited to this workshop because they were the winners in a Jupyter presentation on February 7th. Due to the success of this workshop, the DataLab of JADS is planning to organize another larger scale Hadoop workshop that will be open to all JADS students (including pre-masters, masters, and PDEng trainees). Integrating Spark with Hadoop will be an additional part of the future Hadoop workshop.
Feedback from the participating students
“I knew that the content of the workshop was going to be really technical, however Kai Tao turned it into a playful way of learning and I really enjoyed it. Unfortunately, I was not able to do a lot, because my laptop could not install Ubuntu. However, I really recommend this workshop to everyone who is interesting in Hadoop or Spark. Take into account that setting up Hadoop is difficult, so be open for a lot of technical stuff that will blow up your mind. But hey, do it for the team :)”
By Simoes Anderson
“The workshop gave us some insight into the data engineering aspect of data science. Although it was highly technical, you managed to explain the concepts and principles in a concrete and concise manner. Therefore, in a way, it was different from what we are used to in the normal curriculum, technical wise. However, that was what made it so useful. Seeing as Hadoop, and in extension Spark, are a must-have in a data scientists profile, the workshop was definitely of added value. If I would have to give some pointer, it would be that some of the content might be too difficult for those who don't have a technical (specifically computer science) background or at the very least some experience with Ubuntu. I'm looking forward to any workshops that follow!”
By Maarten Grootendorst
“I thought the workshop was good, but towards the end I started to feel a bit lost. Next time, maybe it is better to send the participants either a schedule or an overview just so they can feel a little more prepared as things move along. Other than that I thought it was good.”
By Vincent Anthony Lingle-Munos
“The Hadoop workshop was a very nice experience in how advanced datasystems work on the low level. Personally, I really liked the technical depth of the workshop!”
By Bart Verhaegh
On March 14th 2017, the DataLab of JADS invited five pre-master students for an entire day of workshop on Hadoop. Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. In this workshop, Data Engineers from the DataLab of JADS firstly taught students step-by-step on how to establish a Hadoop cluster using laptops, routers, and Ethernet cables. Then students established a Hadoop cluster using two laptops, and added the third laptop into the cluster without affecting the existing computers. Finally, several testing examples (e.g., grep, wordcount) were executed on the cluster successfully. Documents that used in this workshop can be downloaded here.
These five students were invited to this workshop because they were the winners in a Jupyter presentation on February 7th. Due to the success of this workshop, the DataLab of JADS is planning to organize another larger scale Hadoop workshop that will be open to all JADS students (including pre-masters, masters, and PDEng trainees). Integrating Spark with Hadoop will be an additional part of the future Hadoop workshop.
Feedback from the participating students
“I knew that the content of the workshop was going to be really technical, however Kai Tao turned it into a playful way of learning and I really enjoyed it. Unfortunately, I was not able to do a lot, because my laptop could not install Ubuntu. However, I really recommend this workshop to everyone who is interesting in Hadoop or Spark. Take into account that setting up Hadoop is difficult, so be open for a lot of technical stuff that will blow up your mind. But hey, do it for the team :)”
By Simoes Anderson
“The workshop gave us some insight into the data engineering aspect of data science. Although it was highly technical, you managed to explain the concepts and principles in a concrete and concise manner. Therefore, in a way, it was different from what we are used to in the normal curriculum, technical wise. However, that was what made it so useful. Seeing as Hadoop, and in extension Spark, are a must-have in a data scientists profile, the workshop was definitely of added value. If I would have to give some pointer, it would be that some of the content might be too difficult for those who don't have a technical (specifically computer science) background or at the very least some experience with Ubuntu. I'm looking forward to any workshops that follow!”
By Maarten Grootendorst
“I thought the workshop was good, but towards the end I started to feel a bit lost. Next time, maybe it is better to send the participants either a schedule or an overview just so they can feel a little more prepared as things move along. Other than that I thought it was good.”
By Vincent Anthony Lingle-Munos
“The Hadoop workshop was a very nice experience in how advanced datasystems work on the low level. Personally, I really liked the technical depth of the workshop!”
By Bart Verhaegh