As announced elsewhere a few days ago, the next HUGUK will be held on the 3rd of June from 18:30 at the new Skills Matter eXchange, featuring the following two main talks:

“Introduction to Sqoop” by Aaron Kimball

-- Synopsis --

This talk introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from RDBMS sources to Hadoop's distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to an RDBMS for use with other data pipelines.

After this session, users will understand how databases and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We'll also cover some deeper technical details of Sqoop's architecture, and take a look at some upcoming aspects of Sqoop's development roadmap.

-- Bio --

Aaron Kimball has been working with Hadoop since early 2007. Aaron has worked with the NSF and several other universities nationally and internationally to advance education in the field of large-scale data-intensive computing. He helped create and deliver academic course materials first used at the University of Washington (and later adopted by many other academic institutions) as well as Hadoop training materials used by several industry partners. Aaron has also worked as an independent consultant focusing on Hadoop and Amazon EC2-based systems. At Cloudera, he continues to actively develop Hadoop and related tools, as well as focus on training and user education. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.

"Hive at Last.fm" by Tim Sell

-- Synopsis --

This talk is about using Hive in practice. We will go through some of the specific use cases for which Hive is currently being used at Last.fm, highlighting its strengths and weaknesses along the way.

-- Bio --

Tim Sell is a Data Engineer at Last.fm who works with Hive and Hadoop on a daily basis.

As usual we'll try to provide some free beer at the end, and anyone is welcome to give a short lightning talk after the main presentations.

The registration page should be up early next week, so please watch this space and register once the link gets published here.
AlexMc said... 3 June 2010 at 22:50

This was excellent. I'm really glad I came!

skybird technology said... 24 December 2016 at 05:06

