Great post on databases and map/reduce
Anand Rajaraman has a great post on Datawocky with an overview of the various approaches to data analysis using Map/Reduce, and they ways in which this paradigm is bridged with RDBMSes by AsterData and...
View ArticleBuilding an Inverted Index with Hadoop and Pig
Note: For some reason, this post appears to be pretty popular. Here's the thing. This was the first thing I wrote when learning Pig. Literally -- I wrote it down the evening I sat down to play with...
View ArticleFree Hadoop Training
Cloudera has made training videos, screencasts, and excercises from their basic Hadoop training available on the web. Check it out here: http://www.cloudera.com/hadoop-training-basic They even include...
View ArticlePresentation on Apache Pig at Pittsburgh Hadoop User Group
Ashutosh and I presented at the Pittsburgh Hadoop User Group on Apache Pig. The slide deck goes through a brief into to Pig Latin, then jumps into an explanation of the different join algorithms, and...
View ArticleGROUP operator in Apache Pig
I’ve been doing a fair amount of helping people get started with Apache Pig. One common stumbling block is the GROUP operator. Although familiar, as it serves a similar function to SQL’s GROUP...
View ArticlePig, HBase, Hadoop, and Twitter: HUG talk slides
I presented tonight at the Bay Area Hadoop User Group, talking briefly about Twitter’s use of Hadoop and Pig. Here are the slides: View this document on Scribd
View ArticleUpcoming Features in Pig 0.8: Dynamic Invokers
Pig release 0.8 is scheduled to be feature-frozen and branched at the end of August 2010. This release has many, many useful new features, mostly addressing usability. In this series of posts, I will...
View Article