These lines are really critical in your driver codes:
Configuration conf = new Configuration();
// set map output and reduce output as compress
conf.setBoolean("mapred.compress.map.output", true);
conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);
conf.setBoolean("mapred.output.compress", true);
conf.setClass("mapred.output.compression.codec", GzipCodec.class, CompressionCodec.class);
The idea is simple: compressed your outputs so that there are less I/O. Especially when we are dealing with large amount of intermediary data.
So, do remember these lines!
0 comments:
Post a Comment