Re: No Reducer scenarios

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: No Reducer scenarios

萝卜丝炒饭
HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector is used to write tmp files.


---Original---
From: "☼ R Nair (रविशंकर नायर)"<[hidden email]>
Date: 2017/1/30 13:32:04
To: "dev"<[hidden email]>;"user"<[hidden email]>;"user"<[hidden email]>;
Subject: No Reducer scenarios

Dear all,


1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero?

Best, Ravion
Reply | Threaded
Open this post in threaded view
|

RE: No Reducer scenarios

Praveen Mothkuri

In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path). Also, the framework doesn't sort the map-outputs before writing it out to HDFS.

From: Praveen Mothkuri
Sent: Friday, February 03, 2017 5:14 PM
To: '
萝卜丝炒饭'; R Nair (रविशंकर नायर); dev; user; user
Subject: RE: No Reducer scenarios

 

In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path). Also, the framework doesn't sort the map-outputs before writing it out to HDFS.

 

 

From: 萝卜丝炒饭 [[hidden email]]
Sent: Friday, February 03, 2017 12:05 PM
To:
R Nair (रविशंकर नायर); dev; user; user
Subject: Re: No Reducer scenarios

 

HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector is used to write tmp files.

---Original---

From: "☼ R Nair (रविशंकर नायर)"<[hidden email]>

Date: 2017/1/30 13:32:04

To: "dev"<[hidden email]>;"user"<[hidden email]>;"user"<[hidden email]>;

Subject: No Reducer scenarios

 

Dear all,

 

 

1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

 

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

 

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero?

 

Best, Ravion

This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.