最新文章专题视频专题问答1问答10问答100问答1000问答2000关键字专题1关键字专题50关键字专题500关键字专题1500TAG最新视频文章推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37视频文章20视频文章30视频文章40视频文章50视频文章60 视频文章70视频文章80视频文章90视频文章100视频文章120视频文章140 视频2关键字专题关键字专题tag2tag3文章专题文章专题2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章专题3
当前位置: 首页 - 科技 - 知识百科 - 正文

HadoopPigUdfScheme

来源:动视网 责编:小采 时间:2020-11-09 13:22:56
文档

HadoopPigUdfScheme

HadoopPigUdfScheme:hadoop pig udf scheme 如果不指定 scheme 当你返回一个tuple里面有大于1个fields的时候, 就必须指定schemea 不然多个field就当作一个field register myudfs.jar; A = load 'student_data' as (name: chararray,
推荐度:
导读HadoopPigUdfScheme:hadoop pig udf scheme 如果不指定 scheme 当你返回一个tuple里面有大于1个fields的时候, 就必须指定schemea 不然多个field就当作一个field register myudfs.jar; A = load 'student_data' as (name: chararray,


hadoop pig udf scheme 如果不指定 scheme 当你返回一个tuple里面有大于1个fields的时候, 就必须指定schemea 不然多个field就当作一个field register myudfs.jar; A = load 'student_data' as (name: chararray, age: int, gpa: float); B = foreach A gene

hadoop pig udf scheme

如果不指定 scheme 当你返回一个tuple里面有大于1个fields的时候,

就必须指定schemea 不然多个field就当作一个field

 register myudfs.jar;
 A = load 'student_data' as (name: chararray, age: int, gpa: float);
 B = foreach A generate flatten(myudfs.Swap(name, age)), gpa;
 C = foreach B generate $2;
 D = limit B 20;
 dump D

This script will result in the following error cause by line 4 ( C = foreach B generate $2;).

java.io.IOException: Out of bound access. Trying to access non-existent column: 2. Schema {bytearray,gpa: float} has 2 column(s).

This is because Pig is only aware of two columns in B while line 4 is requesting the third column of the tuple. (Column indexing in Pig starts with 0.) The function, including the schema, looks like this:

下面实现了一个schema,输出为4个参数,输出为两个参数,在android上面要用imei和mac去生成一个ukey,在ios平台上,要用 mac和openudid去生成一个ukey

最后返回的是一个platform,ukey

 package kload;
 import java.io.IOException;
 import org.apache.pig.EvalFunc;
 import org.apache.pig.data.Tuple;
 import org.apache.pig.data.TupleFactory;
 import org.apache.pig.impl.logicalLayer.schema.Schema;
 import org.apache.pig.data.DataType;
 /**
 *translate mac,imei,openudid to key
 */
 public class KoudaiFormateUkey extends EvalFunc{
 private String ukey = null;
 private String platform = null;
 public Tuple exec(Tuple input) throws IOException {
 if (input == null || input.size() == 0)
 return null;
 try{
 String platform = (String)input.get(0);
 String mac = (String)input.get(1);
 String imei= (String)input.get(2);
 String openudID = (String)input.get(3);
 this.getUkey(platform,mac,imei,openudID);
 if(this.platform == null || this.ukey == null){
 return null;
 }
 Tuple output = TupleFactory.getInstance().newTuple(2);
 output.set(0, this.platform);
 output.set(1, this.ukey);
 return output;
 }catch(Exception e){
 throw new IOException("Caught exception processing input row ", e);
 }
 }
 private String getUkey(String platform, String mac, String imei, String openudID){
 String tmpStr = null;
 String ukey = null;
 int pType=-1;
 if(platform == null){
 return null;
 }
 tmpStr = platform.toUpperCase();
 if(tmpStr.indexOf("IPHONE") != -1){
 this.platform = "iphone";
 pType = 1001; 
 }else if(tmpStr.indexOf("ANDROID") != -1){
 this.platform = "android";
 pType = 1002; 
 }else if(tmpStr.indexOf("IPAD") != -1){
 this.platform = "ipad";
 pType = 1003; 
 }else{
 this.platform = "unknow";
 pType = 1004; 
 }
 switch(pType){
 case 1001:
 case 1003:
 if(mac == null && openudID == null){
 return null;
 }
 ukey = String.format("%s_%s",mac,openudID);
 break;
 case 1002:
 if(mac == null && imei== null){
 return null;
 }
 ukey = String.format("%s_%s",mac,imei);
 break;
 case 1004:
 if(mac == null && imei== null && openudID == null){
 return null;
 }
 ukey = String.format("%s_%s_%s",mac,imei,openudID);
 break;
 default:
 break;
 }
 if (ukey == null || ukey.length() == 0){
 return null;
 }
 this.ukey = ukey.toUpperCase();
 return this.ukey;
 }
 public Schema outputSchema(Schema input) {
 try{
 Schema tupleSchema = new Schema();
 tupleSchema.add(input.getField(0));
 tupleSchema.add(input.getField(1));
 return new Schema(new
 Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
 input),tupleSchema, DataType.TUPLE));
 }catch (Exception e){
 return null;
 }
 }
 }

文档

HadoopPigUdfScheme

HadoopPigUdfScheme:hadoop pig udf scheme 如果不指定 scheme 当你返回一个tuple里面有大于1个fields的时候, 就必须指定schemea 不然多个field就当作一个field register myudfs.jar; A = load 'student_data' as (name: chararray,
推荐度:
标签: p udf hadoop
  • 热门焦点

最新推荐

猜你喜欢

热门推荐

专题
Top