Home: http://manmustbecool.github.io/MyWiki/

1 Tutorial

Programming Pig by Alan F Gates

http://ofps.oreilly.com/titles/9781449302641/index.html

2 Development

Problem : ERROR 1000: Error during parsing with parameter

http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur

With an input command pig -file xxx.pig -param timep=201110271717 has a parameter.

And a pig script line startTime >= (long)$timep takes the parameter.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. For input string: "201110271717"

Fix:

Adding a quote in the pig script line mytime >= (long)'$timep' to avoid pig directly parse the $timep to an integer.


Problem : Unexpected internal error. Failed to create DataStorage

The hadoop version bundled with Pig is different with the version of the hadoop cluster.

Fix:

Set HADOOP_HOME for pig execution enviroment. E.g.,

export HADOOP_HOME=/home/ubuntu/hadoop-0.20.205.0
java -cp pig-0.11.0-SNAPSHOT-withouthadoop.jar:$HADOOP_HOME/hadoop-core-0.20.205.0.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf:...

Problem : UDF tasks fail in cluster mode

java.io.IOException: Deserialization error: could not instantiate 'org.apache.pig.scripting....

Fix:

Use relative path for registering UDF scripts.

Register 'test.py' using jython as myfuncs;

Problem : unable to explicitly set the schema on TOBAG(…) as bag{tuple(val:double)} - {(NULL)}

Fix:

TOBAG(TOTUPLE(...)) as bag{tuple(val:double)}

Problem : Pig UDFs jar on Amazon EMR

ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc

Fix:

Add the pig.jar to the classpath copy the pig-*-amzn.jar file to /home/hadoop/lib