Problem : ERROR 1000: Error during parsing with parameter
With an input command pig -file xxx.pig -param timep=201110271717
has a parameter.
And a pig script line startTime >= (long)$timep
takes the parameter.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. For input string: "201110271717"
Fix:
Adding a quote in the pig script line mytime >= (long)'$timep'
to avoid pig directly parse the $timep to an integer.
Problem : Unexpected internal error. Failed to create DataStorage
The hadoop version bundled with Pig is different with the version of the hadoop cluster.
Fix:
Set HADOOP_HOME for pig execution enviroment. E.g.,
export HADOOP_HOME=/home/ubuntu/hadoop-0.20.205.0
java -cp pig-0.11.0-SNAPSHOT-withouthadoop.jar:$HADOOP_HOME/hadoop-core-0.20.205.0.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf:...
Problem : UDF tasks fail in cluster mode
java.io.IOException: Deserialization error: could not instantiate 'org.apache.pig.scripting....
Fix:
Use relative path for registering UDF scripts.
Register 'test.py' using jython as myfuncs;
Problem : unable to explicitly set the schema on TOBAG(…) as bag{tuple(val:double)} - {(NULL)}
Fix:
TOBAG(TOTUPLE(...)) as bag{tuple(val:double)}
Problem : Pig UDFs jar on Amazon EMR
ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc
Fix:
Add the pig.jar to the classpath copy the pig-*-amzn.jar file to /home/hadoop/lib