Hive - WebHCat (Rest API for HCatalog)

About

WebHCat ((or Templeton) service is a REST operation based API for HCatalog.

WebHCat provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs or perform Hive metadata operations using an HTTP (REST style) interface.

WebHCat is a REST interface for remote job execution, such as:

WebHCat translates the job submission requests into YARN applications, and returns a status derived from the YARN application status.

Articles Related

Management

DDL

WebHCat DDL Resources

Check

On a HdInsight Azure cluster

curl -u admin:{HTTP PASSWD} https://{CLUSTERNAME}.azurehdinsight.net/templeton/v1/status?user.name=admin

Log

in the /var/log/webhcat directory:

webhcat.log is the log4j log to which server writes logs. Each webhcat.log is rolled over daily, generating files named webhcat.log.YYYY-MM-DD.
webhcat-console.log is the stdout of the server when started
webhcat-console-error.log is the stderr of the server process

Connection

To list the network connections to and from WebHCat:

netstat | grep 30111

30111 is the port WebHCat listens on. The number of open sockets should be less than 10.

Support

BadGateway (502 status code)

This is a generic message from gateway nodes.

Is the service running ? See check
Does the Yarn queue is full ? Open the webhcat.log log file and search for “queued job”

Documentation / Reference

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference