Azure - Cluster (HdInsight Cluster)

Card Puncher Data Processing

About

Cluster of computer.

!!! duplicate of Azure - HDInsight (Microsoft's Hadoop) !!!

Azure Cluster

Structure

Templates

Template reference

Default Storage Account

Each cluster has:

It is referred as the default storage account.

HDInsight cluster and its default storage account must be co-located in the same Azure region.

Version

supported-hdinsight-versions

Sizing

Default node configuration and virtual machine sizes for clusters on the page (What are the Hadoop components and versions available with HDInsight?)

Directory

  • Log: /var/log

Type

  • Interactive Query: A Hadoop cluster that provides Low Latency Analytical Processing (LLAP) functionality to improve response times for interactive queries.
  • Hadoop: A Hadoop cluster that is tuned for batch processing workloads. Uses HDFS, YARN resource management,and a simple MapReduce programming model to process and analyze batch data in parallel.
  • Spark: Apache Spark has built-in functionality for working with Hive.
  • HBase: HiveQL can be used to query data stored in HBase.
  • Microsoft R Server - Microsoft R Server (also know as Microsoft Machine Learning Server)

Only the following cluster types support the Enterprise Security Package:

  • Hadoop (HDInsight 3.6 only)
  • Spark
  • Interactive Query

Application

By default, the cluster come with:

Port

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-port-settings-for-services

Management

HostName

Azure HDInsight using an Azure Virtual Network

Azure provides name resolution for Azure services that are installed in a virtual network.

The cluster nodes can communicate directly with each other, and other nodes in HDInsight, by using internal DNS names. Example of internal DNS names assigned to HDInsight worker nodes:

  • wn0-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.cloudapp.net
  • wn2-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.cloudapp.net

Load balancer

  • clusterName.azurehdinsight.net translates to clusterName.Region.cloudapp.azure.com

Create (Provision)

Using Azure Data Factory, you can create HDInsight clusters on demand, and configure a TimeToLive setting to delete the clusters automatically.

Note: Cluster creation (Provisioning)

Metastore

  • Verify that the database grants access to the Azure service in the firewall of the sql server.
  • Verify that the database grants access to the VNet where the cluster is created

Azure Sql Server Firewall

When you create a metastore for Hive or Oozie, don't use dashes, hyphens, or spaces in the database name. This can cause the cluster creation process to fail.

User creation example :

CREATE USER hi_hive WITH PASSWORD = 'the pwd';
CREATE SCHEMA hi_hive AUTHORIZATION hi_hive;
GRANT CONNECT TO hi_hive;
GRANT CREATE TABLE TO hi_hive;
GRANT CREATE VIEW TO hi_hive;
ALTER USER hi_hive WITH DEFAULT_SCHEMA = hi_hive;

-- https://social.technet.microsoft.com/wiki/contents/articles/7662.use-sql-azure-database-as-a-hive-metastore.aspx
EXEC sp_addrolemember 'db_ddladmin', 'hi_hive';
EXEC sp_addrolemember 'db_datawriter', 'hi_hive';
EXEC sp_addrolemember 'db_datareader', 'hi_hive';

Delete

Data is stored in Azure Storage. A cluster can be safely delete.

Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

In the cli 1.0 from doc

azure hdinsight cluster delete clusterName

Template

Azure - Template (Resource): https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-linux-ssh-password

Cost

Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

Version

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning

Support

Failed to start Hive Metastore due to metastore schema initialization error

Deployment failed. Correlation ID: 6d6465b6-8727-409a-aaa5-4f754112ee1c. {                                                                                                                                                                                          
  "status": "Failed",                                                                                                                                                                                                                                               
  "error": {                                                                                                                                                                                                                                                        
    "code": "ResourceDeploymentFailure",                                                                                                                                                                                                                            
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",                                                                                                                                                                       
    "details": [                                                                                                                                                                                                                                                    
      {                                                                                                                                                                                                                                                             
        "code": "HiveMetastoreSchemaInitializationFailedErrorCode",                                                                                                                                                                                                 
        "message": "Failed to start Hive Metastore due to metastore schema initialization error. If you are using a custom Hive metastore, please run 'Hive Schema Tool' against your metastore to check for possible issues with metastore configuration."         
      }                                                                                                                                                                                                                                                             
    ]                                                                                                                                                                                                                                                               
  }                                                                                                                                                                                                                                                                 
}                                                                                                                                                                                                                                                                   

Verify that you SQL Azure Server firewall allows inbound connection from the same subnet of your cluster.





Discover More
Card Puncher Data Processing
Azure - HDInsight (Microsoft's Hadoop)

Azure HDInsight is a cluster distribution of the Hadoop components from the Hortonworks Data Platform (HDP). duplicate of It regroups open-source frameworks:...
Dbeaver Hdinsight Spark Hive Jdbc
Azure - Hive

in Azure. Hive comes with every cluster in Azure. Connection to a Spark Hive on a Spark cluster with Dbeaver. where: driver name = Apache Hive HdInsight class name...
Card Puncher Data Processing
Azure - Security (Ranger)

domain-joined cluster Ranger - domain parameters Create...
Card Puncher Data Processing
Hadoop - Hortonworks

Hortonworks offers an Hadoop Distribution called HDP (Hortonwokrs Data Platform). HDP can be found: in the Azure Hadoop offering called Hdinsight on there sandbox ...
Card Puncher Data Processing
Spark - History Server

job history server history server It lists the following jobs type: incomplete completed attempts. URL web interface: by default For azure hdinsight: ...



Share this page:
Follow us:
Task Runner