About
Cluster of computer.
!!! duplicate of Azure - HDInsight (Microsoft's Hadoop) !!!
Articles Related
Structure
Templates
Default Storage Account
Each cluster has:
- or an Azure Data Lake account dependency.
It is referred as the default storage account.
HDInsight cluster and its default storage account must be co-located in the same Azure region.
Version
Sizing
Directory
- Log: /var/log
Type
- Interactive Query: A Hadoop cluster that provides Low Latency Analytical Processing (LLAP) functionality to improve response times for interactive queries.
- Hadoop: A Hadoop cluster that is tuned for batch processing workloads. Uses HDFS, YARN resource management,and a simple MapReduce programming model to process and analyze batch data in parallel.
- Spark: Apache Spark has built-in functionality for working with Hive.
- HBase: HiveQL can be used to query data stored in HBase.
- Microsoft R Server - Microsoft R Server (also know as Microsoft Machine Learning Server)
Only the following cluster types support the Enterprise Security Package:
- Hadoop (HDInsight 3.6 only)
- Spark
- Interactive Query
Application
By default, the cluster come with:
- YARN,
Port
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-port-settings-for-services
Management
HostName
Azure HDInsight using an Azure Virtual Network
Azure provides name resolution for Azure services that are installed in a virtual network.
The cluster nodes can communicate directly with each other, and other nodes in HDInsight, by using internal DNS names. Example of internal DNS names assigned to HDInsight worker nodes:
- wn0-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.cloudapp.net
- wn2-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.cloudapp.net
Load balancer
- clusterName.azurehdinsight.net translates to clusterName.Region.cloudapp.azure.com
Create (Provision)
Using Azure Data Factory, you can create HDInsight clusters on demand, and configure a TimeToLive setting to delete the clusters automatically.
Note: Cluster creation (Provisioning)
- Azure cli (1.0 !)
Metastore
- Verify that the database grants access to the Azure service in the firewall of the sql server.
- Verify that the database grants access to the VNet where the cluster is created
When you create a metastore for Hive or Oozie, don't use dashes, hyphens, or spaces in the database name. This can cause the cluster creation process to fail.
User creation example :
CREATE USER hi_hive WITH PASSWORD = 'the pwd';
CREATE SCHEMA hi_hive AUTHORIZATION hi_hive;
GRANT CONNECT TO hi_hive;
GRANT CREATE TABLE TO hi_hive;
GRANT CREATE VIEW TO hi_hive;
ALTER USER hi_hive WITH DEFAULT_SCHEMA = hi_hive;
-- https://social.technet.microsoft.com/wiki/contents/articles/7662.use-sql-azure-database-as-a-hive-metastore.aspx
EXEC sp_addrolemember 'db_ddladmin', 'hi_hive';
EXEC sp_addrolemember 'db_datawriter', 'hi_hive';
EXEC sp_addrolemember 'db_datareader', 'hi_hive';
Delete
Data is stored in Azure Storage. A cluster can be safely delete.
Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
In the cli 1.0 from doc
azure hdinsight cluster delete clusterName
Template
Azure - Template (Resource): https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-linux-ssh-password
Cost
Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
Version
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning
Support
Failed to start Hive Metastore due to metastore schema initialization error
Deployment failed. Correlation ID: 6d6465b6-8727-409a-aaa5-4f754112ee1c. {
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "HiveMetastoreSchemaInitializationFailedErrorCode",
"message": "Failed to start Hive Metastore due to metastore schema initialization error. If you are using a custom Hive metastore, please run 'Hive Schema Tool' against your metastore to check for possible issues with metastore configuration."
}
]
}
}
Verify that you SQL Azure Server firewall allows inbound connection from the same subnet of your cluster.