Table of Contents

Azure - HDInsight (Microsoft's Hadoop)

About

Azure HDInsight is a cluster distribution of the Hadoop components from the Hortonworks Data Platform (HDP).

duplicate of Azure - Cluster (HdInsight Cluster)

It regroups open-source frameworks:

Compontent Doc Link
Jupyter Doc https://CLUSTERNAME.azurehdinsight.net/jupyter

Template

Post-install script

Install component at the end of the creation with script action

Example: install hue

Add an edge node. See Edge Node

Admin

# List the root
hdfs dfs -D "fs.default.name=hdfs://mycluster/" -ls /
# Report
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -report
hdfs fsck -D "fs.default.name=hdfs://mycluster/" /
Connecting to namenode via http://hn0-ha.ax.internal.cloudapp.net:30070/fsck?ugi=hdsshadm&path=%2F
FSCK started by hdsshadm (auth:SIMPLE) from /10.40.35.148 for path / at Thu Jan 17 16:04:30 UTC 2019
................................Status: HEALTHY
 Total size:    1200 B
 Total dirs:    139
 Total files:   32
 Total symlinks:                0 (Files currently being written: 15)
 Total blocks (validated):      15 (avg. block size 80 B)
 Minimally replicated blocks:   15 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          5
 Number of racks:               1
FSCK ended at Thu Jan 17 16:04:30 UTC 2019 in 12 milliseconds


The filesystem under path '/' is HEALTHY

hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -safemode leave

SSH

From SSH Connection

ssh -i ~/.ssh/myPrivatekey -p 22 [email protected] # Primary HeadNode
ssh -i ~/.ssh/myPrivatekey -p 23 [email protected] # Secondary HeadNode
ssh [email protected]
ssh sshuser@wn0-myhdi