Impala is a SQL for low-latency data warehousing on a Massively Parallel Processing (MPP) Infrastructure.
Cloudera’s Impala is an implementation of Google’s Dremel. Dremel relies on massive parallelization.
Similar to an MPP data warehouse, queries in Impala originate at a client node. This query is then sent to *every* data storage node which stores part of the dataset. Results are partially aggregated locally, then combined on the client machine.
Impala’s query response time is seconds, instead of minutes with MapReduce.
Impala is an interactive-speed SQL query engine that runs on the Hadoop infrastructure. Impala make all the data in the Hadoop Distributed File System (HDFS) and Apache HBase database tables accessible for real-time querying.
Impala is well-suited to executing SQL queries for interactive exploratory analytics on large datasets.