To see how admission control works for particular queries, examine the profile output for the query. This
information is available through the
PROFILE statement in impala-shell
immediately after running a query in the shell, on the queries page of the Impala
debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
level 2). The profile output contains details about the admission decision, such as whether the query was
queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
Remember that the limits imposed by admission control are "soft" limits.
The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
or queries could be cancelled if they exceed the
MEM_LIMIT setting while running.
In impala-shell, you can also specify which resource pool to direct queries to by
REQUEST_POOL query option.
The statements affected by the admission control feature are primarily queries, but also include statements
that write data such as
CREATE TABLE AS SELECT. Most write
operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
memory due to buffering intermediate data before writing out each Parquet data block. See
Loading Data into Parquet Tables for instructions about inserting data efficiently into
Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session are also queued so that they are processed in the correct order:
-- This query could be queued to avoid out-of-memory at times of heavy load. select * from huge_table join enormous_table using (id); -- If so, this subsequent statement in the same session is also queued -- until the previous statement completes. drop table huge_table;
If you set up different resource pools for different users and groups, consider reusing any classifications you developed for use with Sentry security. See Enabling Sentry Authorization for Impala for details.
For details about all the Fair Scheduler configuration settings, see
Fair Scheduler Configuration, in particular the tags such as
to map users and groups to particular resource pools (queues).