Different MySQL engines implement count(*) in varied ways

Implementation of count(*) in MySQL

Different MySQL engines implement count(*) in varied ways:

MyISAM

  • MyISAM engine stores the total row count of a table on the disk. When executing count(*), it directly returns this number, ensuring high efficiency.

InnoDB

  • InnoDB’s approach is more complex. When executing count(*), it has to read each row of data from the engine and count accumulatively.
    • InnoDB organizes data in an index-oriented way. The leaf nodes of the primary key index tree represent the data, while the leaf nodes of a non-primary index tree store the primary key values. Given this structure, non-primary index trees are typically much smaller than primary ones. For operations like count(*), traversing any index tree would yield the same logical result. Hence, MySQL’s optimizer chooses the smallest tree for traversal to minimize data scanning.

Using Caching Systems for Counting

A popular approach is to use a caching system like Redis to store the row count of a table:

  • Every time a row is inserted into the table, the Redis counter is incremented by 1.
  • Every time a row is removed, the counter is decremented by 1.
    However, caching systems might miss updates or produce logically imprecise counts due to concurrency issues.

Storing Count in the Database

The logic is similar to the caching system approach.

Variations of count Usage

  • count(primaryKeyID): InnoDB will traverse the entire table and return each row’s ID value to the server layer. The server layer then checks if the ID is non-null and accumulates the count.
  • count(1): InnoDB traverses the whole table without fetching values. The server layer just counts rows without checking values, making count(1) faster than count(primaryKeyID).
  • count(columnName):
    • If the column is defined as NOT NULL, the value from each row is fetched and checked.
    • If the column is nullable, the engine checks if the value is non-null before accumulating.
  • count(*) is optimized not to fetch all field values. It simply counts rows without checking for null values.

In terms of efficiency: count(columnName) < count(primaryKeyID) < count(1) ≈ count(*). It’s recommended to use count(*) whenever possible.