Implementation of count(*) in MySQL
Different MySQL engines implement count(*) in varied ways
Implementation of count(*) in MySQL
Different MySQL engines implement count(*) in varied ways:
MyISAM
- MyISAM engine stores the total row count of a table on the disk. When executing
count(*), it directly returns this number, ensuring high efficiency.
InnoDB
- InnoDB’s approach is more complex. When executing
count(*), it has to read each row of data from the engine and count accumulatively.- InnoDB organizes data in an index-oriented way. The leaf nodes of the primary key index tree represent the data, while the leaf nodes of a non-primary index tree store the primary key values. Given this structure, non-primary index trees are typically much smaller than primary ones. For operations like
count(*), traversing any index tree would yield the same logical result. Hence, MySQL’s optimizer chooses the smallest tree for traversal to minimize data scanning.
- InnoDB organizes data in an index-oriented way. The leaf nodes of the primary key index tree represent the data, while the leaf nodes of a non-primary index tree store the primary key values. Given this structure, non-primary index trees are typically much smaller than primary ones. For operations like
Using Caching Systems for Counting
A popular approach is to use a caching system like Redis to store the row count of a table:
- Every time a row is inserted into the table, the Redis counter is incremented by 1.
- Every time a row is removed, the counter is decremented by 1.
However, caching systems might miss updates or produce logically imprecise counts due to concurrency issues.
Storing Count in the Database
The logic is similar to the caching system approach.
Variations of count Usage
- count(primaryKeyID): InnoDB will traverse the entire table and return each row’s ID value to the server layer. The server layer then checks if the ID is non-null and accumulates the count.
- count(1): InnoDB traverses the whole table without fetching values. The server layer just counts rows without checking values, making
count(1)faster thancount(primaryKeyID). - count(columnName):
- If the column is defined as
NOT NULL, the value from each row is fetched and checked. - If the column is nullable, the engine checks if the value is non-null before accumulating.
- If the column is defined as
- count(*) is optimized not to fetch all field values. It simply counts rows without checking for null values.
In terms of efficiency: count(columnName) < count(primaryKeyID) < count(1) ≈ count(*). It’s recommended to use count(*) whenever possible.
All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.