MySQL Trailing Whitespace Problem
I recently had a run-in with a very interesting MySQL “issue”. I place issue in quotes because it is not a bug, and it’s well documented. However it was not something I came across before and the behavior is not intuitive to me.
Without getting into too much detail about the specific use-case, the issue boiled down to how MySQL handles trailing whitespace on the varchar datatype. A basic synopsis is that if you have the values “foo” and “foo ” (note the trailing space), certain operations will treat the values as the same.
Say you have this table:
CREATE TABLE `catTable` (
`id` int(10) unsigned NOT NULL AUTOINCREMENT,
`category` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `category` VALUES (NULL, 'foo'), (NULL, 'foo'), (NULL, 'foo'), (NULL, 'foo'), (NULL, 'foo ');
Note that the last value has a trailing space. As a programmer, I would expect the ‘foo’ and ‘foo ‘ values to be seen as different. And at least MySQL recognizes that their character lengths are different, for when you run this query:
SELECT category, CHAR_LENGTH(category) FROM catTable;
You get the following result set:
| category | CHAR_LENGTH(category) |
-------------------------------------
| foo | 3 |
| foo | 3 |
| foo | 3 |
| foo | 3 |
| foo | 4 |
We noticed an issue in generating a select box off of these categories and comparing the selected value with each value that was returned by the category list. We get the category list by running this query:
SELECT category, CHAR_LENGTH(category), COUNT(*) rowCount
FROM `catTable`
GROUP BY category
which returns the following resultset:
| category | CHAR_LENGTH(category) | rowCount |
------------------------------------------------
| foo | 4 | 5 |
So at first glance, it doesn’t seem like anything is wrong. However with PHP, ‘foo’ and ‘foo ‘ ARE different. So if the ‘foo ‘ is the value returned, and you try to compare a selected value of ‘foo’, it will fail. Yes, you can trim all values that come out of the database (which we did in this case), but I was looking for a reason that MySQL would treat these values as the same.
For reference, my expected resultset would have been:
| category | CHAR_LENGTH(category) | rowCount |
------------------------------------------------
| foo | 4 | 1 |
| foo | 3 | 4 |
Also note that the CHAR_LENGTH in the select is to show the difference in the values. It is not part of the normal query.
I mentioned earlier that this behavior in MySQL is not a bug and is well-documented. With the help of the recently opened DBA StackExchange site, I was pointed in the right direction.
From the MySQL documentation:
All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces
Thanks to the question over at DBA stackexchange, the MySQL solution to get the expected resultset is using the BINARY operator in the GROUP BY clause. This will treat the values as different because of the trailing space.
SELECT category, CHAR_LENGTH(category), COUNT(*) rowCount
FROM `catTable`
GROUP BY BINARY category
This was an interesting problem with the difference in MySQL’s character collations versus binary string comparisons in PHP with regards to how they treat trailing whitespace. Hopefully others will be spared this misinterpretation in the future.


[...] the issue boiled down to how MySQL handles trailing whitespace on the varchar datatype [...]