My first MySQL conference will be post on May 26th, 2011. I am excited for the lineup of talks scheduled for the one-day event.

Reviewing the schedule of events, I am thrilled at all the choices. Here are my MySQL Picks for each time slot:

  • 10:15am – Breaking Down Query Latency With DTrace (Brendan Gregg)
  • 11:00am – MySQL High Availability With Pacemaker (Yves Trudeau)
  • 11:45am – MySQL Replication Alternatives: Pros & Cons (Darpan Dinker)
  • 1:30pm – Measuring Scalability and Performance With TCP (Baron Schwartz)
  • 2:15pm – Preventing MySQL Emergencies (Espen Braekken)
  • 3:00pm – Efficiently Monitoring MySQL with Nagios (Sheeri Cabral)
  • 3:30pm – How NoSQL Complements MySQL (Matt Yonkovit)
  • 4:15pm – Improvements in MySQL 5.5 and 5.6 (Peter Zaitsev)
Share:
Picture of a bass fish

A hooked fish

I sat in the boat with a friend from my college years. We slowly approached a quiet cove and began preparing our rods. The two of us baited our hooks and prepared to make our first casts of the day. One second our bobbers touched the water, the next there was a sharp pull on our lines. A moment later, we looked at our catches: two large bass!

And then I woke up. I remembered that I am not a fisher. Sure, I’ve gone bass fishing and have dabbled briefly in fly-fishing. But in general, that is not one of my hobbies.

I lay there for a few minutes more, thinking about the dream. Why was I suddenly dreaming about fishing? From the few times I have been fishing, I do know that rarely do you make such a great catch two minutes after setting up shop. It is just not how it works, in my limited experience.

This thought made me realize the importance of the dream. It wasn’t about fishing at all, but a lesson to apply to a new project we have started at work. It is not going to “just happen”. It is going to take some time, some hard work.

Have a dream, and then set your energy to accomplishing the dream.

Share:

Today I am going to explore MySQL’s Binary Collations. About a week ago, I wrote a post about an issue I was having with trailing whitespace not showing up as a different value in a GROUP BY query as I expected.

As a brief recap, this is because the MySQL character set I was using was PADSPACE, which means that all trailing whitespace is ignored in comparisons. The solution was to use the BINARY operator.

This issue led me to do more investigation into MySQL character sets and collations. I wanted to see if there were a way to make trailing spaces be significant by changing the column’s character definition, instead of using the BINARY operator. Reading through the MySQL manual on the BINARY operator did not encourage me:

BINARY (operator) also causes trailing spaces to be significant….The BINARY attribute in character column definitions has a different effect.

Ok, so defining the column as BINARY does not make trailing spaces significant. Instead, it changes the collation of the column to the _bin collation of the character set. As an example, if your column is defined as character set ‘utf8′, then defining the column ‘char(10) BINARY’ changes the collation to to ‘utf8_bin’. So what does that mean?

All nonbinary strings have a character set and collation. Examples of character sets are ‘utf8′ and ‘latin1′. They define the type of characters allowed in the column. Each character set has multiple collations associated only with that character set (utf8_general, utf8_bin). These collations define the rules for comparing and sorting the characters in the parent set.

The _bin collation will order the column based on the value of the string on the binary code values of the characters. These characters can be multi-byte (as in the case of utf8), and in this respect makes the _bin collation different from binary strings stored in BINARY, VARBINARY and BLOB column types.

‘_bin’ collations will still allow the lettercase to be modified (for example, calling LOWER() on a _bin collation will change ‘AA’ to ‘aa’) but ordering the values will ignore the lettercase. This ability to modify the lettercase of a _bin collation differs from a binary string, which does not allow lettercase conversion.

Unfortunately, ‘_bin’ collations are still considered nonbinary strings, making this rule apply:

Nonbinary strings have PADSPACE behavior for all collations, including _bin collations

Trailing spaces are still insignificant in comparisons: ‘a ‘ = ‘a’.

So while I was not able to avoid using the BINARY operator to make trailing spaces significant by changing the column definition, I did learn that _bin collation will allow me to sort the values of a column in a case-insensitive manner, while still maintaining lettercase significance for comparisons (‘a’ is not equal to ‘A’).

For more reading on the _bin collation and the difference from binary strings, there is some good information on the MySQL manual page.

Share:

I recently had a run-in with a very interesting MySQL “issue”. I place issue in quotes because it is not a bug, and it’s well documented. However it was not something I came across before and the behavior is not intuitive to me.

Without getting into too much detail about the specific use-case, the issue boiled down to how MySQL handles trailing whitespace on the varchar datatype. A basic synopsis is that if you have the values “foo” and “foo ” (note the trailing space), certain operations will treat the values as the same.

Say you have this table:

CREATE TABLE `catTable` (
  `id` int(10) unsigned NOT NULL AUTOINCREMENT,
  `category` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `category` VALUES (NULL, 'foo'), (NULL, 'foo'), (NULL, 'foo'), (NULL, 'foo'), (NULL, 'foo ');

Note that the last value has a trailing space. As a programmer, I would expect the ‘foo’ and ‘foo ‘ values to be seen as different. And at least MySQL recognizes that their character lengths are different, for when you run this query:

SELECT category, CHAR_LENGTH(category) FROM catTable;

You get the following result set:

| category  | CHAR_LENGTH(category) |
-------------------------------------
| foo       |   3                   |
| foo       |   3                   |
| foo       |   3                   |
| foo       |   3                   |
| foo       |   4                   |

We noticed an issue in generating a select box off of these categories and comparing the selected value with each value that was returned by the category list. We get the category list by running this query:

SELECT category, CHAR_LENGTH(category), COUNT(*) rowCount 
FROM `catTable`
GROUP BY category

which returns the following resultset:

| category  | CHAR_LENGTH(category) | rowCount |
------------------------------------------------
| foo       |   4                   |   5      |

So at first glance, it doesn’t seem like anything is wrong. However with PHP, ‘foo’ and ‘foo ‘ ARE different. So if the ‘foo ‘ is the value returned, and you try to compare a selected value of ‘foo’, it will fail. Yes, you can trim all values that come out of the database (which we did in this case), but I was looking for a reason that MySQL would treat these values as the same.

For reference, my expected resultset would have been:

| category  | CHAR_LENGTH(category) | rowCount |
------------------------------------------------
| foo       |   4                   |   1      |
| foo       |   3                   |   4      |

Also note that the CHAR_LENGTH in the select is to show the difference in the values. It is not part of the normal query.

I mentioned earlier that this behavior in MySQL is not a bug and is well-documented. With the help of the recently opened DBA StackExchange site, I was pointed in the right direction.

From the MySQL documentation:

All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces

Thanks to the question over at DBA stackexchange, the MySQL solution to get the expected resultset is using the BINARY operator in the GROUP BY clause. This will treat the values as different because of the trailing space.

SELECT category, CHAR_LENGTH(category), COUNT(*) rowCount 
FROM `catTable`
GROUP BY BINARY category

This was an interesting problem with the difference in MySQL’s character collations versus binary string comparisons in PHP with regards to how they treat trailing whitespace. Hopefully others will be spared this misinterpretation in the future.

Share:

When I first started setting up this site using WordPress as a CMS, I reviewed some existing code for themes and plugins. Among the garbled mess of code, I found a pearl: the glob function

The glob function, in a nutshell, provides a means of retrieving a list of files that match a pattern. A quick example would be if you had a directory of images that were named based on a unique identifier (a uid), but there were multiple images per UID:

So, now you want to get all the pictures for ‘myid’. Sure, you can loop through the directory using your favorite directory iterator functionality (readdir, the SPL, etc) and test the filename against ‘myid’. Or, you can use the glob function:

     <?php
       $files = glob('/path/to/imageDir/myid*.jpg');
     ?>

Now you have your list of files to do with as you please. The asterisk (*) tells glob to match 0 or more instances of any character (except /).

The other wildcard that glob accepts is the question mark (?). This tells glob to match any single character. So for instance, using our directory structure, if you know the numbers are single digit (less than 10 pictures per UID). You could use this string to get the same filelist:

    <?php
       $files = glob('/path/to/imageDir/myid-?.jpg');
    ?>

The one drawback of glob that I care about is that it is not recursive. It won’t reach into any directories it finds and get a list of matching files. There are, however, various implementations for recursive functionality listed in the comments of the php manual entry.

If you know the depth of the directories, you can add extra asterisks for glob to search. Here’s an example:

We created multiple sizes for each image. How do we get the list of all ‘myid’ files? Simple:

    <?php
      $files = glob('imageDir/*/myid*.jpg');
    ?>

This has been a quick look at the glob function. The manual entry does have useful information, such as which flags the function can handle. There are also notes on limitations on the systems that can use the function.

Share:

As a first attempt at WordPress plugin development, I created OrangePants Archive (op-archive for short).

One thing I’ve been nervous about is updating wordpress to a new version. Staring me in the face are the words Before you upgrade anything, make sure you have backup copies.

Well, it’s a simple matter to ssh into my company’s box and create the appropriate backups (tarball of the blog folder and a mysqldump of the database files). But my goal here is to be able to do all that from the comfort of the wordpress admin

Before someone mentions it: yes, there are other plugins already developed to perform just these tasks. But this is my first attempt at developing a plugin for wordpress, so originality was not really my goal. I’m familiarizing myself with the process of creating a plugin, acceptable coding patterns, and various functionalities.


UPDATE

OP Archive is now up on the WordPress Plugin Directory, and has some semblence of functionality. Go into your wordpress admin under the ‘Tools’ submenu. There will be a panel for ‘OP Archive’. Click it, click ‘Backup Now’ and wait a moment. Soon you will get some feedback about your new archive. Archives are located under the ‘wp-content/uploads/op-archives’ directory. Note: You do need to make sure the uploads directory is writeable by wordpress.

Once you’ve run the backup successfully, you’ll see a listing of archives. You can click the link of the .tar.gz file to download it. The .sql file only ones up a text file in your browser.

I will be adding functionality to delete selected archives, but for now, you have to manually delete them out of your uploads directory.

Please, leave me some comments on the plugin and how it can be improved. This is my first plugin, so I’m looking for ‘wp coding standards’ and advice on backwards compatibility issues. I will gladly take new feature requests though!


Planned Features

  • *Compressed file generation that does not depend on `exec` function.
  • *Ability to choose your type of compression (.gzip, tar.gz, etc)
  • *Ability to ftp your compressed files to remote location.
  • *Remove archives from the admin panel
Share:

“Apathy can be overcome by enthusiasm, and enthusiasm can only be aroused by two things: first, an ideal, with takes the imagination by storm, and second, a definite intelligible plan for carrying that ideal into practice.” – Arnold J. Toynbee

I’m not entirely sure who Arnold J. Toynbee is, but this quote sums up the entire point of this website: An ideal that I am developing to begin removing apathy from the dictionary of my life.

I’ve recently relocated, giving me a chance to take stock of my life thus far. What I see, I don’t like. The past few years have been amazing; marriage and the birth of two sons. But…well, let’s just say I’ve been coasting. I’m nearing my thirties, so perhaps this is just an early mid-life crisis. But I can honestly say, I never have had any long-term goals; no dreams. This is sad really, but I’ve truly been apathetic towards life.

But now I’m starting to dream for my family. What can I do to provide them with more than the bare necessities of existence? The answer is simple in theory: I can stop coasting. Don’t balk at learning new things. Broaden my horizons.

And so I created OrangePants. To freely share knowledge that I gain in my field of Web Development and Database Administration.

By the way, Arnold J. Toynbee was a British historian who wrote a twelve-volume collection called A Study of History.

Share: