Category Archives: Mac OS X

Adding median() to MySQL

For long-tailed data, the median is often more appropriate than the mean for characterizing central tendency. MySQL (and most relational databases) don’t have a built-in aggregate function for median like they do for mean, which MySQL calls avg(). Of course, smart people on the Internet have written User-Defined Functions (UDFs) that add this functionality.

Originally, someone posted this collection of statistics-related UDFs for MySQL 4, but they don’t work with MySQL 5.1, the current stable version. However, someone updated the median() function to make it compatible with version 5. The instructions at that link work for MySQL on the Mac with a couple modifications.

Here’s how to make it work:

  1. Download udf_median.cc.
  2. You probably need to have installed Apple’s Developer Tools (Xcode, etc.) to compile the UDF. See below, though — if you are running exactly the same version of MySQL as me, you can just download my compiled version.

    If you do have Xcode installed, you can open Terminal, navigate to udf_median.cc’s directory, and compile it with this command:

    g++ -o udf_median.so -shared -I/usr/local/mysql/include udf_median.cc
  3. Put the resulting shared library (udf_median.so) into MySQL’s lib/plugin directory:

    sudo cp udf_median.so /usr/local/mysql/lib/plugin/
  4. Connect to MySQL, probably as root. Run this query to load the function:

    CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';

If you’re running the 64-bit Intel (aka x86_64 or x64) Mac version of MySQL 5.1.40, you can also download my compiled udf_median.so and skip ahead to the cp step.

64-bit RMySQL on Mac OS X 10.6

I’ve recently started working with R on 64-bit Macs with enough RAM to make a 64-bit binary worth using. You can download R compiled for the x86_64 platform here. (It will install a new GUI application called R64.app. Make sure you run that one.) But when I went to load one of the modules I use the most, RMySQL, which allows you to load data frames directly as the result of a MySQL query, I found that the x86_64 binary version from CRAN didn’t work. The solution?

  1. Download and install MySQL. Make sure you get the x86_64 version. As of this writing, it’s labeled “Mac OS X 10.5 (x86_64).” You don’t need to install the startup item for this to work, just the main package.
  2. Download the source package of RMySQL.
  3. Open a terminal and navigate to the directory where the RMySQL tar.gz file was downloaded.
  4. Run the following command:

    R CMD INSTALL --configure-args='--with-mysql-dir=/usr/local/mysql' \
    RMySQL_0.7-4.tar.gz

That command is correct with the current version of RMySQL (0.7-4) and the default install location for MySQL (/usr/local/mysql). Change those values as appropriate if you have a different version or different install location.