BTA Blog Rotating Header Image

July, 2011:

Amazon Simple Storage Service (S3)

I have been using Amazon Simple Storage Service (S3) for several years to provide off-site backup and synchronizing files on my numerous systems.  It is a cloud-based service that provides storage for a very reasonable cost ($0.14 per GB/Month as of 7/19/2011).  Access to files on S3 are through web service interfaces (REST, SOAP, and BitTorrent) that are not for general use.  So most access is through some end-user program or service.  Many services have been built that use S3 including Netflix, Tumblr, reddit, and SmugMug with the list growing rapidly.

To use this service, I use a variety of tools including command-line utility called s3cmd, on my Linux systems to synchronize directories and upload/download files.  Synchronizing a directory is as simple as:

s3cmd sync /home/dirk/data s3://bta-bucket/dirk/

This will create a “directory” data in the S3 “directory” bta-bucket/dirk; upload the files that have changed or do not exist on S3 and store file metadata date/time information to make all this possible.  The only problem I have run into is s3cmd’s get and put commands do not use this file metadata to set the file modification time.  This prevents using get and put where you have used the sync command, because the file modification time will not be saved (in S3 headers) or set (on local PC) with get and put.  So to restore one file to a directory AND set the file modification time set during the sync upload use the following command:

s3cmd sync s3://bta-bucket/dirk/data/somefile.ods /home/dirk/data/somefile.ods

The local file date should match the original file’s modification date/time.  Note: the date/time shown by the s3cmd ls command is the upload date/time, NOT the file modification date/time.  The file metadata is stored in the S3 metadata headers with the Key  = “x-amz-meta-s3cmd-attrs” with a Value similar to: “uid:1000/gname:dirk/uname:dirk/gid:1000/mode:33188/mtime:1302623148/atime:1311007748/ctime:1302627027” to hold the file information.

Other S3 utilities save file metadata in different and incompatible ways, so be careful in choosing your S3 backup software and remember that changing to another utility may cause problems with synchronization based on the file modification date/time.

For Windows users, there is a free client called DragonDisk that uses S3 for file storage.  I have not used it, so this is not a recommendation.

MySQL and UTF-8

I recently had a problem in a web application that I created where the UTF-8 characters were not interpreted correctly by browsers.  The biggest issue was this did not happen in all instances of presenting these UTF-8 strings.

After tracing the strings through several libraries, I found the culprit.  The following SQL statement was the source of the text.

SELECT ID, CONCAT(sDescription, ' (', ID, ')') FROM ProdFamily;

Where ID is a integer key for the table and sDescription is a varchar column.  The result of the CONCAT function is a “binary string” because of the integer column as one of the operands.  The end result is that this “binary string” was treated differently than a regular UTF-8 string and characters outside the normal ASCII ones were not display correctly.  To fix this issue the CAST function must be added to set the type of the ID column to string as follows.

SELECT ID, CONCAT(sDescription, ' (', CAST(ID AS CHAR), ')') FROM ProdFamily

See MySQL CONCAT or CAST function documentation for more information.