SQLite and Upper(Special Characters)

While working on the upcoming dictionary update to MusicTools, I had several dozen new terms containing special characters, for example, Accusé and Andächtig. Because of certain challenges getting the dictionary properly transcribed for the original release, I had decided to simply have my transcriber substitute the special characters with their non-special equivalent, so the above two words would have appeared as Accuse and Andachtig.

For the new release, I decided I really needed to render the words as accurately as possible, and it ended up being a non-trivial exercise. The dictionary itself is imported into SQLite from a text file that I maintain with BBEdit. To preserve the special characters, it’s simply a matter of telling BBEdit to save the file in UTF-8 format. So far, so good. Here’s a sample of the import file:

Accrescendo,It., ak-kre-SHEN-do,Increasing; augmenting in tone and power.
Accusé,Fr.,Marked, stressed, emphasized.
Achromatic,,Diatonic.
After import, a sample query shows we’re still in good shape:

sqlite> select definition from terms where term="Accusé";
Marked, stressed, emphasized.
So, we rebuild the app with the new music dictionary, look up Accusé, and we get…

No joy. Looking at the code we find:


-(void) readDatabase:(MusicTerm *)mt word:(NSString *)w {

NSString *term;
NSString *pronunciation;
NSString *definition;
sqlite3 *database;

w = [w uppercaseString];
NSString *query = [[NSString alloc] initWithFormat:@"select term,definition,pronunciation from terms where upper(term) = \"%@\"",w];
sqlite3_stmt *statement;

if (sqlite3_prepare_v2(database, [query UTF8String], -1, &statement, NULL) == SQLITE_OK) {
    sqlite3_reset(statement);
    if (sqlite3_step(statement) == SQLITE_ROW) {
        term = [NSString stringWithUTF8String:(char *)sqlite3_column_text(statement, 0)];
        definition = [NSString stringWithUTF8String:(char *)sqlite3_column_text(statement, 1)];
        pronunciation = [NSString stringWithUTF8String:(char *)sqlite3_column_text(statement, 2)];
    } else {
        definition = @"Definition not available in Dictionary";
        pronunciation = @"";
    }
}

If you’ve been down this road with special characters, you’re probably already smiling. The basic idea is to take the term we’ve been passed in w, uppercase it to avoid case sensitivity issues, and then search for a match with a query that effectively looks like:

select term,definition,pronunciation from terms where upper(term) = w

Turns out this works just fine when special characters aren’t involved. But let’s try this search from the command line:

sqlite> select definition from terms where upper(term)="ACCUSÉ";
sqlite>

Nada. After about 20 minutes of head-scratching, I decide to try this:

sqlite> select upper(term) from terms where term="Accusé";
ACCUSé
sqlite>

Hmmm…it seems that SQLite upper() function apparently ignores special characters – it had been uppercasing everything except our é, causing our dictionary query to fail. With the mystery solved, the fix is easy. We eliminate uppercasing of the search term and in the WHERE clause of query statement:


// w = [w uppercaseString]; commented out here for clarity, but removed from the actual code 

NSString *query = [[NSString alloc] initWithFormat:@"select term,definition,pronunciation from terms where term = \"%@\"",w];
sqlite3_stmt *statement;

if (sqlite3_prepare_v2(database, [query UTF8String], -1, &statement, NULL) == SQLITE_OK) {
    sqlite3_reset(statement);
    if (sqlite3_step(statement) == SQLITE_ROW) {
        term = [NSString stringWithUTF8String:(char *)sqlite3_column_text(statement, 0)];
        definition = [NSString stringWithUTF8String:(char *)sqlite3_column_text(statement, 1)];
        pronunciation = [NSString stringWithUTF8String:(char *)sqlite3_column_text(statement, 2)];
    } else {
        definition = @"Definition not available in Dictionary";
        pronunciation = @"";
    }
}

Building and analyzing the results, we find…joy:


Back in my Perl days I had to sweat case sensitivity virtually all the time. Maybe I should I left that old habit die?

Please note: I reserve the right to delete comments that are offensive or off-topic.

  • Yar

    Hi. I’ve had a similar problem.
    What I did was:
    1. creating a string with all the national characters replaced with uupercases (here in Polish):
    String utf8Uppercases = pSearchString.replace(‘ś’, ‘Ś’).replace(‘ą’, ‘Ą’).replace…….
    2. passing the 2 strings to my SQLite query like this:
    “select ….  from… where…            ” and (upper(“+COL_NAME+”) like upper(?)”+            ”  or upper(“+COL_NAME+”) like upper(?)”+ …

    mDb.rawQuery(query, new String[]{“%”+pSearchStr+”%”, “%”+pUppercaseStr+”%”};

    • http://scottkantner.com Scott Kantner

      Yar,

      Thanks for that.  I may experiment a bit along those lines!

      //Scott