Using Unicode in SQL Server

Hi I'm have big problems trying to save Unicode information into my MS SQL Server 2000 Desktop Engine.

Alright, so the problem I'm having now is I'm working with MS SQL 7.0 and PHP5. I've trying to get information to save into MS SQL, however it is storing it all weird. When I save the information and try to view it in Enterprise Manager it shows it has random characters but for some weird reason I can output that information on the page correctly when I do a SELECT query.

Here's an example of my problems.
< php
$insQ
= "INSERT INTO my_tblBusiness (
pkID,
fkbusinessID,
ntitle,
ndescr
) VALUES (
'1',
'1',
N'主要提供韓式菜餚的',
N'主要提供素食餐點或提供素菜菜餚的餐廳'
)"
;

$DB->Query($insQ);

/**
* When I look at this information using Enterprize Manager
* it is showing
* Title: "a﹐ e| a a e “a e e c e a 3"
* Descr: "a﹐ e| a a c’ e£ e e a –a a c’ e e e c e a 3i "
*/
>


But to make everything that little bit more weird, when I output it it shows correctly.

< php
$DB
->Query("SELECT
ntitle,
ndescr
FROM my_tblBusiness"
);

while(
$rs = $DB->fetch_assoc()){
echo
'Title:'.$rs['ntitle'].'<br />';
echo
'Descr:'.$rs['ndescr'];

}
/**
* Outputs in the Browser:
* Title: 主要提供韓式菜餚的
* Descr: 主要提供素食餐點或提供素菜菜餚的餐廳
*/
>


Now here's the kicker. When I do the same INSERT with SQL Query Analyzer, it puts in it the DB correctly. AND when I look at using Enterprise Manager it shows "主要提供韓式菜餚的" and "主要提供素食餐點或提供素菜菜餚的餐廳".

Alright, so does any know how I can get PHP to be able to insert proper Unicode information into my MS SQL 7.0 DB



Answer this question

Using Unicode in SQL Server

  • Rahul Ohri

    INSERT INTO ylu_tblbusiness2 (
    fkbusinessID,
    ntitle,
    ndescr,
    lang
    ) VALUES (
    '28',
    N'a﹐ e| a a a3°a e e c e a 3i ',
    N'a﹐ e| a a a3°a e e c e a 3i ',
    'tch'
    )

    it is returning the weird characters

  • Stut

    I've never used PHP, and know little about it - only what I can infer from what you've included as your code sample.

    I'm wondering whether you're accepting the data from a web-form/page, and inserting it, or hard-coding some fields (eg. your product description) in your insert statements.

    There should be no problem with using Chinese_Taiwan_Stroke_90_BIN as your encoding with regard to other language sets - this is a unicode binary encoding, and will only affect the ordering of the characters. There's no default encoding that works across all East-Asian languages, or Unicode-encoding, so you just need to pick the one that most closely matches what you want.

    Did you try doing a trace on your database inserts Did you end up with unicode characters, or scambled ASCII scrap




  • capyu

    You need to use DateTime (or smalldatetime). When you just set a time in this field, it'll just use the default date (1900-01-01 00:00:00), which you can ignore.

    Have a look here for details:
    http://msdn2.microsoft.com/en-US/library/ms187819(SQL.90).aspx
    Also see the following for details on Time Formats:
    http://msdn2.microsoft.com/en-US/library/ms187642(SQL.90).aspx

    Here's a simple TSQL example - you just ignore the date part of the values:

    declare @OpenTime datetime
    declare
    @CloseTime datetime
    set
    @OpenTime = '9:00:00'
    set @CloseTime = '17:30:00'
    select (@CloseTime - @OpenTime) as OpenForHours


  • Vytas

    Man I wish the answer as that simple. It is storing "a﹐ e| a a a3°a e e c e a 3".

    The thing is, if I copy the exact Query into SQL Query Analyzer execute, it shows up correctly in the DB AND shows correctly in Enterprize Manager. I wish Microsoft would release a quick fix rather than a book about this problem.

  • Rick.H

    Alright, so have a LONG search.. I've found a solution, appearently this is only on Windows platforms (I dont know that for sure, just what a heard). Alright... here's the break down.

    There are two main types of Unicode... UTF-8 and UCS-2LE. The most popular unicode is UTF-8, however MSSQL doesn't understand it, instead it uses UCS-2LE.

    When you send a query using PHP, it sends it unicoded as UTF-8, and MSSQL cannot understand it, and does not accept it as valid characters. So, in result there must be a converstion between UTF-8 and UCS-2LE. I've breifly tried using vicon to change the incoding from utf-8 to ucs-2le, but ran into problems when ever you introduced any punucution characters ( .!@#$%^&*()_+{}|:"<> ), so I found an articule on bugs.php.net, which suggested this using the COM class, and use ADODB.Connection. This way it would do the translation from utf-8 to ucs-2le automatically. Here's the PHP code..

    < php
    $conn
    = new COM("ADODB.Connection", NULL, CP_UTF8) or die("Cannot start ADO");
    $conn->Open("Driver={SQL Server};Server={MYSERVER};Database=MYDB;UID=MYUSERNAME;PWD=MYPWD;");

    $insQ = "INSERT INTO tblcrap(
    ntitle
    ) VALUES (
    N'主要提供泰式菜餚的餐廳'
    )"
    ;
    $rs = $conn->Execute($insQ);
    >


    Now this reacts a lot different from the normal mssql_connect etc, functions you're used too. Also (and this is NOT confirmed), but I hear there might be a little bit of a memory leak when using this.

    So, just to recap. If you're NOT using foreign characters, use the good ol' fashion mssql functions provided by PHP. Otherwise, you're going to have to use the COM class, for now. If anyone knows different, please tell me.

    One thing to note though. Using the COM class in PHP is 5x !!SLOWER!! than using mssql_query. So, if your queries to not require the storing or retrieval of Unicode information, continue to use the mssql_query function.

  • PhilipLanier

    sorry I'm pretty new to MS SQL, not sure how to do a trace. How do I do that

  • AndrewB

    damn... I guess I'll try posting the same problem on PHP's website. While I've got ya here.

    I've immigrated my website from MySQL to MSSQL. MySQL has a Data type called TIME. I was wondering if MSSQL has a similar Data Type I'm using the column to store the opening and closing times of businesses "20:00:00". The reason for this column is that I want to be able to search for businesses that are open right now.

  • Simon_E

    A 'trace' captures all of the SQL commands that are issued, just before they are executed against the database, allowing a review of what's happening. The tool used to trace SQL is called SQL Profiler. To access SQL Profiler:

    Start -> Programs -> Microsoft SQL Server 2005 -> Performance Tools -> SQL Server Profiler.

    The default database trace should capture your SQL without much configuration. Follow these steps from within SQL Profiler:
    1. File -> New Trace
    2. Connect to your database when the dialog appears (best to connect with sa - System Administrator - privileges to ensure nothing's missed). To do so, just use the Windows account that you used to install SQL server - it should have admin privilieges by default.
    3. When the Trace properties dialog appears, select "Standard" as the "Use the template" option drop-down list. Then, click "Run".
    4. Depending on the use of your database, all sorts of SQL commands being executed start to appear in the trace list.
    5. Execute your PHP page that inserts your data, then return to the trace screen, and press the pause ( ) or stop (■) button.
    6. Review your data to find the statement that inserts your data. You'll notice that when you select a line in the top section, details of the SQL code executed appears in the bottom pane. Use this to determine whether the SQL statement executed contains Unicode chinese characters, or garble in another encoding.
    7. Notice that under Tools-> Options from within the Profiler, you can change the font -> Use an appropriate font (eg. Arial Unicode MS) to ensure your characters are displayed correctly.



  • Eburon

    Sorry, man been really busy with work and stuff. Here's the link..

    http://bugs.php.net/bug.php id=18169

  • Frank Winter

    Interesting. Can you give the actual link for the article that you found

    It seems that there is some sort of code page issue. I haven't done all the reading, but it seems to relate to the byte order in which the unicode characters are stored. Here's a link from the WIKIpedia Code page article: http://en.wikipedia.org/wiki/Code_page:
    I assume the "LE" refers to Little Endian, and "BE" refers to Big endian.

    I'll have to do some more reading on this when I get some time...


  • StevenR2

    From what you've said above, it seems to me that if you can successfully retrieve the data from the database, and it ouputs your Chinese correctly, then you've:
    • correctly set your DB columns to be nvarchar or ntext
    • correctly referencing unicode strings using N'string contents' -type statements
    I believe the issue here is simply the font that you are using to display the data from within Enterprise Manager. I've uninstalled Enterprise Manager myself, as I've moved from SQL 2000 to SQL 2005, but I do remember that it is possible to change the font. Have a look through the menu options in SQL Server Enterprise Manager, find the option to change the font, and set it to a font that contains glyphs for your characters (Arial Unicode MS is a good one).


  • simon mcinnes

    Then you've got an issue with different encodings somewhere. The database will just store what you input into it - you can either store single, or double-byte characters. The collation on your fields will also make a difference - check the collation on your table by issuing the following command:

    sp_help my_tblBusiness

    Check the collation in the second result set for the two columns in which you are attempting to store unicode characters. Perhaps try using Chinese_Taiwan_Stroke_90_BIN.

    Also check that the data being inserted into the database is correct. Do a trace on your database, set the font in the Trace tool to a unicode-enabled font (Arial Unicode MS), and then run your code that performs the insert. Ensure that your code is being inserted as unicode, and not as BIG5 or another encoding.

    I would suggest that there's some automatic character translation going on somewhere - you just need to find out where it is.



  • debeerBiz

    The only problem with using Chinese_Taiwan_Stroke_90_BIN is that what happens when I use try to insert Japanese, Persian, and Simiplied Chinese into the DB

    I remember reading something somewhere about UCS-2 encoding on PHP side. Do you know what that is

  • Curt Carpenter

    There you go - so it's not SQL Server, it's your client program. You'll need to review the method by which you accept the data from the client (web page), and insert it, and track down the point at which it's being (incorrectly) encoded. Once you've done that, try to insert it again, and if problems persist, check the trace data again.


  • Using Unicode in SQL Server