Bulk Insert Unicode

Good day,

We are using bulk insert with a formatfile to load a text file into sqlexpress. One field in the text file contains non-ascii (unicode) charaters and the corresponding database field is nvarchar.

When the record and row are specified in the format file as:

<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400" COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>

the value "Boca Cura" gets imported as "Boca CurA!". This is true even with datafiletype set to widenative or widechar, and/or codepage set to raw or acp.

When the field is specfied in the record section of the format file as NCharTerm, the bulk insert terminates immediately with:

Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23 (FullName).

This is true regardless of which bulk insert options specified.

What could be going wrong and how can we address it

Thanks in advance...



Answer this question

Bulk Insert Unicode

  • Stan Lin

    Here is a C# example:

    // utf8 in, ansi out

    StreamReader inStream = File.OpenText(@"E:\Data\DEM\Panama\panama_canal_final.txt");

    FileStream outStream = File.Create(@"E:\Data\DEM\Panama\panama_canal_fsubset.txt");

    Encoding ansi = Encoding.Default;

    string inLine;

    byte[] outBytes;

    while (true)

    {

    inLine = inStream.ReadLine();

    if (inLine == null)

    break;

    outBytes = ansi.GetBytes(inLine + "\r\n");

    outStream.Write(outBytes, 0, outBytes.Length);

    }

    inStream.Close();

    outStream.Close();


  • guykirs

    Hi knightEknight!

    We solved this with MS using a developer support incident. I should have updated this post when the information was fresh! Just went back through my notes and the readme for the project, and the solution was to convert the input file from UTF8 text encoding to ANSI text encoding. This is a Save As option in Notepad and most text editors. If you have a large file and need some C# code to do this, let me know....


  • Deepak Puri

    Thanks! My input comes from various sources and I've pretty much concluded the same thing. I'll just have to find a way to convert all the input to ANSI. - Regards
  • Amit Banerjee - MSFT

    Instead of converting your files to ANSI as suggested you could also fix your delimiter problem for Unicode files. The TERMINATOR should be '\t\0' instead of just '\t' for tabs and '\r\0\n\0' for the row delimiter if you are using standard CR+LF.

    Regards,

    Lars



  • lagu2653

    Here's my bulk format file, which reads (row by row) everything from a Unicode-encoded file:

    < xml version="1.0" >

    <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <RECORD>

    <FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="\r\0\n\0"/>

    </RECORD>

    <ROW>

    <COLUMN SOURCE="1" NAME="row" xsi:type="SQLNVARCHAR"/>

    </ROW>

    </BCPFORMAT>

    Hope that helps. The target table has an nvarchar column into which the data is inserted.

    Regards,

    Lars



  • JF G.P.

    I am having about the same problem with improrting a Unicode file using BULK INSERT. This file was made by SQL Server and the person providing the file has not been able to get it to output to ANSI.

    Upon input, changing the terminator to '\r\0\n\0' as suggested above did not work for me. Setting DATAFILETYPE to 'char' did not work, and setting CODEPAGE to 850 did not work.

    Converting the file to ANSI did work, but this is to be an automated system, so I need a way to either have the file exported as ANSI, or else have a way to tell BULK INSERT to do the conversion.

    Any other suggestions, anyone


  • Sorinuc

    Hi,

    I see it's been a while since you posted this, but I have just run into this same issue today and I was wondering if you were able to resolve it gracefully

    regards


  • Bulk Insert Unicode