We are using bulk insert with a formatfile to load a text file into sqlexpress. One field in the text file contains non-ascii (unicode) charaters and the corresponding database field is nvarchar.
When the record and row are specified in the format file as:
<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400" COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>
the value "Boca Cura" gets imported as "Boca CurA!". This is true even with datafiletype set to widenative or widechar, and/or codepage set to raw or acp.
When the field is specfied in the record section of the format file as NCharTerm, the bulk insert terminates immediately with:
Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23 (FullName).
This is true regardless of which bulk insert options specified.
What could be going wrong and how can we address it
Thanks in advance...

Bulk Insert Unicode
Stan Lin
Here is a C# example:
// utf8 in, ansi out StreamReader inStream = File.OpenText(@"E:\Data\DEM\Panama\panama_canal_final.txt"); FileStream outStream = File.Create(@"E:\Data\DEM\Panama\panama_canal_fsubset.txt"); Encoding ansi = Encoding.Default; string inLine; byte[] outBytes; while (true){
inLine = inStream.ReadLine();
if (inLine == null) break;outBytes = ansi.GetBytes(inLine +
"\r\n");outStream.Write(outBytes, 0, outBytes.Length);
}
inStream.Close();
outStream.Close();
guykirs
Hi knightEknight!
We solved this with MS using a developer support incident. I should have updated this post when the information was fresh! Just went back through my notes and the readme for the project, and the solution was to convert the input file from UTF8 text encoding to ANSI text encoding. This is a Save As option in Notepad and most text editors. If you have a large file and need some C# code to do this, let me know....
Deepak Puri
Amit Banerjee - MSFT
Instead of converting your files to ANSI as suggested you could also fix your delimiter problem for Unicode files. The TERMINATOR should be '\t\0' instead of just '\t' for tabs and '\r\0\n\0' for the row delimiter if you are using standard CR+LF.
Regards,
Lars
lagu2653
Here's my bulk format file, which reads (row by row) everything from a Unicode-encoded file:
<
xml version="1.0" ><
BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> RECORD> FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="\r\0\n\0"/></
RECORD><
ROW> COLUMN SOURCE="1" NAME="row" xsi:type="SQLNVARCHAR"/></
ROW></
BCPFORMAT>Hope that helps. The target table has an nvarchar column into which the data is inserted.
Regards,
Lars
JF G.P.
Upon input, changing the terminator to '\r\0\n\0' as suggested above did not work for me. Setting DATAFILETYPE to 'char' did not work, and setting CODEPAGE to 850 did not work.
Converting the file to ANSI did work, but this is to be an automated system, so I need a way to either have the file exported as ANSI, or else have a way to tell BULK INSERT to do the conversion.
Any other suggestions, anyone
Sorinuc
Hi,
I see it's been a while since you posted this, but I have just run into this same issue today and I was wondering if you were able to resolve it gracefully
regards