Newbie-What is the proper way to handle special characters in XML?

I'm having a problem with single quotes in my XML document. I have Xml that resembles something like below:

< xml version="1.0" encoding="utf-8" >
<root>
<MyText>Text contain's a single quote</MyText>
</root>


This content resides in a file and when I extract the xml and pass it to a stored procedure, instead of the entire xml being sent to the stored procedure, it is being truncated because of the single quote in the word "contain's" and is causing the stored procedure to error out.

What is the best way to handle this The xml above is being created via the XmlSerializer class. Is there a way to instruct the serializer to encode all of these values as it serializes the content to file Thanks for the help!


Answer this question

Newbie-What is the proper way to handle special characters in XML?

  • Alejo Mejia

    Yeah I pretty much figured I could do a search and replace on the characters, but my main question was is there a way to instruct the XmlSerializer to do this for me as it creates the xml

    Seems like I remember reading somewhere that you should substititute single quotes with "&quote;" (or something like that) and less than or greater than with "&lt;" (or something like that). Does this ring a bell with anyone

  • no.a.m

    Bo146
    You should not be using Sql queries to pass parameters to stored procedures or Sql statements

    You should instead be using SqlParameters - this way you dont ever have to worry about sql injections, escaping characters or etc



    SqlCommand TmpCommand = new TmpCommand("MyStoredProcedure", DbConnection);
    TmpCommand.CommandType = SqlCommandType.StoredProcedure
    TmpCommand.Parameters.Add(new SqlParameter("@ParameterName", (Object) Value));
    TmpCommand.ExecuteNonQuery();

    This should solve most problems that you have regarding escaping characters. This makes your application more reliable and more secure because
    the parameters are put in a seperate buffer. A buffer that is only used for data. And it just grabs it and shoves it into the database in the appropriate spots. Much more secure, probably a bit faster and definitely less of a headache once you get used to doing it.

  • Zau

    http://www.w3.org/TR/REC-xml/ section 2.4 talks about character entities, but I don't think this will solve your problem.

    It sounds like you are building up a SQL string using string concatenation, which is why you are running into the problem with the single quote. XML entities will not solve this for all cases since the set of characters that are significant for SQL and XML are different. To do this safely you should use parameters to pass the data to your stored procedure.

    Here is an example of using parameters from .Net with SqlClient:

    http://msdn.microsoft.com/library/default.asp url=/library/en-us/cpref/html/frlrfsystemdatasqlclientsqlparametercollectionclasstopic.asp



  • gamer36

    In case it helps, here is what the xml looks like. I put in a break point in the code and this is the value of doc.OuterXml

    < xml version="1.0" >
    <utility xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <name>LogImporter</name>
    <storenumber>SSG</storenumber>
    <computername>S-JHARRIS-D</computername>
    <processes>
    <Process polled="false">
    <status>Failed</status>
    <version>1.0.0.0</version>
    <starttime>2006-05-23T10:03:34.2505975-07:00</starttime>
    <endtime>2006-05-23T10:03:35.4068845-07:00</endtime>
    <messages>
    <message>
    <messagetext>Processing file: C:\StoreXml\ArchiveSnapShot.xml</messagetext>
    <messagelabel/>
    </message>
    </messages>
    <errors>
    <error>
    <errnumber>0</errnumber>
    <errmessage>A .NET Framework error occurred during execution of user defined routine or aggregate 'ParseAndSaveXmlLog':
    System.NullReferenceException: Object reference not set to an instance of an object.
    .</errmessage>
    </error>
    </errors>
    </Process>
    </processes>
    </utility>

  • Brian Cecile

    In your case just replacing quote with two qoutes will to the trick. (str.Replace("'","''");

    I recommend you to look web for SQL Injection attacks.



  • Ark Man

    Thanks to both of you guys for helping me through this. It wasn't the single quotes after all but it was being truncated because the varchar size was not using MAX.

  • Amjosp

    Also if it's an XmlDocument there are the propoerties OuterXml/InnerXml and OuterText/InnerText

    You should notice that the OuterXml/InnerXml will give you everything escaped. I am notr sure what xml classes you are using so you may need to do abit of hunting and reading.

  • DevInstinct

    The quotes should not (and do not for me) cause any issues if you are using stored procedures. The only thing I can see it being truncated for is if the field is not long enough and usually it throws an exception stating it would be truncated .

    How are you verifying that the data is being truncated
    Have you ensured that all of the data types in the db AND in the stored procedure parameter are big enough to accomodate the text

  • Kamran

    What is the error you are seeing

    In your stored procedure what types are the parameters declared as

    What is the Stored procedure doing Anything interesting sp_execute SQL Dynamically constructing SQL

    I don't see a way to control the entitization in the XML serializer, but if the XML is getting parsed, then it is as entitized as it needs to be for parsing. You could use Replace to furthur entitize it.



  • Sanjeev Agrawal

    WHOA NELLY!!!

    I appreciate the responses guys but we are definitely going down the wrong road here. To give you a better idea of whats going on I've posted the code below.
    .
    .
    .
    SqlCommand cmd = new SqlCommand("dbo.ParseAndSaveXmlLog", CurrentConnection);

    cmd.CommandType = CommandType.StoredProcedure;

    foreach ( string fileName in files )
    {
    XmlDocument doc = new XmlDocument();
    doc.Load(fileName);
    cmd.Parameters.Clear();
    cmd.Parameters.AddWithValue("@XmlData", doc.OuterXml);
    cmd.ExecuteNonQuery();
    }


    So as you can see, I'm already using parameters, I'm not worried about sql injection as this is not a web app just a simple script. The script is failing when I call ExecuteNonQuery because the xml is being truncated by the single quote. So to sum it up:

    • The file I am reading in contains Xml
    • The file was created using the XmlSerializer class

    My question is this:
    • Is there a way to instruct the XmlSerializer class to automatically encode all special characters
    OR
    • Is there any class in the framework that I can use to automatically encode all special characters (barring the use of the Replace() )

  • Newbie-What is the proper way to handle special characters in XML?