locked
How to check if your sitemap.xml file is UTF-8 encoded? RRS feed

  • Question

  • I'm having problems with Live search indexing my company's site.  I've read through a bunch of threads here and alot of people seem to have the same problem and then miraculously fix it because "my file wasn't UTF-8 encoded and had funky line breaks".

     

    My question is, how the heck do you check your file encoding / line breaks?  Please don't suggest some sort of Java/C++ code that I can run to check it... I don't have the required programming resources and I'm not profficient in programming.

     

    Is there a tool / utility to check and ensure proper encoding?

    Thanks,

    Dan

     

    EXTRA INFO:

    • I created the sitemap using xml-sitemaps.com (like everyone else), but I'm still having problems.
    • I use Dreamweaver CS3 and when trying to save the sitemap.xml file (I have NOT changed the actual file I downloaded from xml-sitemaps.com, mind you, I'm just checking DW's save options) there are 2 additional file options... "Unicode Normalization Form" with options (None, C, D, KC, KD) and "Include Unicode Signature (BOM)" with a simple checkbox (yes/no).  Can I overwrite the file with any of these options to ensure proper UTF-8 encoding?
    Wednesday, March 12, 2008 8:41 PM

Answers

  • I think I may have found the answer to my own question.

     

    If you goto http://validator.w3.org/

     

    And enter your sitemap.xml file, it will tell you the file encoding.  In my case, it IS encoded in us-ascii!  Not quite sure how that happened, but at least I know how to test the status now.  I will try to resave the file in UTF-8 and see if everything gets cleared up.

     

    UPDATE:

    I can seem to change the file encoding. My sitemap.xml file always shows up as US-ASCII.  I googled around for another sitemap.xml file to test and found this...

     

    http://legalindexes.indoff.com/sitemap.xml

     

    ...which is recognized as UTF-8. 

     

    How do I reencode my sitemap file?

    Wednesday, March 12, 2008 8:54 PM

All replies

  • I think I may have found the answer to my own question.

     

    If you goto http://validator.w3.org/

     

    And enter your sitemap.xml file, it will tell you the file encoding.  In my case, it IS encoded in us-ascii!  Not quite sure how that happened, but at least I know how to test the status now.  I will try to resave the file in UTF-8 and see if everything gets cleared up.

     

    UPDATE:

    I can seem to change the file encoding. My sitemap.xml file always shows up as US-ASCII.  I googled around for another sitemap.xml file to test and found this...

     

    http://legalindexes.indoff.com/sitemap.xml

     

    ...which is recognized as UTF-8. 

     

    How do I reencode my sitemap file?

    Wednesday, March 12, 2008 8:54 PM
  • You can examine the content of a sitemap.xml file by opening a command prompt window and typing debug sitemap.xml , then d to display a block of 128 bytes, ? for help, q for quit.

     

    I use Microsoft Notepad and do "save as" with the encoding selected to be UTF-8.

    The first few bytes of the saved file are   EF BB BF 3C 3F 78    ...<?x

    The first 3 bytes EF BB BF comprise a Byte Order Mark (BOM) Signature for UTF-8

     

    The end of line is signalled by 2 bytes, OD OA

     

    If you use some software package to generate your sitemap.xml, why not then read it using notepad and then save it using save as and selecting UTF-8 encoding ?

     

    The inclusion of the BOM is technically optional for UTF-8 files, but it may be that the msn sitemap decoder requires the BOM at the front of the file. I don't know.

     

    Best regards, Eric.

    Thursday, March 13, 2008 2:58 PM