[xwiki-users] support for google sitemaps and webmaster tools? (and why do xwiki RDF's give "unsupported file format"?)
Has anybody figured out a way to get xwiki to generate a Google sitemap as an alternative to RSS feeds? (see http://www.google.com/webmasters/tools/docs/en/protocol.html ). Also, video sitemaps ( http://www.google.com/support/webmasters/bin/answer.py?answer=80472&cbid=sw9...) would be useful to index xwiki attachments and media on media search engines ( http://video.google.com/ ). Video sitemaps are predicated on MRSS: *Using an mRSS feed as a Video Sitemap* back to top<http://www.google.com/support/webmasters/bin/answer.py?answer=80472&cbid=sw9kur0v6z6j&src=cb&lev=topic#Top> Google supports mRSS <http://search.yahoo.com/mrss>, an RSS module that supplements the element capabilities of RSS 2.0<http://cyber.law.harvard.edu/rss/rss.html>to allow for more robust media syndication. If you publish an mRSS feed for the video content on your site, you can submit the feed's URL as a Sitemap. For detailed information on creating an mRSS feed, including samples and best practices, please see the Media RSS specification<http://search.yahoo.com/mrss>. Google also supports RSS 2.0 using enclosures tags for video content and thumbnail urls. **Sitemaps seem to use their own protocol:
*XML Sitemap Format*
The Sitemap Protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped<https://www.google.com/webmasters/tools/docs/en/protocol.html#escaped>. The file itself must be UTF-8 encoded.
A sample Sitemap that contains just one URL and uses all optional tags is shown below. The optional tags are in italics.
<?xml version="1.0" encoding="UTF-8"?> <urlset <https://www.google.com/webmasters/tools/docs/en/protocol.html#urlsetdef> xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url <https://www.google.com/webmasters/tools/docs/en/protocol.html#urldef>> <loc <https://www.google.com/webmasters/tools/docs/en/protocol.html#locdef>>http://www.example.com/</loc> *<lastmod <https://www.google.com/webmasters/tools/docs/en/protocol.html#lastmoddef>>2005-01-01</lastmod> <changefreq <https://www.google.com/webmasters/tools/docs/en/protocol.html#changefreqdef>>monthly</changefreq> <priority <https://www.google.com/webmasters/tools/docs/en/protocol.html#prioritydef>>0.8</priority>* </url> </urlset>
The Sitemap must:
- Begin with an opening <urlset> tag and end with a closing </urlset>tag. - Include a <url> entry for each URL as a parent XML tag. - Include a <loc> child entry for each <url> parent tag.
Niels http://nielsmayer.com PS: google sitemap support sounds like a good entry-level GSOC project. :-) Probably just a big hack to existing code in Main.WebRss?xpage=rdf and Main.BlogRss?xpage=rdf. Probably a 1 day project for someone that knows how.... PPS: why does http://nielsmayer.com/xwiki/bin/view/Main/WebRss?xpage=rdfwork in many places, display correctly in firefox 3, but when used to generate a sitemap in google webmaster tools, it fails. (Meanwhile roller blogger's RSS feed succeeds): roller/NielsMayer/feed/entries/atom Atom Feed 5 hours ago OK 32 xwiki/bin/view/Main/BlogRss?xpage=rdf -- 5 hours ago Errors -- xwiki/bin/view/Main/WebRss?xpage=rdf -- 5 hours ago Errors The reported problem coming from xwiki's xpage=rdf feeds: *Unsupported file format*
Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit. Help<http://www.google.com/support/webmasters/bin/answer.py?answer=35738&hl=en> [image: Help]
Niels Mayer wrote:
Has anybody figured out a way to get xwiki to generate a Google sitemap as an alternative to RSS feeds? (see http://www.google.com/webmasters/tools/docs/en/protocol.html ). Also, video sitemaps ( http://www.google.com/support/webmasters/bin/answer.py?answer=80472&cbid=sw9...) would be useful to index xwiki attachments and media on media search engines ( http://video.google.com/ ). Video sitemaps are predicated on MRSS:
*Using an mRSS feed as a Video Sitemap* back to top<http://www.google.com/support/webmasters/bin/answer.py?answer=80472&cbid=sw9kur0v6z6j&src=cb&lev=topic#Top>
Google supports mRSS <http://search.yahoo.com/mrss>, an RSS module that supplements the element capabilities of RSS 2.0<http://cyber.law.harvard.edu/rss/rss.html>to allow for more robust media syndication. If you publish an mRSS feed for the video content on your site, you can submit the feed's URL as a Sitemap. For detailed information on creating an mRSS feed, including samples and best practices, please see the Media RSS specification<http://search.yahoo.com/mrss>. Google also supports RSS 2.0 using enclosures tags for video content and thumbnail urls.
**Sitemaps seem to use their own protocol:
*XML Sitemap Format*
The Sitemap Protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped<https://www.google.com/webmasters/tools/docs/en/protocol.html#escaped>. The file itself must be UTF-8 encoded.
A sample Sitemap that contains just one URL and uses all optional tags is shown below. The optional tags are in italics.
<?xml version="1.0" encoding="UTF-8"?> <urlset <https://www.google.com/webmasters/tools/docs/en/protocol.html#urlsetdef> xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url <https://www.google.com/webmasters/tools/docs/en/protocol.html#urldef>> <loc <https://www.google.com/webmasters/tools/docs/en/protocol.html#locdef>>http://www.example.com/</loc> *<lastmod <https://www.google.com/webmasters/tools/docs/en/protocol.html#lastmoddef>>2005-01-01</lastmod> <changefreq <https://www.google.com/webmasters/tools/docs/en/protocol.html#changefreqdef>>monthly</changefreq> <priority <https://www.google.com/webmasters/tools/docs/en/protocol.html#prioritydef>>0.8</priority>* </url> </urlset>
The Sitemap must:
- Begin with an opening <urlset> tag and end with a closing </urlset>tag. - Include a <url> entry for each URL as a parent XML tag. - Include a <loc> child entry for each <url> parent tag.
Niels http://nielsmayer.com
PS: google sitemap support sounds like a good entry-level GSOC project. :-) Probably just a big hack to existing code in Main.WebRss?xpage=rdf and Main.BlogRss?xpage=rdf. Probably a 1 day project for someone that knows how....
PPS: why does http://nielsmayer.com/xwiki/bin/view/Main/WebRss?xpage=rdfwork in many places, display correctly in firefox 3, but when used to generate a sitemap in google webmaster tools, it fails. (Meanwhile roller blogger's RSS feed succeeds):
roller/NielsMayer/feed/entries/atom Atom Feed 5 hours ago OK 32 xwiki/bin/view/Main/BlogRss?xpage=rdf -- 5 hours ago Errors -- xwiki/bin/view/Main/WebRss?xpage=rdf -- 5 hours ago Errors
The reported problem coming from xwiki's xpage=rdf feeds:
*Unsupported file format*
Just an idea... maybe they expect the ".xml" extension ? Jerome
Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit. Help<http://www.google.com/support/webmasters/bin/answer.py?answer=35738&hl=en> [image: Help]
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
On Thu, Mar 12, 2009 at 1:44 PM, Jerome Velociter <[email protected]> wrote:
-- 5 hours ago Errors -- xwiki/bin/view/Main/WebRss?xpage=rdf -- 5 hours ago Errors
The reported problem coming from xwiki's xpage=rdf feeds:
*Unsupported file format*
Just an idea... maybe they expect the ".xml" extension ?
Thanks for the hint. The other feeds don't have XML extension, however, I believe your answer is close to the problem. No matter what I did in my setup, I was getting <?xml version="1.0" encoding="ISO-8859-1" ?>
The roller blog atom feed that *does* work correctly w/ google sitemaps returns:
<?xml version="1.0" encoding='utf-8'?>
I fixed this issue by running java with -Dfile.encoding=UTF-8 (note the lowercase setting suggested in http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Performances seems incorrect?). When that alone didn't work, I also added " -Djavax.servlet.request.encoding=UTF-8-DjavaEncoding=UTF-8" which had been suggested in solving this problem for other Tomcat users. (Now I run java with the following options:-server -Xms160m -Xmx1024m -XX:PermSize=160m -XX:MaxPermSize=320m -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8 -DjavaEncoding=UTF-8 -Djava.awt.headless=true) I also saw other suggestions to set LANG="en_US.UTF-8" in the tomcat launching script... however, I'm not sure which of my changes "did" it, but i believe that following two steps I'd forgotten||skipped in http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Encoding caused the correct encoding to be used: (1) WEB-INF> diff web.xml.~1~ web.xml 23c23 < <param-value>ISO-8859-1</param-value> ---
<param-value>UTF-8</param-value>
(2) WEB-INF> diff xwiki.cfg.~2~ xwiki.cfg 29c29 < xwiki.encoding=ISO-8859-1 ---
xwiki.encoding=UTF-8
With all of the above now reconfigured, I now get the correct output for http://nielsmayer.com/xwiki/bin/view/Main/WebRss?xpage=rdf : <?xml version="1.0" encoding="UTF-8" ?> I'll find out whether changing the encoding fixes the *"Unsupported file format*" error when passing a Xwiki RDF feed to http://www.google.com/webmasters/tools/docs/en/about.html . -- Niels http://nielsmayer.com PS: Why not just have xwiki.cfg's default be: 'xwiki.encoding=UTF-8' ; likewise have web.xml's default for com.xpn.xwiki.web.SetCharacterEncodingFilter's 'encoding' be UTF-8. These encoding errors that oft go unnoticed are probably resulting in a number of configuration errors, and perhaps other bug-reports that aren't entirely valid, should they depend on encoding issues.
Partially answering my own question... there is some support for Sitemaps in XWiki http://code.xwiki.org/xwiki/bin/view/Snippets/XmlSitemapGeneratorSnippet The above might not even be necessary since google builds sitemaps out of RSS fedds as well. However, they are not working with XWiki's RSS, only with another application's "Atom" format feed. I have three working XWiki feeds per my firefox browser: http://nielsmayer.com/xwiki/bin/view/Blog/GlobalBlogRss?xpage=plain http://nielsmayer.com/xwiki/bin/view/Main/TagsRss?xpage=plain http://nielsmayer.com/xwiki/bin/view/Main/WebRss?xpage=plain Yet when given to Google Webmaster tools<https://www.google.com/webmasters/tools/docs/en/about.html>, these feeds turn up errors, even though I corrected the encoding of the files to UTF8 (see prev message): roller/NielsMayer/feed/entries/atom
Atom Feed Mar 14, 2009 OK 32 xwiki/bin/view/Blog/GlobalBlogRss?xpage=plain -- 16 hours ago Errors -- xwiki/bin/view/Main/TagsRss?xpage=plain -- Mar 14, 2009 Errors -- xwiki/bin/view/Main/WebRss?xpage=plain -- 12 hours ago Errors --
The specific error continues to be:
- *Unsupported file format* Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit. Help<http://www.google.com/support/webmasters/bin/answer.py?answer=35738&hl=en> [image: Help]
Any further suggestions?? Is there a way to get XWiki to output 'atom' instead of 'RSS'? Perhaps google will accept atom-based feeds but not RSS? The docs indicate it handles RSS feeds. Niels http://nielsmayer.com
Hi Niels, Niels Mayer wrote:
Partially answering my own question... there is some support for Sitemaps in XWiki http://code.xwiki.org/xwiki/bin/view/Snippets/XmlSitemapGeneratorSnippet
The above might not even be necessary since google builds sitemaps out of RSS fedds as well. However, they are not working with XWiki's RSS, only with another application's "Atom" format feed.
I have three working XWiki feeds per my firefox browser: http://nielsmayer.com/xwiki/bin/view/Blog/GlobalBlogRss?xpage=plain http://nielsmayer.com/xwiki/bin/view/Main/TagsRss?xpage=plain http://nielsmayer.com/xwiki/bin/view/Main/WebRss?xpage=plain
Yet when given to Google Webmaster tools<https://www.google.com/webmasters/tools/docs/en/about.html>, these feeds turn up errors, even though I corrected the encoding of the files to UTF8 (see prev message):
roller/NielsMayer/feed/entries/atom
Atom Feed Mar 14, 2009 OK 32 xwiki/bin/view/Blog/GlobalBlogRss?xpage=plain -- 16 hours ago Errors -- xwiki/bin/view/Main/TagsRss?xpage=plain -- Mar 14, 2009 Errors -- xwiki/bin/view/Main/WebRss?xpage=plain -- 12 hours ago Errors --
The specific error continues to be:
- *Unsupported file format* Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit. Help<http://www.google.com/support/webmasters/bin/answer.py?answer=35738&hl=en> [image: Help]
Any further suggestions??
Is there a way to get XWiki to output 'atom' instead of 'RSS'? Perhaps google will accept atom-based feeds but not RSS? The docs indicate it handles RSS feeds.
See http://tinyurl.com/cps7oa and http://tinyurl.com/d3gvvc . Hope this helps, Marius
Niels http://nielsmayer.com _______________________________________________ users mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/users
Niels Mayer wrote:
Partially answering my own question... there is some support for Sitemaps in XWiki http://code.xwiki.org/xwiki/bin/view/Snippets/XmlSitemapGeneratorSnippet
The above might not even be necessary since google builds sitemaps out of RSS fedds as well. However, they are not working with XWiki's RSS, only with another application's "Atom" format feed.
I have three working XWiki feeds per my firefox browser: http://nielsmayer.com/xwiki/bin/view/Blog/GlobalBlogRss?xpage=plain http://nielsmayer.com/xwiki/bin/view/Main/TagsRss?xpage=plain http://nielsmayer.com/xwiki/bin/view/Main/WebRss?xpage=plain
Yet when given to Google Webmaster tools<https://www.google.com/webmasters/tools/docs/en/about.html>, these feeds turn up errors, even though I corrected the encoding of the files to UTF8 (see prev message):
roller/NielsMayer/feed/entries/atom
Atom Feed Mar 14, 2009 OK 32 xwiki/bin/view/Blog/GlobalBlogRss?xpage=plain -- 16 hours ago Errors -- xwiki/bin/view/Main/TagsRss?xpage=plain -- Mar 14, 2009 Errors -- xwiki/bin/view/Main/WebRss?xpage=plain -- 12 hours ago Errors --
The specific error continues to be:
- *Unsupported file format* Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit. Help<http://www.google.com/support/webmasters/bin/answer.py?answer=35738&hl=en> [image: Help]
Any further suggestions??
XWiki outputs RSS 1.0. Google expects RSS 2.0. There are major differences between the two, they can be seen as two different formats, not two versions of the same format.
Is there a way to get XWiki to output 'atom' instead of 'RSS'? Perhaps google will accept atom-based feeds but not RSS? The docs indicate it handles RSS feeds.
-- Sergiu Dumitriu http://purl.org/net/sergiu/
participants (4)
-
Jerome Velociter -
Marius Dumitru Florea -
Niels Mayer -
Sergiu Dumitriu