Hi Kaya,
Yes, if you don't use any front webserver (ie Apache or nginx), you should
put robots.txt directly into /ROOT directory of tomcat (if this one listen
on port 80). After that, you can simply test your set up, trying to join
http://youdomain.org/robots.txt. If you don't find it this way, bots won't
find it neither.
Concerning the disallow directives, it is your choice to let the bots to
index what you want/need. My advice would be the make an inventory of space
and actions you don't want to index.
You could take this one as example:
http://cdlsworld.xwiki.com/robots.txt
Finally, it's funny you're asking about the fact that bots could harass
your server, because almost everyone want them (except for bad robots) to
come indexing their websites :-)
Anyway, I don't think that robots could take a remarkable amount of trafic.
But the users who find your content through search engines, will ;-) I
guess it's what you want.
Regards,
--
Guillaume Fenollar
XWiki SysAdmin
Tel : +33 (0)1.83.62.65.97
2012/1/8 Kaya Saman <kayasaman(a)gmail.com>
Hi,
in the Xwiki documentation for the robots.txt file it says to put it in
the webserver configuration.
http://platform.xwiki.org/**xwiki/bin/view/AdminGuide/**
Performances#HRobots.txt<http://platform.xwiki.org/xwiki/bin/view/AdminG…
On Tomcat where would this go? - Directly on the webapps/ROOT/ directory??
Also the directives used it claims:
# It could be also usefull to block certain spaces from crawling,
# especially if this spaces doesn't provide new content
Should the: /xwiki/bin/view/Photos/ portion also be excluded??
Just as a last thing, what kind of performance benefits would be adhered
to by stopping crawlers?
I am imagining: CPU, RAM, Network B/W........
Regards,
Kaya
______________________________**_________________
users mailing list
users(a)xwiki.org
http://lists.xwiki.org/**mailman/listinfo/users<http://lists.xwiki.org/m…