On 01/08/2012 03:53 PM, Guillaume Fenollar wrote:
Hi Kaya,
Yes, if you don't use any front webserver (ie Apache or nginx), you should
put robots.txt directly into /ROOT directory of tomcat (if this one listen
on port 80). After that, you can simply test your set up, trying to join
http://youdomain.org/robots.txt. If you don't find it this way, bots won't
find it neither.
Thanks for the response Guillaume!
I found a site:
http://www.frobee.com/robots-txt-check
which actually tests compliancey of the robots.txt and it seems mine are
fine.
Concerning the disallow directives, it is your choice to let the bots to
index what you want/need. My advice would be the make an inventory of space
and actions you don't want to index.
You could take this one as example:
http://cdlsworld.xwiki.com/robots.txt
I took a look at it and will compare that to the example off the Xwiki site.
Finally, it's funny you're asking about the fact that bots could harass
your server, because almost everyone want them (except for bad robots) to
come indexing their websites :-)
Anyway, I don't think that robots could take a remarkable amount of trafic.
But the users who find your content through search engines, will ;-) I
guess it's what you want.
It's not that I don't want things to be indexed or viewed but am getting
a strange issue on one of my Xwiki sites that whenever I load the site,
ie start tomcat, the memory usage is really low ~600MB; then after a
while the cpu will start working a little ~10% and the memory consumed
by the process will jump up to 1.6GB. There's not much on that site to
begin with, I mean my Wiki site has more information and images etc..
then this site which is my www site yet the www site is consuming way
more memory??
I'm not really sure of how to even begin debugging as I have both
webalizer and awstats working on my reverse Squid proxy infront of
tomcat. So far awstats which has been working from the beginning (3rd
Jan this year) shows nearly 9000 hits :-S out of which a lot come from
Googlebot.
That was my only issue.
The URLs of both sites are here:
http://www.optiplex-networks.com
http://wiki.optiplex-networks.com
and footprints are shown here:
PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
51547 22 www 46 44 0 3545M 1590M ucond 1 6:04 0.00% java
28878 14 www 49 44 0 3544M 404M ucond 0 3:47 0.00% java
with JID 14 being the wiki. site and JID 22 being the www. site.....
Regards,
Regards,
Kaya