In the last couple of weeks we had some small issues again, but no nearly
complete wipe. Meanwhile, I switched on the logging of the cookies in our
apache proxy and enabled the authentication logging in our XWiki as follows:
added in /etc/xwiki/classes/logback.xml
<logger
name="com.xpn.xwiki.user.impl.xwiki.XWikiAuthServiceImpl"
level="info"/>
<logger name="com.xpn.xwiki.user.impl.LDAP.XWikiLDAPAuthServiceImpl"
level="info"/>
Sadly, the cookie and auth logging didn't give any new advice. The cookie
section in the html-header for google-crawler requests is empty, especially
for the critical edits of guest-protected sites.
One really spooky incidence from this week:
One of us got a e-mail notification that the AdminSheet
(XWiki.XWikiPreferences) has been edited by himself (with his admin
account). But this time, I couldn't find any request on the XWikiPreferences
in the request-log, and according to the auth-log, no one was logged in. But
during this edit, the google-crawler was active on our site.
jerem wrote
For my wiki, we used to have a crawler like this (snuffing every links out
there), then, due to the useless time spent trying to retrieve pages and
checking rights, I wrote small scripts to provide an indexing page to the
crawler.
We considered some preventions like this, but in my opinion it doesn't feel
good to have such a hole on the site. What if other crawlera doesn't fetch
the sitemap, or random visitors get the opportunity to delete content under
some special circumstances. Currently, this crawler issue isn't such a
performance problem for us. But it's on my agenda to have such a precomputed
sitemap for our site, but this is not urgent for us. The crawler-issue
causes headaches for me.
Maybe we should try to update to a newer XWiki version, at least to get rid
of the LDAP-Extension. But without a concrete run down of the real causes of
this issues, I will keep a bad feeling.
We aren't even able to reproduce the problem, this makes the
problem-analysis nearly impossible for us.
I hope to get any new hint and tips for things to try out. We are currently
really loss.
--
View this message in context:
http://xwiki.475771.n2.nabble.com/severe-trouble-with-web-crawlers-tp744216…
Sent from the XWiki- Users mailing list archive at
Nabble.com.