Why not simply use robots.txt, its is made for this, no ?
I noticed xwiki.org lagging and java process using lots of cpu, although ram was high it didn't seem to be a major issue.
I noticed this: - - [07/Jan/2012:21:13:00 +0100] "GET /xwiki/monitoring?part=graph&graph=sql34bb8b9d9f525e0790dab487491d120d4bc685cd&period=annee HTTP/1.1" 500 398 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
And seeing as it seems you can do things like heap dump and run garbage collector with get requests, that page should probably be made difficult to access.
One idea that comes to mind would be using mod_rewrite to change the url if there is no auth cookie set so any logged in user can view it but a roving bot can't.
infra mailing list