tangor

msg:4152633 | 12:52 am on Jun 15, 2010 (gmt 0) |
A four month old site with a half-million pages sounds like a dynamic site. Sounds like slim content is being found and lately (after Mayday) Google is not sending love...
|
manof

msg:4152635 | 1:22 am on Jun 15, 2010 (gmt 0) |
tangor,you are right,is dynamic site,and I also suspect is the Mayday. But I found some similar sites to my site in good condition, the operation of our approach is similar. My site and some other similar sites alexa rank among the 1000-1500 now.
|
aspdesigner

msg:4152657 | 1:55 am on Jun 15, 2010 (gmt 0) |
I can give you a quick answer on this topic... | "Googlebot has been continuously crawling some pages that obviously does not belong to my site:" |
| Some of those URL's being scanned appear warez & hacker-related! We have seen hackers scanning our sites pretending to be Googlebot, but searching for non-existent URLs...of vulnerable software they could exploit to take control of our servers, and turn the sites into virus distributors! Check the IP addresses for "Googlebot", here - http://ws.arin.net/whois [ws.arin.net] If it says something about "RIPE" (likely), then try entering it here - http://www.ripe.net/ [ripe.net] You may be surprised to see where that "Googlebot" actually originated! ;)
|
tedster

msg:4152660 | 2:01 am on Jun 15, 2010 (gmt 0) |
That is a very bizarre crawling pattern. It sounds very much like a technical bug on either your side or on Google's. Googlebot requesting just the home page three or four times an hour but no other URL requests? If your home page shows straight HTML links to the internal pages, then it makes no sense - unless you've got an accidental robots.txt disallow rule, or a robots meta-tag nofollow directive on the page by mistake.
|
manof

msg:4152664 | 2:15 am on Jun 15, 2010 (gmt 0) |
aspdesigner, Thank you. I am sure this is true Googlebot, because their IPs are 66.249.*.*.
|
manof

msg:4152668 | 2:31 am on Jun 15, 2010 (gmt 0) |
tedster, Thank you. I've test it in google webmaster tools,it show me that the Googlebot can visit any pages that meet the robots.txt rules. I also think it seems to be a technical bug, but before I just added a new server only, and no other action. Demo: [Proxy Server] / | \ [Web1] [web2] [New web3]
|
seoN00B

msg:4152674 | 2:54 am on Jun 15, 2010 (gmt 0) |
| A four month old site with a half-million pages sounds like a dynamic site. Sounds like slim content is being found and lately (after Mayday) Google is not sending love... |
| you mean dynamic sites like these has less love from Google?
|
tedster

msg:4152686 | 3:22 am on Jun 15, 2010 (gmt 0) |
Yes, if many of the dynamic pages have a low level of valuable content - i.e. stub pages or information available form many other sites.
|
aspdesigner

msg:4152697 | 4:03 am on Jun 15, 2010 (gmt 0) |
| "I am sure this is true Googlebot, because their IPs are 66.249.*.*." |
| Even the warez/hacker ones? With wildcards in the URL's? - I don't recall seeing Google do that before! If it was the REAL Googlebot, then the other possibility is that Google actually found links to your site pointing to those locations! Which might mean that someone had gained control of your server, and was using it as a warez distribution point? I had that happen to one of my clients years back - his hosting company hadn't kept-up with all the latest security mods, and hackers had broken-in, and stashed several hundred meg of warez on his site! That "/usenext/#######/..." here - | /usenext/1213093/*+*+5.0.15+*+Full+Version.exe.html |
| is particularly suspicious. That's a warez D/L page, and it appears they were using #'d sub-directories, perhaps erasing the old ones to avoid detection. I would take a SERIOUS look at your server, including looking for any hidden/system sub-directories or files. Also, see if you can find-out who is linking to those locations on your site! (try Webmaster Tools, and Yahoo "linkdomain:")
|
manof

msg:4152724 | 5:07 am on Jun 15, 2010 (gmt 0) |
aspdesigner, Thank you. I am sure that my servers do not been hacked or abused. The full URL is: "GET /usenext/1213093/Digital+Patrol+5.0.15+Cracked+Full+Version.exe.html HTTP/1.1" 404 |
| The IPs "66.249.*.*" is real Googlebot.
|
tedster

msg:4152853 | 11:50 am on Jun 15, 2010 (gmt 0) |
| I am sure this is true Googlebot, because their IPs are 66.249.*.* |
| Note that just doing a reverse DNS look-up for the IP address is not enough to verify googlebot. See [webmasterworld.com...] The dark forces can spoof a reverse DNS ;(
|
manof

msg:4153306 | 12:57 am on Jun 16, 2010 (gmt 0) |
tedster, Thank you. This is a recent log,it's real Googlebot: 66.249.71.148 - example.com - [15/Jun/2010:04:24:46 -0400] "GET /projects/sfnet_pys60/downloads/pys60/1.4.0/pys60-1.4.0_src.zip?use_mirror=jaist HTTP/1.1" 404 |
|
|
seoisabusiness

msg:4153494 | 9:50 am on Jun 16, 2010 (gmt 0) |
hi, we see something similar [webmasterworld.com...]
|
manof

msg:4153502 | 10:28 am on Jun 16, 2010 (gmt 0) |
seoisabusiness, Thank you. It's very useful.
|
manof

msg:4153653 | 3:08 pm on Jun 16, 2010 (gmt 0) |
PS: This is not the real existence,just a automatic search result pages. Recent logs: 66.249.71.148 - example.com - [16/Jun/2010:07:20:34 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:07:32:28 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:07:42:17 -0400] "GET /my_brothers_wife_barbara_mori HTTP/1.1" 200 3179 66.249.71.148 - example.com - [16/Jun/2010:07:44:41 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:07:58:58 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:08:08:41 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:08:19:49 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:20:44 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:08:23:43 -0400] "GET /zip/ml15.zip HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:18 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:18 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:19 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:19 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:20 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:20 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:21 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:21 -0400] "GET /Downloads/Security-Privacy/Password-Managers/72323.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:24:56 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:08:25:01 -0400] "GET /setup.exe HTTP/1.1" 200 2664 66.249.71.148 - example.com - [16/Jun/2010:08:32:44 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:08:44:52 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:08:56:03 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:08:56:56 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:09:07:38 -0400] "GET /setup.exe HTTP/1.1" 200 2685 66.249.71.148 - example.com - [16/Jun/2010:09:08:08 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:08:55 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:09:09:27 -0400] "GET /setup.exe HTTP/1.1" 200 2621 66.249.71.148 - example.com - [16/Jun/2010:09:17:23 -0400] "GET / HTTP/1.1" 200 4987 "-" "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" 66.249.71.148 - example.com - [16/Jun/2010:09:20:51 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:09:30:16 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:31:28 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:32:53 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:09:38:21 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:38:22 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:38:22 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:38:23 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:44:32 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:09:50:12 -0400] "GET /zip/ml15.zip HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:18 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:19 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:19 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:20 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:21 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:21 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:22 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:51:22 -0400] "GET /Downloads/Security-Privacy/Password-Managers/72323.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:09:56:42 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:10:08:22 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:10:13:59 -0400] "GET /setup.exe HTTP/1.1" 200 2578 66.249.71.148 - example.com - [16/Jun/2010:10:20:20 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:10:32:25 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:10:33:51 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202 66.249.71.148 - example.com - [16/Jun/2010:10:36:22 -0400] "GET /setup.exe HTTP/1.1" 200 2674 66.249.71.148 - example.com - [16/Jun/2010:10:43:33 -0400] "GET /my_brothers_wife_barbara_mori HTTP/1.1" 200 2987 66.249.71.148 - example.com - [16/Jun/2010:10:44:20 -0400] "GET / HTTP/1.1" 200 4987 66.249.71.148 - example.com - [16/Jun/2010:10:56:33 -0400] "GET / HTTP/1.1" 200 4987 |
|
|
tedster

msg:4153665 | 3:27 pm on Jun 16, 2010 (gmt 0) |
Let's combine these two very similar threads - discussion continues here: [webmasterworld.com...]
|
|