Welcome to WebmasterWorld Guest from 54.145.80.57

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Googlebot only crawling my home page now

     
10:18 pm on Jun 14, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


Hi,guys,

My site has released for 4 months,in the last 15 days,Googlebot crawling 300,000-500,000 pages every day.

But,From 08/06/2010 I find Googlebot usually only crawling my homepage every 10-20 minutes:

"GET / HTTP/1.1" 200

Occasional,Googlebot will repeatedly crawling other 3-4 pages,such as:

............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............


Googlebot has been continuously crawling some pages that obviously does not belong to my site:

"GET /include/setup.exe HTTP/1.1" 404
"GET /author/*_klayiv_*/pisma_*/download.*.prc.zip HTTP/1.1" 404
"GET /usenext/1213093/*+*+5.0.15+*+Full+Version.exe.html HTTP/1.1" 404
"GET /download/projects/vcpp/*_screen.zip HTTP/1.1" 404

Then I changed my site's IP,Googlebot get the robots.txt,but it continues to crawl my site's home page only:

"GET /robots.txt HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............


I've checked google webmaster tools,did not find any abnormality.

Google indexing has dropped some,24 hours indexing is zero,and my site's traffic has dropped 1/3.

I search my site name in google,is first.

I search "site:example.com" in google, and the domain root, example.com, is not the first.


Why?

Any help appreciated,thanks.

[edited by: tedster at 10:44 pm (utc) on Jun 14, 2010]
[edit reason] switch to example.com - it cannot be owned [/edit]

12:52 am on June 15, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6907
votes: 379


A four month old site with a half-million pages sounds like a dynamic site. Sounds like slim content is being found and lately (after Mayday) Google is not sending love...
1:22 am on June 15, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


tangor,you are right,is dynamic site,and I also suspect is the Mayday.

But I found some similar sites to my site in good condition, the operation of our approach is similar.

My site and some other similar sites alexa rank among the 1000-1500 now.
1:55 am on June 15, 2010 (gmt 0)

Full Member

10+ Year Member

joined:Jan 1, 2003
posts:212
votes: 0


I can give you a quick answer on this topic...

"Googlebot has been continuously crawling some pages that obviously does not belong to my site:"

Some of those URL's being scanned appear warez & hacker-related!

We have seen hackers scanning our sites pretending to be Googlebot, but searching for non-existent URLs...of vulnerable software they could exploit to take control of our servers, and turn the sites into virus distributors!

Check the IP addresses for "Googlebot", here -

http://ws.arin.net/whois [ws.arin.net]

If it says something about "RIPE" (likely), then try entering it here -

http://www.ripe.net/ [ripe.net]

You may be surprised to see where that "Googlebot" actually originated! ;)
2:01 am on June 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


That is a very bizarre crawling pattern. It sounds very much like a technical bug on either your side or on Google's. Googlebot requesting just the home page three or four times an hour but no other URL requests?

If your home page shows straight HTML links to the internal pages, then it makes no sense - unless you've got an accidental robots.txt disallow rule, or a robots meta-tag nofollow directive on the page by mistake.
2:15 am on June 15, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


aspdesigner,

Thank you.

I am sure this is true Googlebot, because their IPs are 66.249.*.*.
2:31 am on June 15, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


tedster,

Thank you.

I've test it in google webmaster tools,it show me that the Googlebot can visit any pages that meet the robots.txt rules.

I also think it seems to be a technical bug, but before I just added a new server only, and no other action.

Demo:

[Proxy Server]
/ | \

[Web1] [web2] [New web3]
2:54 am on June 15, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 22, 2010
posts:85
votes: 0


A four month old site with a half-million pages sounds like a dynamic site. Sounds like slim content is being found and lately (after Mayday) Google is not sending love...


you mean dynamic sites like these has less love from Google?
3:22 am on June 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Yes, if many of the dynamic pages have a low level of valuable content - i.e. stub pages or information available form many other sites.
4:03 am on June 15, 2010 (gmt 0)

Full Member

10+ Year Member

joined:Jan 1, 2003
posts:212
votes: 0


"I am sure this is true Googlebot, because their IPs are 66.249.*.*."

Even the warez/hacker ones? With wildcards in the URL's? - I don't recall seeing Google do that before!

If it was the REAL Googlebot, then the other possibility is that Google actually found links to your site pointing to those locations!

Which might mean that someone had gained control of your server, and was using it as a warez distribution point?

I had that happen to one of my clients years back - his hosting company hadn't kept-up with all the latest security mods, and hackers had broken-in, and stashed several hundred meg of warez on his site!

That "/usenext/#######/..." here -

/usenext/1213093/*+*+5.0.15+*+Full+Version.exe.html

is particularly suspicious. That's a warez D/L page, and it appears they were using #'d sub-directories, perhaps erasing the old ones to avoid detection.

I would take a SERIOUS look at your server, including looking for any hidden/system sub-directories or files.

Also, see if you can find-out who is linking to those locations on your site! (try Webmaster Tools, and Yahoo "linkdomain:")
5:07 am on June 15, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


aspdesigner,

Thank you.

I am sure that my servers do not been hacked or abused.

The full URL is:

"GET /usenext/1213093/Digital+Patrol+5.0.15+Cracked+Full+Version.exe.html HTTP/1.1" 404


The IPs "66.249.*.*" is real Googlebot.
11:50 am on June 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I am sure this is true Googlebot, because their IPs are 66.249.*.*


Note that just doing a reverse DNS look-up for the IP address is not enough to verify googlebot. See [webmasterworld.com...] The dark forces can spoof a reverse DNS ;(
12:57 am on June 16, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


tedster,

Thank you.

This is a recent log,it's real Googlebot:


66.249.71.148 - example.com - [15/Jun/2010:04:24:46 -0400] "GET /projects/sfnet_pys60/downloads/pys60/1.4.0/pys60-1.4.0_src.zip?use_mirror=jaist HTTP/1.1" 404
9:50 am on June 16, 2010 (gmt 0)

New User

5+ Year Member

joined:June 15, 2010
posts:10
votes: 0


hi, we see something similar [webmasterworld.com...]
10:28 am on June 16, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


seoisabusiness,

Thank you.

It's very useful.
3:08 pm on June 16, 2010 (gmt 0)

New User

5+ Year Member

joined:June 14, 2010
posts:12
votes: 0


PS:
/setup.exe

This is not the real existence,just a automatic search result pages.

Recent logs:


66.249.71.148 - example.com - [16/Jun/2010:07:20:34 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:07:32:28 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:07:42:17 -0400] "GET /my_brothers_wife_barbara_mori HTTP/1.1" 200 3179
66.249.71.148 - example.com - [16/Jun/2010:07:44:41 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:07:58:58 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:08:41 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:19:49 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:20:44 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:23:43 -0400] "GET /zip/ml15.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:18 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:18 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:19 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:19 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:20 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:20 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:21 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:21 -0400] "GET /Downloads/Security-Privacy/Password-Managers/72323.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:56 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:25:01 -0400] "GET /setup.exe HTTP/1.1" 200 2664
66.249.71.148 - example.com - [16/Jun/2010:08:32:44 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:44:52 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:56:03 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:56:56 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:07:38 -0400] "GET /setup.exe HTTP/1.1" 200 2685
66.249.71.148 - example.com - [16/Jun/2010:09:08:08 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:08:55 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:09:27 -0400] "GET /setup.exe HTTP/1.1" 200 2621
66.249.71.148 - example.com - [16/Jun/2010:09:17:23 -0400] "GET / HTTP/1.1" 200 4987 "-" "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
66.249.71.148 - example.com - [16/Jun/2010:09:20:51 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:30:16 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:31:28 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:32:53 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:38:21 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:38:22 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:38:22 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:38:23 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:44:32 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:50:12 -0400] "GET /zip/ml15.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:18 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:19 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:19 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:20 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:21 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:21 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:22 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:22 -0400] "GET /Downloads/Security-Privacy/Password-Managers/72323.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:56:42 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:08:22 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:13:59 -0400] "GET /setup.exe HTTP/1.1" 200 2578
66.249.71.148 - example.com - [16/Jun/2010:10:20:20 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:32:25 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:33:51 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:10:36:22 -0400] "GET /setup.exe HTTP/1.1" 200 2674
66.249.71.148 - example.com - [16/Jun/2010:10:43:33 -0400] "GET /my_brothers_wife_barbara_mori HTTP/1.1" 200 2987
66.249.71.148 - example.com - [16/Jun/2010:10:44:20 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:56:33 -0400] "GET / HTTP/1.1" 200 4987

3:27 pm on June 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Let's combine these two very similar threads - discussion continues here: [webmasterworld.com...]