homepage Welcome to WebmasterWorld Guest from 204.236.254.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Googlebot only crawling my home page now
manof




msg:4152590
 10:18 pm on Jun 14, 2010 (gmt 0)

Hi,guys,

My site has released for 4 months,in the last 15 days,Googlebot crawling 300,000-500,000 pages every day.

But,From 08/06/2010 I find Googlebot usually only crawling my homepage every 10-20 minutes:

"GET / HTTP/1.1" 200

Occasional,Googlebot will repeatedly crawling other 3-4 pages,such as:

............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............
"GET / HTTP/1.1" 200
"GET /my_*_*_*_mori HTTP/1.1" 200
"GET / HTTP/1.1" 200
............


Googlebot has been continuously crawling some pages that obviously does not belong to my site:

"GET /include/setup.exe HTTP/1.1" 404
"GET /author/*_klayiv_*/pisma_*/download.*.prc.zip HTTP/1.1" 404
"GET /usenext/1213093/*+*+5.0.15+*+Full+Version.exe.html HTTP/1.1" 404
"GET /download/projects/vcpp/*_screen.zip HTTP/1.1" 404

Then I changed my site's IP,Googlebot get the robots.txt,but it continues to crawl my site's home page only:

"GET /robots.txt HTTP/1.1" 200
"GET / HTTP/1.1" 200
............
............
............


I've checked google webmaster tools,did not find any abnormality.

Google indexing has dropped some,24 hours indexing is zero,and my site's traffic has dropped 1/3.

I search my site name in google,is first.

I search "site:example.com" in google, and the domain root, example.com, is not the first.


Why?

Any help appreciated,thanks.

[edited by: tedster at 10:44 pm (utc) on Jun 14, 2010]
[edit reason] switch to example.com - it cannot be owned [/edit]

 

tangor




msg:4152633
 12:52 am on Jun 15, 2010 (gmt 0)

A four month old site with a half-million pages sounds like a dynamic site. Sounds like slim content is being found and lately (after Mayday) Google is not sending love...

manof




msg:4152635
 1:22 am on Jun 15, 2010 (gmt 0)

tangor,you are right,is dynamic site,and I also suspect is the Mayday.

But I found some similar sites to my site in good condition, the operation of our approach is similar.

My site and some other similar sites alexa rank among the 1000-1500 now.

aspdesigner




msg:4152657
 1:55 am on Jun 15, 2010 (gmt 0)

I can give you a quick answer on this topic...

"Googlebot has been continuously crawling some pages that obviously does not belong to my site:"

Some of those URL's being scanned appear warez & hacker-related!

We have seen hackers scanning our sites pretending to be Googlebot, but searching for non-existent URLs...of vulnerable software they could exploit to take control of our servers, and turn the sites into virus distributors!

Check the IP addresses for "Googlebot", here -

http://ws.arin.net/whois [ws.arin.net]

If it says something about "RIPE" (likely), then try entering it here -

http://www.ripe.net/ [ripe.net]

You may be surprised to see where that "Googlebot" actually originated! ;)

tedster




msg:4152660
 2:01 am on Jun 15, 2010 (gmt 0)

That is a very bizarre crawling pattern. It sounds very much like a technical bug on either your side or on Google's. Googlebot requesting just the home page three or four times an hour but no other URL requests?

If your home page shows straight HTML links to the internal pages, then it makes no sense - unless you've got an accidental robots.txt disallow rule, or a robots meta-tag nofollow directive on the page by mistake.

manof




msg:4152664
 2:15 am on Jun 15, 2010 (gmt 0)

aspdesigner,

Thank you.

I am sure this is true Googlebot, because their IPs are 66.249.*.*.

manof




msg:4152668
 2:31 am on Jun 15, 2010 (gmt 0)

tedster,

Thank you.

I've test it in google webmaster tools,it show me that the Googlebot can visit any pages that meet the robots.txt rules.

I also think it seems to be a technical bug, but before I just added a new server only, and no other action.

Demo:

[Proxy Server]
/ | \

[Web1] [web2] [New web3]

seoN00B




msg:4152674
 2:54 am on Jun 15, 2010 (gmt 0)

A four month old site with a half-million pages sounds like a dynamic site. Sounds like slim content is being found and lately (after Mayday) Google is not sending love...


you mean dynamic sites like these has less love from Google?

tedster




msg:4152686
 3:22 am on Jun 15, 2010 (gmt 0)

Yes, if many of the dynamic pages have a low level of valuable content - i.e. stub pages or information available form many other sites.

aspdesigner




msg:4152697
 4:03 am on Jun 15, 2010 (gmt 0)

"I am sure this is true Googlebot, because their IPs are 66.249.*.*."

Even the warez/hacker ones? With wildcards in the URL's? - I don't recall seeing Google do that before!

If it was the REAL Googlebot, then the other possibility is that Google actually found links to your site pointing to those locations!

Which might mean that someone had gained control of your server, and was using it as a warez distribution point?

I had that happen to one of my clients years back - his hosting company hadn't kept-up with all the latest security mods, and hackers had broken-in, and stashed several hundred meg of warez on his site!

That "/usenext/#######/..." here -

/usenext/1213093/*+*+5.0.15+*+Full+Version.exe.html

is particularly suspicious. That's a warez D/L page, and it appears they were using #'d sub-directories, perhaps erasing the old ones to avoid detection.

I would take a SERIOUS look at your server, including looking for any hidden/system sub-directories or files.

Also, see if you can find-out who is linking to those locations on your site! (try Webmaster Tools, and Yahoo "linkdomain:")

manof




msg:4152724
 5:07 am on Jun 15, 2010 (gmt 0)

aspdesigner,

Thank you.

I am sure that my servers do not been hacked or abused.

The full URL is:

"GET /usenext/1213093/Digital+Patrol+5.0.15+Cracked+Full+Version.exe.html HTTP/1.1" 404


The IPs "66.249.*.*" is real Googlebot.

tedster




msg:4152853
 11:50 am on Jun 15, 2010 (gmt 0)

I am sure this is true Googlebot, because their IPs are 66.249.*.*


Note that just doing a reverse DNS look-up for the IP address is not enough to verify googlebot. See [webmasterworld.com...] The dark forces can spoof a reverse DNS ;(

manof




msg:4153306
 12:57 am on Jun 16, 2010 (gmt 0)

tedster,

Thank you.

This is a recent log,it's real Googlebot:


66.249.71.148 - example.com - [15/Jun/2010:04:24:46 -0400] "GET /projects/sfnet_pys60/downloads/pys60/1.4.0/pys60-1.4.0_src.zip?use_mirror=jaist HTTP/1.1" 404

seoisabusiness




msg:4153494
 9:50 am on Jun 16, 2010 (gmt 0)

hi, we see something similar [webmasterworld.com...]

manof




msg:4153502
 10:28 am on Jun 16, 2010 (gmt 0)

seoisabusiness,

Thank you.

It's very useful.

manof




msg:4153653
 3:08 pm on Jun 16, 2010 (gmt 0)

PS:
/setup.exe

This is not the real existence,just a automatic search result pages.

Recent logs:


66.249.71.148 - example.com - [16/Jun/2010:07:20:34 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:07:32:28 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:07:42:17 -0400] "GET /my_brothers_wife_barbara_mori HTTP/1.1" 200 3179
66.249.71.148 - example.com - [16/Jun/2010:07:44:41 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:07:58:58 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:08:41 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:19:49 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:20:44 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:23:43 -0400] "GET /zip/ml15.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:18 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:18 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:19 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:19 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:20 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:20 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:21 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:21 -0400] "GET /Downloads/Security-Privacy/Password-Managers/72323.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:24:56 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:08:25:01 -0400] "GET /setup.exe HTTP/1.1" 200 2664
66.249.71.148 - example.com - [16/Jun/2010:08:32:44 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:44:52 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:56:03 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:08:56:56 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:07:38 -0400] "GET /setup.exe HTTP/1.1" 200 2685
66.249.71.148 - example.com - [16/Jun/2010:09:08:08 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:08:55 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:09:27 -0400] "GET /setup.exe HTTP/1.1" 200 2621
66.249.71.148 - example.com - [16/Jun/2010:09:17:23 -0400] "GET / HTTP/1.1" 200 4987 "-" "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
66.249.71.148 - example.com - [16/Jun/2010:09:20:51 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:30:16 -0400] "GET /images/about_contactpage/download_pdf/CV-davidwettergren2009.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:31:28 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:32:53 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:38:21 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:38:22 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:38:22 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:38:23 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:44:32 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:09:50:12 -0400] "GET /zip/ml15.zip HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:18 -0400] "GET /Downloads/Business/PIMs-Organizers/36710.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:19 -0400] "GET /Downloads/Business/Inventory-Barcoding/34802.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:19 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/57777.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:20 -0400] "GET /Downloads/Business/Databases-Tools/34576.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:21 -0400] "GET /Downloads/Security-Privacy/Encryption-Tools/66295.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:21 -0400] "GET /Downloads/Business/Vertical-Market-Apps/57665.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:22 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:51:22 -0400] "GET /Downloads/Security-Privacy/Password-Managers/72323.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:09:56:42 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:08:22 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:13:59 -0400] "GET /setup.exe HTTP/1.1" 200 2578
66.249.71.148 - example.com - [16/Jun/2010:10:20:20 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:32:25 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:33:51 -0400] "GET /Downloads/Security-Privacy/Password-Managers/31972.exe HTTP/1.1" 404 1202
66.249.71.148 - example.com - [16/Jun/2010:10:36:22 -0400] "GET /setup.exe HTTP/1.1" 200 2674
66.249.71.148 - example.com - [16/Jun/2010:10:43:33 -0400] "GET /my_brothers_wife_barbara_mori HTTP/1.1" 200 2987
66.249.71.148 - example.com - [16/Jun/2010:10:44:20 -0400] "GET / HTTP/1.1" 200 4987
66.249.71.148 - example.com - [16/Jun/2010:10:56:33 -0400] "GET / HTTP/1.1" 200 4987


tedster




msg:4153665
 3:27 pm on Jun 16, 2010 (gmt 0)

Let's combine these two very similar threads - discussion continues here: [webmasterworld.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved