Welcome to WebmasterWorld Guest from 54.211.136.250

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Strange Googlebot behavior

Grabbing same page over and over in one hit

   
6:54 pm on May 28, 2013 (gmt 0)



Hello,

I am used to see Googlebot indexing my pages. Their hits usually look something like this:

66.249.73.17 - - [22/May/2013:10:08:53 -0700] "GET /direcctory/sample.html HTTP/1.1" 301 231 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

But only recently I've begun seeing this odd behavior. Is this the new normal for googlebot?

66.249.73.17 - - [22/May/2013:10:08:56 -0700] "GET /https://example.com/https://example.com/https://example.com/https://v.com/https://example.com/direcctory/sample.html/example.com/direcctory/sample.html/example.com/https:/example.com/direcctory/sample.html/example.com/direcctory/sample.html/example.com/https:/example.com/https:/example.com/direcctory/sample.html/example.com/direcctory/sample.hl/example.com/https:/example.com/direcctory/sample.hl/example.com/direcctory/sample.hl/example.com/https:/example.com/https:/example.com/https:/example.com/direcctory/sample.hl/example.com/direcctory/sample.hl/example.com/https:/example.com/direcctory/sample.hl/example.com/direcctory/sample.hl/example.com/https:/example.com/https:/example.com/direcctory/sample.hl/example.com/direcctory/sample.hl/example.com/https:/example.com/direcctory/sample.hl/example.com/direcctory/sample.hl HTTP/1.1" 301 276 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

My logs are full of this repetitious behavior.

-- Grandma
11:41 pm on May 28, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Was that your own thread title or a moderator? What you're quoting is not "the same page over and over", it's some kind of garbage link. Wait a couple of days and you may not find the source in gwt. (Or is it only bing that does this with 404s? I forget.)

:: trying to figure out if it was my browser that changed every occurrence of "html" into "h{trademark symbol}l" or did it happen earlier :) ::
2:25 pm on May 29, 2013 (gmt 0)



Hi Lucy,
I don't know why those little th's appeared but I'll try it again now. This is a sample of the Googlebot's odd indexing behavior. I have not seen it again now for several days (why all the repetitions):
66.249.73.nn - - [23/May/2013:01:11:44 -0700] "GET /https://example.com/https://example.com/https://example.com/https://example.com/https://example.com/directory/same_page.html/example.com/directory/same_page.html/example.com/https:/example.com/directory/same_page.html/example.com/directory/same_page.html/example.com/https:/example.com/https:/example.com/directory/same_page.html/example.com/directory/same_page.html/example.com/https:/example.com/directory/same_page.html/example.com/directory/same_page.html/example.com/https:/example.com/https:/example.com/https:/example.com/directory/same_page.html/example.com/directory/same_page.html/example.com/https:/example.com/directory/same_page.html/example.com/directory/same_page.html/example.com/https:/example.com/https:/example.com/directory/same_page.html/example.com/directory/same_page.html/example.com/https:/example.com/directory/same_page.html/example.com/directory/same_page.html HTTP/1.1" 301 291 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
7:08 pm on May 29, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Right now you don't know if the googlebot itself has the hiccups, or it's following a link from some hiccupy third party.

You don't have to 'adopt' all wrong requests. Sometimes "ignore it and it will go away" really is the best fix ;)

And, er, when I said "may not" I meant "may". Careless editing.
8:13 pm on May 29, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would be concerned. If that page doesn't exist, the server should respond with a 404 or 410- not a 301.

I think you have a faulty redirect somewhere, most likely having to do with port 443. What happens when you follow "/https://example.com/"?
9:25 pm on May 29, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



### I didn't even notice the 301. Is it followed by another request? If not, the punch line may be "Oh! I didn't realize that 'example.com/blahblah/https://example.com/blahblah/https://example.com/blahblah' is the same page as 'www.example.com/blahblah/https://example.com/blahblah/https://example.com/blahblah'. I'll just scratch it off the list them."

On shared hosting I don't know if it's possible to find out more about the original request. The two obvious variables are: with/without leading www in hostname, and http vs. https protocol.

Does the form
[example.com...]
by itself lead to a real (non-redirected) page?
4:57 pm on May 31, 2013 (gmt 0)



I only noticed those googlebot searches on one day. They all came from the same IP. The [example.com...] does lead to a real page. It is not a redirect. However, there are some redirects in my htaccess file that involve port 443. My host put them there when I first moved my site to his VPS server. All my old links were from https and he changed everything to use http://example.com. It used to be [example.com....] But those redirects did not work. I had to wait for all the old https links to expire before anyone could find me again. (Took about a month.) Most of those old links are gone now. But the (non-working) htaccess redirects are still there. I assume I can remove them. I've asked my host about them. Waiting for his reply.
-- Grandma