Forum Moderators: open

Message Too Old, No Replies

What's Gigablast up to?

trying to spider with 'languages'

         

malachite

8:31 am on Oct 2, 2007 (gmt 0)

10+ Year Member



Until a few days ago, I very rarely saw Gigabot. It read robots.txt, saw it was unwelcome and went away happy. Now I'm getting regular visits throughout the day, attempting to spider old, 301ed URLs with some sort of 'language' query. Searches take the format:

example.com/arabic.php?u=www.example.com/obsolete page/ and the arabic will be replaced in quick succession with german, french, spanish or some other language. The IP is Gigablast.

Site is English only, and as the bot's now completely ignoring robots.txt it's eating 403s. Anyone else seen this odd activity?

[edited by: volatilegx at 11:28 pm (utc) on Oct. 2, 2007]
[edit reason] for examples, please use example.com [/edit]

tbogus

1:28 pm on Oct 19, 2007 (gmt 0)

10+ Year Member



I've seen this to in addition to http://example.com/korean.php?u=http://www.example.com/page.html

Does anybody know what's going on and how to avoid it?

It's seems to be slightly confusing Google...

volatilegx

2:33 am on Oct 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld, tbogus :)

How is it affecting Google? I don't get it.

tbogus

1:26 pm on Oct 22, 2007 (gmt 0)

10+ Year Member



Google is somehow registering these URLs (with the language.php extension as URLs that are in our index, but cannot be found as valid URLs.

My initial thought is that Google is somehow indexing GigaBlast, which has the valid URL, but Google cannot correctly process the link.

In turn, I have about two-hundred URLs that Google is saying don't exist - we've seen in the past that that these invalid URLs affect the performance of our google ranking.

Most of all - I'm curious as to why and how Google is registering these URLs as 'valid' site URLs...

Ahh - the joys of SEO...

Tim.