Webmaster Tools 'Web Crawl' and bogus query string links

Hi,

I tried to embed this issue in another topic and it got lost, so I decided to start a thread on it.

In Google WT Web Crawl, I have successfully blocked various URL's through my judiciious use of ROBOTS.TXT "disallow" clauses. Specifically, I have a ROBOTS.TXT disallow clause of:

Disallow: /*?

I added this to my ROBOTS.TXT file since I was getting Google crawl entries of:

www.domain.com/index.htm (which I want crawled & indexed)

AND

www.domain.com/index.htm?ref=someotherdomain.com

My ROBOTS.TXT disallow clause above successfully blocked this latter URL.

This latter URL is not referenced on my site, and is not found anywhere on the Net (yet). Additionally, I do not offer any sort of subaffiliate program, so this link means nothing to me.

So, here are some questions that I have on this:

1) Where is Google finding this link and what does it mean ? Does
it refer to someone trying to create a scraper site, that isn't
yet indexed ? Where is Google finding this ?

2) If I didn't block this 2nd URL via ROBOTS.TXT, and it got
crawled and indexed by Google, would this create an opportunity
for a duplicate content penalty with the 1st link mentioned
above ? [NOTE: that is my thought, and why I decided to block
it in ROBOTS.TXT]

3) Assuming that this is somehow tied to malicious acts, is there
any sort of means to alert Google to this ?

Thanks in advance,

Doug

Webmaster Tools 'Web Crawl' and bogus query string links

doughayman

tedster

doughayman

tedster

doughayman

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week