Forum Moderators: Robert Charlton & goodroi
I tried to embed this issue in another topic and it got lost, so I decided to start a thread on it.
In Google WT Web Crawl, I have successfully blocked various URL's through my judiciious use of ROBOTS.TXT "disallow" clauses. Specifically, I have a ROBOTS.TXT disallow clause of:
Disallow: /*?
I added this to my ROBOTS.TXT file since I was getting Google crawl entries of:
www.domain.com/index.htm (which I want crawled & indexed)
AND
www.domain.com/index.htm?ref=someotherdomain.com
My ROBOTS.TXT disallow clause above successfully blocked this latter URL.
This latter URL is not referenced on my site, and is not found anywhere on the Net (yet). Additionally, I do not offer any sort of subaffiliate program, so this link means nothing to me.
So, here are some questions that I have on this:
1) Where is Google finding this link and what does it mean ? Does
it refer to someone trying to create a scraper site, that isn't
yet indexed ? Where is Google finding this ?
2) If I didn't block this 2nd URL via ROBOTS.TXT, and it got
crawled and indexed by Google, would this create an opportunity
for a duplicate content penalty with the 1st link mentioned
above ? [NOTE: that is my thought, and why I decided to block
it in ROBOTS.TXT]
3) Assuming that this is somehow tied to malicious acts, is there
any sort of means to alert Google to this ?
Thanks in advance,
Doug
1. It can be very hard to know where Google gets a url. It might come from toolbar data, direct submission, someone else's page that is not now indexed but once was, a cloaked page, a Google test of how your domain handles this "invented" url -- probably more, too!
2. Yes, there's an opportunity for trouble in the SERPs - especially if this happens a lot.
3. Handling it technically as you did is the best thing to do.
Once concern I have though, is that by blocking:
www.domain.com/index.htm?ref=someotherdomain
am I also effectively blocking:
www.domain.com/index.htm ?
The reason I asked this, is that the block of the 1st link above occurred, for the first time on June 4th of this month, and that is when I saw a precipitous drop in traffic (ANOTHER THREAD DEDICATED TO THIS ISSUE).