Bad Links generated by scraper sites

If anyone has posted on this, apologies. I didn't see it.

I discovered another Google indexing problem (doesn't seem to happen with MSN, Yahoo/Inktomi or others) generated by content theft.

Googlebot suddenly started trying to crawl nonexistent links. To give you an idea, lets assume a directory structure:
www.widgets.com
www.widgets.com/blue
www.widgets.com/blue/us
www.widgets.com/blue/france
www.widgets.com/red
www.widgets.com/red/italy
www.widgets.com/red/canada

Google tried to crawl links such as www.widgets.com/blue/us/red/italy/blue/france/widget_order.html

Obviously this page didn't exist BUT, presumably because of the Apache lookback function, it would ultimately return a 200 resolving to the correct "www.widgets.com/widget_order.html"

I went crazy trying to figure the source and finally traced it to a site stealing content. Because the page (not the widgets.com index page) it stole did not have a base href tag, all the relative links on the stolen page became screwed up. I don't know if a base href would have solved it (at least solved the bad links not the theft) but I took the drastic step of deleting my original page and using the Google removal tool to get it out of the index (and note that Googlebot is still trying to crawl these links as the removal is still pending).

Point is, I don't know how Google interpreted this (duplicate content? redirects?) but its another example of other sites being able to screw your PR and SERPs.

Bad Links generated by scraper sites

bobmark

bobmark

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week