homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Interesting seo htaccess lesson?
roshaoar




msg:4661770
 10:10 am on Apr 9, 2014 (gmt 0)

I find this quite interesting.

So my personal site has been getting a LOT of randoms trying out various guesses, presumably in order to post comment spam or find weaknesses to exploit. Stuff like /wp, /admin, /blog, /wordpress. I believe that these are probably largely automated but who knows, maybe there's some real guy behind some of them.

So, I did what I thought was a smart move, redirected and 401'd them ("unauthorised")

Then, sometime later I notice all these links from what is looking like a hacked wordpress site (really ugly site, 1000s of spammy seo keyword pages, just gross) into /article/someurl

Now interesting, I did immediately disavow them in google, so maybe that saved my butt, but I've just realised that /article was one of the directories that I was "disallowing". So to Google spidering the links it found on the spammy seo site I could imagine that this looked pretty dodgy.

Anyhow, I took off all those 401 blocks and am just letting them 404 now... kind of wondering whether to resubmit in google. Hm!

Certainly an interesting lesson though about how I guess competitors can use your htaccess files against you. Powerful stuff htaccess

On another note, boy I wish there was a way I could report that hacked wordpress site to google, it's so obviously being used for nefarious purposes.

 

phranque




msg:4661827
 3:23 pm on Apr 9, 2014 (gmt 0)

redirected and 401'd them

"and"?
there shouldn't be a redirect preceding any 4XX response.

I took off all those 401 blocks and am just letting them 404 now

or is it still a redirect to a 404?

/article was one of the directories that I was "disallowing"

maybe not if the initial response was a 301/302.


Now interesting, I did immediately disavow them in google, so maybe that saved my butt, but I've just realised that /article was one of the directories that I was "disallowing". So to Google spidering the links it found on the spammy seo site I could imagine that this looked pretty dodgy.

... kind of wondering whether to resubmit in google. Hm!

...
On another note, boy I wish there was a way I could report that hacked wordpress site to google, it's so obviously being used for nefarious purposes

you should ask all the google questions in the Google SEO News and Discussion [webmasterworld.com] forum.

lucy24




msg:4661909
 9:10 pm on Apr 9, 2014 (gmt 0)

Interesting. So your malign robot planted links to nonexistent pages on the off chance that they would turn out to exist, because spidering and link-planting are different activities. And, if I'm reading it right, google doesn't know that the pages don't exist, because they're inside disallowed directories.

How big are the directories? If they contain vast numbers of pages a Disallow may be the best approach. Otherwise search engines will spend their whole crawl budget on places that will never be indexed. But if there are only a few (real) pages in the directory, why not let search engines crawl and feed them individual <noindex> instead.

Casual firsthand experience suggests that when a page is marked <noindex>, requests for the page drop back pretty sharply. They'll ask a few times and then back-burner it. Especially if the page was never in the index in the first place, so they're not losing anything.

roshaoar




msg:4661933
 11:00 pm on Apr 9, 2014 (gmt 0)

Yeah, it's naughty. I'm having difficulty seeing it as anything other than attempt to give this site a bad rep - scummy links on a scummy site pointing at articles that Google couldn't realise to not exist, because of me trying to be too clever with .htaccess. My bad... but thankful I disavowed the whole domain. A case where disavow really is a good call.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved