Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is a link that gives a 403 (access denied) response 'valid/existing'?

         

Selen

4:18 pm on May 28, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have a question which I cannot find a clear answer for. Let's assume there is a spammer (there are actually hundreds of them) who links to my site using banned/adult keywords in anchor text. In .htaccess I use this code below (which works correctly and gives 403 - access denied to links from example.com to my website):

## SITE REFERRER BANNING
RewriteEngine on

RewriteCond %{HTTP_REFERER} example\.com [NC]
RewriteRule .* - [F]


The question is - does Google still considers a link that gives 403-access denied response published on example.com as a valid/real link? Or it doesn't count it as a link? I just don't know if I'm not wasting my time by denying bad domains referring to my site (in addition I use the disavow tool too).

JD_Toims

8:09 pm on May 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The question is - does Google still considers a link that gives 403-access denied response published on example.com as a valid/real link?

Yes, they do, because GoogleBot doesn't send a referrer header, so they have no clue about the 403.

All you're doing is blocking any real people who might visit for whatever reason -- I'd take it down personally, because even if you're not what a person is looking for right now, they might want something you have or be interested in what your site presents in the future, so why not let them see what's on your site?

Selen

8:19 pm on May 28, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Ok.. so I guess I'll have to stop doing it. I thought us specifically blocking these bad domains would give a strong signal to Google that we don't have to do anything with these links.

aakk9999

10:30 pm on May 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



a strong signal to Google that we don't have to do anything with these links

Why not disavow them?

not2easy

1:28 am on May 29, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The question is - does Google still considers a link that gives 403-access denied response published on example.com as a valid/real link?
My first question is, where are you seeing these referers' domains? If in GWT, download their list, try the Webmaster World Free Tool here: [freetools.webmasterworld.com...] for checking disavow links and submit a disavow.

If these spammy referers are showing up in your logs, I would first try to find out whether the "link" exists. If you're seeing these in your logs does not mean that any actual link exists. Most referer spam I have seen across several sites is not coming from where it claims to be. So first do a whois lookup on the IP that dropped the referer in your logs and determine whether it is an actual ISP server or a hosting site. A Host is not sending you traffic, only spambots set up to plant artificial links. I've seen spammy referer links from half a dozen different countries all from one IP address. Those are safe to block in htaccess without any chance of blocking real people.

Selen

2:21 am on May 29, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



My first question is, where are you seeing these referers' domains?

These are domains I found using the disavow links check that link to our site using adult keywords in the hopes it will trigger a penalty.

Yes, I use disavow tool (which seems to do the job well) - I just thought that denying these bad domains via .htaccess would help even more. Apparently I was incorrect ;).

tangor

3:15 am on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the page exists on YOUR site and is indexed by the SEs... okay. If the 403 is sent because you've banned access from that referer will it count? Seems like a disavow to me... and that's the way I've done it years before GWT even created the Disavow Tool.

lucy24

4:21 am on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Seems like a disavow to me...

But it's a disavow that google could never find out about. The googlebot does occasionally send a referer, but I've only ever seen it in requests for supporting files that give the html page as referer. And for full credibility it would also have to wear a human UA and use some random IP.

:: detour to check current behavior ::

Has this been on the increase or did I just never notice? About 1/5 of all Googlebot css requests in recent months have included a referer (always the appropriate one, as far as I can tell). Also about 1/3 the .js and-- this is striking-- all the .ttf. (Granted, the numbers for the last are not large, so it may be a fluke.)

tangor

4:27 am on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But it's a disavow that google could never find out about.


How not? If the end URL is allowed at one place and one place only (the originating site) and any SE (including G) can index it, that's a valid page. If links to that page from some sites end as 403s what can be deduced from that? Looks like a disavow, just hard coded by the webmaster, no G involvement required.

JD_Toims

4:57 am on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If links to that page from some sites end as 403s what can be deduced from that?

Nothing -- People/bots only get the 403 if a referrer header is sent -- GoogleBot doesn't send one, so Google *never* gets the 403 for any link to your site from any specific site, which means you aren't sending any "ranking" or "disavow" message to Google or Bing or any other bot/SE that doesn't send a referrer header.

The only thing you're doing is sending a message to *people* who try to visit your site with a browser that does send a referrer header [almost all by default] -- And what you're saying is your site is off-limits to *people* who try to visit from certain sites -- Google, Bing and many other search engines will *not* get the message.

JD_Toims

5:19 am on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Addition:

Sorry if I sounded harsh tangor, I wasn't trying to. What I was stating is just a blunt reality.

The only major bot I've seen send a referrer [misspelled as referer] header was Slurp and it only happened 3 times before I figured out what was going on -- What happened was my site was framed and due to the way it was framed when Slurp requested the page on the site framing my site the requesting user-agent was Slurp with a valid Yahoo! IP, but the referrer sent was the page on the framing site.

I promptly put in frame-buster for regular visitors and a check to see if Slurp, MSN/Bingbot or Googlebot with a valid IP sent a referrer header and if they did I changed the contents of the page to nothing much more than a link to my site by only displaying: If you would like to see this page, [link]click here[/link].

tangor

5:38 am on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And my apologizes to all and sundry that links I disavow WHETHER REFERER (sic) OR ACTUAL LINK are treated the same: 403 and I do that without GWT. Sorry for confusion in terminology... I did misspeak, though the intent, I hope, is clear.

There's some traffic from some sites that I do not entertain.

phranque

5:43 am on May 29, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I just don't know if I'm not wasting my time by denying bad domains referring to my site

you should make this decision based on any potential live visitors referred from these domains.
as has been made clear here (and i agree) the 403 won't be seen by the search engines since there is no Referer header sent.

JD_Toims

5:44 am on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Cool and I hope I didn't sound too harsh at all.
If I did, my apologies.



There are actually a couple good reasons bots don't bother with checking the response they get based on a referrer header:

1.) It's too processor intensive.
2.) It could easily overload a site's server.

If bots checked a page's response based on a referrer header and the page had 1,000 inbound links, they'd have to check the page 1,000x more than just requesting the page once without the referrer header to see what they got.

If they had 1,000 pages with 1,000 inbound links, they'd have to make 1,000,000 requests compared to 1,000 requests without checking the response based on referrer.

If they had 1,000 pages with 10,000 inbound links, they'd have to make 10,000,000 requests compared to 1,000 requests without checking the response based on referrer.

They could easily overload their processing power or a server having to make that many more requests to determine the response a user receives from each inbound link to a page, so they don't send a referrer header when they request a page.

Selen

4:26 pm on May 29, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



you should make this decision based on any potential live visitors referred from these domains.
as has been made clear here (and i agree) the 403 won't be seen by the search engines since there is no Referer header sent.

I was thinking more about it.. having this 403-access denied code in .htaccess may actually have a hidden benefit: if the bad/attacking sites have a lot of broken links (I think link that gives 403 server header response is considered broken?) then these attacking sites are shooting themselves in the foot. At least in theory :

lucy24

5:04 pm on May 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You said at the beginning:
links to my site using banned/adult keywords in anchor text

This implies that the links actually exist, and you've personally seen them. Have you ever-- in actual fact, not just hypothetically-- had human visitors via these links? As far as I can remember, gwt shows anchor text ("how your content is linked" with no distinction between external and in-site linking), but they don't say what specific text is associated with what specific link; it's aggregated.

<topic drift>
Would it trigger any alarms if a search engine found multiple occurrences of external linking text that doesn't match up with words actually used on the site itself-- or vice versa?
</topic drift>

Selen

5:41 pm on May 29, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes, the links exist, I have a list of them. They are mostly posted on hacked blogs or unmoderated forums. I doubt there are many human visitors, but these links are being found and indexed in Google web search. I find them using some 'who links to my site' tools.

Anchor text is totally unrelated to the site's theme (ie. they use vulgar / adult keywords as anchor text).