Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

External links in WMT point to non-existent forum posts

         

acemi

5:50 pm on Mar 9, 2009 (gmt 0)

10+ Year Member



I have a niche forum (running since 2004) that had fallen into a state of limbo and have recently spent some time updating and tweaking it to be ready for the spring/ summer rush.

A part of my efforts was a more rigorous robots.txt file, nofollow tags for external links and more attention to page titles and meta tags. I implemented these major changes between 25 Feb and 1 March.

Google has been dropping some of the old unwanted pages that went 404 (I confirmed the headers). Pages previously indexed but now blocked by robots.txt are going greybar on the toolbar. The crawl rate is way up and pages are being reindexed faster than I anticipated. New posts are making it into the index within a few hours. Visitor traffic has risen almost 50% since 1 March (a small sample but the curve is looking steeply up at the moment).

Now going through the WMT list of anchor text in external links to forum posts is horrifying. There are well over 150 phrases relating to < topics > where there is no remotely similar content anywhere on my forums. A google search turns up these links with the anchor text reflected in WMT but every link is to a non-existant post.

These posts may have been made at some point, but would have been caught in a spam trap and would never have been published. I would empty the spam filter of hundreds of trapped posts every week.

The top search queries list in WMT is accurate and the adsense targetting is great. Despite the content being somewhat related to MFA-targetted weight loss ads, I don't have a problem of ads outside of my niche.

What effects could this list of anchor text be having on my site? Is there any way to get G to drop them?

[edited by: tedster at 7:56 pm (utc) on Mar. 9, 2009]
[edit reason] make topics generic [/edit]

tedster

7:54 pm on Mar 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If those urls get a 404 response from your server, then there is no problem. The WMT report is there for you information but it does not reflect a ranking problem. (In some cases certain 404 backlinks would matter to the websmater.)

acemi

9:27 pm on Mar 9, 2009 (gmt 0)

10+ Year Member



Thanks tedster

What I can't figure out is why anyone would want to set up links like these. They are mostly from (dofollow) spam posts on blogs, many of them not English, and a higher than normal proportion of .hu sites.

acemi

11:01 am on Mar 18, 2009 (gmt 0)

10+ Year Member



If those urls get a 404 response from your server, then there is no problem.

I have been trying to solve the redirect problems for these URLs.

The forum software, phpBB, is not set up to give any error response for topics or posts that do not exist. Instead a page with a 200 response informs the user that the topic cannot be found.

The forum's native redirect function results in a 302 response. I have managed to add in this redirect to point to an interim (bogus) URL that in turn triggers the 404. Due to my ignorance of php this is as far as I have been able to go.

I did try a one-step redirect to my (custom) 404.php but got a 200 header response!

Could I leave this redirect chain (302 => 404) as is, or must I provide the 404 to the initial request?

Thanks

JS_Harris

11:35 am on Mar 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A google search turns up these links with the anchor text reflected in WMT but every link is to a non-existant post

That may not be the case. The posts may very well exist but Google is not recording all of the posts uri parameters, thus returning a 404. It is also possible that the site linking to yours is using .htaccess to block IP ranges, perhaps including entire countries, which also results in 404 pages. Sometimes a host will do this without informing the website owner, although rare it happens too.

Try the links through a proxy local the the TLD they have.

acemi

12:08 pm on Mar 18, 2009 (gmt 0)

10+ Year Member



I have found the links by doing a search for:
"a word or phrase in the external links to your site" + mysite.com

These phrases are from GWT >> Statistics >> What Googlebot sees.

The links I have found are all in spam posts or comments in neglected forums or blogs. The posts consist of keywords and links only. It seems that a bot was probably posting unwanted content in the form of graphics and adverts, and then linking to these posts from elsewhere.

Posts containing this unwanted material were probably submitted to my forums, but never made it through the spam filters. These blocked posts would have been assigned a topic number and these are the topic numbers I am trying to block requests for.

tedster

6:09 pm on Mar 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Could I leave this redirect chain (302 => 404) as is, or must I provide the 404 to the initial request?

302 => 404 is fine. It's 302 => 200 that can cause problems when the url is suppposed to be 404.

acemi

6:30 pm on Mar 18, 2009 (gmt 0)

10+ Year Member



Thanks Tedster

I can now safely file this one as done and focus on some other SE aspects in need of attention.

g1smd

10:29 pm on Mar 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** I did try a one-step redirect to my (custom) 404.php but got a 200 header response! ***

If you directly serve a file, or redirect to it using an external redirect, then that file will always be "found" and will always return a 200 OK response because the file "404.php" does exist.

You can alter the script so that it returns HEADER (HTTP/1.0 404 Not Found) which removes the issue of it sending a 200 OK response back.

You should not redirect to your error handler. You should rewrite URL requests so that they are handled by it - that way the 404 response it handled for, and 'at', the originally requested URL.

Modify the page that says that a topic cannot be found, so that it directly returns a 404 status code in the HTTP header.

acemi

11:09 pm on Mar 18, 2009 (gmt 0)

10+ Year Member



I have been trying to get some assistance to return a 404 response but to no avail and the php scripting is beyond me.

I got as far as returning a 302 which in turn results in the 404. I would still prefer to go the direct route as you suggest g1smd, and will persevere in my attempts to get support to enable the 404 from the URL request.

In the interim, at least, the URLs in question should begin to be deindexed as they are not found.

g1smd

12:26 am on Mar 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It should be as simple as altering the script so that when there is no content to return it simply uses:

<?php HEADER ("HTTP/1.0 404 Not Found"); ?>

to signal that the URL does not exist.

acemi

1:09 am on Mar 19, 2009 (gmt 0)

10+ Year Member



The forum uses a template system and the redirect code I am using to prevent the generic "Topic not Found" page with a 200 response looks like this:

$user->setup();
$redirect_url = append_sid("{$board_root_path}BOGUS-URL.$phpEx");
redirect($redirect_url);

The global "function redirect", which does a 302 redirect, is defined in another file and I can see no way to manipulate it for this use only.

I think getting into the code here is off topic for this forum and it might be an idea to ask for help in the PHP Server Side Scripting forum.