Forum Moderators: Robert Charlton & goodroi
A part of my efforts was a more rigorous robots.txt file, nofollow tags for external links and more attention to page titles and meta tags. I implemented these major changes between 25 Feb and 1 March.
Google has been dropping some of the old unwanted pages that went 404 (I confirmed the headers). Pages previously indexed but now blocked by robots.txt are going greybar on the toolbar. The crawl rate is way up and pages are being reindexed faster than I anticipated. New posts are making it into the index within a few hours. Visitor traffic has risen almost 50% since 1 March (a small sample but the curve is looking steeply up at the moment).
Now going through the WMT list of anchor text in external links to forum posts is horrifying. There are well over 150 phrases relating to < topics > where there is no remotely similar content anywhere on my forums. A google search turns up these links with the anchor text reflected in WMT but every link is to a non-existant post.
These posts may have been made at some point, but would have been caught in a spam trap and would never have been published. I would empty the spam filter of hundreds of trapped posts every week.
The top search queries list in WMT is accurate and the adsense targetting is great. Despite the content being somewhat related to MFA-targetted weight loss ads, I don't have a problem of ads outside of my niche.
What effects could this list of anchor text be having on my site? Is there any way to get G to drop them?
[edited by: tedster at 7:56 pm (utc) on Mar. 9, 2009]
[edit reason] make topics generic [/edit]
If those urls get a 404 response from your server, then there is no problem.
I have been trying to solve the redirect problems for these URLs.
The forum software, phpBB, is not set up to give any error response for topics or posts that do not exist. Instead a page with a 200 response informs the user that the topic cannot be found.
The forum's native redirect function results in a 302 response. I have managed to add in this redirect to point to an interim (bogus) URL that in turn triggers the 404. Due to my ignorance of php this is as far as I have been able to go.
I did try a one-step redirect to my (custom) 404.php but got a 200 header response!
Could I leave this redirect chain (302 => 404) as is, or must I provide the 404 to the initial request?
Thanks
A google search turns up these links with the anchor text reflected in WMT but every link is to a non-existant post
That may not be the case. The posts may very well exist but Google is not recording all of the posts uri parameters, thus returning a 404. It is also possible that the site linking to yours is using .htaccess to block IP ranges, perhaps including entire countries, which also results in 404 pages. Sometimes a host will do this without informing the website owner, although rare it happens too.
Try the links through a proxy local the the TLD they have.
These phrases are from GWT >> Statistics >> What Googlebot sees.
The links I have found are all in spam posts or comments in neglected forums or blogs. The posts consist of keywords and links only. It seems that a bot was probably posting unwanted content in the form of graphics and adverts, and then linking to these posts from elsewhere.
Posts containing this unwanted material were probably submitted to my forums, but never made it through the spam filters. These blocked posts would have been assigned a topic number and these are the topic numbers I am trying to block requests for.
If you directly serve a file, or redirect to it using an external redirect, then that file will always be "found" and will always return a 200 OK response because the file "404.php" does exist.
You can alter the script so that it returns HEADER (HTTP/1.0 404 Not Found) which removes the issue of it sending a 200 OK response back.
You should not redirect to your error handler. You should rewrite URL requests so that they are handled by it - that way the 404 response it handled for, and 'at', the originally requested URL.
Modify the page that says that a topic cannot be found, so that it directly returns a 404 status code in the HTTP header.
I got as far as returning a 302 which in turn results in the 404. I would still prefer to go the direct route as you suggest g1smd, and will persevere in my attempts to get support to enable the 404 from the URL request.
In the interim, at least, the URLs in question should begin to be deindexed as they are not found.
$user->setup();
$redirect_url = append_sid("{$board_root_path}BOGUS-URL.$phpEx");
redirect($redirect_url);
The global "function redirect", which does a 302 redirect, is defined in another file and I can see no way to manipulate it for this use only.
I think getting into the code here is off topic for this forum and it might be an idea to ask for help in the PHP Server Side Scripting forum.