homepage Welcome to WebmasterWorld Guest from 54.205.189.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Most of my pages show 404 status in access log
epmaniac




msg:4248884
 10:32 am on Jan 4, 2011 (gmt 0)

my website lost 80% of traffic in oct 21,... i viewed access log only to find out that majority of the pages of my website which bots like googlebot,bing, yahoo etc etc are returning 404 'not found' status.... but despite using xenu and other tools and also searching the code manually i couldnt find the origin of these 404 errors... all i know is that these errors are caused by rewriting but i dont seem to detect them anywhere.... is there some way to find out the origin of broken links on the web?

 

goodroi




msg:4249018
 6:14 pm on Jan 4, 2011 (gmt 0)

having large amounts of 404 errors can hurt rankings. a 404 error that happens from broken links and missing pages is a bad quality signal to google.

404 errors can also happen when a user has a typo while they manually enter the url into a browser and other random mistakes. you can ignore those random mistakes.

if you see a consistent pattern of 404 errors then either build a page or redirect the requests to a pre-existing page. this will keep users happy and boost the quality signals going to google.

it is helpful to your internal link popularity score when you increase pages your site. when in doubt just build out a new page and interlink it with your old content.

tedster




msg:4249035
 6:51 pm on Jan 4, 2011 (gmt 0)

I'd suggest using the "Fetch as googlebot" utility in the Labs section of Webmaster Tools to check out some of the suspect URLs. It's a definite concern that you find these URLs manually or with Xenu, but googlebot requests show a 404 in your log. It could be sign that your site was hacked.

epmaniac




msg:4249067
 7:57 pm on Jan 4, 2011 (gmt 0)

tedster,

i tried to find the broken links with xenu, but it was of no use... the references are all correct, i have checked the individual files of code also

dont know what to do? could u guide me more abt the hacked issue?

tedster




msg:4249121
 9:30 pm on Jan 4, 2011 (gmt 0)

If your server has been hacked, then there are all kinds of games that will serve googlebot something different than a regular user would get. In fact, sometimes a hacker might install a script that is buggy and doesn't do what they thought it would.

You need to explore those 404 URLs individually, at least some of them, and see what happens to a browser and what happens to googlebot. Xenu is probably too broad a brush. For checking individual URLs, I'd probably use Firefox with the LiveHTTPHeaders add-on.

Chris_R




msg:4249150
 10:22 pm on Jan 4, 2011 (gmt 0)

Are you showing 404 errors for "pages" that exist on your site?
If so - that is a different problem.

If these pages aren't pages you intended to create - you can see the linking pages for some of these in Google Webmaster Tools.

You should run all the pages that are coming back 404 (from both your logs and GWT) through xenu. You can do it as a text file instead of menu trying to find the links. You should also run a sample of those through the view as Googlebot tool as Ted mentioned.

aristotle




msg:4249203
 1:59 am on Jan 5, 2011 (gmt 0)

It sounds to me like a server problem. A hack or an accidental malfunction.

epmaniac




msg:4249251
 6:18 am on Jan 5, 2011 (gmt 0)

hi tedster, goodroi, chris, and aristotle

i guess u guys are thinking that the problem is the pages are giving 404 to bots but 200 to users,... it is not the case

i have checked with fetch as googlebot, xenu and livehttpheaders....and all legitimate links are returning status 200

the problem is:

we cant seem to FIND the links on the site which are broken and are returning 404 (bots are accessing those urls which are not supposed to be on our site exampe: www.site.com/ultra-widget.htm... instead of www.site.com/folder/ultra-widget.htm... is there any way to find the origin...

tedster




msg:4249388
 3:50 pm on Jan 5, 2011 (gmt 0)

Now I understand better - it sounds like you are seeing the same kind of thing we're discussing in this thread: Webmaster Tools - again with the anomalies [webmasterworld.com]

There are anomalies in WebmasterTools from time to time that we don't understand. If you've done due diligence and cannot see the evidence on your site, then there's little more you can do on that angle.

So keep your focus on the traffic loss itself - dig out which URLs have lost traffic and on what keywords. There should be some patterns emerging in that research. There is another thread that seems to apply to your case - October 2010 ranking drops [webmasterworld.com]

epmaniac




msg:4249664
 3:48 am on Jan 6, 2011 (gmt 0)

@ tedster,

no tedster, this is not the issue with wmt, WMT is showing no 404s at all (weird wmt)

i found these issues when i used different server logs readers like WebLogExpert and DeepLogAnalyzer

we cant seem to FIND the links on the site which are broken and are returning 404 (bots are accessing those urls which are not supposed to be on our site exampe: www.site.com/ultra-widget.htm... instead of www.site.com/folder/ultra-widget.htm...

tedster




msg:4249665
 3:53 am on Jan 6, 2011 (gmt 0)

Your opening post talked about "bots". Are you talking about crawler traffic requesting URLs that are 404, or search engine referrals being sent to 404 URLs?

epmaniac




msg:4249666
 4:01 am on Jan 6, 2011 (gmt 0)

crawler

i am talking abt 80-90% of the stats of crawlers in different server log analyzers requesting 404s (and these are the links i cant seem to find anywhere on the site)

please help

tedster




msg:4249668
 4:06 am on Jan 6, 2011 (gmt 0)

Just off-hand, it sounds like you're using a CMS and some URL_rewrite configuration has been scrambled. But your problem is a precise technical issue, so precise use of technical vocabulary is important for anyone to give you useful feedback.

Do you mean you can't find those URLs on your site, or that you can't find any links on your site that point to those to those URLs?

epmaniac




msg:4249670
 4:20 am on Jan 6, 2011 (gmt 0)

can't find any links on the site that point to those URLs

Chris_R




msg:4249697
 6:09 am on Jan 6, 2011 (gmt 0)

Keep in mind - as tester is right the terminology is important - that I keep reading things like "can't find the links on the site which are broken" and "requesting 404s". I am not trying to be nit picky, but just trying to make sure you understand:

1) There doesn't have to be a link to something for you to get a 404 error. There could be a mod rewrite issue like Tedster mentions - and that would probably be in some sort of .htaccess file. There isn't going to be any "links" per say in a case like this, but line(s) of code in a config file somewhere.

2) pages don't return 404 errors (in general). A request for non existent pages does cause this to occur. Same for non existent files. It is possible to have a totally working website - that tries to pull something from the client side (such as a missing CSS file) that will show up 404 every time. There can also be a call to the back end (like a database) that also will return a 404. This can sometimes have no real ill effect to the visitor (but should be fixed anyway). Relative links can screw this up as well.

You seem to be able to see (based on your comments about the folders) some sort of pattern to the URL. Are you seeing 200ok for all the files that page requests as well using live http headers (including the CSS, js,...)?

epmaniac




msg:4250149
 4:45 am on Jan 7, 2011 (gmt 0)

yes, i am seeing 200 ok for all the files

today, i explicitly 301 redirected few of the top 404 pages to relevant content pages, i am seeing a particular thing

in putty, googlebot, yahoobot,bingbot are all making lots of connections (upto 100) now ...compared to what 1 or 2 connections they previously used to make

was it due to the fact that they were being redirected to 404s and they used to stop crawling?

also... its been slmost 24 hours since i have made the changes,i dont still see many pages that have gotten indexed in google search's LAST 24 HOUR FILTER?

do i have to wait longer than that?

speedshopping




msg:4250215
 10:24 am on Jan 7, 2011 (gmt 0)

Hi epnaniac - we too are seeing googlebot return 404 errors as the bot is trying to access the rewritten URL in the log instead of the raw URL - can I ask you what type of 404 you are getting ? Our error is 404 11 0 so would be interested in what yours is - we too had mass traffic loss but more recently

Chris_R




msg:4250312
 3:30 pm on Jan 7, 2011 (gmt 0)

I think you have some redirect problems.

Not everyone links to your site the way you want them to - some will not use the www

Your site does not correctly do the redirect for this - when I try to go to:

example.com/wholesale/blahblah/12345/blah-blah-blah-blah.htm

I should be redirected to:

www.example.com/wholesale/blahblah/12345/blah-blah-blah-blah.htm

instead I am redirected to:

www.example.com/index.php/blahblah/12345/blah-blah-blah-blah.htm?page=pdisp&subcat=oxide&product_id=12345&primary_keyword=blah-blah-blah-blah

Which then attempts to pull the file:

http://www.example.com/wholesale/blahblah/12345/undefined

which returns a 404 - you will need to use a tool that allows you to see the background traffic like live http headers and not xenu for this.

I have sent you the actual urls

tedster




msg:4250804
 9:51 pm on Jan 8, 2011 (gmt 0)

I noticed that you asked the same question in the IIS forum [webmasterworld.com] - and that thread seems to have a definitive answer:

Ocean 1000:

The error is that the url was double escaped. And by default II7 doesn't allow that to be processed. See the following KP article for more details.

HTTP Error 404.11 URL_DOUBLE_ESCAPED [support.microsoft.com]

speedshopping




msg:4250807
 9:58 pm on Jan 8, 2011 (gmt 0)

Hi Tedster, yeh sorry for double posting, wanted to get perspective from a crawling and technical aspect - we believe the 404 11 errors are coming from Google image searches (when the image is displayed in the top frame, the second frame tries to render the page the image sits on - when it renders a page that has a %20 in it, Google double escapes by encoding the % which then renders the page as /keyword%2520keyword/ instead of /keyword%20keyword/

We have 150,000 sitemaps pages with the %20 inserted and it doesn't seem to have problems though, so we are still baffled as to the reasons of traffic loss.

epmaniac




msg:4254089
 5:29 am on Jan 17, 2011 (gmt 0)

hi all,

sorry, i was away for a few days,..

@ speedshopping

i dont know much abt 404 error types, all i can see on my access logs are 404 errors with http 1.1 404 309

let me know if it helped and you solved your problem ? :)

@tedster and chris

thank you guys for helping me out, you guys are life savers and your tips and pointers to guys like us is measurable in GOLD,.. you dont know what you guys are doing, your contribution to WW and us is priceless!

with httpliveheaders i was able to detect few files requests which were giving 404 statuses,...also i solved few of the redirection issues in .htaccess
...i worked on page load time, and made my site a lot faster,.....crawling of googlebot has shot up drastically, i am still to see a regain of traffic though

but the original problem remains.......

i am still seeing links like

www.site.com/cement.htm in my access log

whereas the orginal link should be
www.site.com/wholesalers-cement/

also

also we are seeing majority of links like

www.site.com/a.htm
www.site.com/b.htm
.
.
.
www.site.com/z.htm


these all links should be

http://www.site.com/sitemap-otherproducts/a.htm
http://www.site.com/sitemap-otherproducts/b.htm
http://www.site.com/sitemap-otherproducts/c.htm
.
.
.
http://www.site.com/sitemap-otherproducts/z.htm

i have checked .htaccess for possible redirection issues, but havent found anything

these links are crawled the most times by googlebot,yahoo and bing bot.......i dont know from where they are accessing these links :(

-

Edited to disable autolinking and fix url display

[edited by: Robert_Charlton at 6:20 am (utc) on Jan 17, 2011]

aakk9999




msg:4254196
 2:27 pm on Jan 17, 2011 (gmt 0)

i am still seeing links like

www.site.com/cement.htm in my access log

whereas the orginal link should be
www.site.com/wholesalers-cement/



Having redirects/rewrites set up in .htaccess does not mean that you will not see the original (dynamic) URLs in your access logs. If they were previously exposed to search engines, they will still keep requesting them periodically.

The question is - after your .htaccess fixes, what is the server response to such requests? Do you still get 404 or do you now get 301 301 redirect response to such requests?

If you are still getting 404 then I would check .htaccess again. If you are getting 301 redirect to the correct URL , then this is a correct response and there is nothing else to do.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved