Forum Moderators: open

Message Too Old, No Replies

404's and spiders

         

jk321

3:06 pm on Oct 3, 2000 (gmt 0)



There's one thing (amoung many) that I just can't figure out.

A while ago, I rearranged the directory structure in my site and moved a few pages. I also renamed everything to lower case, like I should have done initially.

So as to not present visitors to my travel related site with 404's, I modified the form at my ISP's "Site Control Panel" to send surfers to the index.html page instead of just giving them the standard "404 file not found" page. The index.html page is a very good "site map" type page.
What I really was attempting to do was give the SE's an accurate picture of my site as it is now. However...

Now I'm getting spidered by google, AV, etc and the logs show their spiders requesting a no-longer-available page and the server returning the index.html page. Just like it's supposed to.

My question is: How are the SE's viewing this. Are they counting this as spam because they are requesting xxx.htm and getting index.html? Or are they just dropping the no-longer-available page like I want them to?

Thanks,
JK

DaveAtIFG

5:37 pm on Oct 3, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WmW JK!

>My question is: How are the SE's viewing this. Are they counting this as spam because they are requesting xxx.htm and getting index.html? Or are they just dropping the no-longer-available page like I want them to?

Your question is several questions! :)

>Are they counting this as spam because they are requesting xxx.htm and getting index.html?
A definite maybe on this one... Seriously, if you are using a meta tag to redirect your visitors and it redirects quickly, some SEs may view it as spam. I've had good luck with a five second (or longer) delay. If you are redirecting using another technique, it will depend on the technique.

>Or are they just dropping the no-longer-available page like I want them to?
If you have a reasonable delay in the redirect, most SEs are just dropping the page. Keep in mind, each reindexes at their own rate, so a page may be dropped by one SE within a few weeks and another may take months, or never...

oilman

5:41 pm on Oct 3, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It sounds more like he is using a custom htaccess file or some other type of server side redirect. Custom htaccess is the way to go if you wnat to get old links dropped out of SEs (at least that's how I would do it.)

jk - who is your host? drop it to me via local email. Maybe I can dig up some info on how they are doing this for you.

littleman

6:12 pm on Oct 3, 2000 (gmt 0)



"the logs show their spiders requesting a no-longer-available page"
This sounds like the server is setting a 404 header. If that is the case, then the pages will most likely be dropped. Load up the page in MSIE5 and see if you get the standard 404 error page.

If you are not sending out the 404 header information then there is a good chance that the pages, being duplicates, will look like spam to the SEs.

tedster

7:37 pm on Oct 3, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>I modified the form at my ISP's "Site Control Panel" to send surfers to the index.html page instead of just giving them the standard "404 file not found" page.

Since this is a function that your ISP offers, it would be good to ask their tech support exactly HOW they achieve the 404 handling. It does sound like they are using htaccess.
One clue you can get is by requesting the page through your browser. If the location window still shows the original URL you typed in, it probably is a change to the htaccess file. In this case the SEs may begin to assume they are seeing lots of pages that duplicate your index. Your index might be buried or dropped for a while on some engines -- Ink has been particularly fierce about duplicate pages.

If the URL in the browser's location window changes to index.htm, then some other kind of redirect is happening. This might result in your old page being dropped because it is a fast redirect. But your index page will probably stay put, and the old URL doesn't exist anymore so it should be dropped, right?

I had exactly this situation last spring with a client, where the index page was being served through htaccess as a custom 404 error page. They got buried in the rankings for a while. I created a custom site map page for the 404 page, instead of simply mapping back to the index. Things have now recovered pretty well -- it took about 8 weeks to bounce back.

jk321

8:50 pm on Oct 3, 2000 (gmt 0)



Thanks to all who replied!

Littleman
<<Load up the page in MSIE5 and see if you get the standard 404 error page. >>

Nope - it re-directs to "/" (I miss-typed in the original message. When I filled out the form in the ISP's "Control Panel" I set it up to go to "www.xxx.com/" and not "index.html" as I originally typed)

Tedster:
<<If the location window still shows the original URL you typed in, it probably is a change to the htaccess file.>>

It changes to "www.xxx.com/"

<<If the URL in the browser's location window changes to index.htm, then some other kind of redirect is happening. This might result in your old page being dropped because it is a fast redirect. But your index page will probably stay put, and the old URL doesn't exist anymore so it should be dropped, right?>>

Man, I sincerely hope so. That's what I was trying to accomplish. The index page is really the only one I care about.

Thanks again,
JK

jk321

10:38 pm on Oct 3, 2000 (gmt 0)




If it is an htaccess approach, would that indicate that only the old out-out-of-date pages will get dropped and the www.xxx.com/ will be OK, or will the whole thing probably get flushed.

jk

tedster

11:06 pm on Oct 3, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>If it is an htaccess approach, would that indicate that only the old out-out-of-date pages will get dropped and the www.xxx.com/ will be OK, or will the whole thing probably get flushed.<<

I think it's the other way around. Serving up the index page via htaccess can make the spider "think" that the index page is now duplicated on lots of other URLs, namely the addresses for your old, dead pages. This can threaten existing rankings for your index page.

In your actual case, because the URL does change, that's good news. You won't be seen as a site with lots of spammy duplicates. Only the out of date URLs should be threatened, if I'm reading the situation correctly.

oilman

11:51 pm on Oct 3, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I went and checked out the contol panel setup for JK123's host. I am certain thatit is htaccess. Through the control panel you can specify what pages you want to show up for the various errors.

JK -what you need to do is make a custom page and then point the 404 error to it. That way you will maintain the integrity of your index page.

littleman

12:46 am on Oct 4, 2000 (gmt 0)



jk321 -
I have a head checking script that should help us get to the bottom of this.
If you are game go to [cgi-fun.hypermart.net ], click on head and put in one of your redirected pages.
Look at the top line and tell us what you get.

Turn off your java script if you want to avoid those annoying pop-ups.

jk321

1:27 am on Oct 4, 2000 (gmt 0)



Littleman:

302 Found
Connection: close
Date: Wed, 04 Oct 2000 02:21:59 GMT
Location: [XXXX.com...] (my site's home page)

uh-oh...

jk

littleman

3:10 am on Oct 4, 2000 (gmt 0)



Ok, that is a 302 redirect. Which basically means 'the page you are looking for has moved here.' I think, in most cases the bot will just ignore the original 302 page and index the page it points to. You should be ok.

Air

4:29 am on Oct 4, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey that's pretty cool littleman, you're cranking out some nice ones.

tedster

6:55 am on Oct 4, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>I think, in most cases the bot will just ignore the original 302 page and index the page it points to. You should be ok.<<

littleman,
I'm a bit confused (verhuddelt, as my Pennslvania Dutch family would say). Shouldn't JK should still create a custom page for the 302 redirect, instead of sending the spiders to the index page? Or else all those old URLs will look like they duplicate his index page -- and that would open up the chance of a deep burial for duplicate content on some SEs.

Am I missing something?

littleman

7:26 am on Oct 4, 2000 (gmt 0)



A 302 redirect (or a 301, which is a permanent redirect) is in a way like a meta refresh. It isn't actually the index page but will point the browser to it.

A good example is altavista.com and av.com. Av.com uses a 301 redirect to point to altavista.com.

look at these to links:
The header and body for av.com [cgi-fun.hypermart.net]
And
The header and body for www.altavista.com [cgi-fun.hypermart.net]

jk321

5:55 pm on Oct 12, 2000 (gmt 0)



Update: How spiders view htaccess re-directs for 404's.

That gurgling sound you heard on 10/10/00 was my site being flushed completely out of Direct Hit.

That's one down...

JK (scanning the "Help Wanted" ads)321

DaveAtIFG

6:48 pm on Oct 12, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



JK - I made the same mistakes on my first commercial site about two years back, it took about 3-6 months to recover rankings and traffic, and I felt like shooting myself at the time. Hang in there! And if you find any promising help wanted ads, remember, we share around here!

jayz

12:26 am on Oct 26, 2000 (gmt 0)



>>I think, in most cases the bot will just ignore the original 302 page and index the page it points to. You should be ok.<<

JK321 or others who can reply...

You mentioned you implemented a redirect for your 404 pages and saw your rankings take a dive.

Was this due to the redirect or because the SE's thought you were spamming with duplicate content?

My underlying question is really: Are 302 redirects ok? Will the pages still be spidered?

jk321

12:53 am on Oct 26, 2000 (gmt 0)



I made a mod to my ISP's "Control Panel."

I THINK what I did was to modify my htaccess file to re-direct 404's to my index page.

It was just bad luck that I got spidered the very next day by several SEs.

I got flushed from AV (#1->#61)and DirectHit (total wipe-out), but interestingly enough, I moved up to #2 in Google!

(Guess what just became my f-a-v-o-r-i-t-e SE)

JK321