Welcome to WebmasterWorld Guest from 54.242.72.36

Forum Moderators: open

Message Too Old, No Replies

Inktomi Slurp bot visiting my site...

     

Mahoney1

8:31 pm on Jan 8, 2004 (gmt 0)

10+ Year Member



every day, he gets robots.txt then he goes stright for old html files (he goes stright for the old frames that my site used to be, its not frames any more, its not been frames for 2 months), he gets a custom 404 with my index page then leaves :/ I completely re-done my site 2 months ago after 2 years, in them 2 years I never touched it once :( I made it just for somethink to do. Now I take pride in it and its a hobby I really enjoy, I would really like to see him index my site. My robot.txt includes:

User-agent: Slurp
Disallow:

Can anyone help me with this?

BTW, my site used to be in 5 frames... yes 5! :S when I made it I never had a clue, not that I have a clue now.

dwilson

8:34 pm on Jan 8, 2004 (gmt 0)

10+ Year Member



What if you put a page on by the name of your old pages just for Slurp? Either have it redirect to a new page or just have a link. Maybe a good place for a sitemap?

btw, welcome to WW!

Mahoney1

8:42 pm on Jan 8, 2004 (gmt 0)

10+ Year Member



would that work? as he gets the 404 page with the index page, but he wont follow it. anyway why does he sill follow the old index page? I mean its the old index page he still must be following as it contains the framesets :/ why doesnt he see the new index page? the old page has been gone for months :/

dwilson

8:48 pm on Jan 8, 2004 (gmt 0)

10+ Year Member



I'm not going to promise that it will work. But it seems to me that it ought to.

Somewhere Slurp got it in his head that you had those pages. Where he learned it is irrelevant.

Now he visits /MyOldPage.html and gets a 404. He says, "Huh, no page here. I don't know where else to go. I'll leave."

If he gets a sitemap when he asks for /MyOldPage.html he'll at least know where else he can go. He ought to spider the site based on that.

I don't know what he'd do with a redirect. Someone else may have some experience with that.

Mahoney1

8:54 pm on Jan 8, 2004 (gmt 0)

10+ Year Member



Hmm thanks i'll try that. That bot sure needs some work tho, cause hes as dumb as me :/

ncw164x

9:15 pm on Jan 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Slurp can be a bit stupid when it comes for pages which are not there any more, I still get requests for pages which were deleted nearly 3 years ago

Now is that dumb or what? only consolation slurp still spiders the rest of the site

ncw164x

Mahoney1

3:47 pm on Jan 9, 2004 (gmt 0)

10+ Year Member



Hmm I did a little research and came across .htaccess 301 redirect, ie: redirect 301 /oldpage.html [mynewpage.html...] the website says its spider friendly, is that correct? I have added it to my .htaccess file, so I guess I'll just have to wait and see what happens :)

kanetrain

5:57 pm on Jan 9, 2004 (gmt 0)

10+ Year Member



Slurp has issues spidering redirects. He indexes the page you are trying to get rid of as if it were the page that you are pointing to so then INK will list oldpage.html as if it had duplicate content of newpage.html and because your old page is older in the ink database, it will get preferential treatment and will bump your new page out of the serps.

He also has issues spidering cgi and other dynamic links as if they are duplicate content. I'm sure the Ink staff is working to fix this, but in the meantime, it sure is annoying.

Slurp also has some serious issues indexing old stuff and not getting rid of expired links/pages. I too have noticed this. I've got pages that have been gone for over a year still showing up in the serps. It's so frustrating! Ink's bot, In my opinion is the hardest to deal with and the most buggy... but that's just my opinion though... what do I know?

[edited by: kanetrain at 6:01 pm (utc) on Jan. 9, 2004]

ncw164x

6:00 pm on Jan 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey what do any of us know?

ncw164x

Mahoney1

11:31 am on Jan 10, 2004 (gmt 0)

10+ Year Member



Well he came back last night, but went for some really old links I could'nt redirect :( Will have to wait longer.

sabai

5:33 pm on Jan 10, 2004 (gmt 0)

10+ Year Member



Mahoney1 - also, make sure that your custom 404 page is returning the correct 404 not found header...

Mahoney1

12:25 am on Jan 11, 2004 (gmt 0)

10+ Year Member



Sorry but what do you mean by "404 not found header"? Iam totally new to this :)

Mahoney1

12:30 am on Jan 11, 2004 (gmt 0)

10+ Year Member



BTW does anyone know how to use Redirect 301 /mydomain/some stuff/some pics.html? I cant get it to redirect when theres spaces in the URL, it will work if its /mydomain/somestuff/somepics.html but not when theres spaces.

Thanks

sabai

4:18 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



what do you mean by "404 not found header"?

When a web page is sent, a bunch of stuff is sent before the HTML of the page - this is the document headers (completely different to the HTML head). One of the things that is sent is a status code - e.g if a document is found, the error code sent will be 200, meaning that the page was found by the web server. If page is not found, the server should return error code 404 so that the search engine / browser or whatever knows that the requested page could not be found. A common fault with custom 404 pages is to have them return a page found instead of a not found.

If you are using PHP then put this at the very top of the error page...


<?php
header("Status: 404 Not Found");
?>

sabai

4:23 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



I cant get it to redirect when theres spaces in the URL

try putting

/mydomain/some stuff/some pics.html?
inside double quotes and don't use spaces in you URLs in future :-)

Mahoney1

9:44 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



Your a genious, thanks :)

Regards

 

Featured Threads

Hot Threads This Week

Hot Threads This Month