homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderator: open

Paid Inclusion Engines and Topics Forum

  posting off  
Inktomi Slurp bot visiting my site...
Mahoney1




msg:22471
 8:31 pm on Jan 8, 2004 (gmt 0)

every day, he gets robots.txt then he goes stright for old html files (he goes stright for the old frames that my site used to be, its not frames any more, its not been frames for 2 months), he gets a custom 404 with my index page then leaves :/ I completely re-done my site 2 months ago after 2 years, in them 2 years I never touched it once :( I made it just for somethink to do. Now I take pride in it and its a hobby I really enjoy, I would really like to see him index my site. My robot.txt includes:

User-agent: Slurp
Disallow:

Can anyone help me with this?

BTW, my site used to be in 5 frames... yes 5! :S when I made it I never had a clue, not that I have a clue now.

 

dwilson




msg:22472
 8:34 pm on Jan 8, 2004 (gmt 0)

What if you put a page on by the name of your old pages just for Slurp? Either have it redirect to a new page or just have a link. Maybe a good place for a sitemap?

btw, welcome to WW!

Mahoney1




msg:22473
 8:42 pm on Jan 8, 2004 (gmt 0)

would that work? as he gets the 404 page with the index page, but he wont follow it. anyway why does he sill follow the old index page? I mean its the old index page he still must be following as it contains the framesets :/ why doesnt he see the new index page? the old page has been gone for months :/

dwilson




msg:22474
 8:48 pm on Jan 8, 2004 (gmt 0)

I'm not going to promise that it will work. But it seems to me that it ought to.

Somewhere Slurp got it in his head that you had those pages. Where he learned it is irrelevant.

Now he visits /MyOldPage.html and gets a 404. He says, "Huh, no page here. I don't know where else to go. I'll leave."

If he gets a sitemap when he asks for /MyOldPage.html he'll at least know where else he can go. He ought to spider the site based on that.

I don't know what he'd do with a redirect. Someone else may have some experience with that.

Mahoney1




msg:22475
 8:54 pm on Jan 8, 2004 (gmt 0)

Hmm thanks i'll try that. That bot sure needs some work tho, cause hes as dumb as me :/

ncw164x




msg:22476
 9:15 pm on Jan 8, 2004 (gmt 0)

Slurp can be a bit stupid when it comes for pages which are not there any more, I still get requests for pages which were deleted nearly 3 years ago

Now is that dumb or what? only consolation slurp still spiders the rest of the site

ncw164x

Mahoney1




msg:22477
 3:47 pm on Jan 9, 2004 (gmt 0)

Hmm I did a little research and came across .htaccess 301 redirect, ie: redirect 301 /oldpage.html [mynewpage.html...] the website says its spider friendly, is that correct? I have added it to my .htaccess file, so I guess I'll just have to wait and see what happens :)

kanetrain




msg:22478
 5:57 pm on Jan 9, 2004 (gmt 0)

Slurp has issues spidering redirects. He indexes the page you are trying to get rid of as if it were the page that you are pointing to so then INK will list oldpage.html as if it had duplicate content of newpage.html and because your old page is older in the ink database, it will get preferential treatment and will bump your new page out of the serps.

He also has issues spidering cgi and other dynamic links as if they are duplicate content. I'm sure the Ink staff is working to fix this, but in the meantime, it sure is annoying.

Slurp also has some serious issues indexing old stuff and not getting rid of expired links/pages. I too have noticed this. I've got pages that have been gone for over a year still showing up in the serps. It's so frustrating! Ink's bot, In my opinion is the hardest to deal with and the most buggy... but that's just my opinion though... what do I know?

[edited by: kanetrain at 6:01 pm (utc) on Jan. 9, 2004]

ncw164x




msg:22479
 6:00 pm on Jan 9, 2004 (gmt 0)

Hey what do any of us know?

ncw164x

Mahoney1




msg:22480
 11:31 am on Jan 10, 2004 (gmt 0)

Well he came back last night, but went for some really old links I could'nt redirect :( Will have to wait longer.

sabai




msg:22481
 5:33 pm on Jan 10, 2004 (gmt 0)

Mahoney1 - also, make sure that your custom 404 page is returning the correct 404 not found header...

Mahoney1




msg:22482
 12:25 am on Jan 11, 2004 (gmt 0)

Sorry but what do you mean by "404 not found header"? Iam totally new to this :)

Mahoney1




msg:22483
 12:30 am on Jan 11, 2004 (gmt 0)

BTW does anyone know how to use Redirect 301 /mydomain/some stuff/some pics.html? I cant get it to redirect when theres spaces in the URL, it will work if its /mydomain/somestuff/somepics.html but not when theres spaces.

Thanks

sabai




msg:22484
 4:18 pm on Jan 11, 2004 (gmt 0)

what do you mean by "404 not found header"?

When a web page is sent, a bunch of stuff is sent before the HTML of the page - this is the document headers (completely different to the HTML head). One of the things that is sent is a status code - e.g if a document is found, the error code sent will be 200, meaning that the page was found by the web server. If page is not found, the server should return error code 404 so that the search engine / browser or whatever knows that the requested page could not be found. A common fault with custom 404 pages is to have them return a page found instead of a not found.

If you are using PHP then put this at the very top of the error page...

<?php
header("Status: 404 Not Found");
?>

sabai




msg:22485
 4:23 pm on Jan 11, 2004 (gmt 0)

I cant get it to redirect when theres spaces in the URL

try putting /mydomain/some stuff/some pics.html? inside double quotes and don't use spaces in you URLs in future :-)

Mahoney1




msg:22486
 9:44 pm on Jan 11, 2004 (gmt 0)

Your a genious, thanks :)

Regards

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved