Forum Moderators: open

Message Too Old, No Replies

Google Searching

crawls...but doesn't update...again

         

Visi

10:56 pm on Jan 3, 2003 (gmt 0)

10+ Year Member



For those that follow the search engines much more than I do...a question or two:)

Why should I allow Google to use excessive bandwidth (172760 pages in 90 days on a 1000 page site) to list my pages, if in the search categories or directories, sites that are closed or have not been updated for 2 years are listed ahead of me? Apparently Google is applying little or no factors for last webpage update in their current searchs? but only for sending out the freshbot?

Our site is updated on a regular basis, and is fresh botted daily. Interesting that old pages that have not been on the site for months are not being removed from the google directory either? Based on the above have to wonder how effective the search algo really is?

At least in our category results are "link rated", not by content, updates or freshness.

Any comments appreciated.

Visi

3:39 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



Somewhat surprised that there are no comments on this one?

To me this cuts to the heart of the search engine happenings on the web today. Search engines creating their own cached databases, using remote (our) servers to serve up the content. Results only as they see them, and in our category, really skewed. As website administators, we, not the search engines are paying for that....ummm...service? through bandwidth. If I am paying for something, I would expect it to be in a functioning order. Perhaps the benefits outweigh the costs, but having search engines continuosly prowling around our sites is expensive, in bandwidth. My files are dated....why not just delete ones that are no longer there, and update new ones from dates? Seems easy? Isnt this how it is supposed to work? Why does google freshbot a page when it hasn't changed? Because it might have changed? Save everyone some money, and only reindex changed pages. As a sidelight what benefit is a fresh tag to our site? Think it entices the average person to visit?

Pages that cannot be found for over three months should be removed from listings...this is plain common sense. I keep hearing my peers say this is done, but I don't personally see this happening. Why should we have to redirect poor search engine performance, through programming?

Just some thoughts....from an early morning hangover:)

chiyo

3:43 pm on Jan 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



visi, I guess google would say in their defence that you can robot.txt their crawlers out completely or by directory.

Googleguy also posted here how you can reduce the amount of crawler hits by letting the bot know when your pages were "last modified". Bit technical for me, but your server admin could set it for you maybe. Do a site search for "last modified" and look for googleguy posts. Otherwise one of the more knowledgeable members here may be able to more specific than me.

Visi

4:01 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



That's the quandry....we need them, and not only google but also others. I would just like to see a more effective way of doing it, and at less hassle for small webmasters, at a smaller cost. All the files carry a date with them, so why not just use it? and get rid of the old pages? Our error logs are filled with the old (4 month removed) requests. Think google search engine today taking an easy way out to create their own databases, with no comparisons for dead pages. I'll see what ink does on the next update, since they are crawling from their database through our site, requesting all previous pages in their database. Just sitting here watching them request old pages. Different approach than google? just merrily spidering.....can only hope so.
By the way...thanks for input chiyo

[edited by: Visi at 4:02 pm (utc) on Jan. 4, 2003]

marek

4:02 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



visi, are you speaking about Google Directory? This is just a replication of DMOZ and AFAIK doesn't depend on what Googlebot indexes. If you update your site regularly and Google index it, your new content should affect the search results, not directory.

Visi

4:09 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



Is in directory listing, and undersatnd that portion to some degree, because of DMoz problems recently, but old results are in search function, under see pages from site request. First 250 or so are updated, rest are old, obsolete pages. Total pages liste are over 2000, when site is actually around 1000 pages. Was competely redone 4 months ago. The new site indexed, but old site pages just keep hanging around. On top of that google just keeps coming and coming.....dont mind getting indexed, but not that quick at typing to update all those pages every few dayys....lol. You can see by the googlebot pages over last 3 months, they have been crawling, and updating new pages, but not removing old ones:(

marek....does update search functions, thanks

marek

4:39 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



Well, why to refuse the visitors that those old pages can bring. Replace them with redirects (server side) or set up a 404 page to direct visitors from those old pages to the new ones with the most relevant content.

vitaplease

7:05 pm on Jan 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Visi,

an old un-updated page can still be very topical and important to recent thoughts.

I guess the original "Google-Pagerank" pdf document of the Google founders had not been updated since then. But it will probably still be "quoted" of voted through recently added links as of now.

Google seems to show an interest in the age of links:

see the Honorable Mention for Laird Breyer in the Google programing contest:

[google.com...]

...suggested some modifications to take into account the "age" of each link to reduce Pagerank's tendency to bias against newly-created pages.

By the way, most members here cry murder, the moment they do not get visited by any Googlebot daily...

BTW, will there be another programming contest this year?

Google did call it the annual contest...

aus_dave

9:32 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



vitaplease - I think URL above should be winner.html ;).

Visi

11:34 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



Have added a redirect 404...thanks

vitaplease, thanks (I think...lol)....for the info.

Your comment about people upset when google doesnt come every day is what I don't understand, I guess. If I am indexed say once or twice a month and maintain my rankings in the search functions, what purpose is visiting daily doing for me other than using bandwidth? It doesnt go deep enough into a site to note most of the changes or additions, so what purpose other than having googles cache copy fresh does it serve?

BigDave

11:56 pm on Jan 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google does honor your modified date, but you have to set up your server to honor If-modified-since requests from google. It works like this.

google requests a page
you return the page with a "last-modified" date
google requests the page again, with "if-modified-since" with the date that you sent them
If you have changed the file since then, you send them the new version.
If you haven't changed the file since then, you send them a "Not-modified"

Most web servers that are only serving up static content, regular html pages, already have this set. If you are serving up any sort of dynamic content, then it is your reponsibility to handle this.

As for old pages, are they really off your server, or is it just that you no longer link to them? Have you tried following these link to see what happens?

Was there already a custom 404 page? If it was not returning a status of 404 when redirected, you might just be telling google that the page they requested still exists.

Oh yeah, I personally give a lot more credit to old links from old pages to other old pages. There is often a reason that they lasted so long.

vitaplease

2:39 am on Jan 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



aus_dave, your right.. too late to edit so here is the right one, because very much worth the read..

[google.com...]

Oh yeah, I personally give a lot more credit to old links from old pages to other old pages. There is often a reason that they lasted so long

BigDave, I agree, if that page giving the old links is still updated regularily.

When checking the inbound links from competition and asking for similar links from sites linking to my competition, I often find totally abandonded pages, high Pagerank, but last updated 2000 stuff..

I often wonder if and when Google's algo will take that into consideration..

Visi

3:41 am on Jan 5, 2003 (gmt 0)

10+ Year Member



It is a custom 404...how can I check it is sending correct code? When I request the page U get the 404 page. The pages are no longer on the server in any form. Error logs and visitor log say 404 is being sent?

Thanks

BigDave

4:55 am on Jan 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you are logged in to Webmaster World, at the top of the screen, click on the link for control panel.?Then click on the link for "Server Headers".

Enter the URL of one of the pages that you have removed.

The top line should have

HTTP/1.1 404 Not Found

if you have

HTTP/1.1 200 OK

then you are telling google that the file still exists.

Visi

5:02 am on Jan 5, 2003 (gmt 0)

10+ Year Member



404 page is correct....one less problem:)

Thanks for how to do that:)

EasyCall

5:58 am on Jan 5, 2003 (gmt 0)

10+ Year Member



I'm using .htaccess to redirect 404 errors to my homepage, which works great, but when I tried your control panel suggestion, it gave me the results HTTP/1.1 302 Found and lists my index page. So from what you're saying, this means Googlebot thinks these pages still exist? How do I make it so I can redirect 404 errors to my homepage and still serve a 404 error?