Forum Moderators: open
Why should I allow Google to use excessive bandwidth (172760 pages in 90 days on a 1000 page site) to list my pages, if in the search categories or directories, sites that are closed or have not been updated for 2 years are listed ahead of me? Apparently Google is applying little or no factors for last webpage update in their current searchs? but only for sending out the freshbot?
Our site is updated on a regular basis, and is fresh botted daily. Interesting that old pages that have not been on the site for months are not being removed from the google directory either? Based on the above have to wonder how effective the search algo really is?
At least in our category results are "link rated", not by content, updates or freshness.
Any comments appreciated.
To me this cuts to the heart of the search engine happenings on the web today. Search engines creating their own cached databases, using remote (our) servers to serve up the content. Results only as they see them, and in our category, really skewed. As website administators, we, not the search engines are paying for that....ummm...service? through bandwidth. If I am paying for something, I would expect it to be in a functioning order. Perhaps the benefits outweigh the costs, but having search engines continuosly prowling around our sites is expensive, in bandwidth. My files are dated....why not just delete ones that are no longer there, and update new ones from dates? Seems easy? Isnt this how it is supposed to work? Why does google freshbot a page when it hasn't changed? Because it might have changed? Save everyone some money, and only reindex changed pages. As a sidelight what benefit is a fresh tag to our site? Think it entices the average person to visit?
Pages that cannot be found for over three months should be removed from listings...this is plain common sense. I keep hearing my peers say this is done, but I don't personally see this happening. Why should we have to redirect poor search engine performance, through programming?
Just some thoughts....from an early morning hangover:)
Googleguy also posted here how you can reduce the amount of crawler hits by letting the bot know when your pages were "last modified". Bit technical for me, but your server admin could set it for you maybe. Do a site search for "last modified" and look for googleguy posts. Otherwise one of the more knowledgeable members here may be able to more specific than me.
[edited by: Visi at 4:02 pm (utc) on Jan. 4, 2003]
marek....does update search functions, thanks
an old un-updated page can still be very topical and important to recent thoughts.
I guess the original "Google-Pagerank" pdf document of the Google founders had not been updated since then. But it will probably still be "quoted" of voted through recently added links as of now.
Google seems to show an interest in the age of links:
see the Honorable Mention for Laird Breyer in the Google programing contest:
[google.com...]
...suggested some modifications to take into account the "age" of each link to reduce Pagerank's tendency to bias against newly-created pages.
By the way, most members here cry murder, the moment they do not get visited by any Googlebot daily...
BTW, will there be another programming contest this year?
Google did call it the annual contest...
vitaplease, thanks (I think...lol)....for the info.
Your comment about people upset when google doesnt come every day is what I don't understand, I guess. If I am indexed say once or twice a month and maintain my rankings in the search functions, what purpose is visiting daily doing for me other than using bandwidth? It doesnt go deep enough into a site to note most of the changes or additions, so what purpose other than having googles cache copy fresh does it serve?
google requests a page
you return the page with a "last-modified" date
google requests the page again, with "if-modified-since" with the date that you sent them
If you have changed the file since then, you send them the new version.
If you haven't changed the file since then, you send them a "Not-modified"
Most web servers that are only serving up static content, regular html pages, already have this set. If you are serving up any sort of dynamic content, then it is your reponsibility to handle this.
As for old pages, are they really off your server, or is it just that you no longer link to them? Have you tried following these link to see what happens?
Was there already a custom 404 page? If it was not returning a status of 404 when redirected, you might just be telling google that the page they requested still exists.
Oh yeah, I personally give a lot more credit to old links from old pages to other old pages. There is often a reason that they lasted so long.
[google.com...]
Oh yeah, I personally give a lot more credit to old links from old pages to other old pages. There is often a reason that they lasted so long
BigDave, I agree, if that page giving the old links is still updated regularily.
When checking the inbound links from competition and asking for similar links from sites linking to my competition, I often find totally abandonded pages, high Pagerank, but last updated 2000 stuff..
I often wonder if and when Google's algo will take that into consideration..
Enter the URL of one of the pages that you have removed.
The top line should have
HTTP/1.1 404 Not Found
if you have
HTTP/1.1 200 OK
then you are telling google that the file still exists.