That would be useful to webmasters in determining what is actually seen by a spider. Also, it would be indispensable to detect hijacked or cloaked pages. Google could put a message on the results page... Is the page you are seeing different than this one? If so please click here to send a report of this to us. etc.
Just as the cached pages are sometimes very useful to see what *was* spidered this tool would show what is *currently* spiderable. Nobody would be able to make copyright or legal issues for this tool the way excuses are made for the using the "noarchive tag". If you don't want the spiderable page there... simply don't post it.
What if all I want to do is feed Google bot the text version of my site because it's a big site and I don't want Gbot taking up my bandwidth by caching my copyrighted images and downloading usesless tags it doesn't need?
That would be useful to webmasters in determining what is actually seen by a spider
Such a tool already exists: "view source" :)
The whole idea of showing what the spider sees is largely moot if the viewer doesn't understand HTML - and if you can understand that you'll get just as much (if not more) out of view source as you would a "what does the spider see" option.
The general public doesn't (and shouldn't have to) give two hoots about differences between cached page and the actual page.
Yes there is an educational potential for such a tool, but there are already existing programs to fill this niche - the rest of the potential uses for such a tool would involve your competitors pages and either filing spam reports or reverse engineering.
If you just want enough dirt to file a spam report then there are a few methods to get around imperfect cloaking already out there - but if your competitor is smart enough to use a robust cloaking technique then they're probably smart enough to leave you with very little useful evidence to feed into a spam report :)
Msg#: 23679 posted 1:44 am on Apr 30, 2004 (gmt 0)
"What if all I want to do is feed Google bot the text version of my site"
Then that would be a call for Google as to whether there was cloaking or spam. Nobody is forced to post their pages to the www. Google obeys the robots tag if someone wants privacy over inclusion. The excuses are fairly weak in my opinion. I think it would be a great tool for the guys at the plex to detect cloakers. A good deal of the top pages now are cloaks. A fair part of these are hijacked sites.
"Such a tool already exists: "view source""
But as you know cloakers disallow caching and Google obeys this. I think it would be indispensable to help determine hijacked pages. And, while I agree the average Joe Surfer doesn't even know how to use regular Google the right way... there are enough amateur webmasters that would welcome such a tool. These mom and pops type webmasters post alot of pages to the net.
isn't that google's job? I'm sure they have tools to do such work.
Google already provides a cache, and has admitted that if a website decides to exclude itself from that cache, they are considered more likely to be cloaking. As such we can presume google takes a closer look, and so, we don't have to.
You see a cache not match a page, you can report it. You see a lack of a cache, google is already looking closer.
Unfortunately these days it's also the webmasters job to protect their work and though there are ways to catch redirecting cloakers... (they are not always so smug as they think they are)... most mom and pops website builders either wouldn't know how or wouldn't have the necessary access to the server etc. to do much about it. The tool would be for them to see *if* their page is being hijacked. *Or* additionally, if the page that continually tops them in the serps is really what it appears to be. Kinda like "streetlights" for the web. The page would *have* to be called with the identical useragent and i.p. as a regular crawler... A simple button on the page could trigger a note to google engineers and that data could only help them build a better mousetrap to remove these guys from the serps. Provided of course that google actually still cares about the serps... I *think* they do.
There are several threads in here with webmasters losing well established sites to these guys who use simply an ad tracking script to redirect. Another example... I keep a site up for a local church, they aren't a "customer" and the site is non-profit... A porn site has been redirecting to this church's site. *If* someone looking for a church in my town eventually finds porn, what good has come from any of it... my work would be for naught, these folks surfing are just going to close the browser... of course they will be mousetrapped and someone will get affiliate credit for all the popups. To me it is so clear... It's a problem.