Forum Moderators: open

Message Too Old, No Replies

Searching for HTML Source?

Is there a way to search Google's cache?

         

peterq

9:56 pm on Dec 15, 2004 (gmt 0)

10+ Year Member



Does anyone know of a way to search for HTML source code using Google's page caching or something similar? I couldn't find a reference to the cached page field in the Google API. If the data is there, it seems they'd offer to search it-- filtering out searches for actual tags of course.

Many thanks in advance of a response.

DerekH

12:05 am on Dec 22, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to Webmasterworld, peterq

Let me bump this back up for you...

Does anyone know of a way to search for HTML source code using Google's page caching or something similar? I couldn't find a reference to the cached page field in the Google API. If the data is there, it seems they'd offer to search it-- filtering out searches for actual tags of course.

What do you mean - to search the HTML filtering out the ACTUAL tags?
HTML without the tags is the page content - that's why CSS and HTML coexist so well...

Do explain what you mean...
DerekH

peterq

3:58 pm on Dec 22, 2004 (gmt 0)

10+ Year Member



Thanks a ton for your reply! For example: If you look at the source Google's cache of google.com:

[64.233.167.104...]

...I would like to be able to search for ".h{font-size: 20px;}" and have Google.com come up as a result. I would like to be able to search the source code of the entire google index if possible. I don't know if anyone else keeps a cache of every page, or not.

In the Google API, I see I can return the cache for a given page with a single request, but there is no method to include the cache in the search. I can only imagine what kind of resources such a search would take, but I wonder if anyone has seen such a thing.

Thanks again!

treeline

7:59 pm on Dec 22, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Boy, if you can do that it would really help hackers writing viruses like the one attacking phpbb right now.

peterq

8:25 pm on Dec 22, 2004 (gmt 0)

10+ Year Member



Very good point-- a good reason for Google not to expose the cache.

Too bad those people are already doomed by the "powered by phpbb".

lizardx

8:39 pm on Dec 22, 2004 (gmt 0)

10+ Year Member



<< Too bad those people are already doomed by the "powered by phpbb". >>

You're not doomed, all you have to do is upgrade to 2.0.11, it's an easy upgrade. With no board mods, it takes about 5 minutes. With mods it takes longer of course. But only an hour or two.

fclark

3:51 pm on Dec 23, 2004 (gmt 0)

10+ Year Member



I asked about the same thing in this thread [webmasterworld.com] last month.

Basically, a new google search tag along the lines of inurl: (e.g. inhtml: ) would be really handy.

kaled

4:40 pm on Dec 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would certainly like to be able to search html source to look for copyright infringements.

I would suggest source: rather than inhtml:

Kaled.

peterq

1:22 am on Dec 24, 2004 (gmt 0)

10+ Year Member



Ah, Kaled! Exactly why I asked. Google is already very useful for content copyrights, but source code copyrights would be a plus!