Forum Moderators: Robert Charlton & goodroi
Lets start a list of the ways sites can link, refer, or point urls to your pages other than direct hrefs. Mainly, we are after ways that SE's such as Google may run into and use urls they find:
What else? Wow - blew through 20....
I will update as we go. Thanks to everyone who pitches in...
[edited by: Brett_Tabke at 3:57 pm (utc) on April 19, 2009]
I think we've covered most of the common ones and also the less common ones :) but a few that come to mind:
- Analytics data (toolbars, javascript data collection, etc.). OK, a visit isn't exactly a vote (but is a link?) but enough of the right visits is certainly saying something
- Domain registrations and DNS data - Google have access to newly created domains, and also to DNS files, which will contain references to website hosts - some more frequently than others
- OCR - Google have mentioned this a few times (in connection with PDFs, for instance, and many images contain "watermark" URLs and suchlike).
> widgets
Aps are interesting. I thought about it a long time (such as twitter aps), but I don't see a method of that url getting into a se bot. How could it?
> billboards
You know Dinkar, you may have something there. We know for a fact, that Gbot can read text inside of graphics (Google Catalog proved that in spades). Do you think they are running OCR routines over the Google Street View maps? Then they could feed any hits, back into Google maps in order to increase the quality of maps somehow.
It is all in the community Greenleaves. Just trying to give back some of what I get.
> Receptional Andy
> Domain Registration and DNS data
Big time. Nice catch.
[edited by: Brett_Tabke at 8:21 pm (utc) on April 16, 2009]
Pretty much all of the form spidering I've seen is essentially manipulating GET parameters.
URLs discovered in any other HTML attributes, like
<img onerror="http://www.example.com" lowsrc="http://www.example.com"/>
for a full list of all possible ones, just take a look at the HTML DTD
<table background="http://www.example.com">
<iframe src="http://www.example.com">
<base href="http://www.example.com"/>
... and many more
(incidentally, all of these can also be targets for XSS, but that's another topic)
URLs inferred by parentage:
http://www.example.com/dir/image.gif => http://www.example.com/dir/
URLs used in CSS:
body{background:url('http://www.example.com/image.gif');}
URLs used in conversation via Instant Messaging and Chatrooms (assuming suspiciously that they are not very private)
Tweets and statuses and other user-entered content all over the www
brute-force scraping of tinyUrl et al
the URL set as your "home page" in your browser
*your* Browser History (yes, it is possible to scrape it using some nifty and unobtrusive JS+CSS techniques)
Is this just a "let's list the various link types" for fun thread, or are you implying that there is some value to some of the link acquisition methods described above?
Could you put a "*" next to the ones that you feel are most worthy of attempting to get (of course only the "moral" ones).
These types of links obviously cannot make up for good solid link building skills....but on the rainy days when no webmasters are wanting to reply to my link requests.....would one or two of these suffice to build up my link profile?
Kind of like how a weight lifter dude goes on vacation and there are no weights for him to work out with in the hotel he is staying at for 2 days....so he borrows his girlfriends "thigh master" for a day or two?
[edited by: BaseballGuy at 2:59 am (utc) on April 17, 2009]
> OT baseball guy
This is a list of nontraditioal ways to get a url in front of GoogleBot or into the system. The value of doing so is a whole other discussion.
off topic - I cleaned 5-6 side topics in here. Feel free to start another thread with related issues.
And then google earth's .kml- and .kmz-files, particularly with respect to community-applications like the google-earth-war-project.
(Both might be subsumed under 19) or 23), but shouldn't we better treat each "format other than webpages" on it's own?)
Image-Captchas!
Related: URLs may also be encoded as the "result" of those magical stereoscope images (if you definetly want to keppe SEs out;)
Gbot can read text inside of graphics (Google Catalog proved that in spades).
How about URL written on a graphic?
Do you think they are running OCR routines over the Google Street View maps? Then they could feed any hits, back into Google maps in order to increase the quality of maps somehow.
Honestly, it would not be surprised if G is doing this. However, if G is doing this and putting this as a factor in ranking a site, I seriously worry that Internet will become another world that is like our real world (By buying more ads/billboard ads offline, which is easily bought with $$$, a site ranks). There gone the long tail markets and left only manipulation by big shark in the real world.
I am pretty much convinced that links in Gmail pass some juice.
Is there anybody who has noticed this?
How did you find it?