Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: bakedjake
How long have they been around and where was the press?
I can read most of those screen shots just fine on a 22" monitor at 1600x1200.
Yah, Bill, but it's not the whole page. What you leave unmentioned is they only show five paragraphs out of 23 total paragraphs in the article about Obama/McCain. It's not the whole page, the user has to click through to the site to enjoy the entire article.
What you leave unmentioned is they only show five paragraphs out of 23 total paragraphs in the article about Obama/McCain.
You're sidestepping that some pages are smaller than others so it could easily show the entire content or a significant portion of it so stop focusing on one particular page layout.
The point is that the snippet served up is now significantly larger than any other SE which is what I find unacceptable.
"Search Me" - good name; so easy for URL.
Way cool results presentation - like new Mac OS, or using cooliris.
So if can iron out inevitable wrinkles, avoid lawsuits from cranky webmasters, and enhance results - looks to me that should deserve a good share of the market.
[and google likely looking hard at this; not laughing their socks off as likely did w cuil]
If not smaller "snippets", some blurring of the images a good idea; maybe could use something like graded filter - so sharper at the top (header readily read, along with title; albeit not all sites share such design), and then quickly becoming fuzzier.
So, can get a fair idea of what a page is like, but to read text/appreciate photos, have to visit the page.
I've 1600px wide monitor; are snippets where main text can be read with ease.
What if an established SE rolled out a search interface like searchme.
What if website owners could 'opt in' to receive full indexing & display as we see now at searchme.
Anyone concerned with copyright or other issues could stay opted out of their text & pix being scraped.
So now the 'opt in' sites would be nicely displayed (and likely clicked on more) & the other sites would be a blurry mess of text & x's where the images should be....
Would the owners of those sites have any legal recourse for being displayed in an unfriendly light?
I imagine that most people would go through to pages that appeal. The images on their image search do look rather too big but perhaps an automated watermark across the images might persuade users to visit the site. Definitely very powerful and intuitive. Excellent topic for WW.
So where is the robots.txt to remove my sites? If we all have to contact them it's going to get out of hand very quickly.
Then again, perhaps searchme will show them a thing or two, however, this fascination with a high production SERPs interphase really reminds me of the apparent operational path ask.com followed
I have to ask how much of a copyright "infringement" is possible on a web page with one paragraph? Where copyright falls through the cracks is FAIR USE where it is impossible to quote a small article without quoting the entire article.
Think for example about a 5 line poem with less than 200 chars ( many haiku would be covered by this example ) ..One line if reproduced by someone who did not create it ..can be considered to be a quote ..that would be about 20% ..obviously if you wrote 500 pages than 20% ( or 100 pages is no longer OK if used as a "quote" ) ..but then "fair use" isn't what search engines do ..what they do is aggregate lots other people's original material ..brand it with their logo ..on their pages ..and call it their search ..and then pass it off as their product
"fair use" is for reveiwers ( find me the reveiw on any page of googles serps or MS or searchme etc etc ..or educational establishments ..Not a single current existing search engine is an educational establishment ..
Practically search engines are usefull ..but they are not obligatory ( Brett took this entire forum out of search for a while ..not just "noarchive" ..but not spiderable "no bots" ..new people still found it :)..the old way via links from other sites and by word of mouth )..no google yahoo ms or anyone else needed ..just other webmasters saying "hey ..look there is this place I know" ..and making a link on their site ..
Search engines must obey the law ..and the protocols " even googles "noarchive" tag is an abomination which we should not have to use to prevent them scraping our entire pages ..into their publically accessable cache ..
Since they had some very bad press over the level of branding that they placed on their "cache" of entire pages ..even though they have had judgements given ( for now in their favour ) by judges who dont know a pixel from a png ..and so shouldnt even be adjudicating on matters in which they are ignorant ..
Nevertheless google has reduced the "branding" on the "cache" pages ..to quieten down some webmasters as their actions definately were evil.
When G began they were almost universally loved by webmasters ..they were new ..radically different ..and we all sung their praises to each other and to average joe and jane surfer ( most of whom were using alltheweb or altavista or Yahoo etc and who didnt know how easy it was for webmasters to manipulate what they the average surfer saw on page 1 ,2,or 3 for almost any search term ..But G were not gratefull ..and they treat webmasters and the law ( copyright or federal or countries laws ) with ever greater distain ..because they now have the money and the eyeballs to stiffle access to dissent ..
So now they are talked of in terms of hate , fear , and dislike almost in the same way that one once heard used exclusively for microsoft ..who also dont really listen ..even those who make good money with G would like an alternative so all their eggs are not forced into the same basket as G has a defacto monopoly on search in the west ..and is in bed with repression in the east ..one tweak of the algo can ruin you ..
Randy and his peoples new search engine could be the beginnings of the re-balancing of search ..the breaking of G's monopoly ..and the use of the flash interface and the concept is as radical as was google's "clean" page was then ..looks like Linux and 3D desktops and Spielburg all rolled into one ..and i repeat I love the look ..And Randy is also here ( for now )and discussing ..and apparently listening ..( which is more than G's, Y's and MS's PR reps have done in a long while ..and yes I include Mr Cutts in the ranks of SE PR spinnners ) ..So rather than fight amongst ourselves ( your sites may not be affected by screenshots and entire 1st and only paragraphs on page being shown ..but some of my sites are ..and one in particular and it's not even a site that I bother with since years ..but it's the principal of it ..but then it might have been your site that was affected by some other aspect of the searchme model ..and I'd fight for you not against you if that were the case ..somesites are just 5 pages ..images and a little text ..I'm thinking of a site that sells flash nav etc that belongs to a member here ..Others may be selling ebooks on stopping smoking or gardening or whatever ..usually one or two para pages in that model ..and most adsense sites I see are sparse on text ..so should all the sites with less than 2000 words per page just go to the wall to let Randy and his friends make money ..and os that some here ..might ..yeah might ..get a little more traffic ..?
While Randy is here and listening we may be able to work something out with him and help us all along with him ..
Saying what amounts to "screw you jack my site has big pages so I'm ok !" isnt what WebmasterWorld has ever been about ..
I think Incredibill ( may have been someone else tho ) posted a peice recently quoting the lines that end with
"and then they came for me and there was no one to speak up for me " ..or some such ..
BTW ..DMCA doesnt require one to have large amounts of text ..nor do any of the international conventions ..
Ps ..Randy ..still think you have more in common with a directory than a search engine ..even though one does not have to postulate for inclusion ..
[edited by: Leosghost at 3:24 pm (utc) on Sep. 29, 2008]
Further to the Mac/Safari issue, just wanted to let you know that I've checked, and my version of Flash player is MAC 9,0,124,0.
SearchMe.com doesn't work for me. (Mac OSX 10.4.11 and Safari 3.1.2 - Mac is a PowerPC rather than Intel.) All other Flash sites work for me: but I definitely can't see the SearchMe site unless I change browsers.
Appreciate that this thread is mainly concerned with other matters - but I'd guess it's important to know that some people may not be able to use your search engine.
Firefox comes "out of the box" with pop up blocked switched on. If it doesn't work with the vanilla settings of the world's second most popular browser then it is seriously broken.
All it did was display screen shots which didn't link to anything although Firefox kept flashing up warning messages about pop-ups.
least mine does ..firefox 22.214.171.124 ..
I tried Firefox 3 ..it ran like molasses and took over the CPU and showed over 200,000 processes ..so I went back to an older one ..I'll wait to upgrade again to series 3 until someone convinces me it runs light and fast ..xp pro.
Yes, URLs could be prominent (not even noticing them just now)
Results presentation useful if you'll check thro only a few results (as typical)
Less so if want to skim down list of lots of results; or use browser search to check thro search engine result. Both readily achieved w google. Option to remove images pane (as well as option to make it smaller) might help here?
Seems even when have list of results in lower window, Firefox search only checks the window w images. Only images, so it can't find words.
Interesting google n these long results (as yet, no legal action notices in that thread!). I'd suggested google would be watching but not laughing at Search Me.
Next we discussed snippet length and the other requests but since there does not seem to be consensus on this among webmasters, we want to open this up the webmaster community to recommend a series of new robots.txt directives that will cause our bot to do things like limit or eliminate snippets, blur images, etc. We want to give the webmaster control of it so we don't end up dictating it for everyone based on the demands of a few. So I want to throw it back to you all, what directives would you like to see added to robots.txt for visual search engines like searchme?
Want traffic? Take it from where it comes. Want traffic and somebody is willing to work with how that is developed, TALK TO THEM.
Me? Above the fold...EVERYTHING I GOT. Okay. More than that...have to think about it. What an incoming visitor to my site sees on first click is what searchme seems to be showing. Okay by me as far as the interesting searchme display goes.
Dang it, if anyone doesn't want to share/display this stuff then you're in the wrong business.
I took a look at searchme. Interesting. Didn't know "Charlotte" was searchme...blocked it some weeks back. Still listed...but not as strong as I might have been because of that (Randy... I'll let your bot back in...didn't honor robots.txt the first two weeks, hope that has changed).
Copyright infringment is one thing. Search engine is another...if the search engine does not serve the website.
You can't get listed if you don't let 'em...and you can't complain about the listing if it is done responsibly.
Not a rant, just a reminder we don't bite the hand that feeds us (too often).
One of the first decisions was to modify our bot to honor the NOARCHIVE directive in robots.txt .
I am not trying to be presumptuous and teaching search engine CEO what is robots.txt and what is META data, but just in case..., and it might be good reference point for new webasters
uhh,ohhh NOARCHIVE is not part of the robots.txt, but rather is meta data element (tag) that is part of the page meta data, while robots.txt is site wide
robots.txt protocol has established 'directives' although some search engines expanded it a bit - such as use of pattern matching, etc.
META data tags also have a standardized set, however there is tag data that is supported by some and not the other SEs (their initiatives)
Good primer of what "robot controls" (robots.txt and MEAT) are supported by (current :) ) major search engines
Managing Robot's Access To Your Website [janeandrobot.com]
all about robots.txt - [robotstxt.org...]
W3C notes on robots.txt - [w3.org...]
(scroll down and there are notes on META data as well)
IETF rfc - [robotstxt.org...]
Google bot and NOARCHIVE
However, those individual pages may then have a directive to not index it, not archive it, etc, buried in the meta data. With nofollow and others there are quite a few possible combinations.
For Google, they show URLs blocked by robots.txt as URL-only entries in their SERP, which is often less than useful - as they are pages that the webmaster often/usually didn't want showing up at all. Google shows nothing at all for the URL when a page is blocked by robots meta noindex data.
Many sites have a bot-blocker that activates for anything that accesses stuff explicitly listed as "keep out" in the robots.txt file so beware of that.
Does your bot check robots.txt every time it goes to spider a site, just before starting, or does it cache the robots data for a while like some other bots - who then stumble into blocked areas, blocked only recently but after the last visit the bot made to check the robots file?
For robots.txt, what does your bot do when there is a section for "all" (User-agent: *) and a section especially for your bot? In that particular case, Google reads ONLY the section for their bot, so you have to list everything for Google in that one section, necessarily repeating everything that is already listed in the "all" section again. If your bot reads both sections, how does it prioritise conflicting rules?
Does your bot understand wildcard notation, like Disallow: /*/thatfolder for example?
Rhetorical questions, in the main, I'm sure your guys have thought carefully about these things. However, as suggested above, the place for those answers, in detail, is on a "for webmasters" section of your site.
If they have international ambitions, they will need to learn.
Perhaps they don't have such ambitions. I'm all for multiple language support, but it is very very hard (therefore expensive) - you wouldn't launch a service that attempted to have that, you'd wait until you've grown and proven your business case in one language, then expand.
There are 1.5 billion English language speakers in the World according to Wikipedia. Seems to me like a decent starting place.
I support the idea, but I've never had any of my sites translated into other languages. They are all in English, and I get decent traffic despite that.