Welcome to WebmasterWorld Guest from 220.127.116.11
The first part of the link was actually to a page that exists, but there was a bizare string of characters after the page name.
500 (Internal server error)
Of course it threw an error on our site.
See those last characters... now where on earth did that come from.
When I do a properties on the actual link I get this info
Protocol: HyperText Transfer Protocol
Our site doesn't use any type of these strings so either someone linked to us in a VERY bizarre fashion or this is some internal Google error or a way of testing the site....
What do y'all make of it? Anyone else seen anything like it?
YOUR site does not use those characters, but the site where that link came from DOES use those characters. That's why Google was looking for that URL, it's not because of your site, but some other site like a scammer scraper phony SERP site has a link to your site and when Google crawled them and found your URL there, they came to your site looking to index that URL where Google was met with an error from an improperly formatted URL.
It might also be the 404 URL that Google generates to verify your site when you first signup to Sitemaps. A lot of people forget, that as part of the verification process, GOogle sends a purposefully incorrect URL to your site to make sure you have a true 404 Page Not Found page. SO it could be that also.
As we have found with our site, our links appear on thousands of those phony SERP scraper pages, which are merely repackaging the top 10 overture or top 10 Google adwords pay per click results for a particular keyword.
So the quesiton is, do you advertise on Adwords, or Yahoo's pay per click searches?
If so, what you're seeing is probably remnants of a scammer's adwords or Overture affiliate code used to generate the URL leading to your site from his phony SERP page. It's easy to spot the overture and Google links to your web site on these phony SERP pages. Here's how:
Let's say you search for widgets on Google, and click on one of the results that looks promising to you. Turns out you just land on one of those SERP pages that is just another listing of 20 search engine results, with some adwords thrown in. Then you see your site listed there.
You'll see the link to your site that when you mouse over it, the browser status bar shows you www.example.com, but if you actually do a view source on the scammers web page, you'll see the URL is some real long URL like the one you showed in your posting to start this thread.
Usually though, it will be like 2 lines long, and start like this:
Scammers are sneaky this way, they tell the browser to show you one URL to trick you into thinking you are going straight to your site, but in the HTML they actually use the lengthy Google ADwords or Overture PAID link to your site. So people click on the "link to your site" not knowing it's actually a sponsored link, that you are paying for. The affiliate links from Google adwords and Overture are long, with tons of encoded characters.
So why did you get the 500 error?
It could be they are trying to wreck your server on purpose, or they just screwed up generating the code for your site, and your server did not like the format, and choked on it.
THe fact that you got the 500 error and NOT the 404 error makes me think that this was NOT Google performing the 404 test as part of verification. But check your log file and see if this error occurred at the time you were performing your initial Sitemaps verification.
My gut feeling says this was a scammer pulling some kind of trick that back fired on him. He won't earn any money as an Adwords affiliate if the link does not go through!
Hope this helps!
[edited by: JeffOstroff at 1:11 am (utc) on July 30, 2006]
Also .. google sitemaps now has a way to verify your page through a meta tag which is nice since most sites dont like throwing potential customers to a 404. I personally like to put them at a page that allows them to easily search again.
I wouldn't say it choked, but it definately burped. We have had a lot of problems with blasted scrapers. Nearly every weekend I spend half a day searching for them and reporting them.
We use completely custom code for our site. No templates.. no wysiwyg editors. Totally from scratch.
We do use a few adwords and some overture since things have been slow for us this summer, but why would google try to spider their own ppc link or an overture/yahoo/whoever link to begin with.
I wish there was some way .. on the sponsored links that say .. a link gets clicked on, the link actually fires back a response to google, then say google scans the page it came from for quality and then decides whether or not it's a worthy source. If it isn't then parse the url and send the customer to the site at no cost to us and if the site comes back as questionable then they are put on some sort of probabation or placed in a que for manual review.
%7C = ¦
%5B = [
%5E = *
So the bad querystring would really look like &y=02D2B45A1CDC1231&i=357&c=9315&q=02*SSHPM[L7.&&'?~¦jm~%
when I ran a google search for it under
I got ALL kinds of supplimental pages and really spammy ones. I mean big time spammy ones. I did find that the & symbol is used in php pages instead of the? to define the beginning of a querystring.
the only valid link one one of the pages found was actually a wikipedia page in germany.
There were several site crawl errors shown for:
mydomain.com instead of it being www.mydomain.com since we have a 301 redirect for those to go to the main domain structure. it showed "domain not found" but obviously google knew it was ours since it showed in our sitemap. Our sitemaps show the full url and have since last November.
I just wish that it would tell us where the links were found so if one of them is bad we can have the other website fix it .. or if it's a spammer site we could find it easily.
were being seen as duplicate content and from what we feared the https versions were and we were hit with a penalty even though we were totally innocent of spam.
a note on my post above .. the link that appeared to be an internal wikipedia link I don't believe was to our site and I certainly don't speak German.
[edited by: Bewenched at 7:59 pm (utc) on July 30, 2006]
surely sitemaps only checklinks and urls that originate from your site?
At least onsite or off, please, guys! Actual page would be even better!
(Yes, of course they read these threads!)
For example, your apache server would have an error log and a regular access log. I think it show sup in both, as the error log only shows errors, but the access log shows everything. I typiclaly just run webtrends on the access log, and if it does not parse what I am look for, I open up the error log and access log in a text editor and hunt it down.
[edited by: JeffOstroff at 12:24 am (utc) on Aug. 1, 2006]
Those characters are all used in html and most of the time it happens from a cut and paste operation.
More than likely an href tag not being properly closed!
Have you run xenu link slueth on your site? Xenu will help you pinpoint the error if its on your site.
Wouldn't your log file show you where the error came from?
As well, as my sites are all generated, and it should happen to be some bad code internally causing a dud link (no, no of course that never happens, I am talking theoretically ;)) then a source code search probably won't find it. So the backlink is crucial.