Forum Moderators: open
[webmasterworld.com...]
The URLs pertaining to my website that all point to my index page take the following form.
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?SID=xRSUNVW8R9P44HSYQ6UWED&
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/default.asp?S=AC3&am
www.some-other-URL.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-2.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-3.com/file/callink.php?linkid=3
I have emailed google, but have received no reply. I am unsure what I can do to A) eliminate the incorrect URL's that appear to originate from my site and B) eliminate the mirror URLs that originate from unrelated websites.
Any help would be greatly appreciated.
Have you tried searching site:yoursite.com on the datacenter I mentioned above to see if there are any positive changes? Also, take a look at the inurl:tracker2.php search results. I am encouraged by the fact that the urls containing tracker2.php once were indexed using the title/descriptions from the pages they hijacked. But on that datacenter (and a few others), they are now listed as url only. We are seeing baby steps. But they are steps in the right direction.
[look4ithere.com...]
Type webmasterworld into the search
There isn't anything wrong with 'webmasterworld' appearing on other sites, or in the urls of other webpages (as file names, perhaps). The problem lies in Google's abilitity to distinguish between someone else's url and your own. The problem has been that the site:mysite.com search has been showing many unrelated redirect urls as being part of a particular site. The site: command is designed to return just the pages truly associated with a particular site. In my case, when I ran the site: search over the past 6 months, there were over 20 redirect urls (increasing weekly), none of which I owned or created, being tied to my site. Google would ultimately penalize the original site as the algorithm must believe that the one person owns all of those spammy urls and is trying to spam the engine.
C
You can view everything on webmasterworld, without ever going to webmasterworld.
Do a search there on any site and then try the link for the site and you will see what you mean. Please dont dismiss this without even looking.
A quote from a friend: "The difference is that they aren't caching the pages or making their results spiderable so it appears that it's their content."
p.s.
This site is now the ONLY site that shows up when I look for my site in google. Otherwise, it is totally gone.
Potentially resulting in two pages with identical content.
There are some threads here on using mod_rewrite to redirect one to the other.
[webmasterworld.com...]
My apologies if I misunderstood you. We are all in this together, and I certainly understand your frustration. Hell I started this thread two months ago. LOL
I, too, have the 'www' missing when I search for my site. As I mentioned before, some of the redirects that were being incorrectly connected to my site are gradually disappearing. Still, if I do a specific search for my site using the 'www', I get a redirect site, cached as my own, and indexed in Google with full title and description. Google actually thinks I have replaced my original domain/url with these silly redirects. Fortunately, that particular redirect was the result of an honest link a webmaster gave me a couple of months ago. He agreed to remove the link, which has ultimately left a 404-Page-Not-Found error. I was then able to submit the url through Google yesterday for removal, which normally takes 24 hours using the Google URL removal tool.
Google's notion that these redirects were designed by myself to replace my own, original domain is absolutely ludicrous given that the original domain is 4 years old and has an order-of-magnitude more backlinks (which should signify it's importance/legitimacy as the 'original'). There is clearly some very flawed logic in Google's algorithm(s).
Incidentally,
I have read that it is possible to block redirects via htaccess. Is this advisable? I have identified the ip block from which the tracker2s to my site are coming.
I may make this question a new post.
(http://www.look4ithere.com/) They've framed webmasterworld into their own site. And trapped all the outgoing links so they redirect back into their site.
...
You missed the point. These people are taking all of our sites and framing them into THEIR site, and trapping all the links.
<FRAMESET ROWS="100,*"...<FRAME NAME="BANNER" SRC="http://www.look4ithere...
<FRAME NAME="MAIN" SRC="http://www.webmasterworld...
I tried it with my site. Every link (html anchor) on my site contains a
target='_top', which causes a break-out from any framing (this is not a new issue). I checked, and it works fine. There is no "trapping all the links" occurring other than standard Frames behaviour. This is the format to follow if you want to make use of it:
One thing I did see very recently that I thought was odd was a use of javascript to effect a link:
Duplicate URLs contain %09 in query strings and Google as termed all such pages as SUPPLEMENTAL PAGES.
www.domainname.com/dir/filename.aspx?varid=%0924
When you click on the above URL, it opens the same page as:
www.domainname.com/dir/filename.aspx?varid=24
These cannot be removed through Google removal tool as they "still exist".
These duplicate URLs are creating major problem for us as no new content is being indexed by Google and only the home page is being refreshed by Google.
Is there a way to remove these duplicate URLs.
please.....
Are the urls generated from your domain, just indexed incorrectly?
If so, you can solve the problem the way I did. You are correct that Google will not remove the urls because they "still exist". However, they WILL remove the urls if you create a robots.txt file then go back to the Google URL removal tool and submit the url of the robots.txt file. They will remove them within 24 hours, typically. Mine were gone in 3 hours. You may already understand Robots.txt, but I had to learn real quick last month. Be careful not to disallow files that you DO want indexed. If you see specific incorrect urls being indexed, I would just disallow the files you see, followed by a slash:
User-agent: *
Disallow: /dir/filename.aspx?varid=24/
Leaving the slash off at the end is an implied wildcard, and you will end up disallowing everything in the /dir/filename.aspx?varid= realm.
www.theirdomain.com/rd/results/rdq_them/www.mydomain.com/ f=searchemall_
The landing page contains my site in their frame.
Sorry, I thought this was another example of what you were talking about.
or... you could make the point that modern day SEO is really about using the above defined methods... redirect scripts and other black hat means to duplicate existing well written content and you need to be defensive these days just to have a presence on the web. SEO used to be an art... it was about finesse. Now it is something else altogether.
I assume you've done all that. Wanna bet that anyone can take down your perfect SEOed site with just a few 302 redirects and all you can do is sit and watch as your rankings=0?
Fingers crossed.
hm when I wanted to click to page to I got this from Google:
We're sorry...
... but we can't process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected.
We'll restore your access as quickly as possible, so try again soon. In the meantime, you might want to run a virus checker or spyware remover to make sure that your computer is free of viruses and other spurious software.
We apologize for the inconvenience, and hope we'll see you again on Google.
Im almost 100% nothing is wrong with my computer and when I try a new search nothing is wrong.
I decided to remove some of my links without title and description.
I just reviewed the robots.txt material and have 1 question left.
How should I set up the robotfile when I want to remove files from a subdomain?
Let's say I have the following structure:
black.widget.com/small
black.widget.com/medium
black.widget.com/big
I want to keep the medium index, but remove all pages in the medium dir.
Should I set up as follow?
User-agent: Googlebot
Disallow: /black.widget.com/medium/
So...how to cope with a subdomain?
TIA