Welcome to WebmasterWorld Guest from 54.221.26.113

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Why Did Google Index Pages With NoFollow?

     
2:19 pm on Aug 3, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 23, 2008
posts:294
votes: 0


A month ago I added the nofollow attribute to the meta tags of many pages, more of them new, but some of these new pages with nofollow attribute are indexed.

Also some of pages that are in a subdirectory that was always blocked by robots.txt, are indexed...

20 days ago, I deleted 25% of pages of my site with thin content, and google today still show me that pages...

Any idea why happen this?
4:44 pm on Aug 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 13, 2003
posts:1281
votes: 0


Just because you nofollowed links into those pages, doesn't mean that Google won't index them. A nofollow is just a signal that you don't trust the pages you're linking to and any pagerank that would go to them is just wasted. You can try blocking the pages with NOINDEX / NOCACHE in the HEAD section but even that doesn't work every time. If a page is in an excluded directory, or has HEAD section exclusions but has external links pointing to it, Google will still index it and even rank it. They'll follow your robots.txt instructions to a point, but if they think a document is popular enough they'll ignore your instruction.
5:40 pm on Aug 3, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 23, 2008
posts:294
votes: 0


Thanks SEOMike

I made a mistake when I wrote the Post. I must said NoIndex. The pages have noindex attribute now in the HEAD and some of them are indexed...
6:33 pm on Aug 3, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12735
votes: 159


First things first.

The stuff in the directory that's blocked by robots.txt - is it possible that got in before you blocked it? Because once you wall off that directory, Google goes no further. And it still won't keep them out of the index necessarily - they'll just show up as URLS only, without titles or meta descriptions.

Next - the URLs that you NOINDEXed - has Googlebot been back to pick up the new tag? Check the cache date on the ones you see in Google.

And finally - just because you delete the pages on your site doesn't mean Google will necessarily drop it from the index. I've had old pages hang around for over a year or more.

If you really really really don't want something in Google, you gotta password protect it (ultimate security) or NOINDEX it (pretty good security) or block it by robots.txt (maybe 50% security) and if it's already there, you gotta remove it yourself (via GWT) or wait till Google notices the NOINDEX or the 404. Which could be tomorrow or infinity or any time in between.

(And as an aside, I'd use the robots.txt testing tool inside GWT to make sure it's set up correctly. I can write robots.txt files in my sleep and I still check it periodically - just in case)
2:38 am on Aug 4, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 23, 2008
posts:294
votes: 0


@netmeg thank you.

about the directory, since I created it was always blocked using robots.txt.
5:34 am on Aug 4, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6595
votes: 338


And finally - just because you delete the pages on your site doesn't mean Google will necessarily drop it from the index. I've had old pages hang around for over a year or more.

I've had pages dead near 10 years still appear, even after G dropped them... each time they change the algos (Panda for example) all the old stuff comes back. Google (Bing, too, and Yahoo before that) never forget a URL they met... and keep testing it over and freakin' over.

Once on the web, always on the web (indexers). And as Walter Cronkite used to say "And that's the way it is..."
7:38 am on Aug 4, 2011 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10550
votes: 10


have you checked your server access logs to see if the noindexed urls have been crawled?
you can also check the cached version of your content to see if it shows the meta robots noindex unless you also use noarchive.

if you add a meta robots tag to a document in a directory that is excluded by robots.txt the url won't get crawled and the SE won't see the meta robots tag, so the url may become or remain indexed.

a 410 Gone status code response actually means "gone" as opposed to "not found" (404) and usually works better for removing content.
4:54 pm on Aug 4, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 13, 2003
posts:1281
votes: 0


And finally - just because you delete the pages on your site doesn't mean Google will necessarily drop it from the index. I've had old pages hang around for over a year or more.

a 410 Gone status code response actually means "gone" as opposed to "not found" (404) and usually works better for removing content.

Agreed.

I've had good success getting hundreds of pages removed from Google by simply adding some lines to .htaccess to make the server return a 410 for the removed pages. Google will eventually remove it. They'll count them as errors for a while in webmaster tools but I saw no ill effect in ranking from those "errors." In my tests 410 responses got the pages removed much faster than 404s.

You could move the pages you want blocked to a directory blocked by robots.txt and serve 410s for their previous location.
6:12 pm on Aug 4, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 23, 2008
posts:294
votes: 0


Thanks for the comments!

Watching GWMT

Can someone provide me an example of how to force the server to return 410 using htaccess?

Thank you
6:15 pm on Aug 4, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 13, 2003
posts:1281
votes: 0


It's real easy for single pages:

Redirect gone /DIRECTORY/PAGE.htm
6:16 pm on Aug 4, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 23, 2008
posts:294
votes: 0


Also another comment...

using the site command, sometimes I see that I have for example 570 pages and sometimes I have 427 ...

Seems that when I have 427 pages indexed my positions increase (remember that the pages that I deleted were thin content)
6:57 pm on Aug 4, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12735
votes: 159


I wouldn't go by the site command; it's severely messed up.
7:01 pm on Aug 4, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 23, 2008
posts:294
votes: 0


@netmeg what command or Tool you recommend to know the pages that are indexed?
4:03 pm on Aug 9, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 23, 2008
posts:294
votes: 0


Now google sometimes shows using the site command, more pages that were always blocked by robots.txt in a directory.

One doubt, is correct this nomenclature:

User-agent: *
Disallow:/subdirectory/ or I need to put an space like this

Disallow: /subdirectory/


Thanks
4:37 pm on Aug 9, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


The syntax that Google publishes includes the space - see [google.com...]

It probably works without the space, too - but why push it? If in doubt, test your robots.txt file with the tool they offer inside Webmaster Tools.
3:52 am on Sept 22, 2011 (gmt 0)

Junior Member

joined:Sept 10, 2011
posts:50
votes: 0


Its a nice sharing but not clear. Need to know more details
4:11 am on Sept 22, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Mike - ask a more specific question and we'll do our best.
6:50 am on Sept 22, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I deleted a number of pages in April and returned 410 for some and a 301 redirect for others. It has taken Google 4 months to remove all of the URLs from their index [site:example.com].

In WMT, so far about half of the URLs are gone. The number is dropping by a few dozen every few days. Only a very few URLs in the "internal links" list show as having internal links pointing at them; most URLs show "not available" for the link data. Once a URL is seen as "Gone" AND all the URLs that linked to IT are also seen as "Gone" Google continues looking for a month or two and then deletes the URL from the list.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members