search engine optimization uk, california
search engine optimization, virginia search engine optimization
Defend Your Website From Google Duplicate Proxy
by Sophie White
There is a current and active way to knock a website out of Google's
search engine results. It's simple and effective. This information is
already in the public domain and the more people that know about it, the
more likelihood there is that Google will do something about it. This
article will tell you how it works, how to get a website knocked out of
the search engine rankings, but most importantly, how to defend your own
website from having it happen to you.
To understand this exploit, you must first understand about Google's
Duplicate Content filter. It's simply described thus: Google doesn't
want you to search for "blue widget" and have the top 10 search terms
returned copies of the same article on how great blue widgets are. They
want to give you ONE copy of the Great Blue Widget article, and 9 other
different results, just on the off chance that you've already read that
article and the other results are actually what you wanted.
To handle this, every time Google spiders and indexes a page, it checks
it to see if it's already got a page that is predominantly the same, a
duplicate page if you will. Exactly how Google works this out, nobody
knows exactly, but it is going to be a combination of some or all of:
page text length, page title, headings, keyword densities, checking
exactly copy sentence fragments etc. As a result of this duplicate
content filter, a whole industry has grown up around trying to get round
the filter, just search for "spin article".
Getting back to the story here, Google indexes a page and lets say it
fails it's duplicate content check, what does Google do? These days, it
dumps that duplicate page in Google's Supplemental Index. What, you
didn't know that Google have 2 indexes? Well they do: the main one, and
supplemental one. 2 things are important here: Google will always return
results from their Main index if they can; and they will only go to the
Supplemental index if they don't get enough joy from their main index.
What this means is that if your page is in the supplemental index, it's
almost certain that you will never show up in the Search Engine Ranking
Pages, unless there is next to no competition for the phrase that was
searched for.
This all seems pretty reasonable to me, so what's the problem? Well
there's another little step I haven't mentioned yet. What happens if
someone copies your page, let's say your homepage of your business
website, and when Google indexes that copy, it correctly determines that
it's a duplicate. Now Google knows about 2 pages that it knows are
duplicates, it has to decide which to dump in the supplemental index,
and which to keep in the main one. That's pretty obvious right? But how
does Google know which is the original and which is the copy? They
don't. Sure they have some clever algorithms to work it out, but even if
they are 99% accurate, that leaves a lot of problems for that 1% of
times they can get it wrong!
And this is the heart of the exploit, if someone copies your websites
homepage say, and manages to convince Google that *their* page is the
original, your homepage will get tossed into the supplemental index, not
to see the light of day in the Search Engine Ranking Pages for a while.
In case I'm not being clear enough, that's bad! But wait, it gets worse:
It's fair to say that in the case of a person physically copying your
page and hosting it, you can often get them to take it down through the
use of copyright lawyers, and cease and desist letters to ISP's and the
like, with a quick "Reinclusion Request" to Google. But recently there's
a new threat that's a whole lot harder to stop: the use of publicly
accessible Proxy websites. (If you don't know what a Proxy is, it's
basically a way of making the web run faster by caching content more
local to your internet destination. In principle they are generally a
good thing.)
There are many such web proxies out there, and I won't list any here,
however I will describe the process: they send out spiders (much like
Google's) and they spider your page, take your content, then they host a
copy of your website on their proxy site, nominally so that when their
users request your page, they can serve up their local copy quickly
rather than having to retrieve if off your server. The big issue is that
Google can sometimes decide that the proxy copy of your web page is the
original, and yours is not.
Worse again, there's some evidence that people are deliberately and
maliciously using proxy servers to cache copies of web pages, then using
normal (white and black hat) Search Engine Optimization (SEO) techniques
to make those proxy pages rank in the search engine, increasing the
likelihood that your legitimate page will be the one dumped by the
search engines' duplicate content filters. Danger Will Robinson!
Even worse still, some of the proxy spiders actively spoof their origins
so that you don't realize that it's a spider from a proxy, as they
pretend to be a Googlebot for example, or from Yahoo. This is why the
major search engines actively publish guidelines on how to identify and
validate their own spiders. Now for the big question, how can you defend
against this? There are several possible solutions, depending on you web
hosting technology and technical competence.
Option 1 - If you are running Apache and PHP on your server, you can set
the webhost up to check for search engine spiders that purport to be
from the main search engines, and using php and the .htaccess file, you
can block proxies from other sources. However this only works for
proxies that are playing by the rules and identifying themselves
correctly.
Option 2 - If you are using MS Windows and IIS on your server, or if you
are on a shared hosting solution that doesn't give you the ability to do
anything clever, it's an awful lot harder and you should take the advice
of a professional on how to defend yourself from this kind of attack.
Option 3 - This is current the best solution available, and applies if
you are running a PHP or ASP based website: you set ALL pages robot meta
tags to "noindex" and "nofollow", then you implement a PHP or ASP script
on each page that checks for valid spiders from the major search
engines, and if so, resets the robot meta tags to index and follow. The
important distinction here is that it's easier to validate a real
spider, and to discount a spider that's trying to spoof you, because the
major search engines publish processes and procedures to do this,
including IP lookups and the like.
So, stay aware, stay knowledgeable, and stay protected. And if you see
that you've suddenly been dumped from the Search Engine Rankings Pages,
now you might know why, how and what to do about it.
search engine optimization uk, california
search engine optimization, virginia search engine optimization
Looking for search engine optimization uk, california
search engine optimization, virginia search engine optimization? Look to
SEO ONE, inc for all your SEO needs.
SEO ONE Successes in 2006
Jewelry Store -
4 #1 positions in MSN
$1.2 million in new revenue
National Restaurant Chain -
2 #1 keywords on Yahoo
over $2.4 million in sales
Automotive Supply Company - First and second page ranking
Over $3 million in sales
Restaurant Chain -
Two #1 positions in Google and over three million in new revenue
Medical Service Company -
Three # 1 positions in MSN and Google and 2.7 million in new revenue
We offer ...
Optimal Search Engine Placement at extremely competitive pricing.
Marketing and PPC search engine optimization.
Increased profitability, by increasing your visibility.
Direct Consulting with Senior Search Engine Optimization Specialist.
Copyright 2006
SEO ONE, inc All Rights
Reserved
4100 Spring Valley Rd. Suite 203 - Dallas, Texas - 75244
Toll Free: 866-886-4608 | Phone : 972-755-4592 | Fax : 866-409-7978