© 2004 Position Concepts. All
DO NOT Re-publish in ANY form without Permission from the Authors
What EXACTLY Does Google
Like in Links?
by Jim Gilbert (Position
Concepts) and Tom Dahm (Bridgepose)
After you read
this article you will understand this update:
Oct 2014: Things have changed
a GREAT deal -- Google Definitely favors themed links. It's the definition
of "themed" that is tricky. Not only does the page from which your have a
link comes from matters -- so does the entire site on which that page
resides! It get's even more complicated, but that is the basics.
This article is the result of a
three-month analysis effort originally started to find an answer to what we
thought was a simple question: Has Google moved to favoring
"themed" (on topic) links?
It turns out, that in trying to
"accurately" answer our original question (or hypothesis) that
many other questions were posed and answered along the way. What we report here is
not "estimates", "guesses", "observations"
or "swags" -- they are results from sound and in-depth
For basic background and brevity,
we have intentionally kept this section short. Many of our data sources
and variables are not listed here. This short list is intended to provide
you with a simple summary of the type of data that had to be gathered
to perform this type of work. The vast majority of data is related only to
linking, since that was the focus of our analysis -- on-page criteria was
not part of this effort.
Many thousands of pages covering various
topics were selected for analysis. And EVERY available characteristic of
EVERY inbound link to each of those pages was analyzed. The data gathering
itself was a major undertaking -- even with in-house custom built tools.
From the ranking pages analyzed, a few of
the characteristics gathered were: PR, inbound links, outbound links,
links from same class C block, page title, page URL and more.
From the pages that linked to the ranking
pages, a few of the characteristics gathered were: PR, inbound links,
outbound links, page title, URL, link text used, unique linking domains
involved, and more.
To answer certain hypotheses about Google's
potential use of "topic" or "themed" rankings, we also
had to create a few valid variables associated with topics and themes.
Sorry, but we have chosen not to cover the detail of this themeing effort.
We will say that we tested various themeing methods and settled on an
approach that, from all statistical work, appeared appropriate and as
accurate as it could be made.
Having a mathematics and
statistics degree (and access to some very good Ph.D. statisticians)
certainly helped. All data gathering and statistical selection was set up to be robust,
comprehensive and as accurate as possible -- Simple
correlation and regression analysis was proved to be inaccurate and unacceptable!
After considerable testing, we decided that
the most significant and reliable statistical process was Logistics
Stepwise Regression from SAS Institute's Statistical Analysis System . With this process we did not need to attempt
prediction of ranking. Nor were we
constrained by the simplistic "linear" nature inherent in the
more limited statistical applications such as spreadsheets. Logistics
allowed us to analyze results in a Top 10 or Not Top 10
fashion. Simply put, the search engine ranking algorithms are too
complicated to reverse engineer and predict any specific rankings. But, analyses turn out to be much more reliable and robust when you are
just trying to predict whether or not a page can achieve a Top 10 ranking.
|All "want-to-be" mathematicians and statisticians
should be forewarned: Google's ranking algorithm is VERY
complicated and we do not have access to all the variables Google uses in its'
algorithm. So, using simple statistics (such as correlation,
regression and averages) to find reliable predictors is a total waste of
To accomplish the type
of analysis and hypothesis testing we performed, the Logistics process
and top-10 versus non-top-10 approach is much more reliable than trying to locate the exact set
and worth of the variables capable of predicting any exact rankings.
Most of our findings
parallel or confirm common optimization knowledge, so - if you are
experienced - you are unlikely to find anything shocking or of great value
in this section. Remember, this effort was originally begun in hopes of finding an answer to
the specific question: Has Google moved to favoring
"themed" (on topic) links?
These preliminary findings are basically confirmations to our original
beliefs and answers to some questions that that came up during the
will NOT see the PR variable mentioned here as being important! PR
or PageRank is a result statistic -- it is created primarily by linking
structures and linking quantity. Furthermore, the PR that we see IS NOT
the same PR that Google uses in it's algorithm -- it is only a discrete (0
to 10) visual representation of a much more comprehensive ranking scale.
The PR we see in the Google Toolbar is not really important! What
is important is the linking information that Google uses to build the
"actual" PR! This is not just our opinion -- our statistical
work proved (with a very high degree of significance) that PR has an extremely
high collinearity with other linking variables and their characteristics.
So as not to stir argument, we
will rephrase our statement regarding PR. Google's PR (the one we can't
see) is hugely important to Google in it's ranking algorithms. The PR we
can see is not important (especially in this analysis), because we have
more specific and better visibility to the actual linking information that
The No Surprise
The quantity of spiderable
links to your site's pages IS important. No surprise here.
If the page that links to
to your site's pages has too many "outbound" links, it works
against you. The "too many" number is not fixed, so we
cannot tell you what it is.
The Surprise Findings:
pages that you get to link to your site's pages have certain
characteristics and some of these characteristics are VERY important
to your resulting rankings.
Google definitely favors
pages that link to your site's pages to have many inbound links.
Several additional statistical analyses of this showed over and over
again that "link pages with many inbound links" are much,
much more important than just the "PR of the linking
page"! The statistical measures of
significance on this single characteristic were extremely high. No
other characteristic showed significance even close to this level.
Has Google moved to favoring
"themed" (on topic) links?
We don't know for sure!
Obviously, it is somewhat disappointing to admit this, especially
considering all the work that went into this research. However, there were some
positive "themeing" related results that were discovered in
don't know for sure? We still believe that Google
may favor themed links. And if they don't today, they are very
likely to in the future. What the statistics did show was this:
If Google is classifying links as
themed are not themed, Google is not using the linking page
<title> tag or the pointing linktext to do so. Therefore, if
Google is favoring themed links, it is using or working on a much more
robust means of doing so.
The only link themeing that
proved statistically significant was themed URLs. It has long been
believed that a URL containing the keywords one is targeting is a
benefit. Our work did show that, at least at a minimum statistically
acceptable level, this is true.
4416 Maize Dr. Plano (Dallas), TX 75093