What EXACTLY Does Google Like in Links?
by Jim Gilbert (Position Concepts) and Tom Dahm (Bridgepose)

After you read this article you will understand this update:
Oct 2014: Things have changed a GREAT deal -- Google Definitely favors themed links. It's the definition of "themed" that is tricky. Not only does the page from which your have a link comes from matters -- so does the entire site on which that page resides! It get's even more complicated, but that is the basics.

This article is the result of a three-month analysis effort originally started to find an answer to what we thought was a simple question: Has Google moved to favoring "themed" (on topic) links?

It turns out, that in trying to "accurately" answer our original question (or hypothesis) that many other questions were posed and answered along the way. What we report here is not "estimates", "guesses", "observations" or "swags" -- they are results from sound and in-depth statistical analysis.

The Data:
For basic background and brevity, we have intentionally kept this section short. Many of our data sources and variables are not listed here. This short list is intended to provide you with a simple summary of the type of data that had to be gathered to perform this type of work. The vast majority of data is related only to linking, since that was the focus of our analysis -- on-page criteria was not part of this effort.

Many thousands of pages covering various topics were selected for analysis. And EVERY available characteristic of EVERY inbound link to each of those pages was analyzed. The data gathering itself was a major undertaking -- even with in-house custom built tools.

From the ranking pages analyzed, a few of the characteristics gathered were: PR, inbound links, outbound links, links from same class C block, page title, page URL and more.

From the pages that linked to the ranking pages, a few of the characteristics gathered were: PR, inbound links, outbound links, page title, URL, link text used, unique linking domains involved, and more.

To answer certain hypotheses about Google's potential use of "topic" or "themed" rankings, we also had to create a few valid variables associated with topics and themes. Sorry, but we have chosen not to cover the detail of this themeing effort. We will say that we tested various themeing methods and settled on an approach that, from all statistical work, appeared appropriate and as accurate as it could be made.

The Statistics:
Having a mathematics and statistics degree (and access to some very good Ph.D. statisticians) certainly helped. All data gathering and statistical selection was set up to be robust, comprehensive and as accurate as possible -- Simple correlation and regression analysis was proved to be inaccurate and unacceptable!

After considerable testing, we decided that the most significant and reliable statistical process was Logistics Stepwise Regression from SAS Institute's Statistical Analysis System . With this process we did not need to attempt a specific prediction of ranking. Nor were we constrained by the simplistic "linear" nature inherent in the more limited statistical applications such as spreadsheets. Logistics allowed us to analyze results in a Top 10 or Not Top 10 fashion. Simply put, the search engine ranking algorithms are too complicated to reverse engineer and predict any specific rankings. But, analyses turn out to be much more reliable and robust when you are just trying to predict whether or not a page can achieve a Top 10 ranking.

All "want-to-be" mathematicians and statisticians should be forewarned: Google's ranking algorithm is VERY complicated and we do not have access to all the variables Google uses in its' algorithm. So, using simple statistics (such as correlation, regression and averages) to find reliable predictors is a total waste of time! 

To accomplish the type of analysis and hypothesis testing we performed, the Logistics process and top-10 versus non-top-10 approach is much more reliable than trying to locate the exact set and worth of the variables capable of predicting any exact rankings.

Most of our findings parallel or confirm common optimization knowledge, so - if you are experienced - you are unlikely to find anything shocking or of great value in this section. Remember, this effort was originally begun in hopes of finding an answer to the specific question: Has Google moved to favoring "themed" (on topic) links? These preliminary findings are basically confirmations to our original beliefs and answers to some questions that that came up during the analysis.

Note you will NOT see the PR variable mentioned here as being important!  PR or PageRank is a result statistic -- it is created primarily by linking structures and linking quantity. Furthermore, the PR that we see IS NOT the same PR that Google uses in it's algorithm -- it is only a discrete (0 to 10) visual representation of a much more comprehensive ranking scale. The PR we see in the Google Toolbar is not really important! What is important is the linking information that Google uses to build the "actual" PR! This is not just our opinion -- our statistical work proved (with a very high degree of significance) that PR has an extremely high collinearity with other linking variables and their characteristics.

So as not to stir argument, we will rephrase our statement regarding PR. Google's PR (the one we can't see) is hugely important to Google in it's ranking algorithms. The PR we can see is not important (especially in this analysis), because we have more specific and better visibility to the actual linking information that builds PR.

The No Surprise Findings: 

  • The quantity of spiderable links to your site's pages IS important. No surprise here.

  • If the page that links to to your site's pages has too many "outbound" links, it works against you. The "too many" number is not fixed, so we cannot tell you what it is.

The Surprise Findings:
The pages that you get to link to your site's pages have certain characteristics and some of these characteristics are VERY important to your resulting rankings.

  • Google definitely favors pages that link to your site's pages to have many inbound links. Several additional statistical analyses of this showed over and over again that "link pages with many inbound links" are much, much more important than just the "PR of the linking page"!  The statistical measures of significance on this single characteristic were extremely high. No other characteristic showed significance even close to this level.

  • Has Google moved to favoring "themed" (on topic) links? 
    We don't know for sure! 
    Obviously, it is somewhat disappointing to admit this, especially considering all the work that went into this research. However, there were some positive "themeing" related results that were discovered in this work.

  • We don't know for sure?  We still believe that Google may favor themed links. And if they don't today, they are very likely to in the future. What the statistics did show was this: If Google is classifying links as themed are not themed, Google is not using the linking page <title> tag or the pointing linktext to do so. Therefore, if Google is favoring themed links, it is using or working on a much more robust means of doing so.

  •  The only link themeing that proved statistically significant was themed URLs. It has long been believed that a URL containing the keywords one is targeting is a benefit. Our work did show that, at least at a minimum statistically acceptable level, this is true.


