September 2004

Issue Nine
Page Three
Planet Ocean Communications

Is Google Favoring Themed Links?
by Jim Gilbert and Tom Dahm

After you read this article you will understand this update:
Oct 2014: Things have changed a GREAT deal -- Google Definitely favors themed links. It's the definition of "themed" that is tricky. Not only does the page from which your have a link comes from matters -- so does the entire site on which that page resides! It get's even more complicated, but that is the basics.


Lately there's been much discussion and speculation about page themeing and link themeing, along with assumptions about Google's penchant for them. That got us wondering...

Has Google shifted toward favoring themed links?

Initially, we thought it was a simple question that could be definitively answered with a statistical study. However, as you will see, our original question led to other questions and answers over the course of our three-month study.

Themeing explained

Let's start with a brief explanation of themeing. The theme of your site refers to your site's primary topic. For example, you might have a site for a business that runs whale watching tours. If you want to rank highly in Google for the keyphrase whale watching, then your site should have a laser-beam focus on the topic of whale watching.

Effective themeing involves excluding as much content as possible that might distract from your site's primary theme. If you were to add a section on dolphin watching, you would be diluting your site's whale watching theme – and this could have a negative effect on your SE positioning for the phrase whale watching.

Granted, it's a rare website that deals with one topic to the exclusion of all others. Ideally, however, if your site offers multiple products they should either be closely related and grouped together or else separated as much as possible whenever that are not closely related.

Using this technique, your site will more easily achieve an identifiable theme, and each page within your site will have its own very specific theme: Whale watching pages are exclusively focused on whale watching; dolphin watching pages are exclusively focused on dolphin watching; and theme dilution is thereby minimized.

Link themes

Link themes are built on the same logic. The theme of a link is defined by the keywords you choose to use in the text of a link (aka, the anchor text). So, if you link to one of your whale watching pages with the words whale watching in the anchor text, you are theoretically reinforcing that page's whale watching theme.

The common belief is that Google places a lot of weight on the keywords found in anchor text and themeing your links is one of the most effective ways to theme your pages. The question we sought to answer through our research was, is this actually true? ...and, if so, just how valuable does Google consider these themed links to be?

Bear in mind as you read that our study was based on solid research — no estimates, guesses, observations, or swags — only results from sound and in-depth statistical analysis.

The Data:

In the spirit of brevity, we're intentionally keeping this section short. Many of our data sources and variables are not listed here. This short list is intended to provide you with a simple summary of the type of data that had to be gathered to perform this type of work. The vast majority of data is related only to linking, since that was the focus of our analysis. On-page criteria was not part of this effort.

We selected many thousands of pages covering various topics for our project and analyzed every available characteristic of every inbound link to each of those pages. The data gathering itself was a major undertaking that utilized in-house custom-built tools.

From the ranking pages analyzed, a few of the characteristics we focused on were: PageRank, nbound links, outbound links, links from same class C block, page title, page URL, and more.

From the pages that linked to the ranking pages, a few of the characteristics we gathered were: PageRank, inbound links, outbound links, page title, URL, link text used, unique linking domains involved, and more.

To answer certain hypotheses about Google's potential use of topic or themed rankings, we also had to create a few valid variables associated with topics and themes. To avoid complicated explanation, suffice it to say that we tested various themeing methods and settled on an approach that, from all statistical work, appeared appropriate and as accurate as possible.

All wannabe mathematicians
and statisticians take note:
Google's ranking algorithm is VERY complicated and we do not have access to all the variables Google uses. So, using simple statistics — such as correlation, regression and averages — to find reliable predictors is a total waste of time!

To accomplish the type of analysis and hypothesis testing we performed, the Logistics process and Top-10 versus Non-Top-10 approach was much more reliable than trying to locate the exact set and worth of the variables capable of predicting any exact rankings.

The Statistics:

Having a mathematics and statistics degree (as well as access to some very good Ph.D. statisticians) certainly helped. We set up all of our data gathering and statistical selection to be robust, comprehensive, and as accurate as possible – we decided that simple correlation and regression analysis was inaccurate and unacceptable!

After considerable testing, we found that the most significant and reliable statistical process was Logistics Stepwise Regression from SAS Institute's Statistical Analysis System. With this process we didn't need to attempt a specific prediction of ranking. Nor were we constrained by the simplistic "linear" nature inherent in the more limited statistical applications such as spreadsheets. Logistics allowed us to analyze results in a Top 10 or Not Top 10 fashion.

Simply put, the search engine ranking algorithms are too complicated to reverse engineer with the hope of predicting any specific rankings. However, analyses turn out to be much more reliable and useful when just trying to predict whether or not a page can achieve a Top 10 ranking.

The Unimportance Of PageRank Within Our Findings:

Not surprisingly, most of our findings parallel or confirm common SEO (search engine optimization) beliefs. If you're an experienced SEO you are likely to find confirmation that what you have been doing is on the right track. Remember, however, that this effort was originally begun in hopes of finding an answer to the specific question:

Has Google shifted toward favoring themed links?

So, it should be expected that within our preliminary findings we've confirmed some of the common SEO beliefs as well as learned some answers to additional questions that became apparent during the analysis.

It should be noted, however, we do not reference the PageRank variable as anything of importance! PageRank is a result metric – it is the byproduct of linking structures and linking quantity. PageRank is the effect, links are the cause.

Furthermore, the so-called PageRank (PR) that we see listed in the Google Toolbar is not the same PR that Google uses in its algorithm – the Toolbar version is only a discrete (0 to 10) visual representation of a much more comprehensive ranking scale. The PR we see in the Toolbar is not important. What is important is the linking information that Google uses to build the "actual" PR.

Of course, this is not just our opinion – our statistical work proved with a very high degree of certainty that PR has an extremely high collinearity with other linking variables and their characteristics.

To clarify, and to avoid stirring argument, let's expand our statement regarding PR: Google's PR – the one we can't see – is hugely important to Google's ranking algorithm. However the PR we can see is not important (especially in this analysis), because we have more specific and better visibility to the actual linking information that builds PR.

The Findings:

  • The quantity of spiderable links to your site's pages IS important. We now know for a fact that the more incoming links you have, the better (no surprise).

  • If an incoming link page (the page that links to your site) has too many outbound links, it works against you. Although the "too many" number is not fixed, making it hard to specifically pin down, think links pages. The more outbound links a page has, the less help a link from that page will give you in the rankings (again, no surprise).

The pages that link to your site's pages have certain characteristics and some of those characteristics very much affect your rankings!

  • Google unquestionably favors pages more whenever the incoming link pages themselves have many inbound links. In other words, when searching for link partners, you want links from pages that have many inbound links themselves.

    If this sounds confusing, perhaps the diagram below will help clarify:

    It's the green page that benefits most due to the fact that its linking partner (the very popular yellow page) has many inbound links. The more incoming links the yellow page has, the better the ranking benefit to the green page. The page clusters in the diagram above show how one page (green) might have a PR=4 yet rank higher than a page (red) with a PR=5. The diagram above also shows how a PR=5 page can be a much more valuable linking page than a PR=6.

    Several additional statistical analyses of this showed over and over again that link pages with many inbound links are much, much more important than just the PR of the linking page. The statistical measures of significance on this single characteristic were extremely high. No other characteristic showed significance even close to this level.

The Question Remains...

Has Google shifted toward favoring themed (on topic) links?  Based on purely statistical analysis we can only say...

We don't know for sure!

However, there are a couple of reasonable themeing speculations that we feel safe inferring from our work. We believe that Google may favor themed links. And if they don't today, they are very likely to in the future. What the statistics did show is:

  • If Google is classifying links as themed, they are not using the linking page <title> tag or the anchor link-text to do so. Therefore, if Google is favoring themed links, they are using (or working on) a much more complicated means of doing so.

  • The only link-theme-scenario that proved statistically significant was related to themed URLs. It has long been believed that placing targeted keywords within the URL leads to ranking benefits. At least at a minimum statistically acceptable level, our work showed this to be partially true.


While it's true that we cannot statistically prove that Google's page ranking system is responding to themed links, we can say that we found no evidence that themed links would ever hurt your efforts. That, combined with the fact that people respond to links that appear to be relevant, is reason enough to focus some of your efforts on obtaining themed links.

However, even more important is the finding that links from pages with many incoming links offer much more of a relevance boost than any other factor. That's why, based on our results, we'd recommend that your whale watching page (for example) focus on getting just a few links from popular pages on sites like National Geographic than getting, say, 50 links from various theme-related sites having much lesser incoming link popularity.

Nothing beats the facts!
Jim Gilbert & Tom Dahm
Jim Gilbert & Tom Dahm

Jim Gilbert is President of Position Concepts, a specialty service search engine marketing firm providing SEM, SEO, PPC, and research services. Jim has performed research and search engine optimization since the inception of public access Internet search engines in 1993. Before his involvement with the Internet, Jim spent twelve years providing analysis support in statistics, simulation, and numerical analysis for several fortune 500 companies.

Tom Dahm is President of BridgePose Search Engine Marketing, an SEO contractor providing services to some of the industry's top firms. He is the past founder of NetMechanic, an early leader in web site optimization tools, and has written widely on search engine optimization, load time optimization, and web accessibility.


© Copyright 1997-2004 Planet Ocean Communications, Inc.
Planet Ocean ® is a registered trademark of Planet Ocean Communications, Inc.