Optimise your PDFs for SEO

By Doug Roberts

Even though PDF document are not a native web format Google will still index them and return them in the search results where they believe the document provide the best value for the searchers query.

The drawback is that someone clicking on a link to your PDF or a PDF in the search results will open the PDF directly without visiting your website.

This means that PDFs they can be less effective at driving traffic to your site.

Google provides more information about how their inclusion of PDF documents in search results here:
http://googlewebmastercentral.blogspot.co.uk/2011/09/pdfs-in-google-search-results.html

It’s possible to optimise your PDF documents to give them the best chance of performing in search.

  • Traffic to your PDF documents won’t show up in your Google Analytics
  • Your Query data in Webmaster Tools

You can check the top-pages report in Webmaster tools.You might be surprised at just how many search impressions, clicks and keywords your PDF documents are getting

[Include image from webmaster Tools]

[Inclide image of keywords pdf]

In fact, in this case, according to Webmaster Tools, the PDF in question is appearing in search for over 160 queries in the last month. Many of these are highly relevant to the market the site is targeting, but would be difficult to target with traditional web content.

Are your PDF documents accessible?

In order for Google to include your PDF documents in it’s search results you need to make sure that Google can find then on your website.

  • Are there links to your PDFs that Google can crawl. You’ll want to make sure that they’re not no-follow or using creatve ways to embed PDF content on the page that prevents them from being found.
  • Make sure that your robots.txt file isn’t preventing access to the folder that holds the PDF documents. This can easibly happen when PDFs are stored in areas on your site such as /file/download.

You can check that Google has found your PDF by doing a search on google for the following query:

site:_mysite.com_ filetype:pdf

You should see something like this:

[insert image here]

Make sure your PDF documents are text.

In order for your PDFs to appear in search results the content must be accessible – readable to search engines.

PDFs that are scanned can be treated as images. This means that there are no words that Google and other search engines can index.

Optimising your PDF Meta Data

Just like a normal page on your website it’s possible to provide search engines with additional data about the document, particularly a good Title and Description.

This is an example of how one of your PFD document appears in the search results:
As you can see, the title is a little odd and the description doesn’t read naturally and could be improved to provide better description of the document and provide a compelling reason why someone would want to read this report – (what’s the “What’s in it for me?”)

Google have said the following regarding how they determine the Title to use:

“We use two main elements to determine the title shown: the title metadata within the file, and the anchor text of links pointing to the PDF file. To give our algorithms a strong signal about the proper title to use, we recommend updating both.“

Therefore, it’s important to remember when you create links to your PDF documents to avoid generic anchor text like “download here” and use relevant descriptive phrases like.

Eg: Download the PDF version of What’s wrong with TTIP

Setting PDF Document Properties

You can modify the title and description of PDF documents just as you would do with html pages, but you’ll need to use your PDF creator (such as InDesign) to set these parameters.
Example of the properties set on one of your PDF documents (whats-wrong-with-ttip.pdf):

h2 Inbound links and link-equity

It’s hard to generalise, but there can be some reluctance to link directly to PDF documents due to the jarring experience of opening up a PDF reader in order to read the content.

This isn’t always the case. PDF documents that provide significant value you can earn highly topically relevant links.

Do PDFs pass PageRank or “link-equity”

There is some debate amongst SEOs about whether links in PDF documents pass link-equity or not.
This is what Google have to say on the question:

“Generally links in PDF files are treated similarly to links in HTML: they can pass PageRank and other indexing signals, and we may follow them after we have crawled the PDF file.”
Source: Google Webmaster Central – PDFs in Google search results

In a March 2010 interview, Matt Cutts of Google’s webspam team replied as follows when asked about PDF files:

“We absolutely do process PDF files. I am not going to talk about whether links in PDF files pass PageRank. But, a good way to think about PDFs is that they are kind of like Flash in that they aren’t a file format that’s inherent and native to the web, but they can be very useful.“
Source: Stone Temple Consulting – Matt Cutts Interviewed by Eric Enge

Some SEOs have taken this as a suggestion that PDF documents do not pass page rank.

Ensure you have links in your PDF document

Regardless of whether links in PDF documents pass link-equity, you should still consider adding links into your documents.

The best approach is to consider the reader first before any potential search engine benefit and add links that you feel would be beneficial to readers.

Provide links to information sources, additional resources, related content both on your own site and authoritative third party sites.

Calls to action

Depending on the purpose of the PDF document, there’s no reason why you shouldn’t consider adding explicit calls to action to your PDF documents. For example:

“Buy this book” – link from a book sample back to the product page. After all, the purpose of a preview sample is to help persuade people that they want to buy this book.

“Get the latest version of this document” – link back to landing page for the document that always features the most recent version.

This is particularly useful for “living” documents that are regularly updated. Displaying the version number and publication date in the footer can help too.

“Find out more about what we do” – If you’re promoting your brand/business then include links to your home page or campaign landing pages.

Provide an HTML version of the document

It’s not always possible to create a standard HTML version of the PDF document particularly where you need to recreate complicated tables or page layouts, but it can be worth doing considering.

For instance, not everyone likes to download or read PDF documents. This is especially true if they’re trying to access the information using a mobile device.

If this audience is important to you, then you may want to consider the accessibility of your content and prioritise an HTML version.

Canonicalise your alternative versions to the HTML page

If you are providing both an HTML and PDF version of the page then you can make sure that link-equity is consolidated to the HTML version by setting up a canonical header.
This will also help to ensure that the HTML version of the page is the version that appears in the search results.

Unlike a web page, it’s not possible to set the canonical version as meta data. Instead you’ll need to configure your web server to return the relevant header information when a visitor requests the page.

In order to do this, you’d add something like this to the server configuration or .htaccess file:

Header add Link ‘; rel=“canonical”’

More information on Canonical URLs is available here:
https://support.google.com/webmasters/answer/139066?hl=en&rd=1#6

Tracking Visitors From Your PDF Documents

Tracking visitors from your PDF documents will appear to be “direct” visitors, just like people who have entered your address into their browser or used a bookmark.
This makes it impossible to establish how many visitors your PDF documents are driving to the site and their value to your business.

Using Campaign Tagging

You can use Google Analytics campaign tagging on the links in your PDF documents so that you can identify the visitors that have come from your reports, just as your world for Adwords or an email campaign.
This allows you to add the following parameters to a link:

  • Campaign Source* [eg: pdf]
  • Campaign Medium* [eg: report|resource|guide]
  • Campaign Term (Unused)
  • Campaign Content (Unused)
  • Campaign name* – [eg: Name of your report/document]

For example, a link back to Air Pollution issues page from the London’s Unseen Killer report might look like this:

http://www.jeanlambertmep.org.uk/jeans-issues/london/air-pollution/?utm_source=pdf&utm_medium=report&utm_campaign=air-pollution-ondons-unseen-killer

Tips:

  • Stick to lower-case. This keeps all your tagging consistent.
  • Use “-”s instead of spaces in the report name and remove stop words. Eg: “My Report on the Use of Tagging in analytics” becomes: “analytics-tagging-report”

Using the filename can be a good idea (analytics-tagging-report.pdf) which can help you easily identify the document.

Google provides a simple URL builder tool to help you create these links:
https://support.google.com/analytics/answer/1033867

Comments

Leave a comment