Search Engine Friendly Joomla
Getting recognised by search engines is a major factor in running a website . This short guide will explain how to make your Joomla website Search Engine Friendly so that it has a better chance of being listed on Google, Yahoo & Bing.
URLs & htaccess | Titles & Headings | Metadata | Microdata | Robots
URLs, htaccess and redirects
As Joomla is a database driven system the URLs it creates are actually queries sent to the database to retrieve content, which is why they often appear like complex algorithms. Search engines (such as Google, Yahoo and Bing) find these just as difficult to read as we do, which is why it is recommended that you adopt SEF URLs instead.
Transforming this…
http://www.hyde-design.co.uk/index.php?option=com_content&task=blogsection&id=0&itemid=4
Into this…
http://www.hyde-design.co.uk/search-engine-optimization
Search engines prefer these URLs because they're easier to read and can also include valuable keywords. To access the settings click on Site > Global Configuration.
Tick Yes next to “Search Engine Friendly URLS” and Yes again next to “Use Apache mod_rewrite”.
Then locate the “htaccess.txt” file within the root directory of your joomla installation…
…and rename it to “.htaccess” (don’t forget the full stop at the beginning).
Once this is done the file may seem to disappear (depending on what level of access your FTP client is set to) but it has done the job and all URLs and internal links to URLs will be replaced with SEF alternatives.
Warning: If your site starts to display 404 errors and suddenly lose style, then either the htaccess file has not been renamed or you have a serious problem with your web-host (ie. They are using Windows servers instead of Linux). In which case resetting the above should restore the state back to normal.
The wording of each link is then determined by the alias you have used for each menu item, so to change the URL title you have to go to Menu Manager, select your menu, open your menu item and rename the alias.
.htaccess
The hyper text access file (.htaccess) has already been mentioned, it is the text file that you need to rename for the SEF URLs to work.
One important function of the .htaccess file that most people overlook is the use of redirected URLS. If you are updating an old HTML based site in a Joomla format then any links that were recognised by Google are likely to become dead links (unless of course you are using exactly the same links, including the “.html” suffix). To avoid this add the following line to the bottom of the .htaccess file.
Redirect 301 /oldpage.html http://www.hyde-design.co.uk/newpage
Replacing “oldpage.html” with the old address and “http://www.hyde-design.co.uk/newpage” with the new address. This will ensure that any bookmarked pages or google ranked URLS are not lost in the redesign process.
As mentioned earlier, search engines treat the index.php file separately from your home page, they also treat the http://domain.com address separately from the http://www.domain.com address. Including the following code at the top of your .htaccess file will redirect any non-www pages to the relevant www page (don’t forget to replace text in red to your domain name).
Options +FollowSymLinks
DirectoryIndex index.php
RewriteEngine On
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.hyde-design.co.uk/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
rewritecond %{http_host} ^hyde-design.co.uk [nc]
rewriterule ^(.*)$ http://www.hyde-design.co.uk/$1 [r=301,nc]
Another htaccess use is to "mask" urls. This allows you to display an URL using an alternative non-existent URL.
Eg. Adding the following line just after the "RewriteEngine On" line...
RewriteRule ^greetings$ index.php?option=com_content&view=article&id=46 [L]
... changes the ugly URL "index.php?option=com_content&view=article&id=46" with the shorter friendlier "greetings".
What's the difference between masking and redirecting? Redirecting pushes you from the URL you have typed in to a different URL (so there is one master URL), whereas Masking provides an alternative URL (there are still two URLs).
Titles & Headings
The single most important element of each webpage is the title. For standard articles this is pulled from the title used in the article.
Including relevant keywords in your title increases the chances of being recognised by search engines. Therefore something like “home” or “a story” is far less effective than “Using titles and headings in Joomla 1.5 websites”.
Titles should be unique and limited to 70 characters long (including spaces). If it exceeds 70 characters, then Google truncates the title to the first 64 characters. Avoiding words such as and, the, is, as, in etc. provides you with more space to use valuable key words. Just be careful it doesn’t become illegible.
Likewise a heading tagged in the Heading 1 format will be considered of great value (do not have more than one h1 tag per page though).
The other headers (Heading 2, Heading 3, Heading 4 etc) are also important keywords but they decrease in value.
Metadata
It's often scoffed at now but the traditional meta-description and meta-keywords tags can offer you some valuable SEO oomph!
The metadata tags are lines of code that appear in the <head> section of your website. It's not visible on the live site for general users unless they are deliberately viewing the source code. The importance of metadata has fluctuated over the years and although meta-data was originally intended to assist search engines with indexing sites, Google, Yahoo & Bing rely on their own complex algorithms instead.
Meta Keywords
By themselves the meta-keywords don't amount to anything valuable. The keywords are not visible on the live site so editors can easily take liberties, adding keywords that don't actually have relevance to the site. For this reason they are often deemed irrelevant but there are a few Joomla features that make use of the meta keywords in SEO beneifical ways...
- The HD-Meta-display module outputs the meta-keywords from an article into a module position (see the bottom of this page for an example). The value of the keywords increases considerably if they can be seen by the general user.
- The Related Article Module uses the individual keywords to create internal links to other articles that share at least one key word.
- The Banner Module can determine which banner to display based on these key words.
Meta Description
The meta-description is displayed in Google's search results if the search term is included within the text. Otherwise the description they use is generated themselves (often the first sentance on the webpage). However rather than stuffing the description with valuable keywords it is more beneficial to make the description sound enticing. Remember this snippet will be competing against similar ones in somebody's search engine results list.
In Joomla 2.5 both the Meta-Keywords & Meta-description tags are only displayed if you have entered them in (whereas in Joomla 1.5 the tags would be generated even if they were empty.)
Meta Generator
By default the meta-generator tag is displayed as "Joomla! - Open Source Content Management". There are many downsides to this...
- First of all it spells out that the site was built with Joomla, notifying potential hackers how the site was built and what version you are using.
- Furthermore search engines actually take notice of sites that contain generators belonging to out-of-date (or no longer supported) Content Management Systems.
However changing this meta-tag is a simple enough process. Read this to learn how the generator tag can be changed.
The Open Graph Protocol
A more recent form of meta data was introduced by Facebook. The Open Graph Protocol is the markup used by Facebook to map and categorise websites into it's own social search engine. The metadata includes standard information (such as page title, type, URL, description & site name) as well as location information (latitude, longitude, address, region and country) and contact details (email address, telephone number & fax number).
If you share a link on facebook, the page title, description and thumbnail image are all pulled from this metadata. If the page they try to share doesn't contain their metadata then Facebook will have a guess and sometimes offer a selection of thumbnail images to display (instead of the one chosen by you).
We've built a plug-in for you to embed Open Graph metadata into your Joomla site automatically.
With the growing popularity of social media it's important that your site is correctly represented in the Facebook community.
Microdata
Microdata is way of formatting the content of your website so that it is more easily digested by search engines. It works in the same way as applying a style to text but there is no actual visual change. So instead of making the selected text bold or appear in italics you state “this is a business address” or “this is the name of a product”.
Unlike metadata (which tries to define a whole page) microdata targets a specific section or a single word. In effect you’re marking up the content in a descriptive manner so that search engines have a better idea of what your content is actually about.
Rich Snippets
Google uses Microdata to create "Rich Snippets" - search engine results that are illustrated with thumbnail images, product ratings, reviews, breadcrumb trails or author profiles. These extra elements are determined by the content on your site and how they are defined by Microdata.
You can test the success of your microdata markup with Google's Rich Snippets testing tool.
An example Rich Snippet - note the reviews count and the breadcrumbs, both generated from Microdata.
Microformats & Schema
In order for the Microdata to work it has to conform to a standard body of rules. There are a number of standards available but one of the most notable was Microformats.
Microformats have been around for a while and have a variety of "formats" including hrecipe (which Google uses to power it's rich snippet recipe searches), hCard (which Twitter uses to mark up contact details) and hCalendar (which Facebook uses to mark up group events).
However, in 2011 Google, Yahoo and Bing all announced they would be backing a new form of Microdata known as Schema. Although search engines will support both standards of Microdata it would make your code fairly overpopulated so is probably more sensible to just use the one.
How it Works
Microdata acts in the same way as a CSS class. The whole element you are formatting (eg. your contact details) is wrapped in a <span> with a specific definition that declares which standard you are using. Then further <span>s are placed within to define specific properties (eg. your name, your address, your telephone number).
If your original text looked like this...
<p>Mr Jones, 24 Test Drive, London, 01234 567890</p>
Using the "hCard" specification in Microformats it would look like this...
<p><span class="vcard"><span class="fn">Mr Jones</span>, <span class="adr">24 Test Drive, London</span>, <span class="tel">01234 567890</span></span></p>
And using the "person" specification in Schema it would look like this...
<p><span itemscope itemtype="http://schema.org/Person"><span itemprop="name">Mr Jones</span>, <span itemprop="address">24 Test Drive, London</span>, <span itemprop="telephone">01234 567890</span></span></p>
The rendered code would be exactly the same in all three cases, but the two that include microdata have specified what the content is (somebody's contact details) and which parts of the content are the person's name, address and telephone number.
Robots
Before we deal with robots it's important to understand the difference between crawling and indexing.
Crawling is an automatic process that simply follows a link and fetches the content of a website.
Indexing is the process of making sense of the pages that have been crawled.
The two should not be confused however as pages that have not been crawled can be indexed by other means.
You can control which pages crawlers follow through the Robots.txt file. This is stored in the root directory of the Joomla site and disallows search engines from crawling specific folders and files on your site. This is the file in full…
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/
By default this disallows search engines from crawling your images directory. But to help optimize your site's inclusion in "Google Image searches" you should allow access to at least the images/stories folder (where Joomla keeps your article's images). To do this, insert the following line before the Disallow: /images/ line...
Allow: /images/stories
It’s also good practice to use descriptive titles for the images in this folder.
Again, this just instructs the crawlers which links should be followed or not. To stop a page from being indexed you have to apply a "noindex" directive which can be easily set up on the article page (on the right hand side of the page under "Metadata Information").
Adding index instructs the indexers that this page should be indexed.
Adding noindex instructs the indexers that this page should not be indexed.
Adding follow instructs the crawlers that the links on the page should be followed.
Adding nofollow instructs the crawlers that the links on the page should not be followed.
Google also treats the index.php file separately from your home page, meaning test.com and test.com/index.php could be considered duplicate content (even though they are the same thing). If you are using SEF URLS you can remedy this by disallowing the indexing of any page beginning with index.php. To do this, insert the following line...
Dissallow: /index.php