Robots

 

Before we deal with robots it's important to understand the difference between crawling and indexing.

Crawling is an automatic process that simply follows a link and fetches the content of a website.
Indexing is the process of making sense of the pages that have been crawled.

The two should not be confused however as pages that have not been crawled can be indexed by other means.

You can control which pages crawlers follow through the Robots.txt file. This is stored in the root directory of the Joomla site and disallows search engines from crawling specific folders and files on your site. This is the file in full…


robotsUser-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/


By default this disallows search engines from crawling your images directory. But to help optimize your site's inclusion in "Google Image searches" you should allow access to at least the images/stories folder (where Joomla keeps your article's images). To do this, insert the following line before the Disallow: /images/ line...

Allow: /images/stories

It’s also good practice to use descriptive titles for the images in this folder.

Again, this just instructs the crawlers which links should be followed or not. To stop a page from being indexed you have to apply a "noindex" directive which can be easily set up on the article page (on the right hand side of the page under "Metadata Information").

Adding index instructs the indexers that this page should be indexed.
Adding noindex instructs the indexers that this page should not be indexed.
Adding follow instructs the crawlers that the links on the page should be followed.
Adding nofollow instructs the crawlers that the links on the page should not be followed.


Google also treats the index.php file separately from your home page, meaning test.com and test.com/index.php could be considered duplicate content (even though they are the same thing). If you are using SEF URLS you can remedy this by disallowing the indexing of any page beginning with index.php. To do this, insert the following line...

Dissallow: /index.php


Web Design & Development
07845 950063
This email address is being protected from spambots. You need JavaScript enabled to view it.