A website is made up of a number of different pages, and it is the pages, not the website as a whole, that get indexed by search engines. When your website gets returned by search engines for a particular keyword phrase, it may not be your home page which is listed, but an internal page which contains the search user’s term. It is therefore very important to ensure that each of your web pages is search engine optimised as you want it to be correctly indexed and returned to search users.
Do all web pages need indexing?
Some websites may contain pages which you do not wish to be returned by the search engines, or you may require the inclusion of information which search engine crawlers could mistake for spam. This might include pages which your customers have to log into to access, information pages such as business terms and conditions or ‘about us’ profiles or pages of information which are duplicated elsewhere on the web, such as standard manufacturers’ text on e-commerce product listings. If you have web pages in your site which may harm its SEO, for example if the content could be mistaken for spam or is a duplication of other web content, then you may want to consider instructing the search engine crawlers to ignore some of your web pages
Instructing crawlers to disregard web pages
This instruction is produced in the form of a file, known as a robot.txt file, which is added into your website. Search engine crawlers can retrieve this file and follow the specific instructions on it – for example, you may wish to instruct them to ignore all links on a certain page, or you may wish for them to ignore the page completely. The crawlers may not follow your instructions, but in most cases they will. It helps them to navigate your site more easily as they can focus on the pages that do require indexing, and it is a useful tool in demonstrating that any unorthodox content is not spam and therefore should not affect your website’s ranking and relevance negatively.