Tuesday, April 17, 2012

4 Indexing a site

Before a site appears in search results, a search engine must index it. An indexed site will have been visited and analyzed by a search robot with relevant information saved in the search engine database. If a page is present in the search engine index, it can be displayed in search results otherwise, the search engine cannot know anything about it and it cannot display information from the page..

   Most average sized sites (with dozens to hundreds of pages) are usually indexed correctly by search engines. However, you should remember the following points when constructing your site. There are two ways to allow a search engine to learn about a new site:

   - Submit the address of the site manually using a form associated with the search engine, if available. In this case, you are the one who informs the search engine about the new site and its address goes into the queue for indexing. Only the main page of the site needs to be added, the search robot will find the rest of pages by following links.

   - Let the search robot find the site on its own. If there is at least one inbound link to your resource from other indexed resources, the search robot will soon visit and index your site. In most cases, this method is recommended. Get some inbound links to your site and just wait until the robot visits it. This may actually be quicker than manually adding it to the submission queue. Indexing a site typically takes from a few days to two weeks depending on the search engine. The Google search engine is the quickest of the bunch. 


Try to make your site friendly to search robots by following these rules:

   - Try to make any page of your site reachable from the main page in not more than three mouse clicks. If the structure of the site does not allow you to do this, create a so-called site map that will allow this rule to be observed.

   - Do not make common mistakes. Session identifiers make indexing more difficult. If you use script navigation, make sure you duplicate these links with regular ones because search engines cannot read scripts (see more details about these and other mistakes in section 2.3).

   - Remember that search engines index no more than the first 100-200 KB of text on a page. Hence, the following rule – do not use pages with text larger than 100 KB if you want them to be indexed completely.

   You can manage the behavior of search robots using the file robots.txt. This file allows you to explicitly permit or forbid them to index particular pages on your site.

   The databases of search engines are constantly being updated; records in them may change, disappear and reappear. That is why the number of indexed pages on your site may sometimes vary. One of the most common reasons for a page to disappear from indexes is server unavailability. This means that the search robot could not access it at the time it was attempting to index the site. After the server is restarted, the site should eventually reappear in the index.

   You should note that the more inbound links your site has, the more quickly it gets re-indexed. You can track the process of indexing your site by analyzing server log files where all visits of search robots are logged. We will give details of seo software that allows you to track such visits in a later section.

No comments:

Post a Comment