Robots.txt not configured correctly

ahbappy250 · 发表于 2024-12-18 16:47:25

Another step that you need to take to index your website properly is to check how the robots.txt text file has been inserted.

The robots.txt file is a UTF-8 encoded text file, stored in the root directory, that contains site access or restriction directives intended for search engine bots. The basic syntax of a robots.txt is quite simple: you specify the name of a robot and an france email list action. The crawler is identified by the user agent, while actions (e.g. disallow) can be specified in the disallow.

Usually the file can be verified by typing

If there is a line like this:

User-agent: * Disallow:

This means you are discouraging search engines from crawling your entire site.

image.png
But, an incorrectly configured robots.txt file could also have a rule that prevents bots and spiders from crawling a particular page that you want to appear in search results. Here’s an example:

image.png
To solve the problem, you need to let the search bot crawl the pages of the site that need to be indexed and positioned.

4. Presence of Meta Tags that prevent indexing

Meta tags can also provide instructions to spiders on how to treat the contents of a particular page or website. The difference with robots.txt is that they are displayed on individual pages and do not simply provide a general instruction. Robots meta tags are often forgotten and can be insidious and harmful to the indexing of a site. An example of an instruction of this type: <meta name=“robots” content=“noindex”>, inserted in the <head> section, will prevent the indexing of the page. How do you check if they are present? When you are on the page, right-click anywhere and select “inspect element” . A tool will open with which you can check the presence of the portion of code on the page.

Attention! Crawlers also respect the X-Robots-Tag HTTP response header.

image.png
5. Duplication of resources

Everyone knows that Google loves content, but the content has to be unique . We know that this is very rare, especially in e-commerce (think for example of a product listing page sorted by price ascending, price descending and latest arrivals). If the pages of your website use the same content blocks, Google identifies those pages as fundamentally the same, which can result in Google indexing only one of the pages that displays the content.

Consequently, it is appropriate to indicate to the search engine which of the pages is the representative one, making sure that it becomes “canonical”, compared to the others, which will be “canonicalized”. Making a page “canonical” means transmitting to it all the authority acquired over time. The other pages, however, will not be indexed.

		自动登录	找回密码
密码			立即注册