和顺纵横信息网

 找回密码
 立即注册
搜索
热搜: 活动 交友 discuz
查看: 299|回复: 0

Robots.txt not configured correctly

[复制链接]

20

主题

20

帖子

62

积分

注册会员

Rank: 2

积分
62
发表于 2024-12-18 16:47:25 | 显示全部楼层 |阅读模式
Another step that you need to take to index your website properly is to check how the robots.txt text file has been inserted.

The robots.txt file is a UTF-8 encoded text file, stored in the root directory, that contains site access or restriction directives intended for search engine bots. The basic syntax of a robots.txt is quite simple: you specify the name of a robot and an france email list action. The crawler is identified by the user agent, while actions (e.g. disallow) can be specified in the disallow.

Usually the file can be verified by typing

If there is a line like this:

User-agent: * Disallow:

This means you are discouraging search engines from crawling your entire site.

image.png
But, an incorrectly configured robots.txt file could also have a rule that prevents bots and spiders from crawling a particular page that you want to appear in search results. Here’s an example:

image.png
To solve the problem, you need to let the search bot crawl the pages of the site that need to be indexed and positioned.

4. Presence of Meta Tags that prevent indexing

Meta tags can also provide instructions to spiders on how to treat the contents of a particular page or website. The difference with robots.txt is that they are displayed on individual pages and do not simply provide a general instruction. Robots meta tags are often forgotten and can be insidious and harmful to the indexing of a site. An example of an instruction of this type: <meta name=“robots” content=“noindex”>, inserted in the <head> section, will prevent the indexing of the page. How do you check if they are present? When you are on the page, right-click anywhere and select “inspect element” . A tool will open with which you can check the presence of the portion of code on the page.

Attention! Crawlers also respect the X-Robots-Tag HTTP response header.




image.png
5. Duplication of resources

Everyone knows that Google loves content, but the content has to be unique . We know that this is very rare, especially in e-commerce (think for example of a product listing page sorted by price ascending, price descending and latest arrivals). If the pages of your website use the same content blocks, Google identifies those pages as fundamentally the same, which can result in Google indexing only one of the pages that displays the content.

Consequently, it is appropriate to indicate to the search engine which of the pages is the representative one, making sure that it becomes “canonical”, compared to the others, which will be “canonicalized”. Making a page “canonical” means transmitting to it all the authority acquired over time. The other pages, however, will not be indexed.

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|Archiver|手机版|小黑屋|和顺纵横信息网

GMT+8, 2025-7-15 19:16 , Processed in 0.058234 second(s), 18 queries .

Powered by Discuz! X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回复 返回顶部 返回列表