Robots.txt屏蔽方法
robots.txt方法只支持那些遵守robots.txt规则的蜘蛛,很多垃圾蜘蛛并不看这个,所以不是100%有用。
User-agent: SemrushBot Disallow: / User-agent: DotBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: MauiBot Disallow: / User-agent: MegaIndex.ru Disallow: / User-agent: BLEXBot Disallow: / User-agent: ZoominfoBot Disallow: / User-agent: ExtLinksBot Disallow: / User-agent: hubspot Disallow: / User-agent: leiki Disallow: / User-agent: webmeup Disallow: / User-agent: Googlebot Disallow: / User-agent: googlebot-image Disallow: / User-agent: googlebot-mobile Disallow: / User-agent: yahoo-mmcrawler Disallow: / User-agent: yahoo-blogs/v3.9 Disallow: / User-agent: Slurp Disallow: / User-agent: twiceler Disallow: / User-agent: AhrefsBot Disallow: / User-agent: psbot Disallow: / User-agent: YandexBot Disallow: /
nginx屏蔽方法
以下代码添加到Nginx配置文件内,判断蜘蛛的UA标识,真就返回403
#屏蔽垃圾蜘蛛 if ($http_user_agent ~* (SemrushBot|DotBot|MJ12bot|AhrefsBot|MauiBot|MegaIndex.ru|BLEXBot|ZoominfoBot|ExtLinksBot|hubspot|leiki|webmeup)) { return 403; }