Robots.txt
Robots.txt parser
Installation
composer require innmind/robots-txt
Usage
use Innmind\RobotsTxt\Parser;
use Innmind\OperatingSystem\Factory;
use Innmind\Url\Url;
$os = Factory::build();
$parse = Parser::of(
$os->remote()->http(),
'My user agent',
);
$robots = $parse(Url::of('https://github.com/robots.txt'))->match(
static fn($robots) => $robots,
static fn() => throw new \RuntimeException('robots.txt not found'),
);
$robots->disallows('My user agent', Url::of('/humans.txt')); //false
$robots->disallows('My user agent', Url::of('/any/other/url')); //true
Note: Here only the path /humans.txt
is allowed because by default github disallows any user agent to crawl there website except for this file.