SEO - v1.0.0

Robots.txt

ArtisanPack UI SEO provides dynamic robots.txt generation with support for global rules, bot-specific directives, and automatic sitemap inclusion.

Overview

The robots.txt file tells search engine crawlers which pages they can or cannot crawl. The package generates this file dynamically based on your configuration.

Configuration

// In config/seo.php
'robots' => [
    'enabled' => true,
    'global' => [
        'disallow' => ['/admin', '/api', '/private'],
        'allow' => ['/api/public'],
    ],
    'bots' => [
        'GPTBot' => ['disallow' => ['/']],
        'CCBot' => ['disallow' => ['/']],
        'Googlebot' => ['allow' => ['/']],
    ],
    'crawl_delay' => null,
    'sitemaps' => true,
    'host' => null,
],
Option Description
enabled Enable dynamic robots.txt
global Rules for all bots
bots Bot-specific rules
crawl_delay Delay between requests (seconds)
sitemaps Auto-include sitemap URLs
host Host directive (for Yandex)

Serving Robots.txt

Via Routes

// In config/seo.php
'routes' => [
    'robots' => true,
],

// Registers: GET /robots.txt

Via Controller

use ArtisanPackUI\Seo\Services\RobotsService;

class RobotsController extends Controller
{
    public function __invoke(RobotsService $robots)
    {
        return response($robots->generate())
            ->header('Content-Type', 'text/plain');
    }
}

Generated Output

# Robots.txt generated by ArtisanPack UI SEO

User-agent: *
Disallow: /admin
Disallow: /api
Disallow: /private
Allow: /api/public

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Googlebot
Allow: /

Sitemap: https://example.com/sitemap.xml

Programmatic Management

Using the Service

use ArtisanPackUI\Seo\Services\RobotsService;

$robotsService = app('seo.robots');

// Generate content
$content = $robotsService->generate();

// Add disallow rule (default user-agent: *)
$robotsService->disallow('/secret');

// Add allow rule
$robotsService->allow('/public');

// Add bot-specific rules
$robotsService->disallow('/archive', 'Bingbot');
$robotsService->crawlDelay(5, 'Bingbot');

// Add sitemap
$robotsService->addSitemap('https://example.com/sitemap.xml');

// Set crawl delay for all bots
$robotsService->crawlDelay(10);

// Set host directive
$robotsService->setHost('example.com');

// Get rules for a specific user-agent
$rules = $robotsService->getRulesForUserAgent('Googlebot');

// Get all user agents with rules
$userAgents = $robotsService->getUserAgents();

// Clear all rules and start fresh
$robotsService->clearRules();

// Remove rules for a specific user-agent
$robotsService->removeUserAgent('GPTBot');

Using Helper

// Get robots.txt content
$content = seoRobotsTxt();

Bot-Specific Rules

Common Bots

Bot Description
Googlebot Google's crawler
Bingbot Microsoft Bing's crawler
Slurp Yahoo's crawler
DuckDuckBot DuckDuckGo's crawler
Baiduspider Baidu's crawler
YandexBot Yandex's crawler

AI/LLM Bots

Bot Description
GPTBot OpenAI's crawler
ChatGPT-User ChatGPT browsing
CCBot Common Crawl
anthropic-ai Anthropic's crawler
Claude-Web Claude browsing

Blocking AI Crawlers

'bots' => [
    'GPTBot' => ['disallow' => ['/']],
    'ChatGPT-User' => ['disallow' => ['/']],
    'CCBot' => ['disallow' => ['/']],
    'anthropic-ai' => ['disallow' => ['/']],
],

Advanced Configuration

Crawl Delay

'robots' => [
    // Global crawl delay for all bots
    'crawl_delay' => 10,

    // Per-bot crawl delay
    'bots' => [
        'Bingbot' => [
            'crawl_delay' => 5,
        ],
    ],
],

Request Rate

Some crawlers support request rate:

'bots' => [
    'Googlebot' => [
        'request_rate' => '1/10', // 1 request per 10 seconds
    ],
],

Visit Time

Specify preferred crawl times:

'bots' => [
    'Googlebot' => [
        'visit_time' => '0400-0845', // Crawl between 4 AM and 8:45 AM
    ],
],

Environment-Based Configuration

Development/Staging

Block all crawlers in non-production environments:

'robots' => [
    'enabled' => true,
    'global' => [
        'disallow' => app()->environment('production') ? [] : ['/'],
    ],
],

Or in the service:

$robotsService = app('seo.robots');

if (!app()->environment('production')) {
    $robotsService->disallow('/');
}

Generated Output (Staging)

User-agent: *
Disallow: /

Multiple Sitemaps

Include multiple sitemap URLs:

'robots' => [
    'sitemaps' => [
        'https://example.com/sitemap.xml',
        'https://example.com/sitemap-news.xml',
        'https://example.com/sitemap-images.xml',
    ],
],

Or programmatically:

$robotsService->addSitemap('https://example.com/sitemap.xml');
$robotsService->addSitemap('https://example.com/sitemap-news.xml');

Clean URLs

Ensure proper disallow patterns:

// Correct - blocks directory
'disallow' => ['/admin/'],

// Also blocks single file
'disallow' => ['/admin'],

// Block with wildcard
'disallow' => ['/admin/*'],

Common Patterns

Standard Website

'robots' => [
    'global' => [
        'disallow' => [
            '/admin/',
            '/api/',
            '/private/',
            '/cart/',
            '/checkout/',
            '/account/',
            '/*.pdf$',
            '/*?*',  // Block URLs with query strings
        ],
        'allow' => [
            '/api/public/',
        ],
    ],
],

E-commerce Site

'robots' => [
    'global' => [
        'disallow' => [
            '/admin/',
            '/cart/',
            '/checkout/',
            '/account/',
            '/wishlist/',
            '/compare/',
            '/search/',
            '/*?sort=',
            '/*?filter=',
        ],
    ],
],

Blog/Content Site

'robots' => [
    'global' => [
        'disallow' => [
            '/admin/',
            '/wp-admin/',  // If migrated from WordPress
            '/tag/',       // Avoid duplicate content
            '/author/',
            '/search/',
        ],
    ],
],

Testing Robots.txt

Google's Robots.txt Tester

Use Google Search Console to test your robots.txt.

Programmatic Testing

$robotsService = app('seo.robots');

// Get rules for a specific user-agent
$rules = $robotsService->getRulesForUserAgent('Googlebot');

// Check the disallow and allow arrays
$disallowedPaths = $rules['disallow'] ?? [];
$allowedPaths = $rules['allow'] ?? [];

// Get all configured user agents
$userAgents = $robotsService->getUserAgents();

Caching

Robots.txt can be cached:

'robots' => [
    'cache' => true,
    'cache_ttl' => 3600, // 1 hour
],

Clear cache using the CacheService:

$cacheService = app('seo.cache');
$cacheService->forget('robots');

Events

The robots.txt generation doesn't dispatch events by default, but you can hook into the HTTP request.

Next Steps