SEO - v1.0.0

Robots.txt

ArtisanPack UI SEO provides dynamic robots.txt generation with support for global rules, bot-specific directives, and automatic sitemap inclusion.

Overview

The robots.txt file tells search engine crawlers which pages they can or cannot crawl. The package generates this file dynamically based on your configuration.

Configuration

// In config/seo.php
'robots' => [
    'enabled' => true,
    'global' => [
        'disallow' => ['/admin', '/api', '/private'],
        'allow' => ['/api/public'],
    ],
    'bots' => [
        'GPTBot' => ['disallow' => ['/']],
        'CCBot' => ['disallow' => ['/']],
        'Googlebot' => ['allow' => ['/']],
    ],
    'crawl_delay' => null,
    'sitemaps' => true,
    'host' => null,
],

Option	Description
`enabled`	Enable dynamic robots.txt
`global`	Rules for all bots
`bots`	Bot-specific rules
`crawl_delay`	Delay between requests (seconds)
`sitemaps`	Auto-include sitemap URLs
`host`	Host directive (for Yandex)

Serving Robots.txt

Via Routes

// In config/seo.php
'routes' => [
    'robots' => true,
],

// Registers: GET /robots.txt

Via Controller

use ArtisanPackUI\Seo\Services\RobotsService;

class RobotsController extends Controller
{
    public function __invoke(RobotsService $robots)
    {
        return response($robots->generate())
            ->header('Content-Type', 'text/plain');
    }
}

Generated Output

# Robots.txt generated by ArtisanPack UI SEO

User-agent: *
Disallow: /admin
Disallow: /api
Disallow: /private
Allow: /api/public

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Googlebot
Allow: /

Sitemap: https://example.com/sitemap.xml

Programmatic Management

Using the Service

use ArtisanPackUI\Seo\Services\RobotsService;

$robotsService = app('seo.robots');

// Generate content
$content = $robotsService->generate();

// Add disallow rule (default user-agent: *)
$robotsService->disallow('/secret');

// Add allow rule
$robotsService->allow('/public');

// Add bot-specific rules
$robotsService->disallow('/archive', 'Bingbot');
$robotsService->crawlDelay(5, 'Bingbot');

// Add sitemap
$robotsService->addSitemap('https://example.com/sitemap.xml');

// Set crawl delay for all bots
$robotsService->crawlDelay(10);

// Set host directive
$robotsService->setHost('example.com');

// Get rules for a specific user-agent
$rules = $robotsService->getRulesForUserAgent('Googlebot');

// Get all user agents with rules
$userAgents = $robotsService->getUserAgents();

// Clear all rules and start fresh
$robotsService->clearRules();

// Remove rules for a specific user-agent
$robotsService->removeUserAgent('GPTBot');

Using Helper

// Get robots.txt content
$content = seoRobotsTxt();

Bot-Specific Rules

Common Bots

Bot	Description
`Googlebot`	Google's crawler
`Bingbot`	Microsoft Bing's crawler
`Slurp`	Yahoo's crawler
`DuckDuckBot`	DuckDuckGo's crawler
`Baiduspider`	Baidu's crawler
`YandexBot`	Yandex's crawler

AI/LLM Bots

Bot	Description
`GPTBot`	OpenAI's crawler
`ChatGPT-User`	ChatGPT browsing
`CCBot`	Common Crawl
`anthropic-ai`	Anthropic's crawler
`Claude-Web`	Claude browsing

Blocking AI Crawlers

'bots' => [
    'GPTBot' => ['disallow' => ['/']],
    'ChatGPT-User' => ['disallow' => ['/']],
    'CCBot' => ['disallow' => ['/']],
    'anthropic-ai' => ['disallow' => ['/']],
],

Advanced Configuration

Crawl Delay

'robots' => [
    // Global crawl delay for all bots
    'crawl_delay' => 10,

    // Per-bot crawl delay
    'bots' => [
        'Bingbot' => [
            'crawl_delay' => 5,
        ],
    ],
],

Request Rate

Some crawlers support request rate:

'bots' => [
    'Googlebot' => [
        'request_rate' => '1/10', // 1 request per 10 seconds
    ],
],

Visit Time

Specify preferred crawl times:

'bots' => [
    'Googlebot' => [
        'visit_time' => '0400-0845', // Crawl between 4 AM and 8:45 AM
    ],
],

Environment-Based Configuration

Development/Staging

Block all crawlers in non-production environments:

'robots' => [
    'enabled' => true,
    'global' => [
        'disallow' => app()->environment('production') ? [] : ['/'],
    ],
],

Or in the service:

$robotsService = app('seo.robots');

if (!app()->environment('production')) {
    $robotsService->disallow('/');
}

Generated Output (Staging)

User-agent: *
Disallow: /

Multiple Sitemaps

Include multiple sitemap URLs:

'robots' => [
    'sitemaps' => [
        'https://example.com/sitemap.xml',
        'https://example.com/sitemap-news.xml',
        'https://example.com/sitemap-images.xml',
    ],
],

Or programmatically:

$robotsService->addSitemap('https://example.com/sitemap.xml');
$robotsService->addSitemap('https://example.com/sitemap-news.xml');

Clean URLs

Ensure proper disallow patterns:

// Correct - blocks directory
'disallow' => ['/admin/'],

// Also blocks single file
'disallow' => ['/admin'],

// Block with wildcard
'disallow' => ['/admin/*'],

Common Patterns

Standard Website

'robots' => [
    'global' => [
        'disallow' => [
            '/admin/',
            '/api/',
            '/private/',
            '/cart/',
            '/checkout/',
            '/account/',
            '/*.pdf$',
            '/*?*',  // Block URLs with query strings
        ],
        'allow' => [
            '/api/public/',
        ],
    ],
],

E-commerce Site

'robots' => [
    'global' => [
        'disallow' => [
            '/admin/',
            '/cart/',
            '/checkout/',
            '/account/',
            '/wishlist/',
            '/compare/',
            '/search/',
            '/*?sort=',
            '/*?filter=',
        ],
    ],
],

Blog/Content Site

'robots' => [
    'global' => [
        'disallow' => [
            '/admin/',
            '/wp-admin/',  // If migrated from WordPress
            '/tag/',       // Avoid duplicate content
            '/author/',
            '/search/',
        ],
    ],
],

Testing Robots.txt

Google's Robots.txt Tester

Use Google Search Console to test your robots.txt.

Programmatic Testing

$robotsService = app('seo.robots');

// Get rules for a specific user-agent
$rules = $robotsService->getRulesForUserAgent('Googlebot');

// Check the disallow and allow arrays
$disallowedPaths = $rules['disallow'] ?? [];
$allowedPaths = $rules['allow'] ?? [];

// Get all configured user agents
$userAgents = $robotsService->getUserAgents();

Caching

Robots.txt can be cached:

'robots' => [
    'cache' => true,
    'cache_ttl' => 3600, // 1 hour
],

Clear cache using the CacheService:

$cacheService = app('seo.cache');
$cacheService->forget('robots');

Events

The robots.txt generation doesn't dispatch events by default, but you can hook into the HTTP request.

Next Steps

XML Sitemaps - Sitemap configuration
Configuration - Full config reference
Artisan Commands - CLI tools