SEO and LLM Files

1. Overview

Three root-level files control how search engines and LLM crawlers discover and index the site:

File Purpose Audience

File	Purpose	Audience
`robots.txt`	Allow or disallow crawling of specific paths	Search engine bots (Googlebot, Bingbot, etc.) and AI crawlers
`sitemap.xml`	Map all indexable pages with priority and update frequency	Search engines (Google, Bing, etc.)
`llms.txt`	Curate high-signal content with structured context	LLM crawlers (GPTBot, ClaudeBot, Google-Extended, etc.)

robots.txt

Allow or disallow crawling of specific paths

Search engine bots (Googlebot, Bingbot, etc.) and AI crawlers

sitemap.xml

Map all indexable pages with priority and update frequency

Search engines (Google, Bing, etc.)

llms.txt

Curate high-signal content with structured context

LLM crawlers (GPTBot, ClaudeBot, Google-Extended, etc.)

All three files are served from the site root (e.g., https://www.myriadevents.co.za/robots.txt). The sitemap is generated automatically at build time by the @astrojs/sitemap integration; the other two are static files in public/.

2. robots.txt

2.1. What It Does

robots.txt tells web crawlers which parts of the site they may or may not access. It is the first file any well-behaved crawler reads before indexing content.

2.2. File Location

public/robots.txt

2.3. Structure

# Default rule — allow all crawlers
User-agent: *
Allow: /
Disallow: /api/             (1)

# Sitemap location
Sitemap: https://www.myriadevents.co.za/sitemap.xml  (2)

# AI-specific crawler rules
User-agent: GPTBot           (3)
Allow: /

1	Block crawling of the API endpoint (the contact form function)
2	Points crawlers to the sitemap for full page discovery
3	Named AI crawler sections allow per-crawler control

2.4. When to Update

A new page is added — no change needed (the default Allow: / covers new pages)
A path should be hidden from crawlers — add a Disallow: line
A new AI crawler should be blocked or allowed — add a User-agent: section
The domain changes — update the Sitemap: URL

2.5. Known AI Crawler User-Agents

User-Agent Operator

User-Agent	Operator
`GPTBot`	OpenAI
`ClaudeBot`	Anthropic
`Google-Extended`	Google (AI training, separate from Googlebot)
`CCBot`	Common Crawl (used by many AI datasets)
`PerplexityBot`	Perplexity AI

GPTBot

OpenAI

ClaudeBot

Anthropic

Google-Extended

Google (AI training, separate from Googlebot)

CCBot

Common Crawl (used by many AI datasets)

PerplexityBot

Perplexity AI

To block a specific crawler, change Allow: / to Disallow: / under its section:

User-agent: GPTBot
Disallow: /

2.6. Reference

The robots.txt specification

3. sitemap.xml

3.1. What It Does

sitemap.xml tells search engines which pages exist and how to crawl them efficiently.

3.2. How It Is Generated

The sitemap is generated automatically at build time by the @astrojs/sitemap integration. It discovers all static routes and produces a sitemap index at /sitemap-index.xml (which references /sitemap-0.xml).

The integration is configured in astro.config.mjs:

import sitemap from '@astrojs/sitemap';

export default defineConfig({
  site: 'https://www.myriadevents.co.za',
  integrations: [sitemap()],
  // ...
});

The site property is required — the integration uses it to generate absolute URLs.

3.3. When to Update

No manual updates are needed for static pages. When you add or remove an .astro page in src/pages/, the sitemap updates automatically on the next build.

3.4. Excluding Pages

To exclude specific pages from the sitemap, pass a filter function:

integrations: [
  sitemap({
    filter: (page) => !page.includes('/internal/'),
  }),
]

3.5. Customising Priority and Change Frequency

The @astrojs/sitemap integration supports serialize for per-page customisation:

integrations: [
  sitemap({
    serialize: (item) => {
      if (item.url === 'https://www.myriadevents.co.za/') {
        item.priority = 1.0;
        item.changefreq = 'weekly';
      }
      return item;
    },
  }),
]

When dynamic content (events, results) is published as individual pages in the future, revisit whether robots.txt and llms.txt should also be generated dynamically. See the project README for details.

3.6. Reference

4. llms.txt

4.1. What It Does

llms.txt is a Markdown file at the site root that provides LLM crawlers with a curated, high-signal summary of the site’s content. While robots.txt controls access and sitemap.xml lists pages, llms.txt provides context — helping LLMs understand what the site is about and which pages are most relevant.

This follows the llms.txt specification, an emerging convention for making websites more accessible to AI systems.

4.2. File Location

public/llms.txt

4.3. Structure

# Company Name                                                     (1)

> One-line description of the company.                              (2)

A paragraph with more context about the company.                    (3)

## Services                                                        (4)

- [Service Name](URL): One-line description of the service.

## Key Pages                                                       (5)

- [Page Name](URL): One-line description of the page.

## Contact                                                         (6)

- Phone: ...
- Email: ...

1	H1 heading with the company or site name
2	Blockquote with a concise tagline or description
3	Introductory paragraph with key context
4	Grouped sections for services, pages, or topics
5	Links with inline descriptions — LLMs use these to understand page relevance
6	Contact details for the organisation

4.4. When to Update

A new page or service is added — add a link entry under the appropriate section
A page is removed — remove its entry
Company details change — update the description, contact info, or tagline
The domain changes — update all URLs

4.5. Writing Guidelines

Keep descriptions to one sentence — LLMs parse these as structured metadata
Use full absolute URLs (not relative paths)
Group related pages under descriptive headings
Lead with the most important content — LLMs may truncate long files
Do not include internal/private pages, API endpoints, or admin URLs

4.6. Optional: Per-Page Markdown Versions

For higher-value LLM consumption (especially if the site becomes content-heavy), you can provide Markdown versions of key pages:

public/
├── llms.txt                        # Main index
├── llms/
│   ├── about.md                    # Clean text version of the About page
│   ├── services/
│   │   ├── event-management.md
│   │   └── timing-results.md

Then reference them from llms.txt:

## Services

- [Event Management](https://www.myriadevents.co.za/llms/services/event-management.md): Full-service event management.

This is a higher-effort, higher-value step — only worth doing if the site has substantial content that LLMs should consume in full.

4.7. Reference

llms.txt specification

5. File Summary

File What to Update When Effort

File	What to Update	When	Effort
`public/robots.txt`	Add/remove `Disallow:` rules or AI crawler sections	When paths or crawler policy changes	Low
`sitemap.xml` (auto-generated)	Adjust `serialize`/`filter` in `astro.config.mjs` if needed	Only when customisation is required	None (automatic)
`public/llms.txt`	Add/remove link entries, update descriptions	When pages, services, or company info changes	Low

public/robots.txt

Add/remove Disallow: rules or AI crawler sections

When paths or crawler policy changes

Low

sitemap.xml (auto-generated)

Adjust serialize/filter in astro.config.mjs if needed

Only when customisation is required

None (automatic)

public/llms.txt

Add/remove link entries, update descriptions

When pages, services, or company info changes

Low