Feature Request: Add Configurable robots.txt Support #9

Open
opened 2025-01-29 23:58:17 +01:00 by Aroy · 1 comment
Owner

I would like to request the addition of a robots.txt file with configurable settings that allow users to specify which bots to block directly from Hugo’s main configuration file. This feature will help users control search engine indexing behavior and prevent unwanted bots from crawling their site.

Rationale:
Adding a robots.txt file with configurable rules improves SEO control and security. Users can define which search engines can index their content and block unwanted bots, scrapers, or specific sections of their site. This feature is especially useful for sites that contain private content, staging environments, or areas that should not be indexed.

Use Cases:

  • SEO Optimization: Site owners can control which parts of their site search engines can index.
  • Privacy and Security: Blocking scrapers and unwanted bots from crawling sensitive sections.
  • Staging Environments: Preventing indexing of development or test sites.

Implementation Ideas:

  • Default robots.txt Template:

    • Create a robots.txt template in the theme (e.g., layouts/_default/robots.txt).
  • Allow customization based on site parameters in config.yaml or config.toml.

  • Configuration Options in config.yaml:

    • Allow users to enable/disable robots.txt generation.
    • Provide an option to block specific bots or directories.

Example configuration:

params:
      robots:
        enabled: true
        disallow_all: false  # Block all bots
        allow_all: true       # Allow all bots (default)
        blocked_bots:
          - "AhrefsBot"
          - "SemrushBot"
          - "MJ12bot"
        disallowed_paths:
          - "/private/"
          - "/admin/"

Dynamic robots.txt Generation:

  • The robots.txt file should be automatically generated based on the configuration parameters.

Example template (layouts/_default/robots.txt):

User-agent: *
{{ if .Site.Params.robots.disallow_all }}
Disallow: /
{{ else if .Site.Params.robots.allow_all }}
Disallow:
{{ else }}
{{ range .Site.Params.robots.disallowed_paths }}
Disallow: {{ . }}
{{ end }}
{{ end }}

{{ range .Site.Params.robots.blocked_bots }}
User-agent: {{ . }}
Disallow: /
{{ end }}

Environment Variable Support (for Netlify, Vercel, etc.):

  • Allow overriding robots.txt settings using environment variables, e.g.:
HUGO_PARAMS_ROBOTS_ENABLED=true
HUGO_PARAMS_ROBOTS_DISALLOW_ALL=false

This is useful for staging environments where indexing should be disabled.

Documentation:

  • Include a section in the theme documentation explaining how to configure robots.txt.
  • Provide examples for different use cases (e.g., allowing/disallowing bots, blocking certain directories).

Additional Context:
Many Hugo users deploy their sites on platforms like Netlify, Vercel, or GitHub Pages, where controlling search engine indexing is crucial. Having a built-in, configurable robots.txt ensures better SEO and security while keeping configuration simple and manageable.

I would like to request the addition of a robots.txt file with configurable settings that allow users to specify which bots to block directly from Hugo’s main configuration file. This feature will help users control search engine indexing behavior and prevent unwanted bots from crawling their site. **Rationale:** Adding a robots.txt file with configurable rules improves SEO control and security. Users can define which search engines can index their content and block unwanted bots, scrapers, or specific sections of their site. This feature is especially useful for sites that contain private content, staging environments, or areas that should not be indexed. **Use Cases:** - SEO Optimization: Site owners can control which parts of their site search engines can index. - Privacy and Security: Blocking scrapers and unwanted bots from crawling sensitive sections. - Staging Environments: Preventing indexing of development or test sites. **Implementation Ideas:** - Default robots.txt Template: - Create a robots.txt template in the theme (e.g., `layouts/_default/robots.txt`). - Allow customization based on site parameters in `config.yaml` or `config.toml`. - Configuration Options in `config.yaml`: - Allow users to enable/disable robots.txt generation. - Provide an option to block specific bots or directories. **Example configuration:** ``` params: robots: enabled: true disallow_all: false # Block all bots allow_all: true # Allow all bots (default) blocked_bots: - "AhrefsBot" - "SemrushBot" - "MJ12bot" disallowed_paths: - "/private/" - "/admin/" ``` **Dynamic robots.txt Generation:** - The robots.txt file should be automatically generated based on the configuration parameters. Example template (`layouts/_default/robots.txt`): ``` User-agent: * {{ if .Site.Params.robots.disallow_all }} Disallow: / {{ else if .Site.Params.robots.allow_all }} Disallow: {{ else }} {{ range .Site.Params.robots.disallowed_paths }} Disallow: {{ . }} {{ end }} {{ end }} {{ range .Site.Params.robots.blocked_bots }} User-agent: {{ . }} Disallow: / {{ end }} ``` **Environment Variable Support (for Netlify, Vercel, etc.):** - Allow overriding robots.txt settings using environment variables, e.g.: ``` HUGO_PARAMS_ROBOTS_ENABLED=true HUGO_PARAMS_ROBOTS_DISALLOW_ALL=false ``` This is useful for staging environments where indexing should be disabled. **Documentation:** - Include a section in the theme documentation explaining how to configure robots.txt. - Provide examples for different use cases (e.g., allowing/disallowing bots, blocking certain directories). **Additional Context:** Many Hugo users deploy their sites on platforms like Netlify, Vercel, or GitHub Pages, where controlling search engine indexing is crucial. Having a built-in, configurable robots.txt ensures better SEO and security while keeping configuration simple and manageable.
Aroy added the
Kind/Feature
Priority
High
labels 2025-01-29 23:59:10 +01:00
Author
Owner

Need to clean up the formatting of this request Fixed markdown formatting for the feature request

~~Need to clean up the formatting of this request~~ Fixed markdown formatting for the feature request
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Aroy/Rinkusu#9
No description provided.