XML sitemaps are machine-readable files hosted on your website that list the key URLs you want search engines to find. They’re basically a menu for your site that not only points to your pages but can also give extra details that a bot may not get from just crawling.
They act as a shortcut for search engines, making it easier to discover and understand your site’s structure. Google will still crawl your site, but a sitemap helps to guide crawlers to the pages you actually care about.
Whether you have a website for your small local business or large ecommerce store, Climb can help you build your sitemap – get in touch!
Why does my site need an XML sitemap?
If your site is large or publishes frequently, an XML Sitemap can show Google exactly where all the pages are.
Small websites can also benefit from having an XML sitemap. While a well-structured site with only a few pages may be fully crawlable without one, a sitemap provides a guaranteed list of URLs you want indexed and a central place to monitor indexation in Google Search Console. It’s a low-effort, high-value step toward maintaining search visibility.
XML Sitemaps can also match multilingual versions to their correct alternatives, surface content that might otherwise be missed, and even help search engines discover media files such as images or videos.
What is an XML Sitemap?
An XML sitemap isn’t just for search engines, it’s also useful for you. Paired with tools like Screaming Frog SEO Spider, you can check which sitemap URLs are indexed and spot gaps in coverage. In Google Search Console, you can compare the number of URLs submitted versus the number indexed, and identify whether a specific template, section, or category might be causing issues.
We would highly recommend implementing an XML Sitemap as they help both Google and yourself understand the pages on your website. However there are key factors to remember:
- They aren’t a ranking factor, so having one has no impact on rankings
- They aren’t a substitute for good internal linking
- Using a sitemap doesn’t guarantee indexing of content if it is thin or low quality
XML Sitemaps are literally that – a map of your site in a computer readable format. If your site is difficult to crawl for any other reason, it may also help with link discovery too.
What about HTML Sitemaps?
Both XML Sitemaps and HTML Sitemaps list the URLs on your site, however they are intended for different audiences.
XML sitemaps are built for search engine bots like Googlebot. They help crawlers navigate and understand your site’s structure more effectively.
HTML sitemaps are designed for people. They’re a visual, clickable index of your site’s pages, usually organised by category, so visitors can quickly find what they need.
Google can crawl HTML sitemaps too, but unlike XML sitemaps, they can’t directly carry technical data such as hreflang annotations or image metadata.
HTML sitemaps can also be more selective. For example, you might have an “Investors” section that you want Google to find and index, but it’s not something your average visitor needs in a public navigation menu. In that case, you could leave it out of your HTML sitemap but still include it in your XML sitemap so search engines know it exists.
Google’s Guidelines on XML Sitemaps
Google publishes clear requirements for how XML sitemaps should be structured and maintained. Following these not only ensures search engines can read your sitemap but also avoids wasting crawl budget on the wrong URLs.
Supported formats for sitemaps
Google supports multiple formats:
- XML: Recommended for flexibility and feature support.
- RSS and Atom feeds: Accepted, but less flexible for including optional tags like hreflang or media metadata.
- Plain text: One absolute URL per line, but without any metadata support.
Regardless of format, your sitemap must be encoded in UTF-8, follow the Sitemap Protocol, and start with the correct XML declaration for XML files:
<?xml version="1.0" encoding="UTF-8"?>
Do sitemaps have size limits?
Each sitemap can contain up to 50,000 URLs and must be 50 MB or smaller uncompressed. Larger sites should split sitemaps into logical groups (eg. sitemap-blog.xml, sitemap-products.xml) and link them together in a sitemap index file.
How should I structure my XML Sitemap?
When building your XML sitemap, every entry should contain the full canonical URL for the page. For example, https://www.example.com/page rather than a relative path like /page.
The sitemap file itself should be hosted on the same site it references, unless you’ve explicitly set up cross-domain sitemap permissions in Google Search Console. All URLs within the sitemap must belong to the same domain or subdomain as the file’s location.
If you have multiple subdomains, each one should have its own dedicated sitemap and be submitted separately. Avoid mixing protocols (http vs. https) or hosts (www vs. non-www) within the same sitemap.
What should I be aware of when building an XML Sitemap?
An effective XML sitemap should list only URLs you actively want indexed. This means excluding any pages that return errors (4xx or 5xx), URLs that redirect to other locations (3xx), duplicate parameterised URLs, or pages marked with a noindex tag.
Your sitemap must also be publicly accessible and return a valid HTTP 200 status when requested. Search engines should be able to retrieve it without any form of authentication, cookies, or special headers.
Once your sitemap is live, you should submit it directly in Google Search Console so Google is aware of it, and also reference it in your robots.txt file. The robots.txt inclusion acts as a backup discovery method for search engines, making it more likely that your sitemap will be found and used, even if it’s not manually submitted.
What does an XML Sitemap look like?
An XML sitemap follows a strict structure using the <urlset> element to wrap all <url> entries.
Basic XML sitemap example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com/</loc>
<lastmod>2025-08-01</lastmod>
</url>
<url>
<loc>https://www.example.com/about</loc>
<lastmod>2025-07-15</lastmod>
</url>
</urlset>
Complex XML sitemap with images and hreflang:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://www.example.com/product-123</loc>
<lastmod>2025-08-01</lastmod>
<image:image>
<image:loc>https://www.example.com/images/product-123.jpg</image:loc>
<image:caption>Example product caption</image:caption>
</image:image>
<xhtml:link rel="alternate" hreflang="en" href="https://www.example.com/product-123" />
<xhtml:link rel="alternate" hreflang="fr" href="https://www.example.com/fr/product-123" />
</url>
</urlset>
What You Can Include in an XML Sitemap
XML sitemaps are more than just a list of links, they’re a versatile way to give search engines extra context that might not be obvious from simply crawling your site. This added layer of data can help crawlers better understand your content, especially in cases where key elements aren’t immediately discoverable through normal navigation.
What URLs to include in my XML sitemap?
Your XML sitemap should only contain URLs you want search engines to index. These should be live, canonical pages that return an HTTP 200 status. That includes standard web pages such as your homepage, main category pages, product or service listings, blog posts, and any other content that is valuable for search visibility.
Avoid adding temporary landing pages, thin or low-quality content, or duplicate versions of the same page. Every URL you include is essentially a signal to search engines that the page is important, so focus on pages that genuinely deserve to appear in search results and that align with your SEO goals.
When should I put images in my XML Sitemap?
If images are a core part of your content strategy (product photos, infographics, illustrations, or news photography) adding them in a sitemap helps Google discover and index them.
Google can usually find images by crawling <img> tags on your pages, but there are situations where they can be missed. Images loaded dynamically with JavaScript, hidden behind clicks in galleries or used as CSS backgrounds may not be picked up through normal crawling.
The image sitemap protocol also lets you give search engines more context than HTML alone. You can add captions, titles, and licensing information, which can help Google understand how an image should be displayed and which queries it might be relevant for. This can be especially valuable for e-commerce sites looking to surface product images in Google Shopping, creative professionals who depend on image search, or publishers whose news images appear alongside stories in search results.
It’s best to focus on meaningful content images rather than decorative icons or background graphics.
Why should I include hreflang within my XML Sitemap?
Including hreflang inside your sitemap is most useful when you want to reinforce or simplify your language and regional targeting signals. It’s not a replacement for on-page hreflang tags, but compliments your international SEO
If your site has hreflang markup directly in the HTML, that’s generally enough for Google to understand your alternate language or regional versions. However, sitemap-based hreflang can be beneficial when:
- Your site is very large, and managing or updating HTML hreflang tags across thousands of pages is error-prone.
- You want a single, centralised place to declare all alternate versions, making auditing easier.
- You’re using different CMSs or templates for various language sections, and keeping on-page tags consistent is tricky.
- Some pages are hard for Google to crawl due to navigation or linking structure. A sitemap gives Google a direct, structured list of all alternates.
Google will happily accept hreflang in sitemaps alone, but best practice is to use both on-page and sitemap annotations.
How are XML Sitemaps beneficial for video content?
A video sitemap (or a standard XML sitemap with video extensions) can tell Google exactly where the video is located, its title, description, duration, and even whether it’s family-friendly.
You can also point to video thumbnails and indicate the type of platform or restrictions, such as region availability. This is particularly helpful when videos are embedded deep within pages, loaded dynamically, or hosted on third-party platforms. Supplying this data increases the chances of videos appearing in Google’s video search results or rich snippets.
Want your videos to show up in Google’s search results? Climb’s SEO experts can build you a sitemap that boosts your video visibility.
How are XML Sitemaps beneficial for news content?
XML sitemaps give Google a clear, structured feed of your most recent articles, which helps with faster discovery and indexing.
A Google News sitemap is different from a standard XML sitemap in that it’s designed to contain only news articles published in the last 48 hours. Google News crawlers rely on this to quickly identify what’s new and relevant for their news index.
By using a news sitemap, you can also provide extra metadata about each article which helps Google understand the context and topic of your stories. For publishers covering breaking news, politics, sports, or other fast-moving topics, this can be the difference between being indexed within minutes or hours.
Common Mistakes to Avoid
Mixing URL versions and indexabilities
One of the most frequent errors in XML sitemaps is listing inconsistent or incorrect URL formats:
Protocol consistency: Always use the secure HTTPS version if that is your canonical preference, and never mix HTTP and HTTPS within the same sitemap.
Host consistency: Choose either the www or the non-www hostname and use it consistently across all entries. Mixing them can lead to duplication and confusion for search engines.
Subdomains: Do not include subdomains, each site needs its own sitemap and robots.txt file. If subdomains need their own sitemaps, create and submit them separately.
Staging or development sites: Never list URLs from test environments, staging servers, or developer sandboxes. These should remain blocked from indexing entirely.
Use absolute URLs: All <loc> tags must contain the full, absolute URL (e.g., https://www.example.com/page) rather than a relative path (/page).
Outdated entries
Orphan pages are URLs listed in the sitemap but no longer linked anywhere on the live site. These can include deleted content, expired campaigns, or old test pages. Including them wastes crawl budget, can return errors or redirects, and may send mixed signals about which pages should be indexed.
The best way to avoid this is to make your sitemap dynamic rather than manually maintained. Automating updates so the file changes whenever a page is published, edited, or deleted ensures only current, indexable URLs are included. Many CMS platforms and SEO tools have this capability, keeping your sitemap accurate and aligned with your site’s actual structure.
Oversized files
Sites with products can easily exceed Google’s sitemap limits of 50,000 URLs or 50 MB (uncompressed) per file. When this happens, search engines will simply ignore anything beyond the limit, meaning some pages may never be discovered through the sitemap.
The solution is to split your URLs into multiple, logically grouped sitemaps and reference them all in a sitemap index file. This approach keeps each sitemap within limits while still giving search engines full coverage of your site.
Outdated tags
Tags such as <priority> and <changefreq> are ignored by Google. The only optional tag Google may use is <lastmod>, and even then it is treated as a hint, not a directive. Trying to “game the system” by automatically setting it to the current date will simply cause Google to ignore it if the page content hasn’t genuinely changed.
If you choose to use <lastmod>, make sure the value is accurate, updated only when the page changes, and formatted using W3C Datetime (e.g., 2025-08-08).
Final thoughts
An XML sitemap isn’t a magic bullet for SEO, but it’s a powerful tool for ensuring search engines can efficiently discover the content that matters most. By keeping it clean, accurate, and focused on indexable, SEO valuable URLs, you give crawlers a clear roadmap to your site. It also gives you valuable insight through tools like Google Search Console, helping you spot coverage issues and track how your content is being indexed.
Treat your sitemap as a living document. Update it whenever your site changes, validate it regularly, and use it alongside strong internal linking and high-quality content. If you want expert eyes on your sitemap to make sure it’s doing its job, Climb’s technical SEO team can help you optimise it for maximum visibility. Speak to us today!
XML Sitemap Checklist
Content
- Only indexable URLs (status 200)
- All URLs intentional
- No redirects (3xx)
- No broken links (4xx/5xx)
- No noindex pages
- No blocked URLs in robots.txt
- No staging/dev/test URLs
- No parameterised duplicates
- No thin/low-quality pages
- No orphan pages
Structure & Format
- Less than 50,000 URLs per file
- Less than 50 MB uncompressed
- Split large sitemaps logically
- Uses sitemap index if split
- Full, absolute URLs
- Canonical protocol (HTTP/HTTPS)
- Canonical host (www/non-www)
- No unwanted subdomains
- UTF-8 encoding
- Valid XML syntax
- Correct namespace declarations
- XML declaration at top (<?xml version=”1.0″ encoding=”UTF-8″?>)
Access & Submission
- Sitemap returns HTTP 200
- Publicly accessible (no login)
- Linked in robots.txt
- Submitted in Google Search Console
- Monitored in Search Console for coverage
Maintenance
- Updates automatically on publish/edit/delete
- Reviewed for accuracy regularly
- Matches current site structure