At this point an “XML site map” is something many marketers and even small business owners with a general understanding of SEO are at least aware of. But when I log into a client Google Search Console account it’s still pretty common to see either no XML sitemap submitted, or a number of errors and/or pages that aren’t being submitted. In this post we’ll walk through some tips for getting a sitemap generated and submitted to Google Search Console that will scale with your site.
What is an XML Sitemap?
To start with for those who aren’t familiar: a sitemap is basically a list of all of your website’s URLs. Submitting an XML sitemap to Google via Google Search Console (former Google Webmaster Tools) helps the search engine discover and index all of your website’s content. As a result, this step provides a positive effect on your SEO efforts because the more content in Google’s index, the more opportunities you have to penetrate more search verticals and thus show up for more search queries.
An XML sitemap also helps you understand potential SEO issues on your site that you’re trying to diagnose.
To help you make a Google site map, we’ve created a detailed set of instructions on how to make a dynamic XML site map to increase indexation of content for your website. At the end of the site map instructions, we’ve included some helpful resources if you need a XML site map generator or site map builder and have questions on how to submit your site map to Google.
Instructions on How to Build a Dynamic XML Site Map
- Use the full url of your site for the “Starting URL” option. The crawler explores only the URLs within the starting directory, i.e. when starting URL is “http://www.example.com/path/
index.html”, the “http://www.example.com/path/ sub/page.html” will be indexed, but “http://www.example.com/other/ index.html” will NOT.
- “Save sitemap to” – is the filename in the “public_html/” folder of your website. This file should be writable by the script. To make sure it is, create this file and set its permissions to 0666.
- It is recommended to use “Server’s response” for “Last modification” field. In this case the entries for static pages will be filled with their real last modification time, while for dynamic pages the current time is used.
- “Do not parse” input field contains file types, separated by space. These files will be added to the sitemap, but not fetched to save bandwidth, because they are not html files and have no embedded links. Please make sure these files are indexed by Google since there is no sense in adding them to sitemap otherwise!
- “Do not parse URLs” works together with the option above to increase the speed of sitemap generation. If you are sure that some pages at your site do not contain the unique links to other pages, you can tell generator not to fetch them.
For instance, if your site has “view article” pages with urls like “viewarticle.php?..”, you may want to add them here, because most likely all links inside these pages are already listed at “higher level” (like the list of articles) documents as well:
If you are not sure what to write here, just leave this field empty. Please note that these pages are still included into the sitemap.
- “Exclude extensions” – these files are not crawled and not included in sitemap.
- To disallow the part of your website from inclusion to the sitemap use “Exclude URLs” setting: all URLs that contain the strings specified will be skipped.
For instance, to exclude all pages within “www.domain.com/folder/” add this line:
If your site has pages with lists that can be reordered by columns and URLs look like “list.php?sort=column2”, add this line to exclude duplicate content:
Anyway, you may leave this box empty to get ALL pages listed.
- “Include ONLY URLs” setting is the opposite to “Exclude URLs”. When it is not empty, ONLY the urls that match the substring entered are included into sitemap.
- “Individual attributes” setting allows you to set specific values for last modification time, frequency andpriority per page. To use it, define specific frequency and priority attributes in the following format: “url substring,lastupdate YYYY-mm-dd,frequency,priority”
- You may want to limit the number of pages to index to make sure it will not be endless if your website have an error like unlimited looped links.
- To limit the maximum running time of the script, define the “Maximum execution time” field (in seconds).
- To have a possibility to use “Resume session” feature, define the “Save the script state” field. This value means the intervals to save the crawler process state, so in case the script was interrupted, you can continue the process from the last saved point. Set this value to “0” to disable savings.
- To reduce the load on your server made by the sitemap generator, you can add the “sleep” delay after each N (configured) requests to your site for X seconds (configured). Leave blank (“0”) values to crawls the site without delays.
- Google doesn’t support sitemap files with more than 50,000 pages. That’s why script supports “Sitemap Index” creation for the big sites. So, it will create one sitemap index file and multiple sitemap files with 50 thousand pages each.
For instance, your website has about 140,000 pages. The XML sitemap generator will create these files:
- “sitemap.xml” – sitemap index file that includes links to other files (filename depends on what you entered in the “Save sitemap to” field)
- “sitemap1.xml” – sitemap file (URLs from 1 to 50,000)
- “sitemap2.xml” – sitemap file (URLs from 50,001 to 100,000)
- “sitemap3.xml” – sitemap file (URLs from 100,001 to 140,000)
Please make sure all of these files are writable if your website is large.
- Enable “Create HTML Sitemap” option to let generator create a sitemap for your visitors. You should also define the “HTML Sitemap filename” where the sitemap will be stored. It is possible to split html sitemap onto multiple files by defining the “Number of links per page in HTML sitemap” option.
The filenames are like the following:
- “sitemap.html” – in case when all links fit in one file
- “sitemap1.html” – site map file, page 1
- “sitemap2.html” – site map file, page 2
Same as point above: please make sure all of these files are writable. The site map pages layout can be modified to suit to your website in pages/mods/sitemap_tpl.html file.
Besides modifying the stylesheet for html sitemap, you can change the way it is formatted. The basic template commands are:
- <TLOOP XX>…</TLOOP> – defines a repeating sequence of code (like page numbers or sitemap links)
- <TIF XX>…</TIF> – defines a conditional statement that is inserted only when a specific term is met
- <TVAR XX> – inserts a value of a specified variable
Please refer to sitemap_tpl.html file for usage example.
- “sitemap.html” – in case when all links fit in one file
- Enable GZip compression of sitemap files to save on disk space and bandwidth. In this case “.gz” will be added to sitemap filenames (like “sitemap.xml.gz”).
- “Sitemap URL” is the same file entered in “Save sitemap to” field, but in the URL form. It is required to inform Google about sitemap address.
- Set “Ping Google” checkbox enabled to let the script inform Google on every sitemap change. In this way you will always let google know about the fresh information on your site.
- If you want to restrict access to your generator pages, set the login and password here.
XML Sitemap Generator Resources
The following are some great resources for making XML site maps for Google and include a library of helpful articles to help you further understand what a XML site map, give site map examples and how to solve potential problems that may arise when you submit a site map to Google for SEO purposes. Also, there’s a list of free site map tools and creators.
Site Map Learning Library
- All About Site Maps
- Creating Site Maps
- Managing Site Maps
- Creating Site Maps for Multiple Websites
- Image Sitemaps
- Video Sitemaps