A sitemap is a list of all web pages from a website, and it's very useful because it helps your content appear faster in the user's browser. There are two types of sitemap, static and dynamic. Static sitemaps are being used today only for static websites, or for websites that do not have a lot of pages. Creating these sitemaps could take a long time, because after you create every new page on your website you'll have to manually include it to the sitemap. On the other hand, dynamic sitemaps are much more useful and used. They are connected with the database of the website, and every time a new page is created, it will automatically be included in the sitemap.
Drupal 8 has one great module, which allows us to quickly and easily generate a sitemap for our website. It is Simple XML Sitemap, which you can download here. The installation of this module is not complicated, and so is using it.
In it's user interface, we can choose which entities we would like to include in the sitemap. For example if we choose entity type node as one of them, we can also choose which of the node content types we would like to exclude from the sitemap, and also which nodes we'd like to exclude. After every change, we need to run the "Regenerate sitemap" script, which you can find on the module configuration page. ("admin/config/search/simplesitemap"). Sitemap will then be created under your project root folder, and you can access it running this link: "your_project/sitemap.xml".
As I have already said, this module is very helpful, but what if we would like to generate our map which requires some other specification, for example if we would like to include only the nodes created after some specific date, or we need only nodes included created by some specific user(s). We can still manage to do that with the module's interface, but what if there are more than a hundred, or even a thousand nodes we needed to change.. it would be such a pain to manually accomplish. It would be a lot easier if we create some custom code, which would regenerate the sitemap with our changes. I'll explain in this blog how to accomplish this.
In order to follow the code examples, you will need to have a custom module. If you don't have any you'll have to create one. For this tutorial, I'll name my custom module custom_sitemap.
In the custom_sitemap.module file, I'll create two functions.
In order to create our custom values for the sitemap we'll need to retrieve the values which will represent the URLs of pages which will be included in the sitemap, language the node text was written in, timestamp value which will represent the value when the node was changed, and priority (supports values from 0.1 to 1.0) for the sitemap.
Then, I'll create a controller, in src folder which should be in root folder of your module, if you don't have it, create a folder named "Controller", and therein create a php file "MapgenerateController.php". This controller will be invoked every time when we create a script "/map-generate" (which we will later define in our module.routing file), and we will create a service which this module will use to retrieve the results.
In the previous code we've only defined a service which will be called after every script run, and we have defined a markup element on the page which will be loaded after the script has run.
The next step is creating and defining our custom route for our controller. In the file custom_sitemap.routing.yml you need to insert the code bellow:
We have defined the path for our controller, which will execute the function "generate" after running the script "/map-generate".
Also, we need to create the service, which we had defined in our Controller file. It's very simple, in your module_name.services.yml file (in my case: custom_sitemap.services.yml), insert the code below:
In the previous code we've defined our service custom_sitemap.sitemap,which will use the class we still need to create.
The path of the class is: your_project/modules/custom_sitemap/src, the file name in my case is: Sitemap.php.
This class is probably the most important part of making our custom sitemap, because there lays some basic logic of our custom integration. Firstly we have selected the current content of the sitemap from the database, which is generated using the Simple XML Sitemap module. Then I've created an SQL query which will select all nodes from the database of type "Page" and which are created by the user with uid 1 (admin). This is just a demo example, you can write the SQL query whatever way you need.
In order to regenerate the sitemap correctly with our custom values, we need to have the URL, langcode and the timestamp when the node was changed defined, as well as the priority value for the sitemap.
We will select only the title and langcode of those nodes from the database. The name values will be redefined for URL purposes, they will be converted to small letters using the function strtolower, and strings ' ' (empty space) will be replaced with '-' (dashes) using function str_replace.
I'll set that the value of "changed" timestamp is the moment of the running of the script, and the priority value is 0.7.
Again. these are optional values, you can set them however you want based on your needs.
Later we'll write those changes in the sitemap.xml text file.
Now, after every script run "your_domain/map-generate" your sitemap will be regenerated with your newly added custom results. It would be highly recommended that you also create a cron directive, where you can define that your script automatically runs, periodically, specific number of times in day, week.. whenever you need. if you don't already have a cron function in your custom module, create one, and insert the code bellow into it.
In the previous code we defined that my custom cron will run every day, just after midnight, and it will run the regenerate sitemap script that Simple XML sitemap module is using, and your newly added script as well.