Uploading a Screaming Frog Sitemap to My Wordpress Website
If y'all're running Screaming Frog Spider on a WordPress website, exclude the correct URLs from your clamber with this list Screaming Frog Exclude WordPress URLs.
In this post, I'g going to provide a list of WordPress folders, files, and other URLs to exclude from your Screaming Frog spider crawls.
Tabular array of Contents
If yous run a standard/default Screaming Frog Spider clamber on a WordPress website, y'all may run into some unnecessary results on your sitemap.
This post will show how to exclude them — and tell you which ones to exclude.
Already know what you're doing? — Jump the full list. Otherwise, if you desire to learn the how and why of these sitemap exclusions, keep reading.
Permit's swoop in.
Screaming Frog Spider Sitemaps
If yous've been in SEO for more than a few days, especially if you focus on technical or On-page SEO, you've definitely created a sitemap.
And Screaming Frog'due south Spider tool is by far the best in the business.
A sitemap is a design of your website that aid search engines find, clamber and index all of your website's content.
These giant URL lists tell search engines which pages on your site are virtually important.
You don't Demand a sitemap. As Google puts it:
"If your site'southward pages are properly linked, our web crawlers tin usually find most of your site."
But there are a few cases where a sitemap is a huge aid— similar if your website is brand new, recently changed a ton of URLs, or you accept a large website (one,000+ pages).
Unless your internal linking is PERFECT and all your 100'south or ane,000's of URLs have earned external backlinks, search bots are going to accept a difficult time finding all of those pages.
That's where sitemaps come in.
A Good Sitemap is a Clean Sitemap
But non EVERYTHING needs to be in your sitemap. In fact, you lot definitely shouldn't put everything in there.
The whole point of a sitemap—and technical SEO in general—is to brand sure yous're giving Google the best intel possible on how and where to crawl (and therefore index and rank) your pages.
Therefore, y'all but desire to submit your important, public, searchable pages in your sitemap. The others should be excluded.
That'southward where Screaming Frog'southward exclude options come in.
From Screaming Frog's Configuration > Exclude documentation—
The exclude configuration allows you to exclude URLs from a crawl by supplying a list of regular expressions. A URL that matches an exclude is not crawled at all (it'due south not just 'subconscious' in the interface).
This will mean other URLs that do non match the exclude, simply can only be reached from an excluded page volition also not be found in the crawl.
The exclude list is applied to new URLs that are discovered during the crawl. This exclude list does not get applied to the initial URL(s) supplied in crawl or listing style.
Most SEO pros who have used Screaming Frog'south Spider tool are probably already familiar with the Exclude selection. But in instance you need a refresh, check out their extensive guide here, or check out their video beneath:
What should your sitemap exclude?
Let'due south expect at the WordPress folders, files, and URLs to you'll likely want to exclude from your Screaming Frog spider crawls.
Throughout the remainder of this mail, I have to make some assumptions about the intended use-case of your sitemap.
Your mileage my vary, and the suggestions in this mail service won't piece of work for every site in every example. Delight change these lists for your own needs.
WP-Content Folder
-
https://example.com/wp-content/.*
This will exclude everything in your WordPress install'due south /wp-content/ binder.
On the off run a risk that you actually want to allow some of those WordPress directories in your Screaming Frog spider crawl (like maybe your /uploads/ folder for PDF assets, due east.g?), hither are the individual folders.
Pick the ones you want to exclude:
-
https://case.com/wp-content/mu-plugins/.* -
https://example.com/wp-content/plugins/.* -
https://example.com/wp-content/themes/.* -
https://example.com/wp-content/upgrade/.* -
https://example.com/wp-content/uploads/.*
Other WordPress Directories
I can't think of any reason why you'd want these directories included in your Screaming Frog wordpress website clamber.
-
https://example.com/wp-includes/.* -
https://example.com/wp-admin/.*
Almost of them are going to be unreachable for Screaming Frog's spider tool, anyway — if yous're running it with the default "Respect noindex" configuration.
WordPress Default Files
Similar the directories above, these are likely going to exist skipped by Screaming Frog anyway. If your theme or install are making these files public, you've got bigger problems than just a sitemap.
Only that's another post.
In the concurrently, y'all nigh certainly want to exclude these WordPress file URLs from your Screaming Frog crawl.
-
https://instance.com/index.php -
https://instance.com/license.txt -
https://example.com/readme.html -
https://example.com/wp-actuate.php -
https://example.com/wp-blog-header.php -
https://example.com/wp-comments-post.php -
https://example.com/wp-config.php -
https://example.com/wp-config-sample.php -
https://example.com/wp-cron.php -
https://example.com/wp-links-opml.php -
https://example.com/wp-load.php -
https://example.com/wp-login.php -
https://example.com/wp-postal service.php -
https://case.com/wp-settings.php -
https://example.com/wp-signup.php -
https://example.com/wp-trackback.php -
https://example.com/xmlrpc.php
Postal service Taxonomies
This i is tough. Information technology's incommunicable for my list to be exhaustive of Taxonomies, since WordPress admins tin can create their own.
-
https://example.com/author/.* -
https://case.com/category/.* -
https://example.com/tag/.*
WordPress post archives frequently become paginated, leading to lots of URLs like this. Should yous include these URLs in your Screaming Frog crawl, or exclude them?
-
https://example.com/page/two/.* -
https://example.com/folio/3/.*— etc.
It's a matter of stance and use case. Personally, I don't run across how these URLs are helpful in a typical spider crawl. They're not real URLs, per sé.
And most SEO professionals agree these types of URL should not be indexed past Google and other search engines. So if they're not indexed, and therefore can't drive organic search traffic, practise they matter for your SEO sitemap or crawl?
Possibly. Depends on why you lot're making it. Again, your mileage may vary. Exclude them if you want to. Entirely optional.
Server Binaries, etc.
-
https://case.com/bin/.* -
https://example.com/kick/.* -
https://case.com/cdn-cgi/.* -
https://case.com/cgi-bin/.* -
https://case.com/dev/.* -
https://case.com/etc/.* -
https://instance.com/home/.* -
https://example.com/lib/.* -
https://example.com/media/.* -
https://instance.com/mnt/.* -
https://example.com/opt/.* -
https://example.com/run/.* -
https://example.com/sbin/.* -
https://example.com/srv/.* -
https://case.com/tmp/.* -
https://case.com/usr/.* -
https://example.com/var/.*
No idea what these are? StackExchange has a not bad explanation of each. But suffice it to say: you should probably exclude them from your WordPress Screaming Frog spider crawl.
International & Language Groupings
-
https://example.com/en/.* -
https://instance.com/es/.* -
https://example.com/fr/.*
— and/or —
-
https://en.example.com/.* -
https://es.example.com/.* -
https://fr.example.com/.*
On the other hand, maybe you explicitly want these directories/subdomains. Manifestly y'all'll have to change these lists for your needs.
Subdomains
If your WordPress website contains other info or installations on a subdomain, you may want to exclude those.
Mutual examples include blog, forms, funnels, and shopping mini-sites.
-
https://blog.example.com/.* -
https://forum.example.com/.* -
https://info.instance.com/.* -
https://shop.instance.com/.* -
https://store.example.com/.*
In some cases, you may explicitly want these subdomains in your clamber. Just peculiarly if they're not-indexed or canonicalized, you may want to exclude them from your Screaming Frog clamber.
Tertiary-party Tools
These tools oft require subdomains due to their technical setup. I'm thinking of HubSpot, Clickfunnels, Unbounce, etc.
In case any of these tools apply to y'all, hither's a list of likely subdomains you may exist using with these tools:
-
https://clickfunnels.example.com/.* -
https://eloqua.example.com/.* -
https://hubspot.instance.com/.* -
https://instapage.example.com/.* -
https://kajabi.example.com/.* -
https://leadpages.example.com/.* -
https://marketo.example.com/.* -
https://unbounce.example.com/.*
Full list of WordPress URL exclusions
If you've decided which of the above URLs or URL types y'all need to exclude, you tin can catch this total list and alter it for your needs.
Yous probably could paste this listing into your Screaming Frog exclude options box, just it might take some unintended consequences. Be careful not to over-exclude!
And obviously you'll have to supersede example.com with your website's domain.
Here'south the full listing (besides available on Github) —
| /** | |
| * WordPress URL Exclude List for Screaming Frog Spider | |
| * @writer TJ Kelly – https://tjkelly.com | |
| * @desc Full article — https://tjkelly.com/blog/screaming-frog-exclude-wordpress/ | |
| * @engagement 2021-07-08 | |
| */ | |
| https://example.com/wp-content/.* | |
| https://instance.com/wp-content/mu-plugins/.* | |
| https://example.com/wp-content/plugins/.* | |
| https://instance.com/wp-content/themes/.* | |
| https://example.com/wp-content/upgrade/.* | |
| https://example.com/wp-content/uploads/.* | |
| https://example.com/wp-includes/.* | |
| https://example.com/wp-admin/.* | |
| https://example.com/index.php | |
| https://example.com/license.txt | |
| https://example.com/readme.html | |
| https://example.com/wp-actuate.php | |
| https://example.com/wp-blog-header.php | |
| https://case.com/wp-comments-post.php | |
| https://example.com/wp-config.php | |
| https://example.com/wp-config-sample.php | |
| https://example.com/wp-cron.php | |
| https://example.com/wp-links-opml.php | |
| https://example.com/wp-load.php | |
| https://example.com/wp-login.php | |
| https://instance.com/wp-mail.php | |
| https://example.com/wp-settings.php | |
| https://case.com/wp-signup.php | |
| https://example.com/wp-trackback.php | |
| https://example.com/xmlrpc.php | |
| https://example.com/author/.* | |
| https://case.com/category/.* | |
| https://example.com/tag/.* | |
| https://example.com/page/2/.* | |
| https://example.com/page/3/.* — etc. | |
| https://example.com/bin/.* | |
| https://example.com/boot/.* | |
| https://example.com/cdn-cgi/.* | |
| https://example.com/cgi-bin/.* | |
| https://example.com/dev/.* | |
| https://example.com/etc/.* | |
| https://example.com/home/.* | |
| https://instance.com/lib/.* | |
| https://case.com/media/.* | |
| https://example.com/mnt/.* | |
| https://case.com/opt/.* | |
| https://example.com/run/.* | |
| https://instance.com/sbin/.* | |
| https://example.com/srv/.* | |
| https://example.com/tmp/.* | |
| https://example.com/usr/.* | |
| https://example.com/var/.* | |
| https://case.com/en/.* | |
| https://example.com/es/.* | |
| https://case.com/fr/.* | |
| https://en.example.com/.* | |
| https://es.example.com/.* | |
| https://fr.instance.com/.* | |
| https://blog.example.com/.* | |
| https://forum.example.com/.* | |
| https://info.example.com/.* | |
| https://shop.instance.com/.* | |
| https://store.case.com/.* | |
| https://clickfunnels.example.com/.* | |
| https://eloqua.instance.com/.* | |
| https://hubspot.instance.com/.* | |
| https://instapage.example.com/.* | |
| https://kajabi.instance.com/.* | |
| https://leadpages.example.com/.* | |
| https://marketo.example.com/.* | |
| https://unbounce.example.com/.* |
Source: https://tjkelly.com/blog/screaming-frog-exclude-wordpress-urls/
0 Response to "Uploading a Screaming Frog Sitemap to My Wordpress Website"
Enviar um comentário