The URL structure can be a deciding factor for the success of a SEO campaign on a website. Having poorly formed URLs might result in issues with crawling, duplicate content and user experience.
This hints section helps to identify common issues with the URLs of a website.
Example: Audisto URL Check with the URL hint reports for the current crawl
Here is the list of all specific hints related to URLs, that can be identified with the help of the Audisto Crawler.
If the URL contains escaped characters (non-ascii) in the path, it is flagged with this hint. Escaped characters are detected by looking for a % character in the path. Find all URLs with non-ascii characters in the path by using this report.
|Displayed URL||Properly Escaped URL|
If non-ascii chars are used within the path of a URL, they will be URL-encoded/escaped by web clients. RFC 3986 states that non-ascii characters must first be encoded according to UTF-8 and then percent-encoded. Unexpected issues may result if a web client does not follow this encoding procedure correctly.
This often happens in systems that don't use UTF-8 by default.
We suggest that you evaluate if URLs were escaped correctly. We also suggest not to use non-ascii characters in URLs if possible.
If the URL has an ampersand (&) in the path, this hint in triggered.
An ampersand is a reserved character that is distinguishable from other data within a URL. It can be used as a delimiter. If it is not used as a delimiter, it needs to be URL-encoded as "&".
Since ampersands are usually used in the query part of a URL, an ampersand in the path may lead to problems when building a correct URL.
Evaluate all occurrences of ampersand in the path. If the ampersand is not intended to be a delimiter, use proper URL-encoding.
If the URL contains two consecutive slashes, it is flagged with this hint.
Two consecutive slashes in a row are valid but usually not wanted in a URL. Any occurence might indicate issues with relative linking and/or the URL base. This may lead to issues with duplicate content if the CMS delivers the same content, e.g. for http://example.com//page.html and http://example.com/page.html.
We suggest that you not use consecutive slahes. Analyze all occurences of consecutive slashes and fix the reason why they occur.
If the URL contains a query string with 3 or more parameters, it is flagged with this hint.
URLs with more than 3 parameters are considered to be highly dynamic. Highly dynamic URLs often indicate poor quality for search engines, for example, URLs for filter combinations in faceted search. Changing the order of the parameters can lead to a very high number of URLs. The content shown will usually be similar or duplicate content.
The result can be crawl budget issues as well as serious issues with duplicate content.
Use this report to find highly dynamic URLs on the crawled website. You might want to reduce the number of parameters within URLs.
Consider reducing the number of URLs with GET parameters by using the Post/Redirect/Get (PRG) pattern.
If the URL has a question mark (?) in the query, this hint in triggered.
The query starts with a question mark. Using a second question mark in the URL is valid, but you should refrain from doing so, because poorly implemented clients might handle such data incorrectly.
In addition, queries with two question marks often lead to technical problems when building URLs.
We suggest evaluating the reasons for using more than one question mark in a query. Even with valid usage you might consider taking a different technical approach that does not produce two question marks.
If the URL is more than 115 characters long, it is flagged with this hint.
Long URLs are hard to read and often not fully or properly displayed, for example, as part of search result snippets, posts on bulletin boards or on social media websites. If a URL is too long, it may get shortened and not be fully displayed.
115 was the maximum number of characters that Google displayed in their snippets in search results some time ago. Today, it is no longer a fixed number of characters, but instead, Google now uses a pixel length.
If you encounter occurences of this hint, we suggest that you use shorter URLs so that they display properly.
This may be done by reducing the number of:
Or else, this may be done by using IDs instead of speaking URLs which also reduces the number of characters.
If the URL contains characters which are non-ascii or non-lowercase, the URL is flagged with this hint. This hint has been split into two distinct hints as of Audisto Crawler version 0.9.92 and is therefore not triggered any further. See the other hints for more advanced insights.
If the URL contains non-lowercase characters, it is flagged with this hint. Use this report to identify all occurences of non-lowercase characters in the path of a URL.
URLs that contain non-lowercase elements are often a source or errors. If an application does not expect a non-lowercase URL, it might automatically convert it to all lowercase. This might cause issues with duplicate content or accessibility (404 status codes) depending on wether the webserver handles URLs that are case sensitive or not.
We suggest that you stick to lowercase characters in paths.
If the URL contains a query string, it is flagged with this hint. This report identifies all URLs that contain a query string.
Query strings usually contain dynamic name/value pairs that might affect the content returned. Use cases for parameters are:
We suggest that you use this report to get an overview of GET parameter usage on the crawled website. We also suggest that you keep the number of parameters that you use to a minimum.
If the URL contains repeating elements like /foo/foo/ or /foo/bar/foo/bar/ in the path, it triggers this hint. Use this report to identify all occurrences of repeating elements in URLs.
This hint does not find the pattern:
Repetitive path segments can indicate issues with relative URLs as well as a poor folder structure.
We suggest that you re-evaluate your folder structure based on the URLs shown in this report. Try to avoid repeating folder names in different hierarchic levels if possible.
If the URL contains repeating parameters like ?a=1&a=1 or ?b=1&b=2 in the query, it triggers this hint. Use this report to identify all occurrences of directly linked URLs with repetitive parameters in the query.
While repetitive parameters do not make a URL invalid, software might handle these kinds of URLs in different ways, depending on the implementation. Sometimes repetitive parameters are consolidated by name. This might lead to a loss of information if values differ. If values do not differ, it might be safe to consolidate parameters by name.
Repetitive parameters also create unnecessarily long URLs and might indicate an issue with the software that generates the GET parameters.
We suggest that you check for reasons that might cause repetitive usage of GET parameters and fix them to avoid unexpected behavior of software parsing or handling the URL.
If the URL contains repeating parameters in the query, and the values are different, like ?a=1&a=2, it triggers this hint. Use this report to identify all occurrences of directly linked URLs with repetitive query parameters that have differing values.
While repetitive parameters do not make a URL invalid, software might handle these kinds of URLs in different ways, depending on the implementation. If repetitive parameters with differing values are consolidated by name, this leads to a loss of information. If the values differ for repetitive parameters, it might also indicate a problem with the logic of the software that generates the URLs.
Repetitive parameters create unnecessarily long URLs and might indicate issues with the software that generates the GET parameters.
We suggest that you to check for reasons that might cause the usage of repetitive parameters with different values and fix the underlying issues.
If a URL longer than 2000 characters is encountered, it is flagged with this hint.
Long URLs are often generated dynamically in scenarios like:
Long URLs might cause problems.
Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.
While theoretically there is no limit on the length of a URL, you should stay below 2000 characters to be accessible by a large number of clients and web applications.