The URL structure can be a deciding factor for the success of a SEO campaign on a website. Having poorly formed URLs might result in issues with crawling, duplicate content and user experience.
This hints section helps to identify common issues with the URLs of a website.
Example: Audisto URL Check with the URL hint reports for the current crawl
Here is the list of all specific hints related to URLs, that can be identified with the help of the Audisto Crawler.
If the URL contains escaped chars (non-ascii) in the path, it is flagged with this hint. Escaped chars are detected by looking for a % char in the path. Find all URLs with non-ascii characters in the path by using this report.
|Displayed URL||Properly Escaped URL|
If non-ascii chars are used within the path of a URL, they will be URL-encoded/escaped by web clients. RFC 3986 states, that non-ascii characters must first be encoded according to UTF-8 and then percent-encoded. This may lead to unexpected issues if a web client does not follow this encoding procedure correctly.
This often happens in systems that don't use UTF-8 by default.
We suggest to evaluate if URLs were escaped correctly. We also suggest not to use non-ascii characters in URLs if possible.
If the URL has an ampersand (&) in the path, this hint in triggered.
An ampersand is a reserved character that is distinguishable from other data within a URL . It can be used as a delimiter. If it is not used as a delimiter, it needs to be URL-encoded as "&".
Since ampersands are usually used in the query part of a URL , an ampersand in the path may indicate problems in building a correct URL.
Evaluate all occurrences of ampersand in the path. If the ampersand is not intended to be a delimiter, use proper URL-encoding.
If the URL contains two consecutive slashes, it is flagged with this hint.
Two consecutive slashes in a row are valid but usually not wanted in a URL. Any occurence might indicate issues with relative linking and/or the URL base. This may lead to issues with duplicate content if the CMS delivers the same content, e.g. for http://example.com//page.html and http://example.com/page.html.
We suggest not to use consecutive slahes. Analyze all occurences of consecutive slashes and fix the reason why they occur.
If the URL contains a query string with 3 or more parameters, it is flagged with this hint.
URLs with more than 3 parameters are considered to be highly dynamic. Highly dynamic URLs often indicate a poor quality for search engines, e.g. URLs for filter combinations in faceted search. Changing the order of the parameters can lead to a very high number of URLs. The content shown will usually be similar or duplicate content.
The result can be crawl budget issues as well as serious issues with duplicate content.
Use this report to find highly dynamic URLs on the crawled website. You might want to reduce the number of parameters within URLs.
Consider reducing the number of URLs with GET-parameters by using the PRG-pattern.
If the URL has a question mark (?) in the query, this hint in triggered.
The query starts with a question mark. Using a second question mark in the URL is valid, but you may refrain from doing so, because poorly implemented clients might handle such data incorrectly.
In addition queries with two question marks often indicate technical problems on building URLs.
We suggest evaluating the reasons for using more than one question mark in a query. Even with valid usage you might consider taking a different technical approach that does not produce two question marks.
If the URL is more than 115 characters long, it is flagged with this hint.
Long URLs are hard to read and often not fully or properly displayed, e.g. snippets in search results, posts in bulletin boards or social media websites. If a URL is too long, it might get shortened and not be fully displayed.
115 was the maximum count of characters, that Google displayed in their snippets in search results some time ago. As of now, it is not a fixed number of characters any more, but instead a pixel length is used by Google.
If you encounter occurences of this hint, we suggest to utilize shorter URLs so they can be properly displayed.
This may be done by reducing the number of:
If the URL contains chars which are non-ascii or non-lowercase, the URL is flagged with this hint. This hint has been split into two distinct hints as of Audisto version 0.9.92 and is therefore not triggered any further. See the other hints for more advanced insights.
If the URL contains non lowercase characters, it is flagged with this hint. Use this report to identify all occurences of non-lowercase characters in the path of a URL.
URLs that contain non lower case elements, are often a resource for errors. If an application does not expect a non-lower-case URL, it might automatically convert it to all lower case. This might cause issues with duplicate content or accessibility (404 status codes) depending on wether the webserver handles URLs case sensitive or not.
We suggest to stick to lower case characters in paths.
If the URL contains a query string, it is flagged with this hint. This report identifies all URLs that contain a query string.
Query strings usually contain dynamic name/value pairs, that might affect the content returned. Use cases for parameters are:
We suggest to use this report to get an overview over GET-parameter usage on the crawled website. We also suggest to keep the number of used parameters at a minimum.
If the URL contains repeating elements like /foo/foo/ or /foo/bar/foo/bar/ in path, it triggers this hint. Use this report to identify all occurences of repeating elements in URLs.
This hint does not find the pattern:
Repetitive path segments can be a hint for issues with relative URLs as well as for a poor folder structure.
We suggest to re-evaluate your folder structure based on the URLs shown in this report. Try to avoid repeating folder names in different hierarchic levels if possible.
If the URL contains repeating parameters like ?a=1&a=1 or ?b=1&b=2 in query, it triggers this hint. Use this report to identify all occurences of directly linked URLs with repetitive parameters in the query.
While repetitive parameters do not make a URL invalid, software might handle this kind of URLs in different ways, depending on implementation. Sometimes repetitive parameters are consolidated by name. This might lead to a loss of the information if values differ. If values do not differ, it might be safe to consolidate parameters by name.
Repetitive parameters also create unnecessarily long URLs and might indicate an issue with the software that generates the GET-parameters.
We suggest to check for reasons that might cause repetitive usage of GET-parameters and fix it to avoid unexpected behaviour of software parsing or handling the URL.
If the URL contains repeating parameters in query, and the values are different, like ?a=1&a=2, it triggers this hint. Use this report to identify all occurences of directly linked URLs with repetitive query parameters that have differing values.
While repetitive parameters do not make a URL invalid, software might handle this kind of URLs in different ways, depending on implementation. If repetitive parameters with differing values are consolidated by name, this leads to a loss of the information. If the values differ for repetitive parameters, it might also indicate a problem with the logic of the software that generates the URLs.
Repetitive parameters create unnecessarily long URLs and might indicate issues with the software that generates the GET-parameters.
We suggest to check for reasons that might cause the usage of repetitive parameters with different values and fix the underlying issues.
If a URL longer than 2000 characters is encountered, it is flagged with this hint.
Long URLs are often generated dynamically in scenarios like:
Long URLs might cause problems.
Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.
While theoretically there is no limit on the length of a URL , you should stay below 2000 characters to be accessible by a large number of clients and web applications