What characters need to be encoded and why?
How are characters URL encoded?
URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
ASCII Control characters | ||
Why: | These characters are not printable. | |
Characters: | Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.) |
Non-ASCII characters | ||
Why: | These are by definition not legal in URLs since they are not in the ASCII set. | |
Characters: | Includes the entire "top half" of the ISO-Latin set 80-FF hex (128-255 decimal.) |
"Reserved characters" | |||||||
Why: | URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded. | ||||||
Characters: |
|
"Unsafe characters" | |||||||||||||||||||||||||
Why: | Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded. | ||||||||||||||||||||||||
Characters: |
|
How are characters URL encoded?
URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
- Example
- Space = decimal code point 32 in the ISO-Latin set.
- 32 decimal = 20 in hexadecimal
- The URL encoded representation will be "%20"
Reference: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm