Friday, May 16, 2014

why url encoding

What characters need to be encoded and why?

ASCII Control characters
     Why: These characters are not printable.
Characters: Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.)
Non-ASCII characters
     Why: These are by definition not legal in URLs since they are not in the ASCII set.
Characters: Includes the entire "top half" of the ISO-Latin set 80-FF hex (128-255 decimal.)
"Reserved characters"
     Why: URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
Characters:
CharacterCode
Points
(Hex)
Code
Points
(Dec)
 Dollar ("$")
 Ampersand ("&")
 Plus ("+")
 Comma (",")
 Forward slash/Virgule ("/")
 Colon (":")
 Semi-colon (";")
 Equals ("=")
 Question mark ("?")
 'At' symbol ("@")
24
26
2B
2C
2F
3A
3B
3D
3F
40
36
38
43
44
47
58
59
61
63
64
"Unsafe characters"
     Why: Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.
Characters:
CharacterCode
Points
(Hex)
Code
Points
(Dec)
Why encode?
Space2032 Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Quotation marks
'Less Than' symbol ("<")
'Greater Than' symbol (">")
22
3C
3E
34
60
62
These characters are often used to delimit URLs in plain text.
'Pound' character ("#") 2335 This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
Percent character ("%") 2537 This is used to URL encode/escape other characters, so it should itself also be encoded.
Misc. characters:
   Left Curly Brace ("{")
   Right Curly Brace ("}")
   Vertical Bar/Pipe ("|")
   Backslash ("\")
   Caret ("^")
   Tilde ("~")
   Left Square Bracket ("[")
   Right Square Bracket ("]")
   Grave Accent ("`")

7B
7D
7C
5C
5E
7E
5B
5D
60

123
125
124
92
94
126
91
93
96
Some systems can possibly modify these characters.


How are characters URL encoded?
URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
Example
  • Space = decimal code point 32 in the ISO-Latin set.
  • 32 decimal = 20 in hexadecimal
  • The URL encoded representation will be "%20" 
 
Reference: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm

No comments:

Post a Comment

Followers