HTML Charsets
An HTML Charsets page needs to be recognized by a web browser in order for it to show effectively.
The HTML charset Attribute
The <meta> tag specifies the character set:
Example
Web developers are encouraged to use the UTF-8 character set by the HTML5 specification.
Nearly every character and symbol in the world is covered by UTF-8!
The ASCII Character Set
The initial character encoding standard for the internet was ASCII. It provided definitions for 128 distinct characters that may be utilized online:
- English letters (A-Z)
- Numbers (0-9)
- Special characters like ! $ + – ( ) @ < >.
The ANSI Character Set
ANSI (Windows-1252) was the original Windows character set:
- Identical to ASCII for the first 127 characters
- Special characters from 128 to 159
- Identical to UTF-8 from 160 to 255
The ISO-8859-1 Character Set
ISO-8859-1 was the default character set for HTML 4. This character set supported 256 different character codes. HTML 4 also supported UTF-8.
- Identical to ASCII for the first 127 characters
- Does not use the characters from 128 to 159
- Identical to ANSI and UTF-8 from 160 to 255
HTML 4 Example
HTML 5 Example
The UTF-8 Character Set
- is identical to ASCII for the values from 0 to 127
- Does not use the characters from 128 to 159
- Identical to ANSI and 8859-1 from 160 to 255
- Continues from the value 256 to 10 000 characters
Differences Between Character Sets
The following table displays the differences between the character sets described above:
Numb | ASCII | ANSI | 8859 | UTF‑8 | Description |
---|---|---|---|---|---|
32 | space | ||||
33 | ! | ! | ! | ! | exclamation mark |
34 | " | " | " | " | quotation mark |
35 | # | # | # | # | number sign |
36 | $ | $ | $ | $ | dollar sign |
37 | % | % | % | % | percent sign |
38 | & | & | & | & | ampersand |
39 | ' | ' | ' | ' | apostrophe |
40 | ( | ( | ( | ( | left parenthesis |
41 | ) | ) | ) | ) | right parenthesis |
42 | * | * | * | * | asterisk |
43 | + | + | + | + | plus sign |
44 | , | , | , | , | comma |
45 | - | - | - | - | hyphen-minus |
46 | . | . | . | . | full stop |
47 | / | / | / | / | solidus |
48 | 0 | 0 | 0 | 0 | digit zero |
49 | 1 | 1 | 1 | 1 | digit one |
50 | 2 | 2 | 2 | 2 | digit two |
51 | 3 | 3 | 3 | 3 | digit three |
52 | 4 | 4 | 4 | 4 | digit four |
53 | 5 | 5 | 5 | 5 | digit five |
54 | 6 | 6 | 6 | 6 | digit six |
55 | 7 | 7 | 7 | 7 | digit seven |
56 | 8 | 8 | 8 | 8 | digit eight |
57 | 9 | 9 | 9 | 9 | digit nine |
58 | : | : | : | : | colon |
59 | ; | ; | ; | ; | semicolon |
60 | < | < | < | < | less than |
61 | = | = | = | = | equals sign |
62 | > | > | > | > | greater than |
63 | ? | ? | ? | ? | question mark |
64 | @ | @ | @ | @ | commercial at |
65 | A | A | A | A | Latin A |
66 | B | B | B | B | Latin B |
67 | C | C | C | C | Latin C |
68 | D | D | D | D | Latin D |
69 | E | E | E | E | Latin E |
70 | F | F | F | F | Latin F |
71 | G | G | G | G | Latin G |
72 | H | H | H | H | Latin H |
73 | I | I | I | I | Latin I |
74 | J | J | J | J | Latin J |
75 | K | K | K | K | Latin K |
76 | L | L | L | L | Latin L |
77 | M | M | M | M | Latin M |
78 | N | N | N | N | Latin N |
79 | O | O | O | O | Latin O |
80 | P | P | P | P | Latin P |
81 | Q | Q | Q | Q | Latin Q |
82 | R | R | R | R | Latin R |
83 | S | S | S | S | Latin S |
84 | T | T | T | T | Latin T |
85 | U | U | U | U | Latin U |
86 | V | V | V | V | Latin V |
87 | W | W | W | W | Latin W |
88 | X | X | X | X | Latin X |
89 | Y | Y | Y | Y | Latin Y |
90 | Z | Z | Z | Z | Latin Z |
91 | [ | [ | [ | [ | left square bracket |
92 | \ | \ | \ | \ | reverse solidus |
93 | ] | ] | ] | ] | right square bracket |
94 | ^ | ^ | ^ | ^ | circumflex accent |
95 | _ | _ | _ | _ | low line |
96 | ` | ` | ` | ` | grave accent |
97 | a | a | a | a | Latin small a |
98 | b | b | b | b | Latin small b |
99 | c | c | c | c | Latin small c |
100 | d | d | d | d | Latin small d |
101 | e | e | e | e | Latin small e |
102 | f | f | f | f | Latin small f |
103 | g | g | g | g | Latin small g |
104 | h | h | h | h | Latin small h |
105 | i | i | i | i | Latin small i |
106 | j | j | j | j | Latin small j |
107 | k | k | k | k | Latin small k |
108 | l | l | l | l | Latin small l |
109 | m | m | m | m | Latin small m |
110 | n | n | n | n | Latin small n |
111 | o | o | o | o | Latin small o |
112 | p | p | p | p | Latin small p |
113 | q | q | q | q | Latin small q |
114 | r | r | r | r | Latin small r |
115 | s | s | s | s | Latin small s |
116 | t | t | t | t | Latin small t |
117 | u | u | u | u | Latin small u |
118 | v | v | v | v | Latin small v |
119 | w | w | w | w | Latin small w |
120 | x | x | x | x | Latin small x |
121 | y | y | y | y | Latin small y |
122 | z | z | z | z | Latin small z |
123 | { | { | { | { | left curly bracket |
124 | | | | | | | | | vertical line |
125 | } | } | } | } | right curly bracket |
126 | ~ | ~ | ~ | ~ | tilde |
127 | DEL | ||||
128 | € | euro sign | |||
129 | | | | NOT USED | |
130 | ‚ | single low-9 quotation mark | |||
131 | ƒ | Latin small f with hook | |||
132 | „ | double low-9 quotation mark | |||
133 | … | horizontal ellipsis | |||
134 | † | dagger | |||
135 | ‡ | double dagger | |||
136 | ˆ | modifier letter circumflex accent | |||
137 | ‰ | per mille sign | |||
138 | Š | Latin S with caron | |||
139 | ‹ | single left-pointing angle quotation mark | |||
140 | Œ | Latin capital ligature OE | |||
141 | | | | NOT USED | |
142 | Ž | Latin Z with caron | |||
143 | | | | NOT USED | |
144 | | | | NOT USED | |
145 | ‘ | left single quotation mark | |||
146 | ’ | right single quotation mark | |||
147 | “ | left double quotation mark | |||
148 | ” | right double quotation mark | |||
149 | • | bullet | |||
150 | – | en dash | |||
151 | — | em dash | |||
152 | ˜ | small tilde | |||
153 | ™ | trade mark sign | |||
154 | š | Latin small s with caron | |||
155 | › | single right-pointing angle quotation mark | |||
156 | œ | Latin small ligature oe | |||
157 | | | | NOT USED | |
158 | ž | Latin small z with caron | |||
159 | Ÿ | Latin Y with diaeresis | |||
160 | no-break space | ||||
161 | ¡ | ¡ | ¡ | inverted exclamation mark | |
162 | ¢ | ¢ | ¢ | cent sign | |
163 | £ | £ | £ | pound sign | |
164 | ¤ | ¤ | ¤ | currency sign | |
165 | ¥ | ¥ | ¥ | yen sign | |
166 | ¦ | ¦ | ¦ | broken bar | |
167 | § | § | § | section sign | |
168 | ¨ | ¨ | ¨ | diaeresis | |
169 | © | © | © | copyright sign | |
170 | ª | ª | ª | feminine ordinal indicator | |
171 | « | « | « | left-pointing double angle quotation mark | |
172 | ¬ | ¬ | ¬ | not sign | |
173 | | | | soft hyphen | |
174 | ® | ® | ® | registered sign | |
175 | ¯ | ¯ | ¯ | macron | |
176 | ° | ° | ° | degree sign | |
177 | ± | ± | ± | plus-minus sign | |
178 | ² | ² | ² | superscript two | |
179 | ³ | ³ | ³ | superscript three | |
180 | ´ | ´ | ´ | acute accent | |
181 | µ | µ | µ | micro sign | |
182 | ¶ | ¶ | ¶ | pilcrow sign | |
183 | · | · | · | middle dot | |
184 | ¸ | ¸ | ¸ | cedilla | |
185 | ¹ | ¹ | ¹ | superscript one | |
186 | º | º | º | masculine ordinal indicator | |
187 | » | » | » | right-pointing double angle quotation mark | |
188 | ¼ | ¼ | ¼ | vulgar fraction one quarter | |
189 | ½ | ½ | ½ | vulgar fraction one half | |
190 | ¾ | ¾ | ¾ | vulgar fraction three quarters | |
191 | ¿ | ¿ | ¿ | inverted question mark | |
192 | À | À | À | Latin A with grave | |
193 | Á | Á | Á | Latin A with acute | |
194 | Â | Â | Â | Latin A with circumflex | |
195 | Ã | Ã | Ã | Latin A with tilde | |
196 | Ä | Ä | Ä | Latin A with diaeresis | |
197 | Å | Å | Å | Latin A with ring above | |
198 | Æ | Æ | Æ | Latin AE | |
199 | Ç | Ç | Ç | Latin C with cedilla | |
200 | È | È | È | Latin E with grave | |
201 | É | É | É | Latin E with acute | |
202 | Ê | Ê | Ê | Latin E with circumflex | |
203 | Ë | Ë | Ë | Latin E with diaeresis | |
204 | Ì | Ì | Ì | Latin I with grave | |
205 | Í | Í | Í | Latin I with acute | |
206 | Î | Î | Î | Latin I with circumflex | |
207 | Ï | Ï | Ï | Latin I with diaeresis | |
208 | Ð | Ð | Ð | Latin Eth | |
209 | Ñ | Ñ | Ñ | Latin N with tilde | |
210 | Ò | Ò | Ò | Latin O with grave | |
211 | Ó | Ó | Ó | Latin O with acute | |
212 | Ô | Ô | Ô | Latin O with circumflex | |
213 | Õ | Õ | Õ | Latin O with tilde | |
214 | Ö | Ö | Ö | Latin O with diaeresis | |
215 | × | × | × | multiplication sign | |
216 | Ø | Ø | Ø | Latin O with stroke | |
217 | Ù | Ù | Ù | Latin U with grave | |
218 | Ú | Ú | Ú | Latin U with acute | |
219 | Û | Û | Û | Latin U with circumflex | |
220 | Ü | Ü | Ü | Latin U with diaeresis | |
221 | Ý | Ý | Ý | Latin Y with acute | |
222 | Þ | Þ | Þ | Latin Thorn | |
223 | ß | ß | ß | Latin small sharp s | |
224 | à | à | à | Latin small a with grave | |
225 | á | á | á | Latin small a with acute | |
226 | â | â | â | Latin small a with circumflex | |
227 | ã | ã | ã | Latin small a with tilde | |
228 | ä | ä | ä | Latin small a with diaeresis | |
229 | å | å | å | Latin small a with ring above | |
230 | æ | æ | æ | Latin small ae | |
231 | ç | ç | ç | Latin small c with cedilla | |
232 | è | è | è | Latin small e with grave | |
233 | é | é | é | Latin small e with acute | |
234 | ê | ê | ê | Latin small e with circumflex | |
235 | ë | ë | ë | Latin small e with diaeresis | |
236 | ì | ì | ì | Latin small i with grave | |
237 | í | í | í | Latin small i with acute | |
238 | î | î | î | Latin small i with circumflex | |
239 | ï | ï | ï | Latin small i with diaeresis | |
240 | ð | ð | ð | Latin small eth | |
241 | ñ | ñ | ñ | Latin small n with tilde | |
242 | ò | ò | ò | Latin small o with grave | |
243 | ó | ó | ó | Latin small o with acute | |
244 | ô | ô | ô | Latin small o with circumflex | |
245 | õ | õ | õ | Latin small o with tilde | |
246 | ö | ö | ö | Latin small o with diaeresis | |
247 | ÷ | ÷ | ÷ | division sign | |
248 | ø | ø | ø | Latin small o with stroke | |
249 | ù | ù | ù | Latin small u with grave | |
250 | ú | ú | ú | Latin small u with acute | |
251 | û | û | û | Latin small with circumflex | |
252 | ü | ü | ü | Latin small u with diaeresis | |
253 | ý | ý | ý | Latin small y with acute | |
254 | þ | þ | þ | Latin small thorn | |
255 | ÿ | ÿ | ÿ | Latin small y with diaeresis |
HTML Charsets
Character Encoding
ASCII
ANSI
ISO-8859-1
UTF-8
Character Set Comparison
HTML
HTML5
HTML tutorials
Learn HTML
Free HTML tutorials
HTML Example
HTML Explained
HTML Character Sets: Ensuring Proper Encoding for Your Web Content
When it comes to creating web content, understanding HTML character sets is crucial. Character encoding plays a vital role in ensuring that your text displays correctly across different platforms and languages. In this informative section, we’ll explore the various character sets used in HTML and their implications.
Character Encoding Basics
At the core of character encoding lies the concept of mapping characters to numerical values. The most widely recognized character encoding is ASCII, which covers the basic English alphabet, numbers, and punctuation. However, as the web has become more global, the need for more comprehensive character sets has emerged.
Common HTML Character Sets
– ANSI: Also known as ISO-8859-1, this character set supports the Latin alphabet and some additional characters used in Western European languages.
– UTF-8: The most widely used character encoding on the web, UTF-8 can represent a vast array of characters from multiple language scripts, including Chinese, Japanese, and Arabic.
– ISO-8859-1: A character encoding that supports the Latin alphabet and some additional characters used in Western European languages.
Comparing Character Set Capabilities
When choosing a character set for your HTML content, it’s essential to consider the language and script requirements of your target audience. UTF-8 is generally the recommended choice as it provides the broadest support for international characters, ensuring your content is accessible to a global audience.
When it comes to creating web content, understanding HTML charsets and character encoding is crucial. Charsets, or character sets, define the range of characters that can be represented in a document. Choosing the right charset ensures your content is displayed correctly across different platforms and languages.
The most common charsets used in HTML include ASCII, ANSI, ISO-8859-1, and UTF-8. ASCII is a basic character set that includes only English letters, numbers, and a limited set of symbols. ANSI and ISO-8859-1 expand on ASCII to include additional characters for European languages.
However, the industry standard today is UTF-8, a universal character set that can represent characters from virtually any language. UTF-8 is compatible with ASCII, making it a versatile choice for multilingual web content.
When selecting a charset for your HTML documents, consider the languages and scripts you need to support. UTF-8 is generally the best option, as it provides comprehensive character coverage while maintaining compatibility with legacy systems. By mastering HTML charsets, you can ensure your web content is accessible and displayed correctly for all your users.