Konverter
Text ↔ Unicode
The Text to Unicode Converter reveals the Unicode code points, character names, and byte encodings (UTF-8, UTF-16) for every character in your input text. It is an essential debugging tool for developers dealing with encoding issues, invisible characters, emoji, bidirectional text, or non-ASCII symbols. It also converts Unicode escape sequences (\u0041, U+0041) back to the actual characters.
What is Unicode?
Unicode is a universal character encoding standard that assigns a unique number (code point) to every character in every writing system: Latin, Cyrillic, Arabic, Chinese, Japanese, Korean, emoji, mathematical symbols, and more. The Unicode standard covers over 140,000 characters across 154 scripts. Code points are written as U+XXXX in hexadecimal (e.g. U+0041 for 'A', U+1F600 for the grinning face emoji). Unicode itself defines only the code points; actual byte representations are determined by encoding schemes such as UTF-8, UTF-16, and UTF-32. UTF-8 is the dominant encoding on the web, used by over 98% of websites.
How does the tool work?
The tool iterates through the input string using Unicode-aware character iteration (handling surrogate pairs in JavaScript correctly for emoji and supplementary characters). For each character it displays the Unicode code point in U+XXXX format, the official Unicode character name (e.g. LATIN SMALL LETTER A, SNOWMAN), the UTF-8 byte sequence in hex (e.g. E2 98 83 for ☃), the UTF-16 code unit(s), and the decimal code point value. In decode mode, the tool parses \u{XXXX}, \uXXXX, U+XXXX, and &#xXXXX; escape sequences and converts them back to the corresponding characters.
Typical Use Cases
- Debugging encoding issues where a character displays incorrectly or as a replacement character (�)
- Identifying invisible Unicode characters (zero-width joiners, right-to-left marks) that cause layout issues
- Looking up the official Unicode name and code point of an emoji or symbol
- Generating Unicode escape sequences (\u00e9) for use in source code or JSON strings
Step-by-step Guide
- Step 1: Type or paste text into the input field to inspect its Unicode properties.
- Step 2: The tool displays code point, character name, UTF-8 bytes, and UTF-16 for each character.
- Step 3: Switch to decode mode to convert Unicode escape sequences back to characters.
- Step 4: Copy individual code points or the full escape sequence list.
Example
Input
A☃
Output
U+0041 LATIN CAPITAL LETTER A | U+2603 SNOWMAN (UTF-8: E2 98 83)
Tips & Notes
- Use this tool to find invisible characters (U+200B zero-width space, U+FEFF BOM) that cause mysterious string comparison failures.
- Emoji in the supplementary planes (U+1F000 and above) require surrogate pairs in UTF-16 and 4 bytes in UTF-8; check this tool to confirm.
- When debugging mojibake (garbled text), the UTF-8 byte column helps identify whether the file was decoded with the wrong encoding.
Frequently Asked Questions
What is the difference between Unicode and UTF-8?
Unicode is the character set – it assigns a number (code point) to each character. UTF-8 is an encoding – it defines how those code points are stored as bytes. UTF-8 is variable-length: ASCII characters use 1 byte, and higher code points use 2–4 bytes.
What is a BOM (Byte Order Mark)?
A BOM is the character U+FEFF (ZERO WIDTH NO-BREAK SPACE) placed at the start of a text file to indicate byte order and encoding. In UTF-8, the BOM is the byte sequence EF BB BF and is generally unnecessary; its presence can cause issues in some parsers.
Why does an emoji show as two characters in some programming languages?
Most emoji have code points above U+FFFF. In UTF-16-based languages like JavaScript and Java, these are represented as surrogate pairs – two 16-bit code units that together encode the single code point. The string .length property in JavaScript counts code units, not code points.
Text ↔ Unicode
Convert text to Unicode code points and back. Supports U+XXXX notation, JavaScript escapes, HTML entities, and more.
Open Tool