Konverter

Text ↔ Unicode

The Text to Unicode Converter reveals the Unicode code points, character names, and byte encodings (UTF-8, UTF-16) for every character in your input text. It is an essential debugging tool for developers dealing with encoding issues, invisible characters, emoji, bidirectional text, or non-ASCII symbols. It also converts Unicode escape sequences (\u0041, U+0041) back to the actual characters.

What is Unicode?

Unicode is a universal character encoding standard that assigns a unique number (code point) to every character in every writing system: Latin, Cyrillic, Arabic, Chinese, Japanese, Korean, emoji, mathematical symbols, and more. The Unicode standard covers over 140,000 characters across 154 scripts. Code points are written as U+XXXX in hexadecimal (e.g. U+0041 for 'A', U+1F600 for the grinning face emoji). Unicode itself defines only the code points; actual byte representations are determined by encoding schemes such as UTF-8, UTF-16, and UTF-32. UTF-8 is the dominant encoding on the web, used by over 98% of websites.

How does the tool work?

The tool iterates through the input string using Unicode-aware character iteration (handling surrogate pairs in JavaScript correctly for emoji and supplementary characters). For each character it displays the Unicode code point in U+XXXX format, the official Unicode character name (e.g. LATIN SMALL LETTER A, SNOWMAN), the UTF-8 byte sequence in hex (e.g. E2 98 83 for ☃), the UTF-16 code unit(s), and the decimal code point value. In decode mode, the tool parses \u{XXXX}, \uXXXX, U+XXXX, and &#xXXXX; escape sequences and converts them back to the corresponding characters.

Typical Use Cases

Debugging encoding issues where a character displays incorrectly or as a replacement character (�)
Identifying invisible Unicode characters (zero-width joiners, right-to-left marks) that cause layout issues
Looking up the official Unicode name and code point of an emoji or symbol
Generating Unicode escape sequences (\u00e9) for use in source code or JSON strings

Step-by-step Guide

Step 1: Type or paste text into the input field to inspect its Unicode properties.
Step 2: The tool displays code point, character name, UTF-8 bytes, and UTF-16 for each character.
Step 3: Switch to decode mode to convert Unicode escape sequences back to characters.
Step 4: Copy individual code points or the full escape sequence list.

Example

Input

A☃

Output

U+0041 LATIN CAPITAL LETTER A | U+2603 SNOWMAN (UTF-8: E2 98 83)

Tips & Notes

Use this tool to find invisible characters (U+200B zero-width space, U+FEFF BOM) that cause mysterious string comparison failures.
Emoji in the supplementary planes (U+1F000 and above) require surrogate pairs in UTF-16 and 4 bytes in UTF-8; check this tool to confirm.
When debugging mojibake (garbled text), the UTF-8 byte column helps identify whether the file was decoded with the wrong encoding.

Frequently Asked Questions

What is the difference between Unicode and UTF-8?

Unicode is the character set – it assigns a number (code point) to each character. UTF-8 is an encoding – it defines how those code points are stored as bytes. UTF-8 is variable-length: ASCII characters use 1 byte, and higher code points use 2–4 bytes.

What is a BOM (Byte Order Mark)?

A BOM is the character U+FEFF (ZERO WIDTH NO-BREAK SPACE) placed at the start of a text file to indicate byte order and encoding. In UTF-8, the BOM is the byte sequence EF BB BF and is generally unnecessary; its presence can cause issues in some parsers.

Why does an emoji show as two characters in some programming languages?

Most emoji have code points above U+FFFF. In UTF-16-based languages like JavaScript and Java, these are represented as surrogate pairs – two 16-bit code units that together encode the single code point. The string .length property in JavaScript counts code units, not code points.

Text ↔ Unicode

Convert text to Unicode code points and back. Supports U+XXXX notation, JavaScript escapes, HTML entities, and more.

Open Tool

Similar Tools

Text ↔ ASCII Binary

HTML Entities

URL Encoder / Decoder

Case Converter

String Obfuscator

All Glossary Entries