JavaScript function to convert UTF8 string between fullwidth and halfwidth forms

Learn javascript function to convert utf8 string between fullwidth and halfwidth forms with practical examples, diagrams, and best practices. Covers javascript, encoding, utf-8 development techniqu...

Converting UTF-8 Strings: Fullwidth to Halfwidth and Vice Versa in JavaScript

Hero image for JavaScript function to convert UTF8 string between fullwidth and halfwidth forms

Learn how to create a robust JavaScript function to convert UTF-8 encoded strings between their fullwidth and halfwidth character forms, essential for internationalization and data normalization.

In many East Asian languages, characters can exist in both fullwidth (全角, zenkaku) and halfwidth (半角, hankaku) forms. While visually similar, these forms have different Unicode code points and can cause issues in data processing, search, and display if not handled consistently. This article provides a comprehensive JavaScript solution to convert strings between these two forms, focusing on common character ranges.

Understanding Fullwidth and Halfwidth Characters

Fullwidth characters typically occupy the same horizontal space as two halfwidth characters (like standard ASCII letters). They are often used in East Asian typography to align with CJK (Chinese, Japanese, Korean) characters, which are inherently fullwidth. Halfwidth characters, on the other hand, are commonly used for Latin letters, numbers, and symbols, occupying less horizontal space.

The conversion process involves mapping specific Unicode ranges. For instance, the fullwidth ASCII range (U+FF01 to U+FF5E) corresponds to the halfwidth ASCII range (U+0021 to U+007E). Similarly, fullwidth Katakana characters have their halfwidth counterparts. The key is to identify these ranges and apply a consistent offset for conversion.

flowchart TD
    A[Input String] --> B{Iterate Characters}
    B --> C{Is Fullwidth ASCII?}
    C -->|Yes| D[Convert to Halfwidth ASCII]
    C -->|No| E{Is Halfwidth Katakana?}
    E -->|Yes| F[Convert to Fullwidth Katakana]
    E -->|No| G{Is Fullwidth Katakana?}
    G -->|Yes| H[Convert to Halfwidth Katakana]
    G -->|No| I[Keep Original Character]
    D --> J[Append to Result]
    F --> J
    H --> J
    I --> J
    J --> B
    B --> K[Output Converted String]

Flowchart of the character conversion logic

Implementing the Conversion Function

Our JavaScript function will take a string and a toFullwidth boolean flag. If toFullwidth is true, it converts halfwidth characters to fullwidth; otherwise, it converts fullwidth to halfwidth. The core logic involves iterating through each character, checking its Unicode code point, and applying an offset if it falls within a convertible range.

We'll handle several key ranges:

  1. ASCII Punctuation and Numbers: Fullwidth (U+FF01) to (U+FF5E) and halfwidth ! (U+0021) to ~ (U+007E).
  2. Space Character: Fullwidth   (U+3000) and halfwidth (U+0020).
  3. Katakana: Fullwidth Katakana (U+30A1 to U+30F6) and halfwidth Katakana (U+FF66 to U+FF9F).

Special attention is needed for the space character, as its fullwidth form (U+3000) does not directly map with the same offset as other ASCII characters.

function convertWidth(str, toFullwidth = false) {
  let result = '';
  for (let i = 0; i < str.length; i++) {
    const charCode = str.charCodeAt(i);
    let convertedChar = str[i];

    if (toFullwidth) {
      // Convert halfwidth to fullwidth
      if (charCode >= 0x0021 && charCode <= 0x007E) { // Halfwidth ASCII ! to ~
        convertedChar = String.fromCharCode(charCode + 0xFF00 - 0x0020); // Offset 0xFEE0
      } else if (charCode === 0x0020) { // Halfwidth space
        convertedChar = String.fromCharCode(0x3000); // Fullwidth space
      } else if (charCode >= 0xFF61 && charCode <= 0xFF9F) { // Halfwidth Katakana
        convertedChar = String.fromCharCode(charCode - 0xFF61 + 0x30A1); // Offset
      }
    } else {
      // Convert fullwidth to halfwidth
      if (charCode >= 0xFF01 && charCode <= 0xFF5E) { // Fullwidth ASCII ! to ~
        convertedChar = String.fromCharCode(charCode - 0xFF00 + 0x0020); // Offset 0xFEE0
      } else if (charCode === 0x3000) { // Fullwidth space
        convertedChar = String.fromCharCode(0x0020); // Halfwidth space
      } else if (charCode >= 0x30A1 && charCode <= 0x30F6) { // Fullwidth Katakana
        convertedChar = String.fromCharCode(charCode - 0x30A1 + 0xFF61); // Offset
      }
    }
    result += convertedChar;
  }
  return result;
}

// Example Usage:
const halfwidthString = "Hello, World! 123 ABCアイウ";
const fullwidthString = "Hello, World! 123 ABCアィウ";

console.log("Original Halfwidth:", halfwidthString);
console.log("Converted to Fullwidth:", convertWidth(halfwidthString, true));

console.log("\nOriginal Fullwidth:", fullwidthString);
console.log("Converted to Halfwidth:", convertWidth(fullwidthString, false));

Considerations and Edge Cases

While the provided function covers many common use cases, it's important to be aware of potential edge cases and limitations:

  • Unicode Normalization: This conversion is distinct from Unicode normalization forms (NFC, NFD, etc.). While related to character representation, normalization deals with combining characters and canonical equivalents, whereas fullwidth/halfwidth conversion is about specific character width variants.
  • Character Completeness: Not all characters have a direct fullwidth or halfwidth equivalent. Characters outside the defined ranges will remain unchanged by this function.
  • Performance: For extremely long strings, iterating character by character might have performance implications. However, for typical web application string lengths, this approach is generally efficient enough.
  • Contextual Conversion: In some advanced scenarios, the conversion might depend on the surrounding text or specific linguistic rules. This function performs a direct, character-by-character mapping, which is suitable for most data normalization tasks.

1. Integrate the Function

Copy the convertWidth function into your JavaScript project. You can place it in a utility file or directly within the script where you need it.

2. Call with toFullwidth = true

To convert a halfwidth string to its fullwidth equivalent, call the function with the second argument set to true: convertWidth('abc', true) will return 'abc'.

3. Call with toFullwidth = false

To convert a fullwidth string to its halfwidth equivalent, call the function with the second argument set to false (or omit it, as it defaults to false): convertWidth('ABC') will return 'ABC'.

4. Test Thoroughly

Always test your implementation with various strings, including those with mixed character types, special symbols, and characters that do not have fullwidth/halfwidth equivalents, to ensure it behaves as expected for your specific use case.