Remove non-ascii character in string
Categories:
Mastering Non-ASCII Character Removal in JavaScript Strings

Learn various robust methods to effectively remove or replace non-ASCII characters from strings in JavaScript, ensuring data consistency and compatibility.
Working with strings in JavaScript often involves handling diverse character sets. While ASCII characters (0-127) are universally recognized, non-ASCII characters (those outside this range, including accented letters, symbols, and characters from other languages) can sometimes cause issues with data processing, display, or storage, especially when interacting with legacy systems or specific APIs. This article explores several effective techniques to identify and remove or replace these non-ASCII characters, providing practical examples and best practices.
Understanding ASCII vs. Non-ASCII Characters
Before diving into removal methods, it's crucial to understand what constitutes an ASCII character. ASCII (American Standard Code for Information Interchange) defines 128 characters, including numbers, English letters (uppercase and lowercase), and some control characters and punctuation marks. Any character with a character code greater than 127 is considered non-ASCII. These include characters like é
, ñ
, ü
, €
, ™
, and characters from Cyrillic, Arabic, Chinese, and other scripts.
flowchart TD A[Input String] --> B{Character Code?} B -- '> 127' --> C[Non-ASCII] B -- '<= 127' --> D[ASCII] C --> E[Remove/Replace] D --> F[Keep] E --> G[Output String] F --> G
Flowchart illustrating the decision process for handling ASCII vs. Non-ASCII characters.
Method 1: Using Regular Expressions with replace()
The most common and often most efficient way to remove non-ASCII characters in JavaScript is by using regular expressions with the String.prototype.replace()
method. The key is to define a regex pattern that matches characters outside the ASCII range. The [\x00-\x7F]
character class matches all ASCII characters. To match non-ASCII characters, we can use a negated character class [^\x00-\x7F]
.
function removeNonAscii(str) {
return str.replace(/[^\x00-\x7F]/g, '');
}
const originalString = "Hello, world! This is a test with non-ASCII characters: éàçüöñ€™";
const cleanedString = removeNonAscii(originalString);
console.log(cleanedString); // Output: "Hello, world! This is a test with non-ASCII characters: "
JavaScript function to remove all non-ASCII characters using a regular expression.
g
flag in the regular expression /[^\x00-\x7F]/g
is crucial. It ensures that all occurrences of non-ASCII characters are replaced, not just the first one.Method 2: Iterating and Checking charCodeAt()
Another approach, particularly useful if you need more granular control or want to understand the underlying character codes, is to iterate through the string character by character and check each character's charCodeAt()
value. If the code falls outside the ASCII range (0-127), you can choose to exclude it or replace it.
function removeNonAsciiIterative(str) {
let cleanedStr = '';
for (let i = 0; i < str.length; i++) {
const charCode = str.charCodeAt(i);
if (charCode >= 0 && charCode <= 127) {
cleanedStr += str[i];
}
}
return cleanedStr;
}
const originalString = "Grüße aus München! (Greetings from Munich!)";
const cleanedString = removeNonAsciiIterative(originalString);
console.log(cleanedString); // Output: "Gree aus Mnchen! (Greetings from Munich!)"
Iterative method to remove non-ASCII characters by checking charCodeAt()
.
Method 3: Replacing with a Placeholder or Transliteration
Instead of simply removing non-ASCII characters, you might want to replace them with a suitable ASCII equivalent or a placeholder. For instance, replacing é
with e
or €
with EUR
. This often requires a more sophisticated approach, potentially involving a lookup table or a library for transliteration.
function replaceNonAsciiWithPlaceholder(str, placeholder = '?') {
return str.replace(/[^\x00-\x7F]/g, placeholder);
}
const originalString = "This string has a Euro symbol: € and an accented letter: é.";
const cleanedString1 = replaceNonAsciiWithPlaceholder(originalString);
const cleanedString2 = replaceNonAsciiWithPlaceholder(originalString, '[NON-ASCII]');
console.log(cleanedString1); // Output: "This string has a Euro symbol: ? and an accented letter: ?."
console.log(cleanedString2); // Output: "This string has a Euro symbol: [NON-ASCII] and an accented letter: [NON-ASCII]."
Replacing non-ASCII characters with a specified placeholder.