Why is it not possible to search through text file contents encoded in UTF-16?

I understand that e.g. catfish and gnome-search-utils both can search inside file contents that are UTF-8 encoded. To be able to search for words or numbers within text files one would have to convert them via iconv into UTF-8 first.

If the file is known, text editors like gedit or mousepad have no trouble with UTF-16.

Why is there no search tool (GUI or command-line) with any of the Linux distributions that can handle UTF-16 encoded txt files?

I'm on Xubuntu.

edited May 9 '17 at 21:53

Gilles

532k12810651592

asked May 9 '17 at 15:33

Enteneller

285

6

ripgrep 0.5.0 supports UTF-16, but (rant) it is a terrible encoding that should never be used, as 1) a UTF-16 string cannot be a C string if it contains any ASCII characters, 2) It is just as much a variable-width encoding as UTF-8, 3) Many tools choke on the BOM, but it is necessary to disambiguate endianness

– Fox
May 9 '17 at 15:52

2

See also utf8everywhere.com

– tripleee
May 9 '17 at 18:40

@Fox: thanks. ripgrep seems powerful.

– Enteneller
May 9 '17 at 21:19

@Fox -- you would no more encode a user string in UTF-16 in C, than you would encode them in UTF-8. C only handles ASCII, and you need library functions to convert strings to(or from) UTF-8 OR UTF-16. However, I tend to agree UTF-16 is icky -- especially since it's often UCS-2 in disguise (no BOM, only supports up to Unicode-2) -- especially when talking about WindowsOS files (log files, reg files, may not have BOMs for example).

– Astara
Aug 25 '17 at 2:20

1

@Astara My statement about C-strings was a quick summary of: if a character is in the subset of Unicode that overlaps with ASCII, its encoding in UTF-16 (or UCS-2) contains a null-byte. The only character containing a null-byte in UTF-8 is NUL itself. This means that you can use functions from the standard C library to read, write, copy, etc. UTF-8 strings, but not UTF-16. You won't get proper change-case support, of course, but the basics are free. In any case, this appears to be a digression from a digression

– Fox
Aug 25 '17 at 2:38

|
show 1 more comment

If the file is known, text editors like gedit or mousepad have no trouble with UTF-16.

Why is there no search tool (GUI or command-line) with any of the Linux distributions that can handle UTF-16 encoded txt files?

I'm on Xubuntu.

edited May 9 '17 at 21:53

Gilles

532k12810651592

asked May 9 '17 at 15:33

Enteneller

285

6

ripgrep 0.5.0 supports UTF-16, but (rant) it is a terrible encoding that should never be used, as 1) a UTF-16 string cannot be a C string if it contains any ASCII characters, 2) It is just as much a variable-width encoding as UTF-8, 3) Many tools choke on the BOM, but it is necessary to disambiguate endianness

– Fox
May 9 '17 at 15:52

2

See also utf8everywhere.com

– tripleee
May 9 '17 at 18:40

@Fox: thanks. ripgrep seems powerful.

– Enteneller
May 9 '17 at 21:19

@Fox -- you would no more encode a user string in UTF-16 in C, than you would encode them in UTF-8. C only handles ASCII, and you need library functions to convert strings to(or from) UTF-8 OR UTF-16. However, I tend to agree UTF-16 is icky -- especially since it's often UCS-2 in disguise (no BOM, only supports up to Unicode-2) -- especially when talking about WindowsOS files (log files, reg files, may not have BOMs for example).

– Astara
Aug 25 '17 at 2:20

1

@Astara My statement about C-strings was a quick summary of: if a character is in the subset of Unicode that overlaps with ASCII, its encoding in UTF-16 (or UCS-2) contains a null-byte. The only character containing a null-byte in UTF-8 is NUL itself. This means that you can use functions from the standard C library to read, write, copy, etc. UTF-8 strings, but not UTF-16. You won't get proper change-case support, of course, but the basics are free. In any case, this appears to be a digression from a digression

– Fox
Aug 25 '17 at 2:38

|
show 1 more comment

If the file is known, text editors like gedit or mousepad have no trouble with UTF-16.

Why is there no search tool (GUI or command-line) with any of the Linux distributions that can handle UTF-16 encoded txt files?

I'm on Xubuntu.

edited May 9 '17 at 21:53

Gilles

532k12810651592

asked May 9 '17 at 15:33

Enteneller

285

If the file is known, text editors like gedit or mousepad have no trouble with UTF-16.

Why is there no search tool (GUI or command-line) with any of the Linux distributions that can handle UTF-16 encoded txt files?

I'm on Xubuntu.

search unicode text

edited May 9 '17 at 21:53

Gilles

532k12810651592

asked May 9 '17 at 15:33

Enteneller

285

edited May 9 '17 at 21:53

Gilles

532k12810651592

asked May 9 '17 at 15:33

Enteneller

285

edited May 9 '17 at 21:53

Gilles

532k12810651592

edited May 9 '17 at 21:53

Gilles

532k12810651592

edited May 9 '17 at 21:53

Gilles

532k12810651592

asked May 9 '17 at 15:33

Enteneller

285

asked May 9 '17 at 15:33

Enteneller

285

asked May 9 '17 at 15:33

Enteneller

285

6

ripgrep 0.5.0 supports UTF-16, but (rant) it is a terrible encoding that should never be used, as 1) a UTF-16 string cannot be a C string if it contains any ASCII characters, 2) It is just as much a variable-width encoding as UTF-8, 3) Many tools choke on the BOM, but it is necessary to disambiguate endianness

– Fox
May 9 '17 at 15:52

2

See also utf8everywhere.com

– tripleee
May 9 '17 at 18:40

@Fox: thanks. ripgrep seems powerful.

– Enteneller
May 9 '17 at 21:19

@Fox -- you would no more encode a user string in UTF-16 in C, than you would encode them in UTF-8. C only handles ASCII, and you need library functions to convert strings to(or from) UTF-8 OR UTF-16. However, I tend to agree UTF-16 is icky -- especially since it's often UCS-2 in disguise (no BOM, only supports up to Unicode-2) -- especially when talking about WindowsOS files (log files, reg files, may not have BOMs for example).

– Astara
Aug 25 '17 at 2:20

1

@Astara My statement about C-strings was a quick summary of: if a character is in the subset of Unicode that overlaps with ASCII, its encoding in UTF-16 (or UCS-2) contains a null-byte. The only character containing a null-byte in UTF-8 is NUL itself. This means that you can use functions from the standard C library to read, write, copy, etc. UTF-8 strings, but not UTF-16. You won't get proper change-case support, of course, but the basics are free. In any case, this appears to be a digression from a digression

– Fox
Aug 25 '17 at 2:38

|
show 1 more comment

6

ripgrep 0.5.0 supports UTF-16, but (rant) it is a terrible encoding that should never be used, as 1) a UTF-16 string cannot be a C string if it contains any ASCII characters, 2) It is just as much a variable-width encoding as UTF-8, 3) Many tools choke on the BOM, but it is necessary to disambiguate endianness

– Fox
May 9 '17 at 15:52

2

See also utf8everywhere.com

– tripleee
May 9 '17 at 18:40

@Fox: thanks. ripgrep seems powerful.

– Enteneller
May 9 '17 at 21:19

@Fox -- you would no more encode a user string in UTF-16 in C, than you would encode them in UTF-8. C only handles ASCII, and you need library functions to convert strings to(or from) UTF-8 OR UTF-16. However, I tend to agree UTF-16 is icky -- especially since it's often UCS-2 in disguise (no BOM, only supports up to Unicode-2) -- especially when talking about WindowsOS files (log files, reg files, may not have BOMs for example).

– Astara
Aug 25 '17 at 2:20

1

@Astara My statement about C-strings was a quick summary of: if a character is in the subset of Unicode that overlaps with ASCII, its encoding in UTF-16 (or UCS-2) contains a null-byte. The only character containing a null-byte in UTF-8 is NUL itself. This means that you can use functions from the standard C library to read, write, copy, etc. UTF-8 strings, but not UTF-16. You won't get proper change-case support, of course, but the basics are free. In any case, this appears to be a digression from a digression

– Fox
Aug 25 '17 at 2:38

ripgrep 0.5.0 supports UTF-16, but (rant) it is a terrible encoding that should never be used, as 1) a UTF-16 string cannot be a C string if it contains any ASCII characters, 2) It is just as much a variable-width encoding as UTF-8, 3) Many tools choke on the BOM, but it is necessary to disambiguate endianness

– Fox
May 9 '17 at 15:52

See also utf8everywhere.com

– tripleee
May 9 '17 at 18:40

@Fox: thanks. ripgrep seems powerful.

– Enteneller
May 9 '17 at 21:19

@Fox -- you would no more encode a user string in UTF-16 in C, than you would encode them in UTF-8. C only handles ASCII, and you need library functions to convert strings to(or from) UTF-8 OR UTF-16. However, I tend to agree UTF-16 is icky -- especially since it's often UCS-2 in disguise (no BOM, only supports up to Unicode-2) -- especially when talking about WindowsOS files (log files, reg files, may not have BOMs for example).

– Astara
Aug 25 '17 at 2:20

@Astara My statement about C-strings was a quick summary of: if a character is in the subset of Unicode that overlaps with ASCII, its encoding in UTF-16 (or UCS-2) contains a null-byte. The only character containing a null-byte in UTF-8 is NUL itself. This means that you can use functions from the standard C library to read, write, copy, etc. UTF-8 strings, but not UTF-16. You won't get proper change-case support, of course, but the basics are free. In any case, this appears to be a digression from a digression

– Fox
Aug 25 '17 at 2:38

|
show 1 more comment

2 Answers
2

active

oldest

votes

UTF-16 (or UCS-2) is highly unfriendly for the null-terminated strings used by the C standard library and the POSIX ABI. For example, command line arguments are terminated by NULs (bytes with value zero), and any UTF-16 character with numerical value < 256 contains a zero byte, so any strings of the usual English letters would be impossible to represent in UTF-16 on a command line argument.

That in turn means that either the utilities would need to take input in some other format (say UTF-8) and convert to UTF-16; or they would need to take their input in some other way. The first option would require all such utilities to contain (or link to) code for the conversion, and the second would make interfacing those programs to other utilities somewhat difficult.

Given those difficulties, and the fact that UTF-8 has better backwards-compatibility properties, I'd just guess that few care to use UTF-16 enough to be motivated to create tools for that.

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

The null termination code in UTF-16 is two null bytes in a row -- which encodes a null byte for UTF-16. If your command line handles UTF-16, then ascii (or unicode) letter 'A' would be internally represented by 0x41 x00 (on windows x86, lower byte is always 1st, often called 'LSB' (vs. MSB). The thing in 'C', is that UTF-16 is an encoding, BELOW what the language uses. 'C' uses user strings which are automatically converted to the platform's native encoding. So a 'C' prog printing "hello worldn" works on all C-supporting platforms.

– Astara
Aug 25 '17 at 2:15

@Astara, well, in practice, the tools that exist assume a character of 8 bits, so the first 8-bit byte with value 0 terminates the string. POSIX also defines a string as "A contiguous sequence of bytes terminated by and including the first null byte.", and that a byte is exactly the same as an octet, i.e. 8 bits. So yeah, you'd need to have a tool that explicitly supports UTF-16.

– ilkkachu
Aug 25 '17 at 15:53

We aren't talking '8-bit' interfaces between tools -- we are talking character interterfaces between tools. Whether those characters are 8 or 32 bits internally isn't something passed out to external tools. The original question asked for a find tool to search for text in files that was UTF-16 encoded. The included version of 'find.exe' in /windows/system32, does that.

– Astara
Aug 26 '17 at 0:32

@Astara, well, the read() and write() system calls deal in bytes, so the interpretation of a character must be done in the tool.

– ilkkachu
Aug 26 '17 at 17:55

There are no read/write "system" calls on NT. On Win, there are 'read/write' library calls that present I/O as 8-bit chars, but on NT those library calls convert from 8 to 16-bit when talking to the system.

– Astara
Aug 27 '17 at 15:44

|
show 5 more comments

Install ripgrep utility which supports UTF-16.

For example:

rg pattern filename

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

answered 14 hours ago

kenorb

8,471370106

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f363946%2fwhy-is-it-not-possible-to-search-through-text-file-contents-encoded-in-utf-16%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Given those difficulties, and the fact that UTF-8 has better backwards-compatibility properties, I'd just guess that few care to use UTF-16 enough to be motivated to create tools for that.

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

The null termination code in UTF-16 is two null bytes in a row -- which encodes a null byte for UTF-16. If your command line handles UTF-16, then ascii (or unicode) letter 'A' would be internally represented by 0x41 x00 (on windows x86, lower byte is always 1st, often called 'LSB' (vs. MSB). The thing in 'C', is that UTF-16 is an encoding, BELOW what the language uses. 'C' uses user strings which are automatically converted to the platform's native encoding. So a 'C' prog printing "hello worldn" works on all C-supporting platforms.

– Astara
Aug 25 '17 at 2:15

@Astara, well, in practice, the tools that exist assume a character of 8 bits, so the first 8-bit byte with value 0 terminates the string. POSIX also defines a string as "A contiguous sequence of bytes terminated by and including the first null byte.", and that a byte is exactly the same as an octet, i.e. 8 bits. So yeah, you'd need to have a tool that explicitly supports UTF-16.

– ilkkachu
Aug 25 '17 at 15:53

We aren't talking '8-bit' interfaces between tools -- we are talking character interterfaces between tools. Whether those characters are 8 or 32 bits internally isn't something passed out to external tools. The original question asked for a find tool to search for text in files that was UTF-16 encoded. The included version of 'find.exe' in /windows/system32, does that.

– Astara
Aug 26 '17 at 0:32

@Astara, well, the read() and write() system calls deal in bytes, so the interpretation of a character must be done in the tool.

– ilkkachu
Aug 26 '17 at 17:55

There are no read/write "system" calls on NT. On Win, there are 'read/write' library calls that present I/O as 8-bit chars, but on NT those library calls convert from 8 to 16-bit when talking to the system.

– Astara
Aug 27 '17 at 15:44

|
show 5 more comments

Given those difficulties, and the fact that UTF-8 has better backwards-compatibility properties, I'd just guess that few care to use UTF-16 enough to be motivated to create tools for that.

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

The null termination code in UTF-16 is two null bytes in a row -- which encodes a null byte for UTF-16. If your command line handles UTF-16, then ascii (or unicode) letter 'A' would be internally represented by 0x41 x00 (on windows x86, lower byte is always 1st, often called 'LSB' (vs. MSB). The thing in 'C', is that UTF-16 is an encoding, BELOW what the language uses. 'C' uses user strings which are automatically converted to the platform's native encoding. So a 'C' prog printing "hello worldn" works on all C-supporting platforms.

– Astara
Aug 25 '17 at 2:15

@Astara, well, in practice, the tools that exist assume a character of 8 bits, so the first 8-bit byte with value 0 terminates the string. POSIX also defines a string as "A contiguous sequence of bytes terminated by and including the first null byte.", and that a byte is exactly the same as an octet, i.e. 8 bits. So yeah, you'd need to have a tool that explicitly supports UTF-16.

– ilkkachu
Aug 25 '17 at 15:53

We aren't talking '8-bit' interfaces between tools -- we are talking character interterfaces between tools. Whether those characters are 8 or 32 bits internally isn't something passed out to external tools. The original question asked for a find tool to search for text in files that was UTF-16 encoded. The included version of 'find.exe' in /windows/system32, does that.

– Astara
Aug 26 '17 at 0:32

@Astara, well, the read() and write() system calls deal in bytes, so the interpretation of a character must be done in the tool.

– ilkkachu
Aug 26 '17 at 17:55

There are no read/write "system" calls on NT. On Win, there are 'read/write' library calls that present I/O as 8-bit chars, but on NT those library calls convert from 8 to 16-bit when talking to the system.

– Astara
Aug 27 '17 at 15:44

|
show 5 more comments

Given those difficulties, and the fact that UTF-8 has better backwards-compatibility properties, I'd just guess that few care to use UTF-16 enough to be motivated to create tools for that.

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

Given those difficulties, and the fact that UTF-8 has better backwards-compatibility properties, I'd just guess that few care to use UTF-16 enough to be motivated to create tools for that.

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

answered May 9 '17 at 17:56

ilkkachu

56.7k784156

The null termination code in UTF-16 is two null bytes in a row -- which encodes a null byte for UTF-16. If your command line handles UTF-16, then ascii (or unicode) letter 'A' would be internally represented by 0x41 x00 (on windows x86, lower byte is always 1st, often called 'LSB' (vs. MSB). The thing in 'C', is that UTF-16 is an encoding, BELOW what the language uses. 'C' uses user strings which are automatically converted to the platform's native encoding. So a 'C' prog printing "hello worldn" works on all C-supporting platforms.

– Astara
Aug 25 '17 at 2:15

@Astara, well, in practice, the tools that exist assume a character of 8 bits, so the first 8-bit byte with value 0 terminates the string. POSIX also defines a string as "A contiguous sequence of bytes terminated by and including the first null byte.", and that a byte is exactly the same as an octet, i.e. 8 bits. So yeah, you'd need to have a tool that explicitly supports UTF-16.

– ilkkachu
Aug 25 '17 at 15:53

We aren't talking '8-bit' interfaces between tools -- we are talking character interterfaces between tools. Whether those characters are 8 or 32 bits internally isn't something passed out to external tools. The original question asked for a find tool to search for text in files that was UTF-16 encoded. The included version of 'find.exe' in /windows/system32, does that.

– Astara
Aug 26 '17 at 0:32

@Astara, well, the read() and write() system calls deal in bytes, so the interpretation of a character must be done in the tool.

– ilkkachu
Aug 26 '17 at 17:55

There are no read/write "system" calls on NT. On Win, there are 'read/write' library calls that present I/O as 8-bit chars, but on NT those library calls convert from 8 to 16-bit when talking to the system.

– Astara
Aug 27 '17 at 15:44

|
show 5 more comments

The null termination code in UTF-16 is two null bytes in a row -- which encodes a null byte for UTF-16. If your command line handles UTF-16, then ascii (or unicode) letter 'A' would be internally represented by 0x41 x00 (on windows x86, lower byte is always 1st, often called 'LSB' (vs. MSB). The thing in 'C', is that UTF-16 is an encoding, BELOW what the language uses. 'C' uses user strings which are automatically converted to the platform's native encoding. So a 'C' prog printing "hello worldn" works on all C-supporting platforms.

– Astara
Aug 25 '17 at 2:15

@Astara, well, in practice, the tools that exist assume a character of 8 bits, so the first 8-bit byte with value 0 terminates the string. POSIX also defines a string as "A contiguous sequence of bytes terminated by and including the first null byte.", and that a byte is exactly the same as an octet, i.e. 8 bits. So yeah, you'd need to have a tool that explicitly supports UTF-16.

– ilkkachu
Aug 25 '17 at 15:53

We aren't talking '8-bit' interfaces between tools -- we are talking character interterfaces between tools. Whether those characters are 8 or 32 bits internally isn't something passed out to external tools. The original question asked for a find tool to search for text in files that was UTF-16 encoded. The included version of 'find.exe' in /windows/system32, does that.

– Astara
Aug 26 '17 at 0:32

@Astara, well, the read() and write() system calls deal in bytes, so the interpretation of a character must be done in the tool.

– ilkkachu
Aug 26 '17 at 17:55

There are no read/write "system" calls on NT. On Win, there are 'read/write' library calls that present I/O as 8-bit chars, but on NT those library calls convert from 8 to 16-bit when talking to the system.

– Astara
Aug 27 '17 at 15:44

The null termination code in UTF-16 is two null bytes in a row -- which encodes a null byte for UTF-16. If your command line handles UTF-16, then ascii (or unicode) letter 'A' would be internally represented by 0x41 x00 (on windows x86, lower byte is always 1st, often called 'LSB' (vs. MSB). The thing in 'C', is that UTF-16 is an encoding, BELOW what the language uses. 'C' uses user strings which are automatically converted to the platform's native encoding. So a 'C' prog printing "hello worldn" works on all C-supporting platforms.

– Astara
Aug 25 '17 at 2:15

@Astara, well, in practice, the tools that exist assume a character of 8 bits, so the first 8-bit byte with value 0 terminates the string. POSIX also defines a string as "A contiguous sequence of bytes terminated by and including the first null byte.", and that a byte is exactly the same as an octet, i.e. 8 bits. So yeah, you'd need to have a tool that explicitly supports UTF-16.

– ilkkachu
Aug 25 '17 at 15:53

We aren't talking '8-bit' interfaces between tools -- we are talking character interterfaces between tools. Whether those characters are 8 or 32 bits internally isn't something passed out to external tools. The original question asked for a find tool to search for text in files that was UTF-16 encoded. The included version of 'find.exe' in /windows/system32, does that.

– Astara
Aug 26 '17 at 0:32

@Astara, well, the read() and write() system calls deal in bytes, so the interpretation of a character must be done in the tool.

– ilkkachu
Aug 26 '17 at 17:55

There are no read/write "system" calls on NT. On Win, there are 'read/write' library calls that present I/O as 8-bit chars, but on NT those library calls convert from 8 to 16-bit when talking to the system.

– Astara
Aug 27 '17 at 15:44

|
show 5 more comments

Install ripgrep utility which supports UTF-16.

For example:

rg pattern filename

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

answered 14 hours ago

kenorb

8,471370106

add a comment |

Install ripgrep utility which supports UTF-16.

For example:

rg pattern filename

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

answered 14 hours ago

kenorb

8,471370106

add a comment |

Install ripgrep utility which supports UTF-16.

For example:

rg pattern filename

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

answered 14 hours ago

kenorb

8,471370106

Install ripgrep utility which supports UTF-16.

For example:

rg pattern filename

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

answered 14 hours ago

kenorb

8,471370106

answered 14 hours ago

kenorb

8,471370106

answered 14 hours ago

kenorb

8,471370106

answered 14 hours ago

kenorb

8,471370106

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cdtjkyj