Wildcards/Globbing: Are character ranges problematic?












0















In The Linux Command Line William Shotts claims that character ranges can be problematic. See the relevant excerpt below, emphasis is mine.




Character Ranges



If you are coming from another Unix-like environment or have been reading
some other books on this subject, you may have encountered the [A-Z] and [a-z] character range notations. These are traditional Unix notations and worked in older versions of Linux as well. They can still work, but you have to be careful with them because they will not produce the expected results unless properly configured. For now, you should avoid using them and use character classes instead.




What is he talking about in the last couple of sentences? What do the POSIX standards say about this?










share|improve this question









New contributor




Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • The book is freely available for download here.

    – Git Gud
    7 hours ago











  • I tagged the question with environment-variables because I suspect they might be relevant, but I don't know it. If someone more experienced than me would assess the adequacy of this tag and remove it if necessary, I would appreciate it.

    – Git Gud
    7 hours ago











  • wildcards are typically used in the area of filename generation; do you see any connection to variables for your question? (I don't, but it's your question)

    – Jeff Schaller
    7 hours ago











  • @JeffSchaller My suspicion stems from the second paragraph here. If you think the tag isn't appropriate here, please let me know and I'll remove it. Also, feel free to remove it yourself. Thanks.

    – Git Gud
    7 hours ago






  • 4





    The reference is probably to locale-dependence: see for example Why does [A-Z] match lowercase letters in bash?

    – steeldriver
    7 hours ago


















0















In The Linux Command Line William Shotts claims that character ranges can be problematic. See the relevant excerpt below, emphasis is mine.




Character Ranges



If you are coming from another Unix-like environment or have been reading
some other books on this subject, you may have encountered the [A-Z] and [a-z] character range notations. These are traditional Unix notations and worked in older versions of Linux as well. They can still work, but you have to be careful with them because they will not produce the expected results unless properly configured. For now, you should avoid using them and use character classes instead.




What is he talking about in the last couple of sentences? What do the POSIX standards say about this?










share|improve this question









New contributor




Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • The book is freely available for download here.

    – Git Gud
    7 hours ago











  • I tagged the question with environment-variables because I suspect they might be relevant, but I don't know it. If someone more experienced than me would assess the adequacy of this tag and remove it if necessary, I would appreciate it.

    – Git Gud
    7 hours ago











  • wildcards are typically used in the area of filename generation; do you see any connection to variables for your question? (I don't, but it's your question)

    – Jeff Schaller
    7 hours ago











  • @JeffSchaller My suspicion stems from the second paragraph here. If you think the tag isn't appropriate here, please let me know and I'll remove it. Also, feel free to remove it yourself. Thanks.

    – Git Gud
    7 hours ago






  • 4





    The reference is probably to locale-dependence: see for example Why does [A-Z] match lowercase letters in bash?

    – steeldriver
    7 hours ago
















0












0








0








In The Linux Command Line William Shotts claims that character ranges can be problematic. See the relevant excerpt below, emphasis is mine.




Character Ranges



If you are coming from another Unix-like environment or have been reading
some other books on this subject, you may have encountered the [A-Z] and [a-z] character range notations. These are traditional Unix notations and worked in older versions of Linux as well. They can still work, but you have to be careful with them because they will not produce the expected results unless properly configured. For now, you should avoid using them and use character classes instead.




What is he talking about in the last couple of sentences? What do the POSIX standards say about this?










share|improve this question









New contributor




Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












In The Linux Command Line William Shotts claims that character ranges can be problematic. See the relevant excerpt below, emphasis is mine.




Character Ranges



If you are coming from another Unix-like environment or have been reading
some other books on this subject, you may have encountered the [A-Z] and [a-z] character range notations. These are traditional Unix notations and worked in older versions of Linux as well. They can still work, but you have to be careful with them because they will not produce the expected results unless properly configured. For now, you should avoid using them and use character classes instead.




What is he talking about in the last couple of sentences? What do the POSIX standards say about this?







wildcards posix locale portability






share|improve this question









New contributor




Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 7 hours ago









Kusalananda

135k17255418




135k17255418






New contributor




Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 7 hours ago









Git GudGit Gud

1183




1183




New contributor




Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Git Gud is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • The book is freely available for download here.

    – Git Gud
    7 hours ago











  • I tagged the question with environment-variables because I suspect they might be relevant, but I don't know it. If someone more experienced than me would assess the adequacy of this tag and remove it if necessary, I would appreciate it.

    – Git Gud
    7 hours ago











  • wildcards are typically used in the area of filename generation; do you see any connection to variables for your question? (I don't, but it's your question)

    – Jeff Schaller
    7 hours ago











  • @JeffSchaller My suspicion stems from the second paragraph here. If you think the tag isn't appropriate here, please let me know and I'll remove it. Also, feel free to remove it yourself. Thanks.

    – Git Gud
    7 hours ago






  • 4





    The reference is probably to locale-dependence: see for example Why does [A-Z] match lowercase letters in bash?

    – steeldriver
    7 hours ago





















  • The book is freely available for download here.

    – Git Gud
    7 hours ago











  • I tagged the question with environment-variables because I suspect they might be relevant, but I don't know it. If someone more experienced than me would assess the adequacy of this tag and remove it if necessary, I would appreciate it.

    – Git Gud
    7 hours ago











  • wildcards are typically used in the area of filename generation; do you see any connection to variables for your question? (I don't, but it's your question)

    – Jeff Schaller
    7 hours ago











  • @JeffSchaller My suspicion stems from the second paragraph here. If you think the tag isn't appropriate here, please let me know and I'll remove it. Also, feel free to remove it yourself. Thanks.

    – Git Gud
    7 hours ago






  • 4





    The reference is probably to locale-dependence: see for example Why does [A-Z] match lowercase letters in bash?

    – steeldriver
    7 hours ago



















The book is freely available for download here.

– Git Gud
7 hours ago





The book is freely available for download here.

– Git Gud
7 hours ago













I tagged the question with environment-variables because I suspect they might be relevant, but I don't know it. If someone more experienced than me would assess the adequacy of this tag and remove it if necessary, I would appreciate it.

– Git Gud
7 hours ago





I tagged the question with environment-variables because I suspect they might be relevant, but I don't know it. If someone more experienced than me would assess the adequacy of this tag and remove it if necessary, I would appreciate it.

– Git Gud
7 hours ago













wildcards are typically used in the area of filename generation; do you see any connection to variables for your question? (I don't, but it's your question)

– Jeff Schaller
7 hours ago





wildcards are typically used in the area of filename generation; do you see any connection to variables for your question? (I don't, but it's your question)

– Jeff Schaller
7 hours ago













@JeffSchaller My suspicion stems from the second paragraph here. If you think the tag isn't appropriate here, please let me know and I'll remove it. Also, feel free to remove it yourself. Thanks.

– Git Gud
7 hours ago





@JeffSchaller My suspicion stems from the second paragraph here. If you think the tag isn't appropriate here, please let me know and I'll remove it. Also, feel free to remove it yourself. Thanks.

– Git Gud
7 hours ago




4




4





The reference is probably to locale-dependence: see for example Why does [A-Z] match lowercase letters in bash?

– steeldriver
7 hours ago







The reference is probably to locale-dependence: see for example Why does [A-Z] match lowercase letters in bash?

– steeldriver
7 hours ago












1 Answer
1






active

oldest

votes


















2














That most likely refers to locales having uppercase and lowercase characters ordered in alternation, instead of first one, then the other:



$ echo "$LANG"
en_US.UTF-8
$ touch a A z Z
$ ls
A Z a z
$ bash -c 'echo [a-z]'
a A z


However, the appropriate character class works:



$ bash -c 'echo [[:lower:]]'
a z


But might also match more than just a to z:



$ LANG=fi_FI.UTF-8
$ touch ä Ä ö Ö
$ bash -c 'echo [[:lower:]]'
a z ä ö


If you want to avoid that, and only match the English lowercase letters a to z, Bash in particular has an option to interpret the ranges in the ASCII order:



$ bash -c 'shopt -s globasciiranges; echo [a-z]'
a z


And you can always force the default collating order:



$ LC_COLLATE=C bash -c 'echo [a-z]'
a z





share|improve this answer


























  • Thanks. I don't understand how bash -c 'echo [a-z]' works. If I don't run the commands in the specific order that you do in your answer, it just echoes [a-z]. Can you shed some light on that? Also, is there a way to check the definition of each character class?

    – Git Gud
    6 hours ago











  • @GitGud [a-z] is a glob pattern, just like foo*; if you don't have any file starting with foo in the current dir, echo foo* will just echo foo*; if you don't have any file named a, b, c, etc in the current dir, echo [a-z] will echo just [a-z].

    – Uncle Billy
    6 hours ago











  • Ah! Of course! Got it. Thanks @UncleBilly

    – Git Gud
    6 hours ago











  • @ikkachu Just to make sure you didn't miss my second question: is there a way to check the definition of each character class?

    – Git Gud
    2 hours ago











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Git Gud is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505498%2fwildcards-globbing-are-character-ranges-problematic%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














That most likely refers to locales having uppercase and lowercase characters ordered in alternation, instead of first one, then the other:



$ echo "$LANG"
en_US.UTF-8
$ touch a A z Z
$ ls
A Z a z
$ bash -c 'echo [a-z]'
a A z


However, the appropriate character class works:



$ bash -c 'echo [[:lower:]]'
a z


But might also match more than just a to z:



$ LANG=fi_FI.UTF-8
$ touch ä Ä ö Ö
$ bash -c 'echo [[:lower:]]'
a z ä ö


If you want to avoid that, and only match the English lowercase letters a to z, Bash in particular has an option to interpret the ranges in the ASCII order:



$ bash -c 'shopt -s globasciiranges; echo [a-z]'
a z


And you can always force the default collating order:



$ LC_COLLATE=C bash -c 'echo [a-z]'
a z





share|improve this answer


























  • Thanks. I don't understand how bash -c 'echo [a-z]' works. If I don't run the commands in the specific order that you do in your answer, it just echoes [a-z]. Can you shed some light on that? Also, is there a way to check the definition of each character class?

    – Git Gud
    6 hours ago











  • @GitGud [a-z] is a glob pattern, just like foo*; if you don't have any file starting with foo in the current dir, echo foo* will just echo foo*; if you don't have any file named a, b, c, etc in the current dir, echo [a-z] will echo just [a-z].

    – Uncle Billy
    6 hours ago











  • Ah! Of course! Got it. Thanks @UncleBilly

    – Git Gud
    6 hours ago











  • @ikkachu Just to make sure you didn't miss my second question: is there a way to check the definition of each character class?

    – Git Gud
    2 hours ago
















2














That most likely refers to locales having uppercase and lowercase characters ordered in alternation, instead of first one, then the other:



$ echo "$LANG"
en_US.UTF-8
$ touch a A z Z
$ ls
A Z a z
$ bash -c 'echo [a-z]'
a A z


However, the appropriate character class works:



$ bash -c 'echo [[:lower:]]'
a z


But might also match more than just a to z:



$ LANG=fi_FI.UTF-8
$ touch ä Ä ö Ö
$ bash -c 'echo [[:lower:]]'
a z ä ö


If you want to avoid that, and only match the English lowercase letters a to z, Bash in particular has an option to interpret the ranges in the ASCII order:



$ bash -c 'shopt -s globasciiranges; echo [a-z]'
a z


And you can always force the default collating order:



$ LC_COLLATE=C bash -c 'echo [a-z]'
a z





share|improve this answer


























  • Thanks. I don't understand how bash -c 'echo [a-z]' works. If I don't run the commands in the specific order that you do in your answer, it just echoes [a-z]. Can you shed some light on that? Also, is there a way to check the definition of each character class?

    – Git Gud
    6 hours ago











  • @GitGud [a-z] is a glob pattern, just like foo*; if you don't have any file starting with foo in the current dir, echo foo* will just echo foo*; if you don't have any file named a, b, c, etc in the current dir, echo [a-z] will echo just [a-z].

    – Uncle Billy
    6 hours ago











  • Ah! Of course! Got it. Thanks @UncleBilly

    – Git Gud
    6 hours ago











  • @ikkachu Just to make sure you didn't miss my second question: is there a way to check the definition of each character class?

    – Git Gud
    2 hours ago














2












2








2







That most likely refers to locales having uppercase and lowercase characters ordered in alternation, instead of first one, then the other:



$ echo "$LANG"
en_US.UTF-8
$ touch a A z Z
$ ls
A Z a z
$ bash -c 'echo [a-z]'
a A z


However, the appropriate character class works:



$ bash -c 'echo [[:lower:]]'
a z


But might also match more than just a to z:



$ LANG=fi_FI.UTF-8
$ touch ä Ä ö Ö
$ bash -c 'echo [[:lower:]]'
a z ä ö


If you want to avoid that, and only match the English lowercase letters a to z, Bash in particular has an option to interpret the ranges in the ASCII order:



$ bash -c 'shopt -s globasciiranges; echo [a-z]'
a z


And you can always force the default collating order:



$ LC_COLLATE=C bash -c 'echo [a-z]'
a z





share|improve this answer















That most likely refers to locales having uppercase and lowercase characters ordered in alternation, instead of first one, then the other:



$ echo "$LANG"
en_US.UTF-8
$ touch a A z Z
$ ls
A Z a z
$ bash -c 'echo [a-z]'
a A z


However, the appropriate character class works:



$ bash -c 'echo [[:lower:]]'
a z


But might also match more than just a to z:



$ LANG=fi_FI.UTF-8
$ touch ä Ä ö Ö
$ bash -c 'echo [[:lower:]]'
a z ä ö


If you want to avoid that, and only match the English lowercase letters a to z, Bash in particular has an option to interpret the ranges in the ASCII order:



$ bash -c 'shopt -s globasciiranges; echo [a-z]'
a z


And you can always force the default collating order:



$ LC_COLLATE=C bash -c 'echo [a-z]'
a z






share|improve this answer














share|improve this answer



share|improve this answer








edited 3 hours ago

























answered 7 hours ago









ilkkachuilkkachu

60.7k1098172




60.7k1098172













  • Thanks. I don't understand how bash -c 'echo [a-z]' works. If I don't run the commands in the specific order that you do in your answer, it just echoes [a-z]. Can you shed some light on that? Also, is there a way to check the definition of each character class?

    – Git Gud
    6 hours ago











  • @GitGud [a-z] is a glob pattern, just like foo*; if you don't have any file starting with foo in the current dir, echo foo* will just echo foo*; if you don't have any file named a, b, c, etc in the current dir, echo [a-z] will echo just [a-z].

    – Uncle Billy
    6 hours ago











  • Ah! Of course! Got it. Thanks @UncleBilly

    – Git Gud
    6 hours ago











  • @ikkachu Just to make sure you didn't miss my second question: is there a way to check the definition of each character class?

    – Git Gud
    2 hours ago



















  • Thanks. I don't understand how bash -c 'echo [a-z]' works. If I don't run the commands in the specific order that you do in your answer, it just echoes [a-z]. Can you shed some light on that? Also, is there a way to check the definition of each character class?

    – Git Gud
    6 hours ago











  • @GitGud [a-z] is a glob pattern, just like foo*; if you don't have any file starting with foo in the current dir, echo foo* will just echo foo*; if you don't have any file named a, b, c, etc in the current dir, echo [a-z] will echo just [a-z].

    – Uncle Billy
    6 hours ago











  • Ah! Of course! Got it. Thanks @UncleBilly

    – Git Gud
    6 hours ago











  • @ikkachu Just to make sure you didn't miss my second question: is there a way to check the definition of each character class?

    – Git Gud
    2 hours ago

















Thanks. I don't understand how bash -c 'echo [a-z]' works. If I don't run the commands in the specific order that you do in your answer, it just echoes [a-z]. Can you shed some light on that? Also, is there a way to check the definition of each character class?

– Git Gud
6 hours ago





Thanks. I don't understand how bash -c 'echo [a-z]' works. If I don't run the commands in the specific order that you do in your answer, it just echoes [a-z]. Can you shed some light on that? Also, is there a way to check the definition of each character class?

– Git Gud
6 hours ago













@GitGud [a-z] is a glob pattern, just like foo*; if you don't have any file starting with foo in the current dir, echo foo* will just echo foo*; if you don't have any file named a, b, c, etc in the current dir, echo [a-z] will echo just [a-z].

– Uncle Billy
6 hours ago





@GitGud [a-z] is a glob pattern, just like foo*; if you don't have any file starting with foo in the current dir, echo foo* will just echo foo*; if you don't have any file named a, b, c, etc in the current dir, echo [a-z] will echo just [a-z].

– Uncle Billy
6 hours ago













Ah! Of course! Got it. Thanks @UncleBilly

– Git Gud
6 hours ago





Ah! Of course! Got it. Thanks @UncleBilly

– Git Gud
6 hours ago













@ikkachu Just to make sure you didn't miss my second question: is there a way to check the definition of each character class?

– Git Gud
2 hours ago





@ikkachu Just to make sure you didn't miss my second question: is there a way to check the definition of each character class?

– Git Gud
2 hours ago










Git Gud is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Git Gud is a new contributor. Be nice, and check out our Code of Conduct.













Git Gud is a new contributor. Be nice, and check out our Code of Conduct.












Git Gud is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505498%2fwildcards-globbing-are-character-ranges-problematic%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Histoire des bourses de valeurs

Why is there Russian traffic in my log files?

Rename multiple files to decrement number in file name?