How to remove symbols from a column using awk

I have data like this:

chr1    134901  139379  -   "ENSG00000237683.5";

chr1    860260  879955  +   "ENSG00000187634.6";

chr1    861264  866445  -   "ENSG00000268179.1";

chr1    879584  894689  -   "ENSG00000188976.6";

chr1    895967  901095  +   "ENSG00000187961.9";

I generated by parsing a GTF file

I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

asked Jan 14 '16 at 19:41

System

62117

1

you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

– jbrahy
Jan 14 '16 at 19:55

@DigitalTrauma ya, but Dani_l already gave that solution.

– jbrahy
Jan 14 '16 at 23:59

add a comment |

I have data like this:

chr1    134901  139379  -   "ENSG00000237683.5";

chr1    860260  879955  +   "ENSG00000187634.6";

chr1    861264  866445  -   "ENSG00000268179.1";

chr1    879584  894689  -   "ENSG00000188976.6";

chr1    895967  901095  +   "ENSG00000187961.9";

I generated by parsing a GTF file

I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

asked Jan 14 '16 at 19:41

System

62117

1

you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

– jbrahy
Jan 14 '16 at 19:55

@DigitalTrauma ya, but Dani_l already gave that solution.

– jbrahy
Jan 14 '16 at 23:59

add a comment |

I have data like this:

chr1    134901  139379  -   "ENSG00000237683.5";

chr1    860260  879955  +   "ENSG00000187634.6";

chr1    861264  866445  -   "ENSG00000268179.1";

chr1    879584  894689  -   "ENSG00000188976.6";

chr1    895967  901095  +   "ENSG00000187961.9";

I generated by parsing a GTF file

I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

asked Jan 14 '16 at 19:41

System

62117

I have data like this:

chr1    134901  139379  -   "ENSG00000237683.5";

chr1    860260  879955  +   "ENSG00000187634.6";

chr1    861264  866445  -   "ENSG00000268179.1";

chr1    879584  894689  -   "ENSG00000188976.6";

chr1    895967  901095  +   "ENSG00000187961.9";

I generated by parsing a GTF file

I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

text-processing sed awk

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

asked Jan 14 '16 at 19:41

System

62117

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

asked Jan 14 '16 at 19:41

System

62117

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

edited Jan 14 '16 at 19:46

jasonwryan

50.3k14135189

asked Jan 14 '16 at 19:41

System

62117

asked Jan 14 '16 at 19:41

System

62117

asked Jan 14 '16 at 19:41

System

62117

1

you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

– jbrahy
Jan 14 '16 at 19:55

@DigitalTrauma ya, but Dani_l already gave that solution.

– jbrahy
Jan 14 '16 at 23:59

add a comment |

1

you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

– jbrahy
Jan 14 '16 at 19:55

@DigitalTrauma ya, but Dani_l already gave that solution.

– jbrahy
Jan 14 '16 at 23:59

you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

– jbrahy
Jan 14 '16 at 19:55

@DigitalTrauma ya, but Dani_l already gave that solution.

– jbrahy
Jan 14 '16 at 23:59

add a comment |

7 Answers
7

active

oldest

votes

Using gsub:

awk '{gsub(/"|;/,"")}1' file

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:

awk '{gsub(/"|;/,"",$5)}1' file

edited Jan 14 '16 at 20:11

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

1

This would remove from all columns, not just 5th, no?

– Dani_l
Jan 14 '16 at 19:55

This is what I thought initally, but after using the code it seemed to keep all columns.

– System
Jan 14 '16 at 19:57

@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

– jasonwryan
Jan 14 '16 at 19:57

Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

– System
Jan 14 '16 at 19:58

@System updated to ensure it only operates on the fifth field.

– jasonwryan
Jan 14 '16 at 20:15

|
show 2 more comments

Using sed to remove all instances of '";':
sed -i 's/[";]//g' file

To only remove from 5th column sed is probably not the best option.

answered Jan 14 '16 at 19:54

Dani_l

3,195929

add a comment |

If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:

tr -d '";' < input.txt > output.txt

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

add a comment |

I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.

<?php



foreach(file($argv[1]) as $line){



    $matches = array();

    preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);

    $matched_line = array_shift($matches); // remove the first element

    vprintf("%st%st%st%st%sn",$matches);

}

this would output this

$ php /tmp/preg_replace.php /tmp/data

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 15 '16 at 16:56

answered Jan 14 '16 at 20:08

jbrahy

22916

1

I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

– jasonwryan
Jan 14 '16 at 20:17

I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

– jbrahy
Jan 14 '16 at 20:18

I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

– jasonwryan
Jan 14 '16 at 20:24

ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

– jbrahy
Jan 14 '16 at 20:25

Fair enough, it is always good to see solutions using different approaches...

– jasonwryan
Jan 14 '16 at 20:46

add a comment |

A sed solution that makes sure we're only fiddling around with the fifth column:

sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

This works also without ERE (-E, or -r for some older sed), but requires a lot more backslashes. The +-quantifier is ERE-only according to the POSIX spec¹ and can be replaced by {1,} (or {1,} for BRE).

In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.

The regex in detail:

^               # Anchored at start of line

(               # Capture group 1 for first 4 columns

    (           # Capture group 2 for repeat count

        [^ ]+   # 1 or more non-spaces

         +      # 1 or more spaces

    ){4}        # 4 times "word plus spaces" (columns)

)               # End capture group 1

"               # Column 5 starts with double quote (not captured)

(               # Capture group 3 for column 5

    [^"]+       # One or more non-quote characters

)               # End capture group 3

";              # Quote and semicolon at end of column 5

$               # Anchored at end of line

¹ GNU sed, as an extension, allows + to be used in BRE as well.

edited Jul 3 '18 at 13:42

answered Jan 17 '16 at 6:28

Benjamin W.

397312

add a comment |

If every line has fixed length (as in the example) than

cut -c1-28,30-46 INFILE

will work.

answered Jan 17 '16 at 7:13

Jshura

1693

add a comment |

In bash you can use string manipulation to achieve what you want. Here is the code

[root@localhost]# cat ./test.sh

#!/usr/bin/env bash



while IFS= read -r line; do

        echo ${line//[";]/}

done < sample.txt

and this is the output

[root@localhost]# ./test.sh

chr1 134901 139379 - ENSG00000237683.5

chr1 860260 879955 + ENSG00000187634.6

chr1 861264 866445 - ENSG00000268179.1

chr1 879584 894689 - ENSG00000188976.6

chr1 895967 901095 + ENSG00000187961.9

answered 1 min ago

Manish R

1032

New contributor

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f255380%2fhow-to-remove-symbols-from-a-column-using-awk%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

7 Answers
7

active

oldest

votes

7 Answers
7

active

oldest

votes

Using gsub:

awk '{gsub(/"|;/,"")}1' file

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:

awk '{gsub(/"|;/,"",$5)}1' file

edited Jan 14 '16 at 20:11

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

1

This would remove from all columns, not just 5th, no?

– Dani_l
Jan 14 '16 at 19:55

This is what I thought initally, but after using the code it seemed to keep all columns.

– System
Jan 14 '16 at 19:57

@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

– jasonwryan
Jan 14 '16 at 19:57

Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

– System
Jan 14 '16 at 19:58

@System updated to ensure it only operates on the fifth field.

– jasonwryan
Jan 14 '16 at 20:15

|
show 2 more comments

Using gsub:

awk '{gsub(/"|;/,"")}1' file

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:

awk '{gsub(/"|;/,"",$5)}1' file

edited Jan 14 '16 at 20:11

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

1

This would remove from all columns, not just 5th, no?

– Dani_l
Jan 14 '16 at 19:55

This is what I thought initally, but after using the code it seemed to keep all columns.

– System
Jan 14 '16 at 19:57

@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

– jasonwryan
Jan 14 '16 at 19:57

Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

– System
Jan 14 '16 at 19:58

@System updated to ensure it only operates on the fifth field.

– jasonwryan
Jan 14 '16 at 20:15

|
show 2 more comments

Using gsub:

awk '{gsub(/"|;/,"")}1' file

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:

awk '{gsub(/"|;/,"",$5)}1' file

edited Jan 14 '16 at 20:11

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

Using gsub:

awk '{gsub(/"|;/,"")}1' file

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:

awk '{gsub(/"|;/,"",$5)}1' file

edited Jan 14 '16 at 20:11

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

edited Jan 14 '16 at 20:11

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

answered Jan 14 '16 at 19:45

jasonwryan

50.3k14135189

1

This would remove from all columns, not just 5th, no?

– Dani_l
Jan 14 '16 at 19:55

This is what I thought initally, but after using the code it seemed to keep all columns.

– System
Jan 14 '16 at 19:57

@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

– jasonwryan
Jan 14 '16 at 19:57

Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

– System
Jan 14 '16 at 19:58

@System updated to ensure it only operates on the fifth field.

– jasonwryan
Jan 14 '16 at 20:15

|
show 2 more comments

1

This would remove from all columns, not just 5th, no?

– Dani_l
Jan 14 '16 at 19:55

This is what I thought initally, but after using the code it seemed to keep all columns.

– System
Jan 14 '16 at 19:57

@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

– jasonwryan
Jan 14 '16 at 19:57

Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

– System
Jan 14 '16 at 19:58

@System updated to ensure it only operates on the fifth field.

– jasonwryan
Jan 14 '16 at 20:15

This would remove from all columns, not just 5th, no?

– Dani_l
Jan 14 '16 at 19:55

This is what I thought initally, but after using the code it seemed to keep all columns.

– System
Jan 14 '16 at 19:57

@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

– jasonwryan
Jan 14 '16 at 19:57

Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

– System
Jan 14 '16 at 19:58

@System updated to ensure it only operates on the fifth field.

– jasonwryan
Jan 14 '16 at 20:15

|
show 2 more comments

Using sed to remove all instances of '";':
sed -i 's/[";]//g' file

To only remove from 5th column sed is probably not the best option.

answered Jan 14 '16 at 19:54

Dani_l

3,195929

add a comment |

Using sed to remove all instances of '";':
sed -i 's/[";]//g' file

To only remove from 5th column sed is probably not the best option.

answered Jan 14 '16 at 19:54

Dani_l

3,195929

add a comment |

Using sed to remove all instances of '";':
sed -i 's/[";]//g' file

To only remove from 5th column sed is probably not the best option.

answered Jan 14 '16 at 19:54

Dani_l

3,195929

Using sed to remove all instances of '";':
sed -i 's/[";]//g' file

To only remove from 5th column sed is probably not the best option.

answered Jan 14 '16 at 19:54

Dani_l

3,195929

answered Jan 14 '16 at 19:54

Dani_l

3,195929

answered Jan 14 '16 at 19:54

Dani_l

3,195929

answered Jan 14 '16 at 19:54

Dani_l

3,195929

add a comment |

If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:

tr -d '";' < input.txt > output.txt

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

add a comment |

If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:

tr -d '";' < input.txt > output.txt

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

add a comment |

If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:

tr -d '";' < input.txt > output.txt

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:

tr -d '";' < input.txt > output.txt

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

answered Jan 14 '16 at 23:40

Digital Trauma

5,90211528

add a comment |

<?php



foreach(file($argv[1]) as $line){



    $matches = array();

    preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);

    $matched_line = array_shift($matches); // remove the first element

    vprintf("%st%st%st%st%sn",$matches);

}

this would output this

$ php /tmp/preg_replace.php /tmp/data

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 15 '16 at 16:56

answered Jan 14 '16 at 20:08

jbrahy

22916

1

I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

– jasonwryan
Jan 14 '16 at 20:17

I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

– jbrahy
Jan 14 '16 at 20:18

I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

– jasonwryan
Jan 14 '16 at 20:24

ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

– jbrahy
Jan 14 '16 at 20:25

Fair enough, it is always good to see solutions using different approaches...

– jasonwryan
Jan 14 '16 at 20:46

add a comment |

<?php



foreach(file($argv[1]) as $line){



    $matches = array();

    preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);

    $matched_line = array_shift($matches); // remove the first element

    vprintf("%st%st%st%st%sn",$matches);

}

this would output this

$ php /tmp/preg_replace.php /tmp/data

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 15 '16 at 16:56

answered Jan 14 '16 at 20:08

jbrahy

22916

1

I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

– jasonwryan
Jan 14 '16 at 20:17

I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

– jbrahy
Jan 14 '16 at 20:18

I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

– jasonwryan
Jan 14 '16 at 20:24

ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

– jbrahy
Jan 14 '16 at 20:25

Fair enough, it is always good to see solutions using different approaches...

– jasonwryan
Jan 14 '16 at 20:46

add a comment |

<?php



foreach(file($argv[1]) as $line){



    $matches = array();

    preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);

    $matched_line = array_shift($matches); // remove the first element

    vprintf("%st%st%st%st%sn",$matches);

}

this would output this

$ php /tmp/preg_replace.php /tmp/data

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 15 '16 at 16:56

answered Jan 14 '16 at 20:08

jbrahy

22916

<?php



foreach(file($argv[1]) as $line){



    $matches = array();

    preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);

    $matched_line = array_shift($matches); // remove the first element

    vprintf("%st%st%st%st%sn",$matches);

}

this would output this

$ php /tmp/preg_replace.php /tmp/data

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

edited Jan 15 '16 at 16:56

answered Jan 14 '16 at 20:08

jbrahy

22916

edited Jan 15 '16 at 16:56

answered Jan 14 '16 at 20:08

jbrahy

22916

answered Jan 14 '16 at 20:08

jbrahy

22916

answered Jan 14 '16 at 20:08

jbrahy

22916

1

I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

– jasonwryan
Jan 14 '16 at 20:17

I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

– jbrahy
Jan 14 '16 at 20:18

I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

– jasonwryan
Jan 14 '16 at 20:24

ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

– jbrahy
Jan 14 '16 at 20:25

Fair enough, it is always good to see solutions using different approaches...

– jasonwryan
Jan 14 '16 at 20:46

add a comment |

1

I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

– jasonwryan
Jan 14 '16 at 20:17

I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

– jbrahy
Jan 14 '16 at 20:18

I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

– jasonwryan
Jan 14 '16 at 20:24

ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

– jbrahy
Jan 14 '16 at 20:25

Fair enough, it is always good to see solutions using different approaches...

– jasonwryan
Jan 14 '16 at 20:46

I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

– jasonwryan
Jan 14 '16 at 20:17

I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

– jbrahy
Jan 14 '16 at 20:18

I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

– jasonwryan
Jan 14 '16 at 20:24

ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

– jbrahy
Jan 14 '16 at 20:25

Fair enough, it is always good to see solutions using different approaches...

– jasonwryan
Jan 14 '16 at 20:46

add a comment |

A sed solution that makes sure we're only fiddling around with the fifth column:

sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.

The regex in detail:

^               # Anchored at start of line

(               # Capture group 1 for first 4 columns

    (           # Capture group 2 for repeat count

        [^ ]+   # 1 or more non-spaces

         +      # 1 or more spaces

    ){4}        # 4 times "word plus spaces" (columns)

)               # End capture group 1

"               # Column 5 starts with double quote (not captured)

(               # Capture group 3 for column 5

    [^"]+       # One or more non-quote characters

)               # End capture group 3

";              # Quote and semicolon at end of column 5

$               # Anchored at end of line

¹ GNU sed, as an extension, allows + to be used in BRE as well.

edited Jul 3 '18 at 13:42

answered Jan 17 '16 at 6:28

Benjamin W.

397312

add a comment |

A sed solution that makes sure we're only fiddling around with the fifth column:

sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.

The regex in detail:

^               # Anchored at start of line

(               # Capture group 1 for first 4 columns

    (           # Capture group 2 for repeat count

        [^ ]+   # 1 or more non-spaces

         +      # 1 or more spaces

    ){4}        # 4 times "word plus spaces" (columns)

)               # End capture group 1

"               # Column 5 starts with double quote (not captured)

(               # Capture group 3 for column 5

    [^"]+       # One or more non-quote characters

)               # End capture group 3

";              # Quote and semicolon at end of column 5

$               # Anchored at end of line

¹ GNU sed, as an extension, allows + to be used in BRE as well.

edited Jul 3 '18 at 13:42

answered Jan 17 '16 at 6:28

Benjamin W.

397312

add a comment |

A sed solution that makes sure we're only fiddling around with the fifth column:

sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.

The regex in detail:

^               # Anchored at start of line

(               # Capture group 1 for first 4 columns

    (           # Capture group 2 for repeat count

        [^ ]+   # 1 or more non-spaces

         +      # 1 or more spaces

    ){4}        # 4 times "word plus spaces" (columns)

)               # End capture group 1

"               # Column 5 starts with double quote (not captured)

(               # Capture group 3 for column 5

    [^"]+       # One or more non-quote characters

)               # End capture group 3

";              # Quote and semicolon at end of column 5

$               # Anchored at end of line

¹ GNU sed, as an extension, allows + to be used in BRE as well.

edited Jul 3 '18 at 13:42

answered Jan 17 '16 at 6:28

Benjamin W.

397312

A sed solution that makes sure we're only fiddling around with the fifth column:

sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile

chr1    134901  139379  -   ENSG00000237683.5

chr1    860260  879955  +   ENSG00000187634.6

chr1    861264  866445  -   ENSG00000268179.1

chr1    879584  894689  -   ENSG00000188976.6

chr1    895967  901095  +   ENSG00000187961.9

In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.

The regex in detail:

^               # Anchored at start of line

(               # Capture group 1 for first 4 columns

    (           # Capture group 2 for repeat count

        [^ ]+   # 1 or more non-spaces

         +      # 1 or more spaces

    ){4}        # 4 times "word plus spaces" (columns)

)               # End capture group 1

"               # Column 5 starts with double quote (not captured)

(               # Capture group 3 for column 5

    [^"]+       # One or more non-quote characters

)               # End capture group 3

";              # Quote and semicolon at end of column 5

$               # Anchored at end of line

¹ GNU sed, as an extension, allows + to be used in BRE as well.

edited Jul 3 '18 at 13:42

answered Jan 17 '16 at 6:28

Benjamin W.

397312

edited Jul 3 '18 at 13:42

answered Jan 17 '16 at 6:28

Benjamin W.

397312

answered Jan 17 '16 at 6:28

Benjamin W.

397312

answered Jan 17 '16 at 6:28

Benjamin W.

397312

add a comment |

If every line has fixed length (as in the example) than

cut -c1-28,30-46 INFILE

will work.

answered Jan 17 '16 at 7:13

Jshura

1693

add a comment |

If every line has fixed length (as in the example) than

cut -c1-28,30-46 INFILE

will work.

answered Jan 17 '16 at 7:13

Jshura

1693

add a comment |

If every line has fixed length (as in the example) than

cut -c1-28,30-46 INFILE

will work.

answered Jan 17 '16 at 7:13

Jshura

1693

If every line has fixed length (as in the example) than

cut -c1-28,30-46 INFILE

will work.

answered Jan 17 '16 at 7:13

Jshura

1693

answered Jan 17 '16 at 7:13

Jshura

1693

answered Jan 17 '16 at 7:13

Jshura

1693

answered Jan 17 '16 at 7:13

Jshura

1693

add a comment |

In bash you can use string manipulation to achieve what you want. Here is the code

[root@localhost]# cat ./test.sh

#!/usr/bin/env bash



while IFS= read -r line; do

        echo ${line//[";]/}

done < sample.txt

and this is the output

[root@localhost]# ./test.sh

chr1 134901 139379 - ENSG00000237683.5

chr1 860260 879955 + ENSG00000187634.6

chr1 861264 866445 - ENSG00000268179.1

chr1 879584 894689 - ENSG00000188976.6

chr1 895967 901095 + ENSG00000187961.9

answered 1 min ago

Manish R

1032

New contributor

add a comment |

In bash you can use string manipulation to achieve what you want. Here is the code

[root@localhost]# cat ./test.sh

#!/usr/bin/env bash



while IFS= read -r line; do

        echo ${line//[";]/}

done < sample.txt

and this is the output

[root@localhost]# ./test.sh

chr1 134901 139379 - ENSG00000237683.5

chr1 860260 879955 + ENSG00000187634.6

chr1 861264 866445 - ENSG00000268179.1

chr1 879584 894689 - ENSG00000188976.6

chr1 895967 901095 + ENSG00000187961.9

answered 1 min ago

Manish R

1032

New contributor

add a comment |

In bash you can use string manipulation to achieve what you want. Here is the code

[root@localhost]# cat ./test.sh

#!/usr/bin/env bash



while IFS= read -r line; do

        echo ${line//[";]/}

done < sample.txt

and this is the output

[root@localhost]# ./test.sh

chr1 134901 139379 - ENSG00000237683.5

chr1 860260 879955 + ENSG00000187634.6

chr1 861264 866445 - ENSG00000268179.1

chr1 879584 894689 - ENSG00000188976.6

chr1 895967 901095 + ENSG00000187961.9

answered 1 min ago

Manish R

1032

New contributor

In bash you can use string manipulation to achieve what you want. Here is the code

[root@localhost]# cat ./test.sh

#!/usr/bin/env bash



while IFS= read -r line; do

        echo ${line//[";]/}

done < sample.txt

and this is the output

[root@localhost]# ./test.sh

chr1 134901 139379 - ENSG00000237683.5

chr1 860260 879955 + ENSG00000187634.6

chr1 861264 866445 - ENSG00000268179.1

chr1 879584 894689 - ENSG00000188976.6

chr1 895967 901095 + ENSG00000187961.9

answered 1 min ago

Manish R

1032

New contributor

answered 1 min ago

Manish R

1032

New contributor

answered 1 min ago

Manish R

1032

answered 1 min ago

Manish R

1032

New contributor

Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cdtjkyj