Removing Control Characters From a Text File in Linux

A few weeks ago, I had a text file generated in widows, and it had the “^M” control character at the end of each line. I had to compare it with a similar file generated on my Linux machine (using diff), and because of these control characters it failed diff.

The first step was to check why diff failed. Normally, you cannot see these control characters, and one might be surprised that the comparison fails thought the files are exactly the same. In order to see these characters, one should use “-A” flag with the cat command:

cat -A textfile

Next step is to removc these characters. This wasn’t easy, and I asked a friend who referred me to the program “dos2unix” which converts text files generated in dos/widows into their Linux counterparts (and vice versa).

This worked!

Recently, I found another way to do this, using “sed”. The trick was to type the control character “^M” in the terminal.

This is how we’d want the sed command to look like:

1
sed -e 's/^M//g' widosTextFile > newTextFile

This should replace the character “^M” by nothing (The ‘g’ is for global: i.e. replace every instance of ^M in each line, not only the first instance in each line), which is exactly what we want.

But, just typing ^M won’t work, since it will be interpretted as two characters, ^ and M, which is not the behavior we want. To type the control character ^M one should use this:  [ctrl]+[v] and then [ctrl]+[m]. This will be displayed as ^M in the terminal, but it will have a completely different meaning.

Hope it helps somebody.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit Post to StumbleUpon

Related posts:

  1. Re-Use A Bash Command With Different Parameters

Tags: , ,

Wednesday, January 28th, 2009 Linux

8 Comments to Removing Control Characters From a Text File in Linux

  • Bharat says:

    Hi, I read your posts and found it very helpful. Once I also faced the same situation as you said, for that I did,

    cat file.txt|strings > new-file.txt

    its simple and quick. :)

  • Amir Watad says:

    Hey,
    Thanks for your comment.
    Actually I didn’t know about the “strings” thing, but I tried it and it seems to work. Thank you :)

  • Hey Amir, saw your posts about forwarded mail and gmail when searching for some filters I could use and then ran across this, which is something I run into a lot as I use Dropbox to sync my windows and linux boxes (I program on Windows, then compile on Linux. Before I finalize my projects, I run them through a very easy to use tool called dos2unix:
    http://linuxcommand.org/man_pages/dos2unix1.html
    Then I set up an alias called dos that has the command ‘dos2unix -p -v *.*’ to run it on all the files in a folder and keep the same filename. I realize this post is a week old and there are many solutions, thought I’d share mine. Like the site a lot, keep it up!

  • Amir Watad says:

    Hey Billy,
    Thank you for your comment.
    Actually I mentioned dos2unix in this post as the trivial solution. Thank you for the detailed explanation though.
    Very happy that you liked the site.

    Thank you again.

  • Jimmy says:

    Hello Amir

    excellent solution to a problem that was driving me daft.

    Thanks

  • Amir Watad says:

    Hi Jimmy,
    Thank you for your comment, and happy that I could help.
    Have a nice day :)

  • Y Devesh says:

    Thanks a lot Amir.
    I was looking exactly for this.

  • Amir Watad says:

    You welcome @Devesh. Happy to help ;-)

  • Leave a Reply

    my email
    my photo
    Hi,
    My name is Amir Watad. I have a BSc. in biomedical engineering from The Biomedical Engineering school , Technion , Israel, and a BSc. in electrical engineering from The Electrical Engineering school , Technion , Israel.
    I'm a Verification Engineer in Mellanox Technologies Ltd.
    I love Linux, the Command Line and the OpenSource Community.
    I used to write Poems (Arabic) when I was able to find time for this.
    January 2009
    S M T W T F S
    « Dec   Feb »
     123
    45678910
    11121314151617
    18192021222324
    25262728293031
    SEO Powered by Platinum SEO from Techblissonline

    Twitter links powered by Tweet This v1.7.3, a WordPress plugin for Twitter.