Removing Control Characters From a Text File in Linux
A few weeks ago, I had a text file generated in widows, and it had the “^M” control character at the end of each line. I had to compare it with a similar file generated on my Linux machine (using diff), and because of these control characters it failed diff.
The first step was to check why diff failed. Normally, you cannot see these control characters, and one might be surprised that the comparison fails thought the files are exactly the same. In order to see these characters, one should use “-A” flag with the cat command:
cat -A textfile
Next step is to removc these characters. This wasn’t easy, and I asked a friend who referred me to the program “dos2unix” which converts text files generated in dos/widows into their Linux counterparts (and vice versa).
This worked!
Recently, I found another way to do this, using “sed”. The trick was to type the control character “^M” in the terminal.
This is how we’d want the sed command to look like:
1 sed -e 's/^M//g' widosTextFile > newTextFile
This should replace the character “^M” by nothing (The ‘g’ is for global: i.e. replace every instance of ^M in each line, not only the first instance in each line), which is exactly what we want.
But, just typing ^M won’t work, since it will be interpretted as two characters, ^ and M, which is not the behavior we want. To type the control character ^M one should use this: [ctrl]+[v] and then [ctrl]+[m]. This will be displayed as ^M in the terminal, but it will have a completely different meaning.
Hope it helps somebody.
If you enjoyed this post, make sure you subscribe to my RSS feed!Related posts:
8 Comments to Removing Control Characters From a Text File in Linux
Leave a Reply
About Me
Tags
Categories
- Algorithms
- Bash
- BlackBerry
- Collaboration
- Command Line
- Cool Tricks
- Easter Eggs
- Ebooks
- Firefox
- Hardware
- Humor
- iPhone
- Linux
- Linux Development
- Linux Kernel
- Networks
- Open Knowledge
- Other
- Productivity
- Programming
- Regular Expressions
- Science
- Security
- Shell Scripts
- Short Posts
- Social Networks
- Thoughts
- Tools
- Vim
- Web Development
- Websites
Popular Posts
Calendar
Archives
- September 2010 (1)
- August 2010 (2)
- July 2010 (5)
- June 2010 (1)
- May 2010 (1)
- April 2010 (3)
- March 2010 (1)
- January 2010 (1)
- December 2009 (2)
- September 2009 (13)
- July 2009 (1)
- June 2009 (6)
- May 2009 (4)
- March 2009 (18)
- February 2009 (10)
- January 2009 (10)
- December 2008 (7)
- November 2008 (8)
- October 2008 (1)
- August 2008 (1)
- July 2008 (1)
- June 2008 (2)

Hi, I read your posts and found it very helpful. Once I also faced the same situation as you said, for that I did,
cat file.txt|strings > new-file.txt
its simple and quick.
Hey,
Thanks for your comment.
Actually I didn’t know about the “strings” thing, but I tried it and it seems to work. Thank you
Hey Amir, saw your posts about forwarded mail and gmail when searching for some filters I could use and then ran across this, which is something I run into a lot as I use Dropbox to sync my windows and linux boxes (I program on Windows, then compile on Linux. Before I finalize my projects, I run them through a very easy to use tool called dos2unix:
http://linuxcommand.org/man_pages/dos2unix1.html
Then I set up an alias called dos that has the command ‘dos2unix -p -v *.*’ to run it on all the files in a folder and keep the same filename. I realize this post is a week old and there are many solutions, thought I’d share mine. Like the site a lot, keep it up!
Hey Billy,
Thank you for your comment.
Actually I mentioned dos2unix in this post as the trivial solution. Thank you for the detailed explanation though.
Very happy that you liked the site.
Thank you again.
Hello Amir
excellent solution to a problem that was driving me daft.
Thanks
Hi Jimmy,
Thank you for your comment, and happy that I could help.
Have a nice day
Thanks a lot Amir.
I was looking exactly for this.
You welcome @Devesh. Happy to help