www.delorie.com/gnu/docs/diffutils/diff_100.html   search  
 
Buy GNU books!


Comparing and Merging Files

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

18.1.1 Handling Multibyte and Varying-Width Characters

diff, diff3 and sdiff treat each line of input as a string of unibyte characters. This can mishandle multibyte characters in some cases. For example, when asked to ignore spaces, diff does not properly ignore a multibyte space character.

Also, diff currently assumes that each byte is one column wide, and this assumption is incorrect in some locales, e.g., locales that use UTF-8 encoding. This causes problems with the `-y' or `--side-by-side' option of diff.

These problems need to be fixed without unduly affecting the performance of the utilities in unibyte environments.

The IBM GNU/Linux Technology Center Internationalization Team has proposed some patches to support internationalized diff http://oss.software.ibm.com/developer/opensource/linux/patches/i18n/diffutils-2.7.2-i18n-0.1.patch.gz. Unfortunately, these patches are incomplete and are to an older version of diff, so more work needs to be done in this area.


  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003