Damon Cortesi's blog

Musings of an entrepreneur.

Unicode Grep

| Comments

I got caught by a bit of a bug today when I was trying to add a custom wallet item in 1Password. I was in the process of copying one of their templates after realizing they were just simple json, but couldn’t find the file where the description strings were stored. The file, Localizable.strings, that the above article eventually led me to turned out to be UTF-16, which grep cannot … grep through. After a little bit of googling, I came up with the following solution, which does a recursive case-insensitive grep in UTF-16 files on OS X.

1
2
3
4
5
6
<div class='bogus-wrapper'><notextile><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
</span><span class='line'>        do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}
</span><span class='line'>done</span></code></pre></td></tr></table></div></figure></notextile></div>

Update: I also put an accompanying shell script on github (ugrep.git) in the event that I need to make it a bit more flexible.

Comments