Talk:Soundex

From Wikipedia, the free encyclopedia

Soundex may be used for indexing words, but it was specifically designed for names, and doesn't always apply well to much else. RossPatterson 21:39, 24 September 2005 (UTC)

Soundex was specifically designed to index names of Western European origin, and not much else. This was a huge bias on the inventor(s) part since they were American, and in the 18th/19th centuries the vast majority of Americans had surnames of Western European origin. This is not necessarily the case today. PS - Can someone please demonstrate that Magaret Odell had anything to do with this patent? I don't see where she appears anywhere on the patent as a patent holder and I can't find any information on her anywhere... Thx. 12.110.196.19 15:48, 5 April 2006 (UTC)
Knuth describes Soundex in volume 3 (p. 394 in the second edition) as: "... a technique that was originally developed by Margaret K. Odell and Robert C. Russell [see U.S. Patents 1261167 (1918), 1435663 (1922)], ...", and that's good enough for me. I expect just about every reference you'll find on the web goes back to Knuth. It's certainly true that only Russell's name is on those two patents, but I see on re-reading that neither Knuth nor this article say that Odell held the patents. RossPatterson 02:00, 6 April 2006 (UTC)
This page said Odell was a co-inventory back in the March revision. I see someone updated the page to reflect that Daitch-Mokotoff returns up to 32 separate encodings. The range of encodings, however, is actually 000000 to 999999; although many numeric combinations in this range will never be encountered because of restrictions on side-by-side duplicate digits in the rules for the algorithm.69.116.243.218 02:20, 22 July 2006 (UTC)

[edit] C Code removal

I'm sorry Sudipta, because I do believe you were acting in perfectly good faith and put plenty of effort into your C implementation of the algorithm, but unfortunately incorporating your own work contravenes the No Original Research standard so I removed it. I hope you understand. Fortunately there are plenty of avenues on the internet where publishing original code is positively encouraged, so maybe your implementation can find a public there? --VinceBowdren 22:41, 22 May 2007 (UTC)

[edit] Better algorithm?

This description doesn't seem very good - at least it's not possible to just convert the steps to code on-by-one as they're written here. The main problem is that step 2 tells you to discard the vowels, but then step 4 refers to the original string. Is there a better algorithm somewhere? Interplanet Janet 16:18, 25 September 2007 (UTC)

One modified algorithm may work fine for you:
  1. Retain the first letter of the string
  2. (amecican census version only): remove any H or W unless it is the first letter
  3. Assign numbers to letters (after the first) as follows:
    • b, f, p, v = 1
    • c, g, j, k, q, s, x, z = 2
    • d, t = 3
    • l = 4
    • m, n = 5
    • r = 6
    • a, e, h, i, o, u, w, y = 0
  4. Any runs of 2 or more of the same digit should be replaced with a single copy.
  5. Remove any 0 digits
  6. Return the first four characters, right-padding with zeroes if there are fewer than four.
Does that help? 67.76.205.139 (talk) 23:53, 10 January 2008 (UTC)
This doesn't deal with the case where the second letter of the string has the same code number as the first letter (in which case the second should be ignored).JMG (talk) 02:42, 2 June 2008 (UTC)