John,
thanks for sending the source code! - I still have to study it in detail...
I just have read about UCA and collation tailoring in the 10.0.1 docs.
As this seems to one of your favourite topics:
In the application I talked of, we typically have to compare German names
(of persons and places).
As you may know, these may contain "umlauts" like 'ä' or special characters
like 'ß' (the "sharp s").
However, in older, restricted charsets (or in internationalized uses like
mail addresses), these umlauts have often been expanded to two characters,
e.g. 'ä' to 'ae' or 'ß' to 'ss'.
So one task we face is to have 'ä' and 'ae' to compare to be the same.
AFAIK, single-byte collations can only compare characters one by one and
therefore can not treat 'ä' and 'ae' as wanted.
Is this the same for unicode collations, or could I establish some rule to
make 'ä' and 'ae' the same?
(So far, we have solved this problem by storing both the original names and
an "normalized" form, where umlauts are expanded and everything is uppercase
and some phonetic simplifications are done (e.g. 'ph' sounds like 'f' and is
therefore normalized to 'f'). The normalized form is stored as an computed
field and is automatically calculated by an user-defined function.
Comparisons are then done on the normalized forms.
This works well with the typical German '1252LATIN1" single-byte collation.)
Any hint if UCA may give better facilities is highly appreciated...
Volker
Post by John SmirniosYou may need to send me an email first or give me another email address.
My email to you bounced with the following message: "Mail rejected for
policy reasons."
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
Post by Volker BarthJohn,
would it be possible to send the sample to me, too?
I have used SIMILAR since about 10 years to do "automated customer data
clearing" stuff, and it works quite well - in conjunction with LIKE and the
like...
I once had done tests with user-defined and external functions (e.g. with
the "Levenshtein" algorithm) but they were far too slow in contrast to
SIMILAR.
Fortunately, the algorithm seems to have been the same since V5.5...so I
would like to have a chance to look at it more closely.
TIA
Volker
(Feeling fine not to have to port that to MS SQL / ASE)
Post by John SmirniosYes. I'll send you a sample via email.
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
Post by RobDoes anyone know the algorithm used to compare strings in the Similar()
function? I have to create a similar function, excuse the pun, in SQL
Server.
TIA
P.S I hate SQL Server!