Friday 31 August 2007

What's this Soundex in Delphi?

Delphi has a Soundex routine found in StrUtils, so why am I giving you another? This blog is about how Soundex works. understanding Soundex will help you decide how and when to use it to enhance your program. I am not suggesting for a moment that you dump the already written Soundex routine within Delphi, unless you really want to, but rather I am suggesting that this may be a way to understand how a Soundex code is put together.

In the early 1980's I wrote a Soundex routine in Turbo Pascal and released it to the world through the bulletin boards that were available back then (before the Internet became readily available to the masses). By releasing it to the public domain, I was more interested in sharing than in copyrighting everything.

Over the following years, to my great amusement, I found my routine copied several times, sometimes word for word, line for line including comments, with my name replaced with someone else claiming to be the author (I would not be so amused these days). As far as I could tell with the limited searching available at the time, I was the first person to write a Soundex routine entirely in Turbo Pascal, but it is a simple routine so I could easily be wrong on that count.

The original Turbo Pascal source has been lost to .. well .. somewhere in that box of 5.5" floppies over in the corner there that I've been taking to the dump any day now for about the last 15 years.

Soundex converts a name to four characters (one letter and three numbers). The conversion will result in the same numbers for like-sounding names. For example all the following names will all result in a Soundex code of "S530" - smith, Smith, smythe, smitt, shmidt, shmidt, snith, snyth, snythe, smmith, etc. This means that if a user enters "Smith", and I search the database for the Soundex code rather than the name "Smith", I will be presented with all those and other similar sounding names.

Soundex was originally invented by Robert C. Russel and Margaret K. Odell around 1918-1922 and initially used for immigration and census information. Being before computers, the Soundex code is naturally simple so that people can calculate Soundex manually. It is used most in computer applications now. Searching your own database for names that sound like 'Smith', for spell checking and other areas.

While Soundex is clever, it is not infallible so common sense must prevail. 'Clark', and 'Klark' will come out with a different Soundex code. other examples are Bate (B300), will not be the same as Bates (B320), or Gate (G300).

In other instances names that really don't sound the same can sometimes, by sheer coincidence, have the same soundex code. Have you ever wondered why sometimes your spell checker wants to replace a mistyped word with something completely different and totally out of context? You are left wondering how on earth the spell checker came up with that word. Well, now you know. Its Soundex. However, it is still clever enough for most uses.

So what's the process? Well, it was simple for a reason (above) and still is.

  • Take the first letter
  • Replace the letters BFPV with '1'
  • Replace the letters CGJKQSXZ with '2'
  • Replace the letters DT with '3'
  • Replace the letter L with '4'
  • Replace the letters MN with '5'
  • Replace the letter R with '6'
  • Ignore all other letters and ignore double letters (e.g. the 2 T's in "Letterman").
  • If the result is less than 4 characters, then pad out with zeros.
  • If result is greater than 4 characters then use only the first 4

And there you have your Soundex code.

There are many ways to implement Soundex and I have chosen one that I hope will allow you to follow what is happening. Take a look at the Soundex routine in StrUtils for another way.

function MySoundex(sName: string): string;
var
Ch, LastCh: Char;
i : integer;
sx : string;
begin
sName := UpperCase(trim(sName));
if length(sName) < 1 then
sx := '' // got nothing, send nothing back
else
begin
LastCh := #0;
for i := 1 to length(sName) do
begin // step through each character in the name
if i = 1 then
sx := sName[1] // store the first character
else
begin
ch := #0;
if sName[i] <> LastCh then
begin
case sName[i] of
'B', 'F', 'P', 'V': ch := '1';
'C', 'G', 'J', 'K',
'Q', 'S', 'X', 'Z': ch := '2';
'D', 'T': ch := '3';
'L': ch := '4';
'M', 'N': ch := '5';
'R': ch := '6';
// Note no ELSE - ignore all other letters
end;
if ch <> #0 then
sx := sx + ch;
if length(sx) > 3 then
break; // we got all we need
end;
end;
LastCh := sname[i];
end;
while length(sx) < 4 do
sx := sx + '0'; // pad out remaining with zero
end;
result := sx;
end;


There are lots of variations and improvements on that depending on what you are doing.

So where would we use Soundex? One obvious way would be to give your users the ability to search for a name by the way it sounds.

Consider a database full of hundreds of thousands of customers. Simply add a field to the Customer table called .. oh I don't know, how about "Soundex". Then add the soundex code for the customer surname on each customer.

Because of the way Soundex works, you may find that finding all customers that sound like "Smith" may be a lot faster than finding all customers with the exact name of "Smith". How can this be? Well, since there will be less unique Soundex codes than unique surnames, the index will have less problems finding a Soundex code. Lets say that you are searching for a name in Auckland (approx 1.3 mill), an index on unique surnames will possibly total around 800,000 (rough guess). However an index on unique Soundex codes for surnames may only total about 100,000 (another wild guess with no substance at all, but you get the idea).

Ramblings


Its beautiful outside again today and although yesterday the wind was getting up a little, I can start to feel the rumblings of spring around here. Are we at last getting out of the deep dark winter that held our grip for the past few months and kept us close to our box of tissues, warm jerseys, and heaters? I hope so. It made me think of where I would be if I had accepted the offer to spend the next few years in Mongolia. I would have loved the experience of learning a new culture and language, but "The Mount" has a way of making me smile.

I climbed to the top of the mount again last weekend where the view is spectacular.
Mount Maunganui from the top of the Mount. It was sad to think I'll soon be moving inland to Hamilton, an hour and a half away.

Have a wonderful day wherever you are and God's blessings.

Thursday 23 August 2007

Have an iif() function in Delphi

There are many advantages to learning other programming languages, but there are also some problems as well. The "Immediate If" (iif) function is one of those. In most, if not all, the other languages there is an immediate if function that makes programming just a little easier and allows you to say things in much less lines. Consider the following...

    if Str = 'Top' then
s := 'Woohoo!'
else
s := 'Climb some more';


While in Delphi we can effectively write this on one line as...

    if Str = 'Top' then s := 'Woohoo!' else s := 'Climb some more';


This looks untidy. If we were using another language with immediate if, that would look like this...

    s := iif(Str='Top', 'Woohoo!', 'Climb some more');


Now I realise to some Delphi programmers that are not multiple programming language experienced, all that seems a little esoteric but it is standard use in even Excel. Here's how it works...

iif(condition, TrueResult, FalseResult) : String;


The iif() function first looks at the boolean condition. If that condition is true, then it return the TrueResult, and if it is false, then it returns the FalseResult. Here's the declaration

function iif(Test: boolean; TrueR, FalseR: string): string;


The 1st parameter is the Test condition, this can be anything (e.g.: x > y).
The 2nd parameter is the string to return if the test condition is True.
The 3rd parameter is the string to return if the test condition is False.

Here's the full function...

function iif(Test: boolean; TrueR, FalseR: string): string;
begin
if Test then
Result := TrueR
else
Result := FalseR;
end;


but in other languages, the 'type' we return can vary. For example we may want to return 99 if true or 0 of false. How do we do that in Delphi? By using the 'overload' directive.

function iif(Test: boolean; TrueR, FalseR: string): string; overload;
function iif(Test: boolean; TrueR, FalseR: integer): integer; overload;
function iif(Test: boolean; TrueR, FalseR: extended): extended; overload;

Implementation

function iif(Test: boolean; TrueR, FalseR: string): string;
begin
if Test then
Result := TrueR
else
Result := FalseR;
end;

function iif(Test: boolean; TrueR, FalseR: integer): integer;
begin
if Test then
Result := TrueR
else
Result := FalseR;
end;

function iif(Test: boolean; TrueR, FalseR: extended): extended;
begin
if Test then
Result := TrueR
else
Result := FalseR;
end;


We can even extend these to bytes or even TObject if we want to. The only thing we can't do is have the same parameter types and return a different type, e.g...

function iif(Test: boolean; TrueR, FalseR: string): string; overload;
function iif(Test: boolean; TrueR, FalseR: string): integer; overload;


This is a no-no. Why? Delphi will decide which function to run based on the parameter types. If all the parameter types are the same, Delphi will not compile as it will see this as an error.

As already stated, Delphi decides which function it will used based on the types used in the parameters. It will always use the least type for the job. For example, if you have iif()'s set up for both integer and byte, and then use iif(x=y,5,10), then Delphi will use the byte function and return a byte. In most cases this will still work ok even if you are expecting an integer returned.

Thursday 16 August 2007

Practical Development Methodologies

Over the years I have designed and installed a number of development methodologies into various organisations (I'll not list them here). What I'm interested in is a description of the development methodology that is adopted by your company, why, and how well its working.

I'd also like to know your description of the development method, not just the company's blurb sheet. For example, I was once working for a company who was the major client of a software development company. I was asked to report on the development methodology that the software company had adopted and how well it was working in favour of the client.

The company, like a great many others I know, proudly advertised their methodology as "Agile". Agile is not itself a methodology but rather a concept. There are a number of methodologies which come within the concept of "Agile". Upon investigation, the company was operating under "Iterative and Incremental Development" (IID). After getting to know both the development company, and the client, I was able to report that the IID methodology worked exceedingly well for both the development company, and their major client in this case.

In most of the methodologies I have installed, I have yet to install a stock-standard, out of the book methodology. I much prefer to look at several areas within the company I am contract to, to ensure a good fit. These areas include:


  • How the programming team currently works, what works well and what needs improvement.
  • The major applications that the team completes, e.g. large developments that take longer than 6 months to deliver, or daily support and enhancements to legacy applications.
  • The major clients of the team. This may be serving in-house applications (in which case the customer may be another business area) or small, large, or corporate external customers. each has its own unique needs.
  • The slant that management wants to portray to the outside world.
  • Both the history and the perceived future of the development team.
  • The history and quality of the delivered products.
  • The location and proximity of the team. For example, some companies I have worked with perform all their development in India with local staff consisting of Business Analysis and Technical Architects. Others have split their development team between two cities.
  • The makeup of the team. Some teams may have 'prima donna' programmers that simply will not conform to changes unless they are subtle or suit their own needs.

All these will have an effect on the eventual development methodology that will be designed and installed.

One example of a change to the standard methodology was where I designed and installed a standard Waterfall methodology to cope with external programmers, but changed the process to be used for Functions, instead of whole applications. This reduced the time it took for the external programmers to return their first cut that could be tested and, where necessary, returned for minor alterations. This meant that for any application there were several waterfalls operating at any one time. A sort of RAD-ified, or even XP'd Waterfall.

So what's your advertised methodology? What's your actual methodology? How well does it work for the team? And how well does it work for the client/customer?

You don't need to tell me the company you work for, and in most cases it may not be prudent.

Wednesday 15 August 2007

The Shell Sort

Yes, I know Delphi comes with a number of sorting routines, but I thought I'd describe a shell sort as it is fascinating little sort routine that is easily understood and a very fast routine.

[Steve: Although the "Shellsort" algorythm was first published by Donald Shell (as pointed out by Anthony Mills, thanks Anthony), this 'Shell Sort' uses a different approach that would be closer to a "Bubble Sort" with some changes. I therefore take a little 'artistic licence' in describing it in ways that people can visually understand]

The name "Shell Sort" comes from those games that are played by the magician, conman, or simply the swift of hand when they try to get money from the average passerby. Having usually 3 shells, a small coin or marble is placed under one of the shells. The shells are then swapped around at speed and the passerby is asked to bet on which shell contains the marble.



The shell sort routine does that same swapping around, but only to place all the shells in order. Consider that each shell is numbered, in this case, 1 to 5, but we'll first get our 'swift of hand' guy to jumble them up a little.



For the purposes of this blog, I'll limit the sort to only 5 integer numbers (or shells) although these could be words or extended reals or whatever. So long as you can compare them to see which is the greatest, this routine is the same or similar.

What this routine does is traverse the 5 shells many times. Each time, it will check each shell and see if it is greater than the next shell, if it is, it will swap the two shells. This will have the effect of moving the largest numbers to the right and the smaller numbers to the left - in other words, sorting them.



Consider the first time through, looking at each shell in turn, and swapping them with the next if it is larger, would, by default, move the largest number to the end.

Lets take a look at what will happen on the first run through.

35241 (look at 1st number, swap if 3 > 5)
35241 (look at 2nd number, swap if 5 > 2)
32541 (look at 3rd number, swap if 5 > 4)
32451 (look at 4th number, swap if 5 > 1)
32415

This moved the largest number (5) to the end. This means that the next time through, we would only need to look at the first 4 numbers, and so on until there is only one number left. That will tell us that the shells are sorted. BTW: Did you notice that we only stepped throuigh the first 4 numbers? That's because the last number is, well, it's the last number and doesn't have a next number to compare to.

Another way to know if they are all sorted (after all, we don't yet know if they are in order before we start) is if we traverse the shells and none needed to be swapped. If we didn't swap any then its in order and we can stop.

Let's start. Create a new project in Delphi with a form. Drop a TMemo and a TButton on the form. We'll put all the code on the OnClick event of the button, so we can follow it easier. First declare the counters; the shells themselves; and a true/false variable to tell us that no swaps were made during the last traversal.


var
i, LastShell: integer;
Shell : Array[1..5] of integer;
NoSwap: boolean;



Then we'll place a procedure inside the OnClick event (something that I would not normally do, but we're keeping it all tight for you). This procedure will simply display the current state and order of the shells so we can see what's happening.


procedure ShowShells;
var
i: integer;
s: string;
begin
// display the shell order in the TMemo
s := '';
for i := 1 to 5 do
s := s + IntToStr(Shell[i]);
Memo1.lines.Add(s);
end;


Go on, say ShowShells shix times quickly. Then we'll add another procedure. This one will swap a shell with the next and tell us that the swap occured.


procedure Swap(var First, Second: integer);
var temp: integer;
begin
Temp := First;
First := Second;
Second := Temp;
NoSwap := false;
end;


Note that I declared variable parameters, but I also could have swapped the array directly. Either way would have been acceptable in this situation.

Now we can start writing code in the OnClick event itself. First we'll fill the array with the current order of the shells.


begin
Shell[1] := 3;
Shell[2] := 5;
Shell[3] := 2;
Shell[4] := 4;
Shell[5] := 1;
memo1.Clear;
ShowShells;


Now we can do the actual sorting, using the counters. Remember, we are going down in the number of shells we need to sort each time (the largest will move to the end each time).


memo1.lines.add('now sorting...');
for LastShell := 5 downto 1 do
begin
NoSwap := true; // No swaps this time through yet
// traverse the unsorted shells
for i := 1 to LastShell-1 do
begin
if Shell[i] > Shell[i+1] then
// its greater than next, swap them
Swap(Shell[i], Shell[i+1]);
ShowShells;
end;
if NoSwap then
// we traversed the shells and didn't need to swap
// any therefore its sorted and we can stop.
break;
end;
memo1.lines.add('Done');
end;


Note that we only traversed the shells to LastShell-1? that's because we can't look to swap the lastshell with the next one as its the last shell :-)

Here's the full code...


procedure TForm1.Button1Click(Sender: TObject);
var
i, LastShell: integer;
Shell : Array[1..5] of integer;
NoSwap: boolean;

procedure ShowShells;
var
i: integer;
s: string;
begin
s := '';
for i := 1 to 5 do
s := s + IntToStr(Shell[i]);
Memo1.lines.Add(s);
end;

procedure Swap(var First, Second: integer);
var temp: integer;
begin
Temp := First;
First := Second;
Second := Temp;
NoSwap := false;
end;

begin
// assign numbers to the array
Shell[1] := 3;
Shell[2] := 5;
Shell[3] := 2;
Shell[4] := 4;
Shell[5] := 1;
memo1.Clear;
ShowShells;
memo1.lines.add('now sorting...');
for LastShell := 5 downto 1 do
begin
NoSwap := true; // No swaps this time through yet
// traverse the unsorted shells
for i := 1 to LastShell-1 do
begin
if Shell[i] > Shell[i+1] then
// its greater than next, swap them
Swap(Shell[i], Shell[i+1]);
ShowShells;
end;
if NoSwap then
// we traversed the shells and didn't need to swap
// any therefore its sorted and we can stop.
break;
end;
memo1.lines.add('Done');
end;

Tuesday 14 August 2007

The Delphi/Pascal Sentence

...or "When/Where to use semicolons".

Apologies to all you long time Pascal gurus, but in the last few days I was pondering on the advice I gave a few students learning Pascal and felt that Today's post should be directed at those learning the language.

Many new-comers find it difficult to grasp when they should be using a semicolon. The rules seem complex and inconsistant to them, when in fact they are reasonably consistant. Consider this...

"Pascal is a sentence"

Writing in Delphi Pascal is writing sentances, its that simple. Consider the following simple piece of code...

if x > 0 then
ShowMessage('x is worth something')
else
ShowMessage('x is worthless');



That equates to a sentence of "If x is greater than zero then x is worth something otherwise x is worthless.". Some people call that a Pascal statement, but I like to think of it as a sentence. Sentences are normally ended in a full stop, but in Pascal, sentances are ended in a semicolon and only the program is ended in a full stop. in the above Pascal sentence, the semicolon comes at the end as it should, and not half way through. You wouldn't say "If x is greater than zero. Then x is worth something. Otherwise. X is worthless." would you? That's 4 parts of a sentence that are seperated into 4 sentences that are meaningless.

So why can't we just place a semicolon before the else? Well think of it. If someone is talking to you in fluent english and finishes the sentence, then a little later (when you are thinking of something else), states "otherwise...", you would say "Huh? what do you mean 'Otherwise'". The Delphi Pascal compiler would state the equivelent of "Huh?" at that point and totally fail to understand you.

"OK-ish", I hear you saying "but what about all that begin/end stuff, nobody says that in real life", and normally you'd be right. Consider the following...


if x > 0 then
begin
y := x;
ShowMessage('x and y are the same');
y := y * 10;
ShowMessage('and now they are not')
end;



How does that equate to a sentence? Well, there is an overriding sentence there that says something like "If x is greater than zero then do some things...". (There's more to that sentence that I'll add soon).

You can think of begin/end as brackets if you like, that's what C and some other languages do, but Pascal is ever so slightly different. When speaking fluent Pascal (Pascal is a language after all), and you want to say some other sentences in the middle of your main sentence, you will need to let the listener know that by saying 'begin' and 'end'.

Each of the lines ending in a semicolon are their own sentences: "make y equal to x"; "say 'x and y are the same'"; "make y equal to y times 10". Being sentences in their own right, they can even have their own begin/ends for their sentences if they need them.

What about that last ShowMessage() statement? Well, here's where the Pascal sentence comes in. The overriding sentence actually says "If x is greater than zero then do some things and say 'and now they are not'.". More correctly, in fluent Pascal the true sentence is "If x is greater than zero then begin do some things and say 'and now they are not' end."

When speaking Pascal you do have to say the 'begin' and 'end' words in a sentence.

Now I can also hear you saying "but I have seen a semicolon before the 'end' many times". Yes, and pascal is smart enough to understand that you are saying nothing. Placing a semicolon before the 'end' is the equivilent of stating an empty sentence before finishing the outer sentence. Sweet nothings.

So what about Procedures and Functions? Procedures and functions are just teaching the Pascal language new words to use. For example...


Procedure Swap(var x, y: integer);

...says "I'm going to teach you the word 'Swap' and it will involve 2 whole numbers". Here's the full procedure which takes 2 numbers and swaps their values so that x equals what y did and y equals what x did (normally used in something like a simple shell sort) ...


Procedure Swap(var x, y: integer);
var i : integer;
begin
i := x;
x := y;
y := i
end;



In this case, the sentence doesn't flow unless you expand it a little.

"Here's a new Pascal Word 'Swap' that takes 'x' and 'y' as variable parameters.
'i' is a new variable just for the word 'Swap'.
begin do some things and then make y equal i end.

The 'do some things' were some other sentences as you can see.

Are you now more confused than ever? Don't worry, that's normal, you are becoming a programmer after all :-)

Friday 10 August 2007

Use of Try/Except/Finally

[Steve: I have altered the example to a more suitable one - that should teach me to quickly post then leave for the weekend without reading it].

I have seen a number of discussions recently showing that some developers can be confused over try/except and try/finally blocks. I can give an example in the kind of work that I was doing this morning.

I often read or write to an external device (let's just call it "the device" as it could be any external device). That device can sometimes send back confusing errors that I try to trap. However because those errors are confusing, I must let the user know all the various issues and ways around them.

Here's a simplified version of my code (the procedures can raise their own exception with the error message returned from the device).

var
curSave : TCursor;
begin
try
curSave := Screen.Cursor;
Screen.Cursor := crHourGlass; // Show hourglass cursor
try
OpenDeviceConnection; // each will raise an exception if error
DoStuffWithDevice;
ObtainReadings;
CalculateNewFigures;
UpdateDevice;
If not CloseConnection then // raise my own error
raise exception.create('Cant Close Connection');
finally
// return the cursor to what it was
Screen.Cursor := curSave;
end;
except
// the original exception is now being handled
// we can re-raise it, but in this case we'll
// raise a new exception with additions
on e:exception do
raise exception.create(e.message + #13+#10
+ 'Please check that the device is switched on' +#13+#10
+ 'and that the cables are properly connected.');
end;
end;



Exceptions created by any of the procedure calls will now result in an error message something like the following...



Note that I have also placed a try/finally block inside the try/except. This is to ensure that we get the cursor back.

I can raise an exception myself with "raise exception.create", the process then heads immediately to the except block, but before doing that, it must complete the finally block.

The order that finally and except will be executed depends on the order that you give it. In the above sequence, finally will be completed before the except. Usually you will see them the other way around where the exception is processed before the finally block.

Thursday 2 August 2007

More on Delphi's early days






My last post generated a bit of discussion, so I was trawling through the websites looking at the early Turbo Pascal information and I came across some interesting tidbits.

As has already been pointed out by one of the readers, there was life even before turbo Pascal. The Pascal Wiki states:
"In the 1980s Anders Hejlsberg wrote the Blue Label Pascal compiler for the Nascom-2. A reimplementation of this compiler for the IBM PC was marketed under the names Compas Pascal and PolyPascal before it was acquired by Borland ... renamed to Turbo Pascal."
Here are a few pages that I found that are worthy of looking into:

By the later 1980's I dropped the Turbo Pascal IDE in favour of MultiEdit. Its still around, and I still use it on occasion, but the Delphi IDE no longer needs it.

A reader in my last post pricked a memory or two over the character "Frank Borland". On searching, I came across this CodeGear piece about Frank. It is definately worth a read.

Ok, now playtime is over and I must get back to work.