Tokenize string in SQL (Firebird syntax)
Few days ago I wrote “Some thoughts on denormalization” post. Though concatenating data to one string is easy with there introduced function, splitting it back could be harder. I left it as exercise, but I feel, to make it complete, it’s good to show one possible implementation.
Here’s one I came with today (using Firebird syntax, however it’s almost pure SQL), quickly. It’s something I created from start to finish in one row and sure for some cases it could be optimized.
recreate procedure Tokenize(input varchar(1024), token char(1))
returns (result varchar(255))
as
declare newpos int;
declare oldpos int;
begin
oldpos = 1;
newpos = 1;
while (1 = 1) do
begin
newpos = position(token, input, oldpos);
if (newpos > 0) then
begin
result = substring(input from oldpos for newpos - oldpos);
suspend;
oldpos = newpos + 1;
end
else if (oldpos - 1 < char_length(input)) then
begin
result = substring(input from oldpos);
suspend;
break;
end
else
begin
break;
end
end
end
The procedure splits the input
string using the specified token
. The string can (or not) end with the token itself, the procedure will handle it.
select * from Tokenize('ab,cd,e', ',')
union all
select * from Tokenize('ab,cd,e,', ',');
union all
select * from Tokenize('ab,cd,e,,', ',');
What it is not handling is some form of quoting in case there’s a token inside the element. Mainly because its intended purpose is to tokenize strings you can control (see the previous post) and because it will slow down the execution.
Feel free to improve it etc. (either post link to your solution or post the code in comments).