2010-10-03

SQL and sequences

I am working on an application that uses a sequence field to keep records in a specific order.

The first challenge was how to fill in a numbered sequence in the new sequence field without using auto-increment (the data table will contain a number of independent sequences for different sub-sets of records so I need to be able to multiple sequences in the same table).  I found the answer on this blog post:

How to Sequence each Sub-set of Records by David Soussan

I won't repeat that post here, but here is the SQL that I ended up with based on Mr. Soussan's technique.

This first SQL query creates a Temp table with Prov_ID and sequence number.

CREATE TABLE Temp
SELECT
    t1.Prov_ID, COUNT(t1.Prov_Sort_Num) AS sequence,
    t1.Prov_Sort_Num >= t2.Prov_Sort_Num AS flg
FROM
    tbl_Provisions AS t1
INNER JOIN
    tbl_Provisions AS t2 ON t1.Doc_ID = t2.Doc_ID
WHERE
    t1.Doc_ID = 1
GROUP BY
    t1.Doc_ID,
    t1.Prov_Sort_Num, flg
HAVING
    flg = TRUE
ORDER BY
    t1.Prov_Sort_Num

Then this second SQL query updates tbl_Provision using the sequence numbers from the newly created Temp table.
UPDATE
    tbl_Provisions AS t1
JOIN
    Temp AS t2 ON t1.Prov_ID = t2.Prov_ID
SET t1.Prov_Sequence = t2.sequence

The next issue was how to make sure that gaps and duplicates didn't end up in the sequence for each sub-set of records?  I found the solution to the issue of detecting gaps in this blog post:

Sequence gaps in MySQL by Sameer

I still don't fully understand how Sameer's SQL works, but it does indeed work reliably.  Here is the SQL I ended up with based on Sameer's technique:

SELECT
    a.Prov_Sequence + 1 AS start,
    MIN(b.Prov_Sequence) - 1 AS end
FROM
    tbl_Provisions AS a,
    tbl_Provisions AS b
WHERE
    a.Prov_Sequence < b.Prov_Sequence
   AND
      a.Doc_ID=$Doc_ID
   AND
      b.DOC_ID=$Doc_ID
GROUP BY
   a.Prov_Sequence
HAVING
   start < MIN(b.Prov_Sequence)
This returns a two column table with each row giving the beginning of a gap in the Start column and the end of that gap listed in the End column.