Ahhh yet another trip into the dusty bowels of .NET data access behaviour. Non developers can tune out now if you’d like. This one is an interesting bug brought about by ADO.NET’s wonderful ability to manage data and relationships on the client-side, without the need to continually talk to an SQL server.
One of the more common requirements within database programming is to have an Identity column on a database table. This column uniquely identifies each database row. A common way to generate unique identifiers is to simply have an auto-incrementing number (yes GUIDs are sexy, but also take up a lot of space and are not genuinely human-readable). Because ADO.NET allows you (through DataSets) to manage all data on the client side, a DataSet has the ability to generate these auto-incremented numbers itself. BUT these numbers don’t automatically match the identity numbers in the SQL server.
Picture this: you have a DataSet with an identity column that starts at 1 and increments by 1 every time you add a record. You also have a table in your database with the same setup, except this table already has 15 rows in it, meaning that the next added record should be number 16 (remember that number). Here are the steps to recreate the bug we just had:
- Add a few hundred rows to your in-memory Dataset. These rows will get the identity values of 1,2,3… etc
- Call an update method to write all the data in the Dataset to the database
- The guts of the update method will (in the background, and unkown to me) add record number 1, then ask the database what Identity it has applied to the new record (in this case, 16)
- The update method will try to get the DataSet to set the Identity of the first row to 16 (so it matches the database), at which time the DataSet will spew and complain about “Column ‘myIDColumn’ is constrained to be unique. Value ’16’ is already present.” (Because we added our own row ’16’ to the DataSet when we populated it in step one)
Now if you’re like me, you’d see the error and think “WTF? I’m adding new rows, and the DB should be generating Identities so how the heck can it be duplicating them!?”. It took me a while to figure that it was the DataSet that was complaining about duplicates, NOT the database.
The solution? Seems pretty hacky, but Microsoft themselves suggest it: set the DataSet’s identity seed and increment to -1. This way the DataSet will generate identity values in the range of -1,-2,-3… As long as your database identity seed and increment are any positive values, you’ll never get a clash, and the DataSet will behave correctly too (in memory relationships based on the identity will work perfectly etc.).