ADO.NET Identity Crisis

Ahhh yet another trip into the dusty bowels of .NET data access behaviour. Non developers can tune out now if you’d like. This one is an interesting bug brought about by ADO.NET’s wonderful ability to manage data and relationships on the client-side, without the need to continually talk to an SQL server.

One of the more common requirements within database programming is to have an Identity column on a database table. This column uniquely identifies each database row. A common way to generate unique identifiers is to simply have an auto-incrementing number (yes GUIDs are sexy, but also take up a lot of space and are not genuinely human-readable). Because ADO.NET allows you (through DataSets) to manage all data on the client side, a DataSet has the ability to generate these auto-incremented numbers itself. BUT these numbers don’t automatically match the identity numbers in the SQL server.

Picture this: you have a DataSet with an identity column that starts at 1 and increments by 1 every time you add a record. You also have a table in your database with the same setup, except this table already has 15 rows in it, meaning that the next added record should be number 16 (remember that number). Here are the steps to recreate the bug we just had:

  1. Add a few hundred rows to your in-memory Dataset. These rows will get the identity values of 1,2,3… etc
  2. Call an update method to write all the data in the Dataset to the database
  3. The guts of the update method will (in the background, and unkown to me) add record number 1, then ask the database what Identity it has applied to the new record (in this case, 16)
  4. The update method will try to get the DataSet to set the Identity of the first row to 16 (so it matches the database), at which time the DataSet will spew and complain about “Column ‘myIDColumn’ is constrained to be unique. Value ’16’ is already present.” (Because we added our own row ’16’ to the DataSet when we populated it in step one)

Now if you’re like me, you’d see the error and think “WTF? I’m adding new rows, and the DB should be generating Identities so how the heck can it be duplicating them!?”. It took me a while to figure that it was the DataSet that was complaining about duplicates, NOT the database.

The solution? Seems pretty hacky, but Microsoft themselves suggest it: set the DataSet’s identity seed and increment to -1. This way the DataSet will generate identity values in the range of -1,-2,-3… As long as your database identity seed and increment are any positive values, you’ll never get a clash, and the DataSet will behave correctly too (in memory relationships based on the identity will work perfectly etc.).

11 Replies to “ADO.NET Identity Crisis”

  1. Muhahahahah!
    The more you tighten your grip the more star systems will slip through your fingers!

  2. I feel I must add context to the above comment. You see Darth Vadar is a Java programmer, and we all know that Java is the evil all powerful empire of programming languages.

    Hence us poorly .NET programmers must unite to form a Rebel Alliance against Darth Vadar and his evil cohorts, lest they bring their space station up to full operating capacity.

  3. If I understood correctly, the issue is reflecting the SQL identity value back in the Dataset (ie they could be out of synch). From what I’ve seen, the Dataset identity value is replaced by the SQL version when:

    1. you use a typed dataset
    2. you use update\fill on a SqlDataAdapter

    I’ve used this mechanism together with the SetParentRow methods to enable related rows to be created in memory during a users session with my web app, and then splat the lot into the database and ensure all the relationships are maintained without having to right a single line of custom code (other than to make sure i do the update on each table in the dataset in a logical order)…a by-product of these observations was that the sql commands generated with the SqlDataAdapter seemed to magically synch up the identit column values…..man I love Typed DataSets…..

  4. Heh. We only use Datasets as a storage place for data really. Saves coding or generating objects, and you get the benefit of the whole difgram thing.

    We definitely use stored procs to populate them. I can’t stand the generated SqlCommand object thingies.

  5. Don’t know if someone still reads this …

    I had the problem dicussed in the posting here. So I tried the hacky workaround and it worked well – at first. The next time I added lots of rows, die DataSet comlained again about an already taken id.

    So what happened? I populated the DataSet with the data from the database, using the Fill method. Before adding new rows I changed AutoIncrementSeed and Step to -1. However at this point the Seed cannot be changed anymore, because the DataSet already knows the correct Seed of the database – remeber I used the Fill method earlier. Although the Seed property is set to -1, the DataSet will use the real Seed. As you can guess the DataSet now generates duplicate ids, because the Step is still set to -1.

    I actually don’t know how a DataSet behaves if there are already rows in the table AND there are added new rows by another application. This would result in a seed set to a specific value which means that the seed = -1 solution doesn’t work. I think this should result in an exception, because the databases id seed has been changed by another application.

  6. Interesting point Tex. I guess Microsoft didn’t consider the fill scenario with -1 increment when they suggested it.

    That probably means the only solution is to either manually or programatically set your seed to something much higher than your highest existing ID, and leave increment at +1?

  7. I ran into this very problem and was hoping someone like you would have the answer. However, I did discover the solution, although it is only mentioned in passing and I could find no code examples.

    The method goes like this:

    1) call the FillSchema method of the table
    2) set the auto-increment values (true, -1, -1)
    3) fill the table

    Getting the schema before filling the table sets the correct seed value and prevents duplicate IDs.

Leave a Reply

Your email address will not be published. Required fields are marked *