June 3, 2021

Migrating Samba sam.ldb from TDB to LMDB

Samba uses TDB internally as key-value store for many purposes. Thus, TDB was also chosen initially as the backend key-value store for the LDB mechanism that backs the Samba/AD implementation of the SAM database.

But, as time evolves, the use cases outgrow the capacity of TDB, which is limted to 4GB.

As part of the pre-release test phase for UCS 5.0 we ran update tests with larger environments, e.g. with 200k users. As the new UCS release updates Samba from 4.10.x to 4.13.y, Samba makes changes to the internal database and index format. Samba performs the format conversion transparently during the first start with the new version. That’s all nice and fine, until any of the partition subfiles of your sam.ldb are already filled more than about half way to the TDB limit. About half way, because Samba uses a transaction for the conversion.

So, to avoid halting the UCS 5.0 release train, we inserted an additional check into the pre-release-checks, which blocks the update if you are using TDB and are close to the limit.

Fortunately, Samba also supports LMDB as key-value store for samb.ldb since some time and I already gathered a bit of experience with the LMDB C implementation, which UCS uses both as backend for OpenLDAP as well as for the UCS Listener cache.

Now, in the first post-release sprint I’ve implemented a script to migrate the sam.ldb from TDB to MDB. At first, I experimented with samba-tool domain backup online which supports choosing the backend-store. But that tool basically performs a join (i.e. DRS replication of all SAM partitions), which gives you a copy of the SAM, but with different values for the (local) uSNChanged/uSNCreated/highestComittedUSN LDAP attributes. Without taking additional measures (like changing the invocationID), it would be dangerous to use this copy to replace the existing sam.ldb, because the DRS replication protocol is designed in a way, where each of the DCs knows the highestComittedUSN of each other DC in it’s replication topology and this is used for propagation dampening, i.e. to reduce the number of updates that are replicated (I hope I didn’t mix up the correct terms here, got to read up on that again). Since the new copy of the sam.ldb is likey to have a lower highestComittedUSN, other DCs would not pull new changes until uSNChanged surpasses the old value of highestComittedUSN again at some point. This is a variation on the topic of what Microsoft calls USN rollback.

Rather that following down that road of dealing with USN changes, the idea occurred to me, that it would be much preferable to not work on LDB/LDAP object level but rather on the lower key-value storage level. After all, that’s the only thing we want to change. So, after digging a bit into Sambas provisioning code one one hand and into the sources of ldb_mdb.c on the other hand and looking at the data with mdb_dump and tdbdump, I was optimistic and curious, if my idea would work.

So I made a quick PoC with python-tdb and python-lmdb and it was just 10 lines of code. The crazy thing is, it worked on first attempt. Ok, it worked for the single TDB files that make up the SAM partitions, but initially the LDB tools failed to access the full sam.ldb, because they still expected to find TDB data in the backend files. To fix that, a flag needed to be set in sam.ldb to tell the LDB libraries to try opening the files with mdb:// instead of tdb://. I learned quite a bit of new details about Sambas brilliantly versatile LDB mechanism.

I’ve cleaned up my usual frankenstein code mix of bash and Python and turned it into this Python script. Let’s see how tonight’s CI results will turn out. It worked on my test VM and took only 50 seconds for the migration of a SAM database with 200k accounts.

Postscript 21-06-07

The CI tests showed, that the migration works and also pointed out the necessity to make sure to keep encryptedSecrets in the requiredFeatures multi-value attribute. This tells SamDB to use additional encryption when storing password hashes on disc. This again shows the importance to have realistic CI-test setups. In this case it was one fresh UCS 4.4-8 Samba/AD DC “backup” joined with an updated UCS Samba/AD DC “primary” and they had different SamDB feature sets activated by default. It’s too easy to congratulate yourself for having green CI tests, when you are actually testing on a green field that doesn’t reflect customer environments with a history of updates and per server differences in configuration defaults. I’m glad we could rule this out. This one again was a team achievement. We’ll publish this in https://help.univention.de and I’ll post it upstream on samba-technical for others to benifit too.

Disclaimer

The script is focussed on UCS and may not fit your purpose. Please only try it in a test environment.

If you find bugs or want to discuss this topic, feel free to contact me some way or the other.

© Arvid Requate 2021