Using sed to clean up an LDIF file for import #Oracle #Identity #UNIX

I needed to import a group of users, into Oracle Internet Directory (OID) with attributes in a variety of backend data stores. I used Oracle Virtual Directory to virtualize the data stores into a single ldap view. I used the OVD adapter configuration to specify which attributes I wanted returned. I then exported using the export control from Apache Directory Studio. This resulted in an ldif file containing all of the records I needed with attributes. There were a few additional attributes as a result of using OVD that I now had to deal with.

I ended up with an ldif file that contained a lot of records like this:

dn: cn=Babs Jensen@ACME.GOV,ou=temp_user_load
objectclass: inetOrgPerson
objectclass: organizationalPerson
objectclass: person
objectclass: top
cn: 1234556677@ACME.GOV
cn: Jensen, Babs
sn: Jensen
givenName: Babs
vdejoindn: ou=acmeinfo_temp:cn=JENSEN,BABS,ou=acmeinfo_temp
vdejoindn: AD_temp:CN=babs.jensen@ACME.GOV,OU=locations,OU=park,ou=ad_t
fascnDecoded: 1234567890987654321
guid: ABcdedghi1234567890
ssn: 12345678

Note: With the SED command you can make changes directly to the source file but I am creating a new target file with each change I can make so that I can always revert back if the command doesn’t work exactly the way I want it to.

I wanted to get rid of lines that don’t start with an attribute name (In my case I am free to get rid of lines that carry over into the second line … YMMV)

I also wanted to specifically wanted to get rid of all lines that start with “vdejoindn:” and there are also some vdejoindn lines that overrun onto a second line that won’t beremoved if I use sed to remove lines with the pattern matching vdejoindn:.

So, first I want to remove all lines that don’t contain a colon. This removes the overrun lines but also all blank lines.

$ sed ‘/:/!d’ input.ldif > tmp.ldif

this keeps the lines with a colon.

But now we don’t have breaks between the records

$ sed ‘s/^dn:/n&/g’ tmp.ldif > tmp2.ldif

Ok, now I want to get rid of the lines that have “vdejoindn:”.

$ sed ‘/vdejoindn:/d’ tmp2.ldif > tmp3.ldif

Now at some point I ended up with “^M” at the end of each file … I don’t know if this is because I opened with VIM in Windows before moving to Linux … I am going to assume so but either way in this instance I want to remove these characters.

$ dos2unix tmp3.ldif > tmp4.ldif

Alright, Now, for me to import this into Oracle Internet Directory (OID) I’ll need to add the “changetype” directive. I am going to add the string “changetype: add” on a new line after each line with “ou=temp_user_load:” which is the temporary suffix I used in this export.

$ sed ‘/ou=temp_user_load/ achangetype: add’ tmp4.ldif > tmp5.ldif

Now, should be the last step, prior to importing, is to correct the entries “DN” attribute. Essentially, we need to replace “ou=temp_user_load” with the correct suffix for where these users will be created.

$ sed ‘s/ou=temp_user_load/cn=Users,o=icam,dc=acme,dc=local/g’ tmp5.ldif > tmp6.ldif

At this point my ldif file (“tmp6.ldif”) is ready to import into my directory. You can use the ldapmodify command or since I am using OID you can use bulkload (which is recommended for large record sets).

6 thoughts on “Using sed to clean up an LDIF file for import #Oracle #Identity #UNIX

  1. Daniel Liston says:

    What if one of those lines that do not contain a “:” are part of an important value of the previous line? For example, what if the string “ou=temp_user_load” was partially wrapped onto the following line? Wouldn’t it make much more sense to unwrap the LDIF from the default 78 characters per line so that attribute names and full values are always on a single line before making any other changes?

    Most LDIF generators/exporters will fold a long field on multiple lines by inserting a line separator (either a linefeed or carriage return/linefeed pair) followed by a space. Processing such LDIF files through another script becomes much easier if such lines were unfolded.

    Sed has much more power and capability than you are exercising. This script has been around for decades, but has been openSourced, copyrighted, and published on github by Vishal Goenka in 2010. He calls his “unldif.sed”
    # Unfold LDIF (LDAP Data Interchange Format) lines Version 1.0
    # sed -nf unldif.sed

    #!/bin/sed -nf
    /^ /!{
    /^ /{
    s,\n ,,;
    s,\n ,,;

    Note: the commas on some command lines are alternative characters for forward slashes.
    This code can be modified to remove ^M line endings (insterted by windows), substitute your temporary OrgUnit with the full Base_DN, and even ignore the vdejoindn lines. With all the LDIF lines unwrapped, it is safe to make changes that can be imported again cleanly.

    Now, just before the final “}” closing brace, insert your commands:

    Notice also, that I did not add the changetype: add line to the output LDIF. The resulting LDIF can be imported with ldapadd instead of ldapmodify.

    1. Daniel Liston says:

      Well, that comment turned out ugly as sin… None of the line formatting entered during the post was preseved. Hopefully the moderator (and users) will get the jist of the information provided. 🙁

    2. Daniel Liston says:

      The s/// command should contain a ctrl-v followed by enter/return key between the first two slash marks. Indenting was lost, but only there to make the script easier for humans to read/translate.

    3. Brad Tumy says:

      Thanks Daniel – always happy to hear better ways to get the job done. I saw your follow up comment about the formatting, feel free to link to a git gist or somewhere else that has better formatting available.

Leave a Reply

Your email address will not be published. Required fields are marked *

Next article

Recommended IDM Books #IDM #infosec