There are at least two hang-up problems in the concurrent vlv search and delete.

1. From ldbm_back_search, vlv_trim_candidates_byvalue is called via vlv_trim_candidates.  The function performs a binary search over the candidate idlist.  Under some stress condition, entries which id is in the candidate idlist could be deleted between the time the candidate idlist is generated and the time performing the binary search.  When it occurs, the errors log stores the the error message printed at the line 1484 endlessly and the thread hangs up since it never steps forward in the binary search.
[...] - vlv_trim_candidates_byvalue: Candidate ID 120527 not found err=-30990
[...] - vlv_trim_candidates_byvalue: Candidate ID 120527 not found err=-30990

[vlv.c]
   1424 static PRUint32
   1425 vlv_trim_candidates_byvalue(backend *be, const IDList *candidates, const  sort_spec* sort_control, const struct vlv_request *vlv_request_control)
   1426 {
    [...]
   1466     /*
   1467      * Perform a binary search over the candidate list
   1468      */
   1469     do {
    [...]
   1482         if ( e == NULL )
   1483         {
   1484             LDAPDebug( LDAP_DEBUG_ANY, "vlv_trim_candidates_byvalue: Candidate ID %lu not found err=%d\n", (u_long)id, err, 0 );
   1485         }
   1486         else
   1487         {
    [...]
   1578         }
   1579     } while (!found);

Stack Trace (snapshot in the endless loop)
(gdb) bt
#0  0x006287a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x008a6a9f in __fsync_nocancel () from /lib/tls/libpthread.so.0
#2  0x00cefcc2 in pt_Fsync (fd=0x8775d88) at ../../../../../nsprpub/pr/src/pthreads/ptio.c:1536
#3  0x00cd14e8 in PR_Sync (fd=0x8775d88) at ../../../../../nsprpub/pr/src/io/priometh.c:181
#4  0x004cc618 in vslapd_log_error (fp=0x8775d88, subsystem=0x0, fmt=0x48089c "vlv_trim_candidates_byvalue: Candidate ID %lu not found err=%d\n", ap=0x5b5d4c8 "��\001") at log.c:1840
#5  0x004cc325 in slapd_log_error_proc_internal (subsystem=0x0, fmt=0x48089c "vlv_trim_candidates_byvalue: Candidate ID %lu not found err=%d\n", ap_err=0x5b5d4c8 "��\001", ap_file=0x5b5d4c8 "��\001") at log.c:1758
#6  0x004cc201 in slapd_log_error_proc (subsystem=0x0, fmt=0x48089c "vlv_trim_candidates_byvalue: Candidate ID %lu not found err=%d\n") at log.c:1722
#7  0x00452fa0 in vlv_trim_candidates_byvalue (be=0x88503f0, candidates=0x98ec0d8, sort_control=0x98d71e0, vlv_request_control=0x5b5d690) at vlv.c:1484
#8  0x00452b5d in vlv_trim_candidates (be=0x88503f0, candidates=0x98ec0d8, sort_control=0x98d71e0, vlv_request_control=0x5b5d690, trimmedCandidates=0x5b5d5c0, vlv_response_control=0x5b5d5f0) at vlv.c:1314
#9  0x0046b354 in ldbm_back_search (pb=0xb2418970) at ldbm_search.c:540
#10 0x0051ac18 in op_shared_search (pb=0xb2418970, send_result=1) at opshared.c:545
#11 0x0805e389 in do_search (pb=0xb2418970) at search.c:276
#12 0x08055ede in connection_dispatch_operation (conn=0xb6e368a8, op=0xb241ddd8, pb=0xb2418970) at connection.c:521
#13 0x08057312 in connection_threadmain () at connection.c:2146
#14 0x00cf4296 in _pt_root (arg=0x89cb5e8) at ../../../../../nsprpub/pr/src/pthreads/ptthread.c:220
#15 0x008a1371 in start_thread () from /lib/tls/libpthread.so.0
#16 0x009dbffe in clone () from /lib/tls/libc.so.6

Change Description
(a) First, I tried to protect the whole search operation with the read lock on the vlvSearchList_lock ( PR_RWLock_Rlock(be->vlvSearchList_lock); ), but this causes more serious deadlock with the delete operation (and other update operations, as well).  This idea does not work.
(b) Instead, I added a code to remove the not-found entry's id from the idlist and retry the binary search.  It solved the hang-up problem.
[vlv.c]
   1485         if ( e == NULL )
   1486         {
   1487             int rval;
   1488             LDAPDebug( LDAP_DEBUG_ANY, "vlv_trim_candidates_byvalue: Candidate ID %lu not found err=%d\n", (u_long)id, err, 0 );
   1489             rval = idl_delete(&candidates, id);
   1490             if (0 == rval) {
   1491                 goto retry;
   1492             } else {
   1493                 return candidates->b_nids; /* not found */
   1494             }
   1495         }

2. With the fix described in [1], I kept running the stress test.  Then, a typical deadlock between the search thread and the delete thread was observed.

Stack Trace showing the deadlock
(gdb) thread 13
[Switching to thread 13 (Thread -1236546640 (LWP 3812))]#0  0x006287a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x006287a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x008a3b26 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x002dedc4 in __db_pthread_mutex_lock (dbenv=0x92dfb80, mutexp=0xb71e90c8) at ../../../dist/../mutex/mut_pthread.c:225
#3  0x0034c02f in __lock_get_internal (lt=0x92e61e0, locker=63, flags=0, obj=0x92e8edc, lock_mode=DB_LOCK_READ, timeout=0, lock=0xb64b6fd0) at ../../../dist/../lock/lock.c:990
#4  0x0034cb95 in __lock_vec (dbenv=0x92dfb80, locker=63, flags=0, list=0xb64b6fc0, nlist=2, elistp=0xb64b6fbc) at ../../../dist/../lock/lock.c:294
#5  0x00331f87 in __db_lget (dbc=0x92e8e70, action=2, pgno=4294967292, mode=DB_LOCK_READ, lkflags=0, lockp=0x96f86a4) at ../../../dist/../db/db_meta.c:446
#6  0x002e0b9f in __bam_c_next (dbc=0x92e8e70, initial_move=1, deleted_okay=0) at ../../../dist/../btree/bt_cursor.c:2276
#7  0x002e3e84 in __bam_c_get (dbc=0x92e8e70, key=0xb64b71f0, data=0xb64b71d0, flags=18, pgnop=0xb64b7160) at ../../../dist/../btree/bt_cursor.c:917
#8  0x00327cf8 in __db_c_get (dbc_arg=0x96fdf20, key=0xb64b71f0, data=0xb64b71d0, flags=18) at ../../../dist/../db/db_cam.c:643
#9  0x0032f925 in __db_c_get_pp (dbc=0x96fdf20, key=0xb64b71f0, data=0xb64b71d0, flags=18) at ../../../dist/../db/db_iface.c:1836
#10 0x00eac39e in vlv_build_idl (start=0, stop=1983, db=0x92e0210, dbc=0x96fdf20, candidates=0xb64b725c, dosort=1) at vlv.c:1009
#11 0x00ead9e9 in vlv_find_index_by_filter (be=0x92aa0d8, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", f=0x96f7fa8) at vlv.c:1875
#12 0x00e933ea in filter_candidates (pb=0x96f8098, be=0x92aa0d8, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", f=0x96f7fa8, nextf=0x0, range=0, err=0xb64b7564) at filterindex.c:105
#13 0x00e94b47 in list_candidates (pb=0x96f8098, be=0x92aa0d8, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", flist=0x9706d38, ftype=161, err=0xb64b7564) at filterindex.c:707
#14 0x00e93731 in filter_candidates (pb=0x96f8098, be=0x92aa0d8, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", f=0x9706d38, nextf=0x0, range=0, err=0xb64b7564) at filterindex.c:160
#15 0x00e94b47 in list_candidates (pb=0x96f8098, be=0x92aa0d8, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", flist=0x9e28848,ftype=160, err=0xb64b7564) at filterindex.c:707
#16 0x00e936e1 in filter_candidates (pb=0x96f8098, be=0x92aa0d8, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", f=0x9e28848, nextf=0x0, range=0, err=0xb64b7564) at filterindex.c:155
#17 0x00ec5abe in onelevel_candidates (pb=0x96f8098, be=0x92aa0d8, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", e=0x9e285e0, filter=0x96f7fa8, managedsait=0, lookup_returned_allidsp=0xb64b7680, err=0xb64b7564) at ldbm_search.c:784
#18 0x00ec5754 in build_candidate_list (pb=0x96f8098, be=0x92aa0d8, e=0x9e285e0, base=0x9e6dde0 "ou=accounting,dc=example,dc=com", scope=1, lookup_returned_allidsp=0xb64b7680, candidates=0xb64b770c) at ldbm_search.c:651
#19 0x00ec4fc6 in ldbm_back_search (pb=0x96f8098) at ldbm_search.c:415
#20 0x00afcef4 in op_shared_search (pb=0x96f8098, send_result=1) at opshared.c:545
#21 0x0805e389 in do_search (pb=0x96f8098) at search.c:276
#22 0x08055ede in connection_dispatch_operation (conn=0xb6cc1808, op=0x9705070, pb=0x96f8098) at connection.c:521
#23 0x08057312 in connection_threadmain () at connection.c:2146
#24 0x00bbb296 in _pt_root (arg=0x93fc3b0) at ../../../../../nsprpub/pr/src/pthreads/ptthread.c:220
#25 0x008a1371 in start_thread () from /lib/tls/libpthread.so.0
#26 0x00708ffe in clone () from /lib/tls/libc.so.6

(gdb) thread 15
[Switching to thread 15 (Thread -1228149840 (LWP 3810))]#0  0x006287a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x006287a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x008a3b26 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x00bb3d20 in PR_WaitCondVar (cvar=0x92e28a8, timeout=4294967295) at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:405
#3  0x00ba0103 in PR_RWLock_Wlock (rwlock=0x92b3550) at ../../../../../nsprpub/pr/src/threads/prrwlock.c:298
#4  0x00eadbf9 in vlv_delete_search_entry (pb=0x9e70068, e=0x96f8ca8, inst=0x92aaa40) at vlv.c:1945
#5  0x00ecbc16 in ldbm_back_delete (pb=0x9e70068) at ldbm_delete.c:548
#6  0x00b02dee in op_shared_delete (pb=0x9e70068) at delete.c:318
#7  0x00b027fe in do_delete (pb=0x9e70068) at delete.c:116
#8  0x08055dee in connection_dispatch_operation (conn=0xb6cc18a8, op=0x96fb938, pb=0x9e70068) at connection.c:486
#9  0x08057312 in connection_threadmain () at connection.c:2146
#10 0x00bbb296 in _pt_root (arg=0x94508d0) at ../../../../../nsprpub/pr/src/pthreads/ptthread.c:220
#11 0x008a1371 in start_thread () from /lib/tls/libpthread.so.0
#12 0x00708ffe in clone () from /lib/tls/libc.so.6

This is a deadlock caused by the read-write lock on the vlvSearchList and the db lock.  The search thread calls read lock on vlvSearchList (e.g., at the top of vlv_find_index_by_filter), then tries to search the db index.  On the other hand, the delete thread calls db transaction which could lock pages with the write lock (it makes the search thread wait in the stacktrace for thread 13), then tries to get write-lock on the vlvSearchList (the mutex is already read-locked by the search thread).

To prevent this deadlock, I eliminated the read-lock from vlv_find_index_by_filter.  Then, as we experienced in [1], some id's in the candidate idlist could have lost the corresponding entries in the primary index file when the code is being executed and displayed this message in the errors log:
[...] - vlv_filter_candidates: Candidate 227347 not found err=-30990

I ran the test suggested in Comment #7 repeatedly over night (about 15 hours) and got this number of conflicts, but no deadlock occurred.
$ egrep "not found" errors | wc -l
   6499

This is not an ideal solution, but thinking of the vlv nature, it should be fast and deadlock free.  If we introduce more protection by locking, it'd make the advantage lose.

I'm attaching my proposed source code diffs for the review, next.