Sounds like it's due to context switch latency..or at least the interval between your workqueue getting queued into runnable list and actually running. Thus, it leads to ~1 ms...a.k.a the HZ interval...assuming you're using 1000 MHz
May we ask? what preemption model do you use now? try full preemption instead of voluntary preemption.
Other than that....maybe you need to think about better asynchronous communication. Something like AIO or callback based trigger.
Regards, Lukas
-- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ