Would be really great if someone could give me a hint (or maybe even a
little code example? ;)
The best hint I can give is not to do such things in kernel in the
first place.
Yes, In general it's true that these kind of things shouldn't be done in
the kernel. But, one might want to do this to get better performance or
just for experimenting. If Ingo thought this way, we wouldn't have had TUX.
Tux was a nice experiment, until somebody showed that you can get the
same performance from userspace by using sendfile(). The performance
argument no longer holds.