[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MMIO regions
> If I understand what you are saying, there are serious performance
> implications for direct-rendering clients (in addition to the added
> scheduler overhead, which will negatively impact overall system
> performance).
>
> I believe you are saying:
> 1) There are n processes, each of which has the MMIO region mmap'd.
> 2) The scheduler will only schedule one of these processes at a time,
> even on an SMP system. [I'm assuming this is what you mean by "in
> use", since the scheduler can't know about actual MMIO writes -- it
> has to assume that a mapped region is a region that is "in use",
> even if it isn't (e.g., a threaded program may have the MMIO region
> mapped in n-1 threads, but may only direct render in 1 thread).]
>
> On MMIO-based graphics cards (i.e., those that do not use traditional DMA),
> a direct-rendering client will intersperse relatively long periods of
> computation with relatively short periods of MMIO writes. In your scheme,
> one of these clients will run for a whole time slice before the other one
> runs (i.e., they will run in alternate time slices, even on an SMP system
> with sufficient processors to run both simultaneously). Because actual
> MMIO writes take up a relatively small fraction of that time slice,
> rendering performance will potentially decrease by a factor of 2 (or more,
> if more CPUs are available). This is significant, especially since many
> high-end OpenGL applications are threaded and expect to be able to run
> simultaneously on SMP systems.
>
I notice this when I was playing with my code. Also I realized regular
kernel semaphores are not going to be able to give you hard realtime
guarantees that are needed. Even the regular interrupt handling is just
not good enough. A good example is VBL. With ordinary interrupt handling
it takes a enormous amount of time to get to the interrput handler. The
effect gets worst under a very highly loaded machine. The tearing effect
gets worst. Its not unusual for a graphics program to create a high load
either. So actually I'm designing a hard realtime schedular that does
this. The regular schedular is not going to cut the mustard. Plus this
gives a enormous performace boost no matter what the load. Someone
familiar with IRIX told me thats what SGI does to optimize their systems.
Also you can have the following
Data-> accel engine
context switch
other data->accel engine.
This would confuss most cards. With a realtime handler you can make sure
that a accel command is finished then allow a context switch.
> The cooperative locking system used by the DRI (see
> http://precisioninsight.com/dr/locking.html) allows direct-rendering
> clients to perform fine-grain locking only when the MMIO region is actually
> being written. The overhead for this system is extremely low (about 2
> instructions to lock, and 1 instruction to unlock). Cooperative locking
> like this allows several threads that all map the same MMIO region to run
> simultaneously on an SMP system.
I'm familar with the system.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/