[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: File/IO discussion - request for comments
On Sun, Feb 01, 2004 at 15:52:01 -0800, Carl Spalletta wrote:
> In a recent O'Reilly Press book "Java NIO" by R. Hitchens, pages 9-13 deal with generic Unix
> file I/O; although the argument is garbled it is implied that _all_ file I/O type reads are
> accomplished though demand paging generated by the pagefault handler.
>
> I am pretty sure that this is not the case in linux and have written the following outline
> to explore this.
>
> Caveat: we assume that memory pagesize and the fs block size are both 4K, and also that no
> system error occurs and that errno remains zero throughout.
>
> All regular file I/O in linux 2.6.1 takes one of two alternatives: through the read/write family
> of syscalls or through direct memory operations in userspace on mmapped files. Both methods
> utilize the page cache.
>
> FIRST ALTERNATIVE: read() syscall
>
> The system receives a file descriptor, an offset, a count and a userspace buffer address. When
> the syscall returns it has copied from 0 <= n <= count bytes to the buffer. How many bytes
> are copied is dependent on the state of the pagecache, together with the blocking/nonblocking mode
> of the file.
>
> To start with the syscall examines the page cache to see if any of the requested pages are there
> So, if the read offset was 10,000 and the count 20,000 then the system tries to find the pages
> containing fs blocks 2 thru 7 - bytes 8192 through 31767. The amount of data found in the page
> cache interacts with the blocking/nonblocking mode of the file as follows:
>
> Blocking read:
> No pages found: queue I/O and sleep on queue.
> Less than all pages found:
> Lock found pages in memory.
> Queue I/O for remaining pages and sleep on queue.
> All pages found:
> Copy the request from the found pages to the buffer.
> Return the count.
>
> Nonblocking read:
> No pages found: return 0:
> Less than all pages found:
> The found pages contain some initial portion of the request:
> Copy that initial portion from the found pages to the userspace buffer.
> Return the number of bytes copied.
> No initial portion found: return 0
> All pages found:
> Copy the request from the found pages to the buffer.
> Return the count.
There is no non-blocking read from disk!
There is only an aio_read, which is a different syscall.
> Notwithstanding Hitchen's claim, there is nothing in the above that has to do with pagefaults
> except in case the pagetable(s) for the userspace buffer are marked 'not present' . Moreover no
> pagefaults can occur in kernel space except on kernel page allocations (..???...)
There are no page-fault per se in kernel. Pagefault happens when a page
is accessed for which there is no entry in page-table -- and that simply
never happens in kernel. However, the mechanizm for loading pages is the
same for read as for page-fault.
> SECOND ALTERNATIVE: operations on mmapped non-anonymous memory
>
> This is in some ways the opposite of the above. The syscall takes place entirely in kernel
> space while the memory operations in this alternative are nominally entirely in user space;
> moreover, the syscall method may have to deal with multiple pages but the memory mapped method
> deals with only a single page at a time.
>
> Page is present: do memory ops in user space.
> Page is not present: pagefault handler utilizes fs 'readpage' method. Resume in user space.
>
> So demand paging does indeed take place in the case of a mapped memory address with a page
> marked 'not present' - but not otherwise and most emphatically not in _every_ case of filesystem
> I/O.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <bulb@ucw.cz>
--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/kernelnewbies/
FAQ: http://kernelnewbies.org/faq/