[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Oops in __free_pages_ok (pre7-1) (Long) (backtrace)



> 
> 
> On Wed, 3 May 2000, Kanoj Sarcar wrote:
> > 
> > At no point between the time try_to_swap_out() is running, will is_page_shared()
> > wrongly indicate the page is _not shared_, when it is really shared (as you
> > say, it is pessimistic). 
> 
> Note that this is true only if you assume processor ordering.
>

True ... not to deviate from the current topic, I would think that instead
of imposing locks here, you would want to inject instructions (like the 
mips "sync") that makes sure memory is consistant. Imposing locks is a
roundabout way of insuring memory consistancy, since the unlock normally
has this "sync" type instruction encoded in it anyway.

> With no common locks, a less strictly ordered system (like an alpha) might
> see the update of the swap-count _much_ later on the second CPU, so that
> is_page_shared() may end up not being pessimistic after all (it could get
> the new page count, but the old swap-count, and thinks that the page is
> free to be removed from the swap cache).
> 
> This is why not having a shared lock looks like a bug to me. Even if that
> particular bug might never trigger on an x86.
> 
> _Something_ obviously triggers on the x86, though. 
> 
> Note that we may be barking up the wrong tree here: it may be a completely
> different page mishandling that causes this. For example, one bug in NFS
> used to be that it free'd a page that was allocated with "alloc_pages()"
> using "free_page()" - which takes the virtual address and only works for
> "normal" pages. Now, if you have more than about 960MB of memory and the
> allocated page was a highmem page, you may end up freeing the wrong page
> due to mixing metaphors, and suddenly the page counts are wrong.
>

Absolutely ... any subsystem which is screwing up the page reference count
would lead to a similar symptom. Very hard to track these ... maybe I will
take some time near the end of the week to run Juan's programs.

Kanoj

 
> And with the wrong page counts, the BUG() can/will happen only much later,
> because a innocent "__free_page()" ends up doing the BUG(), but the real
> offender happened earlier.
> 
> We fixed one such bug in NFS. Maybe there are more lurking? How much
> memory do the machines have that have problems?
> 
> 		Linus
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/