[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC PATCH] Re: get_user_pages rewrite (completed, updated for 2.4.46)
Hi Andrew,
thanks for your review.
On Fri, Nov 08, 2002 at 02:44:08PM -0800, Andrew Morton wrote:
[custom_page_walker_t locking rules for vma->mm->page_table_lock]
> This locking is rather awkward. Why is it necessary, and can it
> be simplified??
No unlocking is needed in the fast and common cases. That shall
reduce bus traffic.
That locking is also needed for follow_page and will not be
dropped, if the page is already faulted into the process space
(should be the common case for get_user_pages).
Under normal operation walk_user_pages is a loop of
follow_page(), which need needs that lock. e.g the while
statement in single_page_walk() will not go to the loop.
The original implementation did NO proper cleanup, if the call
spanned multiple VMAs.
That's why I introduced the case IS_ERR(vma), where the
vma->mm->page_table_lock cannot be unlocked, but cleanup can
happen in case of wrong VMA and the walker having collected some
pages already.
We have two possibilities to simplify locking:
1) Explicit argument, whether the page_table_lock is taken.
- Would simplify usage, but I know that this kind of functions
where eliminated during the past, because Linus and some
other people don't like that kind of magic.
- Would remove the need to do that for huge tlb pages.
- We must check for that flag and restore state at exit and
the error path. Handling the error path is already
complicated, but very visible (the IS_ERR is a good
indicator even to the inexperienced reader).
2) Always be unlock before we enter the custom page walker.
- Would cause lock/follow_page/unlock/lock/page_cache_get/unlock
for EVERY page in the normal get_user_pages() case.
3) Always unlock if IS_ERR(page) would trigger.
(Actually the IS_ERR(page) is triggered also, if IS_ERR(vma) is true).
- Removes the unlocking completely from the custom page walker,
if it doesn't need to do that anyway.
- Is no real simplification, since the walker can be entered
with locking or without, as it is now.
- We still require locking for huge tlb pages, but Mr. Irwin
already acked the changes for that.
4) Introduce an explicit "cleaning" function passed additionally
to the walk_user_pages() function.
- This would seperate the error handling completly from the
normal case.
- It would be possible to omit the error handling, if not
needed. (Or to forget it, if needed later ;-/ )
- The page walker will ALWAYS be entered with the page_table_lock
taken.
- The cleanup handler will ALWAYS be entered without it and
only the custom_data passed along.
- Function enter/exit overhead is compiled twice, because we
have two functions.
- And we still require locking for huge tlb pages.
Which one do you like most? I would favor 3. I've appended
a patch for that against page-walk-api-2.5.46-mm1-all.patch.bz2
for you to test it.
I agree that the locking rules are awkward, but they are the best
solution I could come up with while preserving speed and
functionality. Any better rules will be implemented at your request.
> wrt the removal of the vmas arg to get_user_pages(): I assume this
> was because none of the multipage callers were using it?
Yes, thats true. If some caller needs this, it can use a custom
walker.
Single patch against 2.5.46-mm1 is at
http://www.tu-chemnitz.de/~ioe/patches-page_walk/page-walk-api-2.5.46-mm1-all.patch.bz2
All patches with description and diffstat of the whole thing at:
http://www.tu-chemnitz.de/~ioe/patches-page_walk/index.html
Thanks again for your review, I really appriciate your input here.
Regards
Ingo Oeser
diff -u linux-2.5.46-mm1-ioe/include/linux/mm.h linux-2.5.46-mm1-ioe/include/linux/mm.h
--- linux-2.5.46-mm1-ioe/include/linux/mm.h Fri Nov 8 12:55:49 2002
+++ linux-2.5.46-mm1-ioe/include/linux/mm.h Sat Nov 9 18:02:56 2002
@@ -396,14 +396,15 @@
* If this functions gets a page, for which %IS_ERR(@page) is true, than it
* should do it's cleanup of customdata and return -PTR_ERR(@page).
*
- * This function is called with @vma->vm_mm->page_table_lock held,
- * if IS_ERR(@vma) is not true.
+ * If IS_ERR(@page) is NOT TRUE, this function is called with
+ * @vma->vm_mm->page_table_lock held.
*
- * But if IS_ERR(@vma) is true, IS_ERR(@page) is also true, since if we have no
- * vma, then we also have no user space page.
+ * The value of @vma is undefined if IS_ERR(@page) is TRUE.
+ * (So never use or check it if IS_ERR(@page) is TRUE)
*
- * If it returns a negative value, then the page_table_lock must be dropped
- * by this function, if it is held.
+ * If it returns a negative value but got a valid page, then the
+ * page_table_lock must be dropped by this function. (This condition should be
+ * rather rare.)
*/
typedef int (*custom_page_walker_t)(struct vm_area_struct *vma,
struct page *page, unsigned long virt_addr, void *customdata);
diff -u linux-2.5.46-mm1-ioe/mm/memory.c linux-2.5.46-mm1-ioe/mm/memory.c
--- linux-2.5.46-mm1-ioe/mm/memory.c Fri Nov 8 12:55:49 2002
+++ linux-2.5.46-mm1-ioe/mm/memory.c Sat Nov 9 18:15:06 2002
@@ -1158,8 +1158,6 @@
struct gup_add_pages *gup = customdata;
- BUG_ON(!customdata);
-
if (!IS_ERR(page)) {
gup->pages[gup->count++] = page;
flush_dcache_page(page);
@@ -1170,8 +1168,6 @@
return (gup->count == gup->max_pages) ? 1 : 0;
}
- if (!IS_ERR(vma))
- spin_unlock(&vma->vm_mm->page_table_lock);
gup_pages_cleanup(gup);
return -PTR_ERR(page);
}
@@ -1192,7 +1188,6 @@
spin_unlock(&mm->page_table_lock);
fault = handle_mm_fault(mm, vma, start, write);
- spin_lock(&mm->page_table_lock);
switch (fault) {
case VM_FAULT_MINOR:
@@ -1210,8 +1205,13 @@
spin_unlock(&mm->page_table_lock);
BUG();
}
+ spin_lock(&mm->page_table_lock);
}
- return get_page_map(map);
+ map=get_page_map(map);
+ if (IS_ERR(map))
+ spin_unlock(&mm->page_table_lock);
+
+ return map;
}
/* VMA contains already "start".
@@ -1248,10 +1248,14 @@
spin_lock(&mm->page_table_lock);
page = single_page_walk(tsk, mm, vma, start, write);
- if (!(IS_ERR(page) || PageReserved(page)))
+ if (IS_ERR(page))
+ goto out;
+
+ if (!PageReserved(page))
page_cache_get(page);
spin_unlock(&mm->page_table_lock);
+out:
return page;
}
@@ -2101,8 +2105,6 @@
return (*todo) ? 0 : 1;
}
- if (!IS_ERR(vma))
- spin_unlock(&vma->vm_mm->page_table_lock);
return -PTR_ERR(page);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/