[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
(not so)Quick Update on my progress
[Warning: This is longer then I intended, a lot longer :)]
Hello All, I just though it was time I should give a quick ping to let you
all know I'm alive..
I've been working on the Cache system for Mnemonic.. Not actual coding but
concept work.. I have missed some recent meetings (The last two I actually
could have attended if I had stoped studying caching long enough to notice
the date.. :))... I've also been neck deep in work at work (Have REALLY
LOTS in business in progress for one company right now) but, luckly,
caching relates to part of my work now and I am able to study and research
whill on company time.. :)
Anyways.. Most of my current work has centered around modifying squid to
behave how I expect the mnemonic cache to.. I've been doing alot of real
world testing and data gathering.. For instance: I've got my ISP (who I
work with alot) running one of my expirmental squid caches.. :) I've also
got modified squid caches at work where I've disabled the internel
netscape caching..
My research has been in these areas:
1) Locating nearby parent caches and detecting when one dies.
2) Using http/1.1 to make cache "staleness" user setable.
3) Locating other copies of Mnemonic browser on a Lan..
(This is for sibling caching: If it's not in the cache it can
ask other browsers on the lan.. I've simulated this at work
by running multiple copies of squid on a server at work and setting
each copy of netscape to use differnt ones.. Works REAL well)
4) One of my biggest areas is what I call "Version Support". Versions
are where the browser is configured to 'Get the nearest copy of the
document NOW, then, if it's 'stale', go try to get a newer one from
up the hierarchy if there is a newer one then let the user know (maby
by some icon lighting up).. Can we design the parser so that minor
modifications to the page can occure without complete reparsing?
5) Another big area of my research is in agressive caching.. This is
where the browser can crawl links ahead of time in anticipation of
the user accessing it. Obviously, this is an area of BIG conterversy.
But, I think I've got some good solutions: 1) It can be configured to
only get documents that are already in the first level cache (BIG win
for modem connected users of ISPs). 2) It can be configured to only
try if there are under a certian number of links on the page.
3) It can be configured to only get certian file types (i.e. .html)
and only get them if they are under a certian size.. 4) With proper
support I can even make it ask permission of it's parent cache.
5) I'm also working on some ideas of making it 'smart' about which
links it precaches first.... Also, this allows it to get the first
few bytes of an image in order to figure out it's size (http/1.1).
The agressive cacheing is a big speed win when done on the ISP's
cache.. It's also a big win on slow links when set right (I.e. If
the page has a small number of links then get the html, Then get the
images off those pages only if they are in the ISP's cache already)
6) Probably my biggest effort, however, is in trying to figure out how
to make the cache do more in less disk space (and ram)... Most real
web caches have hundered of megs of space (even gigs).. Most browsers
only have 10 or so megs.. So I've done ALOT of work on garbage
collecting... I've even come up with a patch to squid that I think I
will submit to them which helps tight caches alot.. (Basicly it
doesn't just kill old objects.. It ranks them by size and hit ratio
(and est transfer time) and only kills useless objects)... Also I've
figured out how to make the cache keep track of objects that are
modified too often to make it worth while.. It then keeps a fixed
lenight black list (which is GCed with a method like above) of these
objects and makes no effort to cache them.. But, still, the web cache
on the browser has a hard time because only one users uses it.. Thats
why I think Lan browser sibling, and agressive preloading are very
important (esp when the agression only occures between the browser and
the ISPs cache)..
I think that covers what I've been upto.. But I prob forgot something!
I've also done some work on ways of indexing and searching the cache
(Where did I see that page on killer snakes??)....
I hope to have my Cache Specifaction done soon... But I'm so busy with new
ideas and trials It's hard to find time to write... I might release a
prelimanary version in a week or too..
Oh well, At least you know I'm not dead.. or worse.. you know I'm not lazy!