[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

(not so)Quick Update on my progress



[Warning: This is longer then I intended, a lot longer :)]
Hello All, I just though it was time I should give a quick ping to let you
all know I'm alive.. 

I've been working on the Cache system for Mnemonic.. Not actual coding but
concept work.. I have missed some recent meetings (The last two I actually
could have attended if I had stoped studying caching long enough to notice
the date.. :))... I've also been neck deep in work at work (Have REALLY
LOTS in business in progress for one company right now) but, luckly,
caching relates to part of my work now and I am able to study and research
whill on company time.. :)

Anyways.. Most of my current work has centered around modifying squid to
behave how I expect the mnemonic cache to.. I've been doing alot of real
world testing and data gathering.. For instance: I've got my ISP (who I
work with alot) running one of my expirmental squid caches.. :) I've also
got modified squid caches at work where I've disabled the internel
netscape caching..

My research has been in these areas:

 1) Locating nearby parent caches and detecting when one dies.
 2) Using http/1.1 to make cache "staleness" user setable.
 3) Locating other copies of Mnemonic browser on a Lan..
    (This is for sibling caching: If it's not in the cache it can
     ask other browsers on the lan.. I've simulated this at work
     by running multiple copies of squid on a server at work and setting
     each copy of netscape to use differnt ones.. Works REAL well)
 4) One of my biggest areas is what I call "Version Support". Versions
     are where the browser is configured to 'Get the nearest copy of the
     document NOW, then, if it's 'stale', go try to get a newer one from
     up the hierarchy if there is a newer one then let the user know (maby
     by some icon lighting up).. Can we design the parser so that minor 
     modifications to the page can occure without complete reparsing?
 5) Another big area of my research is in agressive caching.. This is
    where the browser can crawl links ahead of time in anticipation of
    the user accessing it. Obviously, this is an area of BIG conterversy.
    But, I think I've got some good solutions: 1) It can be configured to 
    only get documents that are already in the first level cache (BIG win
    for modem connected users of ISPs). 2) It can be configured to only
    try if there are under a certian number of links on the page. 
    3) It can be configured to only get certian file types (i.e. .html)
    and only get them if they are under a certian size.. 4) With proper
    support I can even make it ask permission of it's parent cache. 
    5) I'm also working on some ideas of making it 'smart' about which 
    links it precaches first.... Also, this allows it to get the first 
    few bytes of an image in order to figure out it's size (http/1.1).
    The agressive cacheing is a big speed win when done on the ISP's 
    cache.. It's also a big win on slow links when set right (I.e. If
    the page has a small number of links then get the html, Then get the
    images off those pages only if they are in the ISP's cache already)
 6) Probably my biggest effort, however, is in trying to figure out how
    to make the cache do more in less disk space (and ram)... Most real
    web caches have hundered of megs of space (even gigs).. Most browsers
    only have 10 or so megs.. So I've done ALOT of work on garbage
    collecting... I've even come up with a patch to squid that I think I 
    will submit to them which helps tight caches alot.. (Basicly it
    doesn't just kill old objects.. It ranks them by size and hit ratio
    (and est transfer time) and only kills useless objects)... Also I've 
    figured out how to make the cache keep track of objects that are 
    modified too often to make it worth while.. It then keeps a fixed
    lenight black list (which is GCed with a method like above) of these
    objects and makes no effort to cache them.. But, still, the web cache
    on the browser has a hard time because only one users uses it.. Thats 
    why I think Lan browser sibling, and agressive preloading are very
    important (esp when the agression only occures between the browser and
    the ISPs cache)..

I think that covers what I've been upto.. But I prob forgot something!

I've also done some work on ways of indexing and searching the cache
(Where did I see that page on killer snakes??)....

I hope to have my Cache Specifaction done soon... But I'm so busy with new
ideas and trials It's hard to find time to write... I might release a
prelimanary version in a week or too..
 
Oh well, At least you know I'm not dead.. or worse.. you know I'm not lazy!