A good read about history of networks

December 30, 2020

Pretty humorously framed: https://apenwarr.ca/log/20170810

I would like to build an internet archiver to archive these posts.

What it will essentially do is:

1) Get HTML content of the link that I gave it 2) Save it in a key value store with all links in the page updated 3) Recursively save all links to to the key value store

It should allow for:

1) Content to be loaded just as it looks (No dynamic content) 2) Links should work and point to a valid page

Thinking a bot more about its design:

1) Rather than referring to the actual link in the query parameters, it will be better to use a hash and have a reverse map of hash to url if required in future (or perhaps this can be stored in a comment in the HTML body that will be stored) 2) So the crawler will work like this: * Get HTML content * Insert the url crawled as a comment in the head of the HTML * Save this content to a key value db with the url hash as the key and modified HTML as the body. * Add a link to the above in some other place to start the navigation – something like sections – or index it using elasticsearch so that we can build a search engine on the articles.

This does sound like the wayback machine, but its objective is just to preserve content that I want to preserve – knowing the changes is not required at all, and this will mostly be a personal collection of things.

Some day I will find the time to implement this.