This site is supported by donations to The OEIS Foundation.

Conservation of web pages

From OeisWiki
Jump to: navigation, search

Conservation of web pages linked to from OEIS

Neil Sloane, Oct 08, 2013:

In 50 years most of the present links to web pages won't work any more, unless we have a backup copy on the OEIS web site. So we should start making copies of all the private web pages that we link to from the sequence entries. If possible we try to get permission first, of course, but in many cases it may already be too late for that, and we will have to use the Wayback Machine to recover the page. What I recommend is that we have a duplicate for each link:

  • J. Smith, <a href="etc etc, [the original link]
  • J. Smith, <a href="a123456.***> [cached copy]

For an example where we have a successfully cached copy, see A213000. For an example of an unsuccessful cache, see A080104. Furthermore, Hugo P. has pointed out to me that many publishers seem to have changed their policy, and articles that were formerly free are now hidden behind pay walls. So that links that used to work don't any more. I'm not sure how to deal with that problem.

Anyway, there are a LOT of links in the OEIS, so this is something where everyone can help. This "crowd-sourcing" may be the only way to solve the problem. If you see a link in A123456, say, to J. Smith, <a href="http://example.com/homepage/file.html">Title</a>, then:

  • ask permission from J. Smith, explaining why we are doing this (basically because we hope the OEIS will be around for hundreds of years, and so it is to everyone's advantage to preserve these pages),
  • and then make a copy called a123456.html (the file name should start with the A-number, with a lower case a)
  • edit A123456 to upload a123456.html,and create a link saying
    • J. Smith, <a href="a123456.html">Title</a> [Cached copy]
  • The same thing with jpg, gif, pdf, etc files.
  • Of course if there are subsidiary files called by the page, you need to copy them too.

If you have questions, ask N. J. A. Sloane, David Applegate, Russ Cox, or Charles Greathouse for help.

Comment from Joseph S. Myers, Oct 08 2013: There have also been changes in the other direction.

  • For example, http://www.elsevier.com/about/open-access/open-archives lists 97 Elsevier journals in which older articles (typically more than 48 months old) are open access and could usefully have references in OEIS turned into links.
  • There are plenty of other journals where old references could usefully be turned into links as well (Mathematics of Computation is one with a large numbers of references in OEIS - typically as Math. Comp. - and open archives up to five years ago).

Comment from Max Alekseyev, Oct 08 2013: There is an on-demand archiving service WebCite http://en.wikipedia.org/wiki/WebCite. In fact, Wikipedia actively employs it: * http://en.wikipedia.org/wiki/Wikipedia:Using_WebCite

Comment from Brendan McKay, Oct 10 2013:

  • A good fraction of pages on the web can't be adequately archived just by saving the html file initially linked to, and there might not even be one. You have to save images as well, and active elements (that use a program running on the server to compute content) are more or less impossible to archive. The fraction of web pages generated directly from an html file decreases every year.
  • You could instead consider archiving an image of the linked page, rather than its html content. That won't handle web sites with complex structure, but it will permanently record what first appeared when the link was clicked. Presumably there are tools for this.
  • Even without that problem, I frankly can't see this ever getting more than partial coverage, due to the labour of asking permission.
  • As far as I know permission is not legally required for archiving, and especially not for archiving the page image (otherwise Google is in billions of troubles).
  • I suggest you instead create an opt-out system and automatically archive external links as they are added.