This site is supported by donations to The OEIS Foundation.

URL

From OeisWiki
Jump to: navigation, search


This article needs more work.

Please help by expanding it!


URLs are used to 'locate' resources, by providing an abstract identification of the resource location. Having located a resource, a system may perform a variety of operations on the resource, as might be characterized by such words as 'access', 'update', 'replace', 'find attributes'. In general, only the 'access' method needs to be specified for any URL scheme.

—T. Berners-Lee, L. Masinter, M. McCahill.[1]

A Uniform Resource Locator (URL) (Uniform Resource Identifier (URI)) is used to locate (identify) a resource on the Internet.

Main parts of URLs

An generic URL consists of the two main parts[1]

scheme:scheme-specific-part

The scheme (commonly called protocol) consists of a sequence of characters from

  • the lower case letters "a" to "z" (case insensitive),
  • digits "0" to "9",
  • and the characters plus ("+"), period ("."), and hyphen ("-").

URL character encoding

The following characters must be URL encoded (i.e. "%" followed by two hexadecimal digits from 0123456789ABCDEF)[1]

  • octets 00-1F and 7F (US-ASCII control characters),
  • octets 80-FF (ANSI characters, not in US-ASCII),
  • " " (space), "<" and ">", """ (quote), "#", "%" (used for encoding), "{", "}", "|", "\", "^", "~", "[", "]", and "`".

The following characters may be reserved for special meaning within scheme-specific-part (and then must be URL encoded)

  • ";", "/", "?", ":", "@", "=" and "&".

May be used unencoded within a URL, only

  • alphanumerics,
  • the special characters "$", "-", "_", ".", "+", "!", "*", "'", "(", ")", ",",
  • and reserved characters used for their reserved purposes.

URL schemes

Some URL schemes (protocols) are[1]

Scheme Default port[2] Descripttion RFC
file Host-specific file names RFC 1738
ftp 21 File Transfer protocol RFC 959
gopher 70 The Gopher protocol RFC 4266
http 80 Hypertext Transfer Protocol RFC 2616
https 443 Hypertext Transfer Protocol Secure RFC 2618
imap 143 Internet Mail Access Protocol RFC 3501
ldap 389 Lightweight Directory Access Protocol RFC 4516
mailto Electronic mail address RFC 6068
news USENET news RFC 5538
nfs 2049 Network File System protocol RFC 3530
nntp 119 USENET news using NNTP (Network News Transfer Protocol) access RFC 5538
pop 110 Post Office Protocol v3 RFC 1939
smtp 25 Simple Mail Transfer Protocol RFC 2821
telnet 23 Reference to interactive sessions RFC 854

Common internet scheme syntax

Common syntax for the scheme-specific data[1]

//user:password@host:port/path

Within the user and password field, any ":", "@", or "/" must be encoded. Note that the "/" between the host (or port) and the path is NOT part of the path.

FTP

This is the scheme used to designate (e.g. 'access', 'update', 'replace', 'find attributes') files and directories on Internet hosts accessible using the FTP protocol (RFC 959).[1]

A user and password may be supplied.

For "anonymous" FTP

  • user is "anonymous"
  • password is the Internet e-mail address of the end user accessing the resource.

The path of a FTP URL has the form

cwd1/cwd2/.../cwdN/name;type=typecode

Where cwd1 through cwdN and name are (possibly encoded) strings and typecode is one of the characters "a", "i", or "d".

HTTP or HTTPS

The Hypertext Transfer Protocol (and Hypertext Transfer Protocol Secure) schemes (protocols) are used to access the World Wide Web, as specified in RFC 2616 and RFC 2618, respectively.[1]

An HTTP or HTTPS URL has the form

scheme://server.domain:port/path?query_string#fragment_id

Every HTTP or HTTPS URL consists of some of

  • the scheme (commonly called protocol) name, either http or https, followed by a colon (:),
  • two slashes (//), then either
    • a server name (e.g. www, ...) followed by a dot (.) then a domain name, or
    • a server IP address followed by a dot (.) then a domain IP address,
  • optionally: a colon (:) followed by a port number,
  • the path of the resource to be fetched or the program to be run, then,
  • for programs such as Common Gateway Interface (CGI) scripts, a question mark (?) followed by a query_string (search_part), and
  • optionally: a hash sign (#) followed by a fragment_id (anchor_id).

Within the path, query_string (search_part) and fragment_id (anchor_id) components, "/", ";", "?" and "#" are reserved.

GOPHER

This is a scheme supporting items and collections of items (directories), as specified in RFC 1436 (base GOPHER protocol).[1]

A GOPHER URL has the form

gopher://host:port/gopher-path

MAILTO

This is a scheme used to designate the Internet mailing address of an individual or service.[1]

A MAILTO URL has the form

mailto:rfc822-addr-spec

where rfc822-addr-spec is (the encoding of an) addr-spec, as specified in RFC 822.

NEWS

The is a scheme used to refer to either news groups or individual articles of USENET news, as specified in RFC 1036.[1]

A NEWS URL has one of two forms

news:newsgroup-name
news:message-id

A newsgroup-name is a period-delimited hierarchical name, such as "comp.infosystems.www.misc".

NNTP

The is an alternative scheme to reference news articles from NNTP servers, as specified in RFC 977.[1]

A NNTP URL has the form

nntp://host:port/newsgroup-name/article-number

TELNET

This is a scheme used to designate interactive services.[1]

A TELNET URL has the form

telnet://user:password@host:port/

FILE

The is a scheme used to designate files accessible on a particular host computer.[1]

A FILE URL has the form

file://host/path

As a special case, host can be the string "localhost" or the empty string; this is interpreted as the machine from which the URL is being interpreted.

See also

  • {{URL}} OEIS Wiki utility template

Notes

External links