This site is supported by donations to The OEIS Foundation.
URL
URLs are used to 'locate' resources, by providing an abstract identification of the resource location. Having located a resource, a system may perform a variety of operations on the resource, as might be characterized by such words as 'access', 'update', 'replace', 'find attributes'. In general, only the 'access' method needs to be specified for any URL scheme.
- —T. Berners-Lee, L. Masinter, M. McCahill.[1]
A Uniform Resource Locator (URL) (Uniform Resource Identifier (URI)) is used to locate (identify) a resource on the Internet.
Contents
Main parts of URLs
An generic URL consists of the two main parts[1]
- scheme:scheme-specific-part
The scheme (commonly called protocol) consists of a sequence of characters from
- the lower case letters "a" to "z" (case insensitive),
- digits "0" to "9",
- and the characters plus ("+"), period ("."), and hyphen ("-").
URL character encoding
The following characters must be URL encoded (i.e. "%" followed by two hexadecimal digits from 0123456789ABCDEF)[1]
- octets 00-1F and 7F (US-ASCII control characters),
- octets 80-FF (ANSI characters, not in US-ASCII),
- " " (space), "<" and ">", """ (quote), "#", "%" (used for encoding), "{", "}", "|", "\", "^", "~", "[", "]", and "`".
The following characters may be reserved for special meaning within scheme-specific-part (and then must be URL encoded)
- ";", "/", "?", ":", "@", "=" and "&".
May be used unencoded within a URL, only
- alphanumerics,
- the special characters "$", "-", "_", ".", "+", "!", "*", "'", "(", ")", ",",
- and reserved characters used for their reserved purposes.
URL schemes
Some URL schemes (protocols) are[1]
Scheme Default port[2] Descripttion RFC file Host-specific file names RFC 1738 ftp 21 File Transfer protocol RFC 959 gopher 70 The Gopher protocol RFC 4266 http 80 Hypertext Transfer Protocol RFC 2616 https 443 Hypertext Transfer Protocol Secure RFC 2618 imap 143 Internet Mail Access Protocol RFC 3501 ldap 389 Lightweight Directory Access Protocol RFC 4516 mailto Electronic mail address RFC 6068 news USENET news RFC 5538 nfs 2049 Network File System protocol RFC 3530 nntp 119 USENET news using NNTP (Network News Transfer Protocol) access RFC 5538 pop 110 Post Office Protocol v3 RFC 1939 smtp 25 Simple Mail Transfer Protocol RFC 2821 telnet 23 Reference to interactive sessions RFC 854
Common internet scheme syntax
Common syntax for the scheme-specific data[1]
- //user:password@host:port/path
Within the user and password field, any ":", "@", or "/" must be encoded. Note that the "/" between the host (or port) and the path is NOT part of the path.
FTP
This is the scheme used to designate (e.g. 'access', 'update', 'replace', 'find attributes') files and directories on Internet hosts accessible using the FTP protocol (RFC 959).[1]
A user and password may be supplied.
For "anonymous" FTP
- user is "anonymous"
- password is the Internet e-mail address of the end user accessing the resource.
The path of a FTP URL has the form
- cwd1/cwd2/.../cwdN/name;type=typecode
Where cwd1 through cwdN and name are (possibly encoded) strings and typecode is one of the characters "a", "i", or "d".
HTTP or HTTPS
The Hypertext Transfer Protocol (and Hypertext Transfer Protocol Secure) schemes (protocols) are used to access the World Wide Web, as specified in RFC 2616 and RFC 2618, respectively.[1]
An HTTP or HTTPS URL has the form
- scheme://server.domain:port/path?query_string#fragment_id
Every HTTP or HTTPS URL consists of some of
- the scheme (commonly called protocol) name, either http or https, followed by a colon (:),
- two slashes (//), then either
- a server name (e.g. www, ...) followed by a dot (.) then a domain name, or
- a server IP address followed by a dot (.) then a domain IP address,
- optionally: a colon (:) followed by a port number,
- the path of the resource to be fetched or the program to be run, then,
- for programs such as Common Gateway Interface (CGI) scripts, a question mark (?) followed by a query_string (search_part), and
- optionally: a hash sign (#) followed by a fragment_id (anchor_id).
Within the path, query_string (search_part) and fragment_id (anchor_id) components, "/", ";", "?" and "#" are reserved.
GOPHER
This is a scheme supporting items and collections of items (directories), as specified in RFC 1436 (base GOPHER protocol).[1]
A GOPHER URL has the form
MAILTO
This is a scheme used to designate the Internet mailing address of an individual or service.[1]
A MAILTO URL has the form
where rfc822-addr-spec is (the encoding of an) addr-spec, as specified in RFC 822.
NEWS
The is a scheme used to refer to either news groups or individual articles of USENET news, as specified in RFC 1036.[1]
A NEWS URL has one of two forms
A newsgroup-name is a period-delimited hierarchical name, such as "comp.infosystems.www.misc".
NNTP
The is an alternative scheme to reference news articles from NNTP servers, as specified in RFC 977.[1]
A NNTP URL has the form
TELNET
This is a scheme used to designate interactive services.[1]
A TELNET URL has the form
FILE
The is a scheme used to designate files accessible on a particular host computer.[1]
A FILE URL has the form
- file://host/path
As a special case, host can be the string "localhost" or the empty string; this is interpreted as the machine from which the URL is being interpreted.
See also
- {{URL}} OEIS Wiki utility template
Notes
External links
- Internet Corporation for Assigned Names and Numbers, IANA.
- IANA, Uniform Resource Identifier (URI) Schemes, © 2006.
- IANA, Service Name and Transport Protocol Port Number Registry.