Skip to content

Commit

Permalink
Merge pull request #201 from RusticiSoftware/master
Browse files Browse the repository at this point in the history
Added more supporting methods for URN paths.
  • Loading branch information
rodneyrehm committed Mar 31, 2015
2 parents 3e44dcf + 158011d commit 55d5a98
Show file tree
Hide file tree
Showing 6 changed files with 173 additions and 35 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ Documents specifying how URLs work:
* [RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax](http://tools.ietf.org/html/rfc3986)
* [RFC 3987 - Internationalized Resource Identifiers (IRI)](http://tools.ietf.org/html/rfc3987)
* [RFC 2732 - Format for Literal IPv6 Addresses in URL's](http://tools.ietf.org/html/rfc2732)
* [RFC 2368 - The `mailto:` URL Scheme](https://www.ietf.org/rfc/rfc2368.txt)
* [RFC 2141 - URN Syntax](https://www.ietf.org/rfc/rfc2141.txt)
* [IANA URN Namespace Registry](http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml)
* [Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)](http://tools.ietf.org/html/rfc3492)
* [application/x-www-form-urlencoded](http://www.w3.org/TR/REC-html40/interact/forms.html#form-content-type) (Query String Parameters) and [application/x-www-form-urlencoded encoding algorithm](http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#application/x-www-form-urlencoded-encoding-algorithm)
* [What every web developer must know about URL encoding](http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding)
Expand Down Expand Up @@ -243,6 +246,13 @@ URI.js is published under the [MIT license](http://www.opensource.org/licenses/m

### master (will become 1.15.0)

* fixed [`.pathname()`](http://medialize.github.io/URI.js/docs.html#accessors-pathname) to properly en/decode URN paths - ([Issue #201](https://github.com/medialize/URI.js/pull/201), [mlefoster](https://github.com/mlefoster))
* fixing URI normalization to properly handle URN paths based on [RFC 2141](https://www.ietf.org/rfc/rfc2141.txt) syntax - ([Issue #201](https://github.com/medialize/URI.js/pull/201), [mlefoster](https://github.com/mlefoster))
* fixed [`.normalize()`](http://medialize.github.io/URI.js/docs.html#normalize) and [`.normalizePath()`](http://medialize.github.io/URI.js/docs.html#normalize-path) to properly normalize URN paths
* added `URI.encodeUrnPathSegment()`
* added `URI.decodeUrnPathSegment()`
* added `URI.decodeUrnPath()`
* added `URI.recodeUrnPath()`
* fixing `URI(undefined)` to throw TypeError - ([Issue #189](https://github.com/medialize/URI.js/issues/189)) - tiny backward-compatibility-break

### 1.14.2 (February 25th 2015) ###
Expand Down
38 changes: 34 additions & 4 deletions about-uris.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,39 @@ <h2>Understanding URIs</h2>
</p>

<p>
URLs are used to address the individual resources of your website.
URNs are usually used for hooking into other applications, as <code>mailto:</code>, <code>magnet:</code> or <code>spotify:</code> suggest.
While RFC 3986 defines the structure of an URL in depth, URNs are not. The structure (and meaning) of URNs are up to their distinct specifications.
URNs <em>name</em> a resource.
They are (supposed to) designate a globally unique, permanent identifier for that resource.
For example, the URN <code>urn:isbn:0201896834</code> uniquely identifies Volume 1 of Donald Knuth's <em>The Art of Computer Porgramming</em>.
Even if that book goes out of print, that URN will continue to identify that particular book in that particular printing.
While the term &quot;URN&quot; <em>technically</em> refers to a specific URI scheme laid out by <a href="http://tools.ietf.org/html/rfc2141">RFC 2141</a>,
the previously-mentioned RFC 3986 indicates that in common usage &quot;URN&quot; refers to any kind of URI that identifies a resource.
</p>

<p>
URLs <em>locate</em> a resource.
They designate a protocol to use when looking up the resource and provide an &quot;address&quot; for finding the resource within that scheme.
For example, the URL <code><a href="http://tools.ietf.org/html/rfc3986">http://tools.ietf.org/html/rfc3986</a></code> tells the consumer (most likely a web browser)
to use the HTTP protocol to access whatever site is found at the <code>/html/rfc3986</code> path of <code>tools.ietf.org</code>.
URLs are not permanent; it is possible that in the future that the IETF will move to a different domain or even that some other organization will acquire the rights to <code>tools.ietf.org</code>.
It is also possible that multiple URLs may locate the same resource;
for example, an admin at the IETF might be able to access the document found at the example URL via the <code>ftp://</code> protocol.
</p>

<h2>URLs and URNs in URI.js</h2>

<p>
The distinction between URLs and URNs is one of <strong>semantics</strong>.
In principle, it is impossible to tell, on a purely syntactical level, whether a given URI is a URN or a URL without knowing more about its scheme.
Practically speaking, however, URIs that look like HTTP URLs (scheme is followed by a colon and two slashes, URI has an authority component, and paths are delimited by slashes) tend to be URLs,
and URIs that look like RFC 2141 URNs (scheme is followed by a colon, no authority component, and paths are delimited by colons) tend to be URNs (in the broad sense of &quot;URIs that name&quot;).
</p>

<p>
So, for the purposes of URI.js, the distinction between URLs and URNs is treated as one of <strong>syntax</strong>.
The main functional differences between the two are that (1) URNs will not have an authority element and
(2) when breaking the path of the URI into segments, the colon will be used as the delimiter rather than the slash.
The most surprising result of this is that <code>mailto:</code> URLs will be considered by URI.js to be URNs rather than URLs.
That said, the functional differences will not adversely impact the handling of those URLs.
</p>

<h2 id="components">Components of an URI</h2>
Expand Down Expand Up @@ -108,7 +138,7 @@ <h3 id="components-urn">Components of an <abbr title="Uniform Resource Name">URN
</span> <a href="docs.html#accessors-protocol">scheme</a> <a href="docs.html#accessors-pathname">path</a> &amp; <a href="docs.html#accessors-segment">segment</a> <a href="docs.html#accessors-search">query</a> <a href="docs.html#accessors-hash">fragment</a>
</pre>

<p>While <a href="http://tools.ietf.org/html/rfc3986">RFC 3986</a> does not define URNs having a query or fragment component, URI.js enables these accessors for convenience.</p>
<p>While <a href="http://tools.ietf.org/html/rfc2141">RFC 2141</a> does not define URNs having a query or fragment component, URI.js enables these accessors for convenience.</p>

<h2 id="problems">URLs - Man Made Problems</h2>

Expand Down
2 changes: 1 addition & 1 deletion docs.html
Original file line number Diff line number Diff line change
Expand Up @@ -579,7 +579,7 @@ <h3 id="is">is()</h3>
<dl>
<dt><code>relative</code></dt><dd><code>true</code> if URL doesn't have a hostname</dd>
<dt><code>absolute</code></dt><dd><code>true</code> if URL has a hostname</dd>
<dt><code>urn</code></dt><dd><code>true</code> if URI is a URN</dd>
<dt><code>urn</code></dt><dd><code>true</code> if URI looks like a URN</dd>
<dt><code>url</code></dt><dd><code>true</code> if URI is a URL</dd>
<dt><code>domain</code>, <code>name</code></dt><dd><code>true</code> if hostname is not an IP</dd>
<dt><code>sld</code></dt><dd><code>true</code> if hostname is a second level domain (i.e. "example.co.uk")</dd>
Expand Down
8 changes: 4 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ <h2>Examples</h2>
// required src/URI.fragmentURI.js to be loaded</pre>

<p>How do you like parsing URNs?</p>
<pre class="prettyprint lang-js">var uri = URI("mailto:hello@example.org?subject=hello");
uri.protocol() == "mailto";
uri.path() == "hello@example.org";
uri.query() == "subject=hello";</pre>
<pre class="prettyprint lang-js">var uri = URI("urn:uuid:c5542ab6-3d96-403e-8e6b-b8bb52f48d9a?query=string");
uri.protocol() == "urn";
uri.path() == "uuid:c5542ab6-3d96-403e-8e6b-b8bb52f48d9a";
uri.query() == "query=string";</pre>

<p>How do you like URI Templating?</p>
<pre class="prettyprint lang-js">URI.expand("/foo/{dir}/{file}", {
Expand Down
121 changes: 95 additions & 26 deletions src/URI.js
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,42 @@
'%3D': '='
}
}
},
urnpath: {
// The characters under `encode` are the characters called out by RFC 2141 as being acceptable
// for usage in a URN. RFC2141 also calls out "-", ".", and "_" as acceptable characters, but
// these aren't encoded by encodeURIComponent, so we don't have to call them out here. Also
// note that the colon character is not featured in the encoding map; this is because URI.js
// gives the colons in URNs semantic meaning as the delimiters of path segements, and so it
// should not appear unencoded in a segment itself.
// See also the note above about RFC3986 and capitalalized hex digits.
encode: {
expression: /%(21|24|27|28|29|2A|2B|2C|3B|3D|40)/ig,
map: {
'%21': '!',
'%24': '$',
'%27': '\'',
'%28': '(',
'%29': ')',
'%2A': '*',
'%2B': '+',
'%2C': ',',
'%3B': ';',
'%3D': '=',
'%40': '@'
}
},
// These characters are the characters called out by RFC2141 as "reserved" characters that
// should never appear in a URN, plus the colon character (see note above).
decode: {
expression: /[\/\?#:]/g,
map: {
'/': '%2F',
'?': '%3F',
'#': '%23',
':': '%3A'
}
}
}
};
URI.encodeQuery = function(string, escapeQuerySpace) {
Expand All @@ -350,22 +386,6 @@
return string;
}
};
URI.recodePath = function(string) {
var segments = (string + '').split('/');
for (var i = 0, length = segments.length; i < length; i++) {
segments[i] = URI.encodePathSegment(URI.decode(segments[i]));
}

return segments.join('/');
};
URI.decodePath = function(string) {
var segments = (string + '').split('/');
for (var i = 0, length = segments.length; i < length; i++) {
segments[i] = URI.decodePathSegment(segments[i]);
}

return segments.join('/');
};
// generate encode/decode path functions
var _parts = {'encode':'encode', 'decode':'decode'};
var _part;
Expand All @@ -387,8 +407,40 @@

for (_part in _parts) {
URI[_part + 'PathSegment'] = generateAccessor('pathname', _parts[_part]);
URI[_part + 'UrnPathSegment'] = generateAccessor('urnpath', _parts[_part]);
}

var generateSegmentedPathFunction = function(_sep, _codingFuncName, _innerCodingFuncName) {
return function(string) {
// Why pass in names of functions, rather than the function objects themselves? The
// definitions of some functions (but in particular, URI.decode) will occasionally change due
// to URI.js having ISO8859 and Unicode modes. Passing in the name and getting it will ensure
// that the functions we use here are "fresh".
var actualCodingFunc;
if (!_innerCodingFuncName) {
actualCodingFunc = URI[_codingFuncName];
} else {
actualCodingFunc = function(string) {
return URI[_codingFuncName](URI[_innerCodingFuncName](string));
};
}

var segments = (string + '').split(_sep);

for (var i = 0, length = segments.length; i < length; i++) {
segments[i] = actualCodingFunc(segments[i]);
}

return segments.join(_sep);
};
};

// This takes place outside the above loop because we don't want, e.g., encodeUrnPath functions.
URI.decodePath = generateSegmentedPathFunction('/', 'decodePathSegment');
URI.decodeUrnPath = generateSegmentedPathFunction(':', 'decodeUrnPathSegment');
URI.recodePath = generateSegmentedPathFunction('/', 'encodePathSegment', 'decode');
URI.recodeUrnPath = generateSegmentedPathFunction(':', 'encodeUrnPathSegment', 'decode');

URI.encodeReserved = generateAccessor('reserved', 'encode');

URI.parse = function(string, parts) {
Expand Down Expand Up @@ -946,9 +998,13 @@
p.pathname = function(v, build) {
if (v === undefined || v === true) {
var res = this._parts.path || (this._parts.hostname ? '/' : '');
return v ? URI.decodePath(res) : res;
return v ? (this._parts.urn ? URI.decodeUrnPath : URI.decodePath)(res) : res;
} else {
this._parts.path = v ? URI.recodePath(v) : '/';
if (this._parts.urn) {
this._parts.path = v ? URI.recodeUrnPath(v) : '';
} else {
this._parts.path = v ? URI.recodePath(v) : '/';
}
this.build(!build);
return this;
}
Expand Down Expand Up @@ -1624,6 +1680,7 @@
if (this._parts.urn) {
return this
.normalizeProtocol(false)
.normalizePath(false)
.normalizeQuery(false)
.normalizeFragment(false)
.build();
Expand Down Expand Up @@ -1670,16 +1727,22 @@
return this;
};
p.normalizePath = function(build) {
var _path = this._parts.path;
if (!_path) {
return this;
}

if (this._parts.urn) {
this._parts.path = URI.recodeUrnPath(this._parts.path);
this.build(!build);
return this;
}

if (!this._parts.path || this._parts.path === '/') {
if (this._parts.path === '/') {
return this;
}

var _was_relative;
var _path = this._parts.path;
var _leadingParents = '';
var _parent, _pos;

Expand Down Expand Up @@ -1763,9 +1826,12 @@

URI.encode = escape;
URI.decode = decodeURIComponent;
this.normalize();
URI.encode = e;
URI.decode = d;
try {
this.normalize();
} finally {
URI.encode = e;
URI.decode = d;
}
return this;
};

Expand All @@ -1776,9 +1842,12 @@

URI.encode = strictEncodeURIComponent;
URI.decode = unescape;
this.normalize();
URI.encode = e;
URI.decode = d;
try {
this.normalize();
} finally {
URI.encode = e;
URI.decode = d;
}
return this;
};

Expand Down
29 changes: 29 additions & 0 deletions test/test.js
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,20 @@
equal(u.pathname(), '/', 'empty absolute path');
equal(u.toString(), '/', 'empty absolute path to string');
});
test('URN paths', function() {
var u = new URI('urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66?foo=bar');
u.pathname('uuid:de305d54-75b4-431b-adb2-eb6b9e546013');
equal(u.pathname(), 'uuid:de305d54-75b4-431b-adb2-eb6b9e546013');
equal(u + '', 'urn:uuid:de305d54-75b4-431b-adb2-eb6b9e546013?foo=bar');

u.pathname('');
equal(u.pathname(), '', 'changing pathname ""');
equal(u+'', 'urn:?foo=bar', 'changing url ""');

u.pathname('music:classical:Béla Bártok%3a Concerto for Orchestra');
equal(u.pathname(), 'music:classical:B%C3%A9la%20B%C3%A1rtok%3A%20Concerto%20for%20Orchestra', 'path encoding');
equal(u.pathname(true), 'music:classical:Béla Bártok%3A Concerto for Orchestra', 'path decoded');
});
test('query', function() {
var u = new URI('http://example.org/foo.html');
u.query('foo=bar=foo');
Expand Down Expand Up @@ -1050,6 +1064,20 @@
u = URI('/../../../../../www/common/js/app/../../../../www_test/common/js/app/views/view-test.html');
u.normalize();
equal(u.path(), '/www_test/common/js/app/views/view-test.html', 'parent absolute');

// URNs
u = URI('urn:people:authors:poets:Shel Silverstein');
u.normalize();
equal(u.path(), 'people:authors:poets:Shel%20Silverstein');

u = URI('urn:people:authors:philosophers:Søren Kierkegaard');
u.normalize();
equal(u.path(), 'people:authors:philosophers:S%C3%B8ren%20Kierkegaard');

// URNs path separator preserved
u = URI('urn:games:cards:Magic%3A the Gathering');
u.normalize();
equal(u.path(), 'games:cards:Magic%3A%20the%20Gathering');
});
test('normalizeQuery', function() {
var u = new URI('http://example.org/foobar.html?');
Expand Down Expand Up @@ -1559,6 +1587,7 @@

equal(URI.decodeQuery('%%20'), '%%20', 'malformed URI component returned');
equal(URI.decodePathSegment('%%20'), '%%20', 'malformed URI component returned');
equal(URI.decodeUrnPathSegment('%%20'), '%%20', 'malformed URN component returned');
});
test('encodeQuery', function() {
var escapeQuerySpace = URI.escapeQuerySpace;
Expand Down

0 comments on commit 55d5a98

Please sign in to comment.