0002571: URLs don't match error when URI contains '@'-char (Thunderbird Connector) - SOGo

ID	Project	Category	View Status	Date Submitted	Last Update

0002571	SOGo Connector		public	2014-01-04 19:41	2021-09-17 20:23

Reporter	Yannik	Assigned To
Priority	normal	Severity	major	Reproducibility	always
Status	closed	Resolution	won't fix

Summary	0002571: URLs don't match error when URI contains '@'-char (Thunderbird Connector)
Description	When adding a remote addressbook using SOGo Thunderbird Connector with a URI containing a '@'-char, the sync fails with the following error message: URLs don't match: https://domain.net/sabredav/addressbookserver.php/addressbooks/user%40domain.net/default/ vs. https://domain.net/sabredav/addressbookserver.php/addressbooks/user@domain.net/default/ It seems as if the html entity of the @-char isn't decoded properly. I am using the latest version of the connector (24.0.2). The sync works fine if the username doesn't contain a @-char, however this is mandatory for our setup as all usernames are set up to match the users email-address.
Steps To Reproduce	Add addressbook with URI containing an '@'-char Try to sync (open thunderbird with -console option to see error output)
Tags	No tags attached.

Yannik 2014-01-04 19:44 reporter ~0006378 Last edited: 2014-01-04 19:49	Sorry, the links in the error message somehow broke. Hopefully this time it will work (in code tags): EDIT: Nope, the bugtracker keeps destroying my links even if using [code], <\pre> or <\code> tags :-( I pasted the error message here: http://pastebin.com/hbacbswj

lweddewe 2014-01-22 21:07 reporter ~0006419	it looks as if German umlauts are also affected. A corresponding URL of an address book on an ownCloud server did not work although a second address book on the same server works fine. The only difference was that the first adress book has an umlaut in the URL.

altxt 2014-04-18 04:30 reporter ~0006931	Percent-encoding the URL seems to help, please review https://github.com/inverse-inc/sogo-connector.tb24/pull/12

altxt 2014-05-14 05:48 reporter ~0007022	Ludovic, anyone? Can you please review this? The bug is kind of weird because sogo connector just does not sync if a username (or, more specifically, path) contains a '@'.

ludovic 2014-05-14 11:17 administrator ~0007023	Using SOGo as the server, there's no issue. It seems the problem is present when the server sends encoded URI. Normally the patch I pushed a few days ago should also fix that issue if the past path component doesn't have an encoded character in it. See and test: https://github.com/inverse-inc/sogo-connector.tb24/commit/f0b78a52c22443eea8745458da672e2831ea3186 Alternatively, only that last part could be 'decoded' before comparison so a new patch would need to be submitted.

altxt 2014-05-16 06:27 reporter ~0007029	Yes, I forgot to mention that this problem is encountered with a third-party server. And yes, commit f0b78 solves the problem, because it was the sogo connector who noticed the URL inequality and failed to sync. However, according to section 2 of RFC 3986 http://tools.ietf.org/html/rfc3986#section-2 , URIs must only contain allowed, non-reserved characters, and any reserved characters, @ included, must be percent-escaped. So it is generally at the grace of a server to work with a non-escaped @, like sogo, or accept a non-escaped URL but return an escaped URL, like sabredav, as opposed to producing an error at seeing a misformatted URL. So, do you think it is necessary to follow the RFC and encode the URL? Then something like my patch is required to encode the URI typed in the address book properties dialog before saving it in the settings (and decoding again before showing it). Otherwise, no further action is needed, because everything seems to work.

evert 2014-05-26 16:18 reporter ~0007122	Hey guys, This is a mis-interpretation of the specifications. Some characters in the path part of the url are 'reserved' but not 'special'. For these characters (@ being one of them) percent-encoding is not required, but optional. The wikipedia page has this sentence that exactly describes this: Reserved characters that have no reserved purpose in a particular context may also be percent-encoded but are not semantically different from those that are not. So instead of attempting to decode @ specifically, if it's in the url, what you really should be doing is canonicalizing any url before you do any comparison. http://example.org/@ and http://example.org/%40 should in fact always be treated as identical. @ is not the only character, the colon (:) often has similar problems in implementations. It looks like the last two patches are getting closer to this, but there may be a few more characters that suffer from this. SabreDAV just tries to do a good job for most clients, but unfortunately we also have to support clients that are buggy in a different way, and in fact expect almost every character to not be encoded, unless absolutely necessary. Bonus problem: If the percent encoding of a character contains a hexidecimal digit, it may be percent-encoded either lower- or uppercase. These are also equivalent, and many clients don't correctly deal with this either.

evert 2014-05-26 16:20 reporter ~0007123	Example for the last sentence, the following three urls should be treated as identical as well: http://example.org/~evert http://example.org/%7eevert http://example.org/%7Eevert

altxt 2014-05-27 08:59 reporter ~0007125	Hi Evert, Ludovic's patch https://github.com/inverse-inc/sogo-connector.tb24/commit/f0b78a52c22443eea8745458da672e2831ea3186 does not attempt to do proper comparison, it just extracts the host part and the last directory part of an URL and compares those, on the assumption that '@' or other inconvenient characters embedded in a username will fall in between the two and out of comparison. My patch https://github.com/inverse-inc/sogo-connector.tb24/pull/12 attempts to properly encode all characters that need it at the UI level, so a request to a server contains an already encoded URL. Assuming that the server does not incorrectly decode an URL, comparison can be straight-forward. Why I think that an '@' should be encoded unless it is separating username and password from the hostname: RFC 3986, section 2.2 If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed. reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" What you are right about and what I missed earlier is that while there exists a preferred form of an URL, where all reserved and non-ASCII characters are encoded, and all the others are not, it is acceptable to encode non-reserved characters: RFC 3986, section 2.3 URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. In the extreme case, a server and a client can have different ideas about which characters have to be encoded and which do not. Your example with an '~' illustrates this - I think an earlier specification, RFC 1738, said tilde was reserved. So any URL comparison code must indeed canonicalize the URL first. But how exactly should it do it? Encoding and decoding are not idempotent, so running a percent-encode function on an URL received from the network is not an option - it will produce a doubly encoded URL which is basically junk. The same is true about running percent-decode on an URL received from the UI - it will mess up the URL if it happens to contain a '%'. So we must compare either (a) networkUrl == percent-encode(localUrl) or (b) percent-decode(networkUrl) == localUrl Option (a) will not work because the remote end might have encoded different characters than our local implementation of percent-encode(). So it leaves option (b). If localUrl is a plain, non-encoded string, it seems it should work. I'll try to make a patch for that.

ludovic 2014-05-27 11:03 administrator ~0007126	My patch was not meant to fix this "issue" at all - the rationale of my patch is provided in it.

evert 2014-07-22 14:49 reporter ~0007341	Hi Altxt, Missed your comment on this ticket. I'm guessing I didn't get the notification. Canonically comparing urls is not very easy. sabredav's handling of this is also not perfect, but it gets closer to the ultimately desired behavior. What we do is basically just looking at the path part of the urls (everything after the host/port), split every part on the slash, and urldecode every component in the path. We split on the slash, because we want to ensure that if we're getting an incoming encoded slash, we don't mistake that for a directory separator. We also check the encoding of every component in the path, if theres non-ascii characters. We attempt to normalize every component to UTF-8, because some clients encode paths as the urlencode representation of ISO-8559-1 or CP-1252. @ludovic. Bit confused about the quotes around "issue". Are you implying that this is not considered a bug for you?

francis 2021-09-17 20:23 administrator ~0015475	No longer relevant. Open new ticket if necessary.

Date Modified	Username	Field	Change
2014-01-04 19:41	Yannik	New Issue
2014-01-04 19:44	Yannik	Note Added: 0006378
2014-01-04 19:45	Yannik	Note Edited: 0006378
2014-01-04 19:46	Yannik	Note Edited: 0006378
2014-01-04 19:46	Yannik	Note Edited: 0006378
2014-01-04 19:48	Yannik	Note Edited: 0006378
2014-01-04 19:48	Yannik	Note Edited: 0006378
2014-01-04 19:49	Yannik	Note Edited: 0006378
2014-01-22 21:07	lweddewe	Note Added: 0006419
2014-04-18 04:30	altxt	Note Added: 0006931
2014-05-14 05:48	altxt	Note Added: 0007022
2014-05-14 11:17	ludovic	Note Added: 0007023
2014-05-16 06:27	altxt	Note Added: 0007029
2014-05-26 16:18	evert	Note Added: 0007122
2014-05-26 16:20	evert	Note Added: 0007123
2014-05-27 08:59	altxt	Note Added: 0007125
2014-05-27 11:03	ludovic	Note Added: 0007126
2014-07-22 14:49	evert	Note Added: 0007341
2021-09-17 20:23	francis	Status	new => closed
2021-09-17 20:23	francis	Resolution	open => won't fix
2021-09-17 20:23	francis	Note Added: 0015475

View Issue Details

Activities

Issue History