View Issue Details

IDProjectCategoryView StatusLast Update
0002571SOGo Connectorpublic2021-09-17 20:23
ReporterYannik Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionwon't fix 
Summary0002571: URLs don't match error when URI contains '@'-char (Thunderbird Connector)
Description

When adding a remote addressbook using SOGo Thunderbird Connector with a URI containing a '@'-char, the sync fails with the following error message:

URLs don't match: https://domain.net/sabredav/addressbookserver.php/addressbooks/user%40domain.net/default/ vs. https://domain.net/sabredav/addressbookserver.php/addressbooks/user@domain.net/default/

It seems as if the html entity of the @-char isn't decoded properly.
I am using the latest version of the connector (24.0.2).
The sync works fine if the username doesn't contain a @-char, however this is mandatory for our setup as all usernames are set up to match the users email-address.

Steps To Reproduce
  1. Add addressbook with URI containing an '@'-char
  2. Try to sync (open thunderbird with -console option to see error output)
TagsNo tags attached.

Activities

Yannik

Yannik

2014-01-04 19:44

reporter   ~0006378

Last edited: 2014-01-04 19:49

Sorry, the links in the error message somehow broke.

Hopefully this time it will work (in code tags):

EDIT: Nope, the bugtracker keeps destroying my links even if using [code], <\pre> or <\code> tags :-(

I pasted the error message here: http://pastebin.com/hbacbswj

lweddewe

lweddewe

2014-01-22 21:07

reporter   ~0006419

it looks as if German umlauts are also affected. A corresponding URL of an address book on an ownCloud server did not work although a second address book on the same server works fine. The only difference was that the first adress book has an umlaut in the URL.

altxt

altxt

2014-04-18 04:30

reporter   ~0006931

Percent-encoding the URL seems to help, please review
https://github.com/inverse-inc/sogo-connector.tb24/pull/12

altxt

altxt

2014-05-14 05:48

reporter   ~0007022

Ludovic, anyone? Can you please review this? The bug is kind of weird because sogo connector just does not sync if a username (or, more specifically, path) contains a '@'.

ludovic

ludovic

2014-05-14 11:17

administrator   ~0007023

Using SOGo as the server, there's no issue.

It seems the problem is present when the server sends encoded URI.

Normally the patch I pushed a few days ago should also fix that issue if the past path component doesn't have an encoded character in it. See and test:

https://github.com/inverse-inc/sogo-connector.tb24/commit/f0b78a52c22443eea8745458da672e2831ea3186

Alternatively, only that last part could be 'decoded' before comparison so a new patch would need to be submitted.

altxt

altxt

2014-05-16 06:27

reporter   ~0007029

Yes, I forgot to mention that this problem is encountered with a third-party server.

And yes, commit f0b78 solves the problem, because it was the sogo connector who noticed the URL inequality and failed to sync.

However, according to section 2 of RFC 3986 http://tools.ietf.org/html/rfc3986#section-2 , URIs must only contain allowed, non-reserved characters, and any reserved characters, @ included, must be percent-escaped. So it is generally at the grace of a server to work with a non-escaped @, like sogo, or accept a non-escaped URL but return an escaped URL, like sabredav, as opposed to producing an error at seeing a misformatted URL.

So, do you think it is necessary to follow the RFC and encode the URL? Then something like my patch is required to encode the URI typed in the address book properties dialog before saving it in the settings (and decoding again before showing it).
Otherwise, no further action is needed, because everything seems to work.

evert

evert

2014-05-26 16:18

reporter   ~0007122

Hey guys,

This is a mis-interpretation of the specifications. Some characters in the path part of the url are 'reserved' but not 'special'. For these characters (@ being one of them) percent-encoding is not required, but optional. The wikipedia page has this sentence that exactly describes this:

Reserved characters that have no reserved purpose in a particular context may also be percent-encoded but are not semantically different from those that are not.

So instead of attempting to decode @ specifically, if it's in the url, what you really should be doing is canonicalizing any url before you do any comparison.

http://example.org/@ and http://example.org/%40 should in fact always be treated as identical.

@ is not the only character, the colon (:) often has similar problems in implementations.

It looks like the last two patches are getting closer to this, but there may be a few more characters that suffer from this.

SabreDAV just tries to do a good job for most clients, but unfortunately we also have to support clients that are buggy in a different way, and in fact expect almost every character to not be encoded, unless absolutely necessary.

Bonus problem: If the percent encoding of a character contains a hexidecimal digit, it may be percent-encoded either lower- or uppercase. These are also equivalent, and many clients don't correctly deal with this either.

evert

evert

2014-05-26 16:20

reporter   ~0007123

Example for the last sentence, the following three urls should be treated as identical as well:

http://example.org/~evert
http://example.org/%7eevert
http://example.org/%7Eevert

altxt

altxt

2014-05-27 08:59

reporter   ~0007125

Hi Evert,

Ludovic's patch https://github.com/inverse-inc/sogo-connector.tb24/commit/f0b78a52c22443eea8745458da672e2831ea3186 does not attempt to do proper comparison, it just extracts the host part and the last directory part of an URL and compares those, on the assumption that '@' or other inconvenient characters embedded in a username will fall in between the two and out of comparison.

My patch https://github.com/inverse-inc/sogo-connector.tb24/pull/12 attempts to properly encode all characters that need it at the UI level, so a request to a server contains an already encoded URL. Assuming that the server does not incorrectly decode an URL, comparison can be straight-forward.

Why I think that an '@' should be encoded unless it is separating username and password from the hostname:
RFC 3986, section 2.2
If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"

What you are right about and what I missed earlier is that while there exists a preferred form of an URL, where all reserved and non-ASCII characters are encoded, and all the others are not, it is acceptable to encode non-reserved characters:
RFC 3986, section 2.3
URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.

In the extreme case, a server and a client can have different ideas about which characters have to be encoded and which do not. Your example with an '~' illustrates this - I think an earlier specification, RFC 1738, said tilde was reserved. So any URL comparison code must indeed canonicalize the URL first.

But how exactly should it do it? Encoding and decoding are not idempotent, so running a percent-encode function on an URL received from the network is not an option - it will produce a doubly encoded URL which is basically junk. The same is true about running percent-decode on an URL received from the UI - it will mess up the URL if it happens to contain a '%'. So we must compare either
(a) networkUrl == percent-encode(localUrl) or
(b) percent-decode(networkUrl) == localUrl
Option (a) will not work because the remote end might have encoded different characters than our local implementation of percent-encode(). So it leaves option (b). If localUrl is a plain, non-encoded string, it seems it should work.

I'll try to make a patch for that.

ludovic

ludovic

2014-05-27 11:03

administrator   ~0007126

My patch was not meant to fix this "issue" at all - the rationale of my patch is provided in it.

evert

evert

2014-07-22 14:49

reporter   ~0007341

Hi Altxt,

Missed your comment on this ticket. I'm guessing I didn't get the notification.

Canonically comparing urls is not very easy. sabredav's handling of this is also not perfect, but it gets closer to the ultimately desired behavior.

What we do is basically just looking at the path part of the urls (everything after the host/port), split every part on the slash, and urldecode every component in the path.

We split on the slash, because we want to ensure that if we're getting an incoming encoded slash, we don't mistake that for a directory separator.

We also check the encoding of every component in the path, if theres non-ascii characters. We attempt to normalize every component to UTF-8, because some clients encode paths as the urlencode representation of ISO-8559-1 or CP-1252.

@ludovic. Bit confused about the quotes around "issue". Are you implying that this is not considered a bug for you?

francis

francis

2021-09-17 20:23

administrator   ~0015475

No longer relevant. Open new ticket if necessary.

Issue History

Date Modified Username Field Change
2014-01-04 19:41 Yannik New Issue
2014-01-04 19:44 Yannik Note Added: 0006378
2014-01-04 19:45 Yannik Note Edited: 0006378
2014-01-04 19:46 Yannik Note Edited: 0006378
2014-01-04 19:46 Yannik Note Edited: 0006378
2014-01-04 19:48 Yannik Note Edited: 0006378
2014-01-04 19:48 Yannik Note Edited: 0006378
2014-01-04 19:49 Yannik Note Edited: 0006378
2014-01-22 21:07 lweddewe Note Added: 0006419
2014-04-18 04:30 altxt Note Added: 0006931
2014-05-14 05:48 altxt Note Added: 0007022
2014-05-14 11:17 ludovic Note Added: 0007023
2014-05-16 06:27 altxt Note Added: 0007029
2014-05-26 16:18 evert Note Added: 0007122
2014-05-26 16:20 evert Note Added: 0007123
2014-05-27 08:59 altxt Note Added: 0007125
2014-05-27 11:03 ludovic Note Added: 0007126
2014-07-22 14:49 evert Note Added: 0007341
2021-09-17 20:23 francis Status new => closed
2021-09-17 20:23 francis Resolution open => won't fix
2021-09-17 20:23 francis Note Added: 0015475