View Issue Details

IDProjectCategoryView StatusLast Update
0004684SOGosogo-toolpublic2020-09-17 10:28
Reporterzhb Assigned Toludovic  
PriorityurgentSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version4.0.4 
Fixed in Version5.0.0 
Summary0004684: sogo-tool on FreeBSD 12.0 always triggers MySQL error "Got an error reading communication packets"
Description
  • OS: FreeBSD 12.0
  • SOGo: 4.0.4 (installed from FreeBSD ports tree, 4.0.5 is not yet available)
  • MySQL: 5.7.25

We run 3 "sogo-tool" commands with cron daemon, and it causes mysql error EVERY time the command runs:

/usr/local/GNUstep/Local/Tools/Admin/sogo-tool expire-sessions 30
/usr/local/GNUstep/Local/Tools/Admin/sogo-ealarms-notify
/usr/local/GNUstep/Local/Tools/Admin/sogo-tool update-autoreply -p /usr/local/etc/sogo/sieve.cred

Running them manually causes same mysql error:

2019-02-19T11:59:33.502480Z 2883 [Note] Aborted connection 2883 to db: 'sogo' user: 'sogo' host: 'localhost' (Got an error reading communication packets)

Seems sogo-tool / sogo-ealarms-notify don't close MySQL connection correctly. This error causes MySQL global status variable "Aborted_clients" increased, when it reaches max errors, SOGo can not connect to MySQL.

+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Aborted_clients | 1456 |
| Aborted_connects | 0 |
+------------------+-------+
MySQL timeout settings:

+-----------------------------+----------+
| Variable_name | Value |
+-----------------------------+----------+
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| have_statement_timeout | YES |
| innodb_flush_log_at_timeout | 1 |
| innodb_lock_wait_timeout | 50 |
| innodb_rollback_on_timeout | OFF |
| interactive_timeout | 28800 |
| lock_wait_timeout | 31536000 |
| net_read_timeout | 30 |
| net_write_timeout | 60 |
| rpl_stop_slave_timeout | 31536000 |
| slave_net_timeout | 60 |
| wait_timeout | 28800 |
+-----------------------------+----------+

Steps To Reproduce

Install SOGo from ports tree on FreeBSD 12.x.

TagsNo tags attached.

Activities

zhb

zhb

2019-02-19 22:52

reporter   ~0013398

Update: when it reaches max (mysql) errors, mysql is unreachable by SOGo and other applications like Postfix/Dovecot.

Christian Mack

Christian Mack

2019-02-20 04:11

developer   ~0013399

Which SOPE Version do you use?

zhb

zhb

2019-02-20 04:44

reporter   ~0013403

It's sope4-4.0.4. Installed from FreeBSD ports tree too.

zhb

zhb

2019-03-11 00:58

reporter   ~0013434

Dear developers,

Any update? This issue drives all FreeBSD users crazy. :(

nycvelo

nycvelo

2019-03-12 21:27

reporter   ~0013443

Also seeing the same issue since upgrading to FreeBSD 12.0-RELEASE and MySQL 5.7.25 client and server installed from ports. Haven't yet detected any pattern to the frequency of MySQL becoming unreachable. A very kludgy workaround is a little script that monitors the MySQL error log and restarts the database whenever the "Bad handshake" message count increments -- but that really is a hack. The system has plenty of CPU, memory, disk, etc., and there's no apparent reason why MySQL should lock up. It ran OK on previous versions of FreeBSD.

Peter2121

Peter2121

2019-03-13 04:53

reporter   ~0013444

I confirm the same problem on FreeBSD 11.2.

nycvelo

nycvelo

2019-12-05 12:49

reporter   ~0013954

Any word on a fix for this? Nine months later, the error is still present with FreeBSD 12.1-RELEASE-p1, sogo 4.1.1 and sope 4.1.1 installed from ports.

Neutro

Neutro

2020-01-20 18:00

reporter   ~0014077

I have the same problem on newly released Debian 10. Would appreciate any help from the devs :)

zhb

zhb

2020-02-13 11:24

reporter   ~0014160

It happens on a Ubuntu 18.04.4 box too.

zhb

zhb

2020-02-17 09:23

reporter   ~0014166

Today i found it happens on a Ubuntu 16.04 box too.

jayanezj

jayanezj

2020-03-06 07:45

reporter   ~0014202

The problem continues in Sogo 4.3.1 on Debian 10

church

church

2020-07-08 14:04

reporter   ~0014472

Same problem found in CentOS 8 running SOGo 4.3.2 with MariaDB 10.3. This is really unacceptable that the issue has been present for so long without being paid attention to, as it requires frequent restarts of the database server to keep up with the absurd amount of errors.

ludovic

ludovic

2020-07-09 15:44

administrator   ~0014496

Which one of the 3 "sogo-tool" commands being ran cause the error count to increase?

church

church

2020-07-09 16:32

reporter   ~0014497

No way to tell. Given the frequency they run at (every minute) and the incrementing rate of failed connections, all three. I've run each command manually on the CLI and verified that their output (with the --verbose flag) works after redirecting output to stdout. Nothing unusual in the output, commands seem to be "working" but the mysql errors continue. In addition, my sogo.log is filled with warnings like this (just an excerpt, there are literally thousands in the log):

Jul 09 16:21:06 sogod [1257]: [WARN] <0x0x55cb83720ef0[WOWatchDogChild]> pid 196061 has been hanging in the same request for 6 minutes
Jul 09 16:21:08 sogod [1257]: [WARN] <0x0x55cb834a6500[WOWatchDogChild]> pid 196140 has been hanging in the same request for 5 minutes
Jul 09 16:21:11 sogod [1257]: [WARN] <0x0x55cb8381c680[WOWatchDogChild]> pid 196141 has been hanging in the same request for 5 minutes
Jul 09 16:21:59 sogod [1257]: [WARN] <0x0x55cb837204c0[WOWatchDogChild]> pid 196257 has been hanging in the same request for 3 minutes
Jul 09 16:22:06 sogod [1257]: [WARN] <0x0x55cb83720ef0[WOWatchDogChild]> pid 196061 has been hanging in the same request for 7 minutes
Jul 09 16:22:08 sogod [1257]: [WARN] <0x0x55cb834a6500[WOWatchDogChild]> pid 196140 has been hanging in the same request for 6 minutes
Jul 09 16:22:11 sogod [1257]: [WARN] <0x0x55cb8381c680[WOWatchDogChild]> pid 196141 has been hanging in the same request for 6 minutes
Jul 09 16:22:59 sogod [1257]: [WARN] <0x0x55cb837204c0[WOWatchDogChild]> pid 196257 has been hanging in the same request for 4 minutes
Jul 09 16:23:06 sogod [1257]: [WARN] <0x0x55cb83720ef0[WOWatchDogChild]> pid 196061 has been hanging in the same request for 8 minutes
Jul 09 16:23:08 sogod [1257]: [WARN] <0x0x55cb834a6500[WOWatchDogChild]> pid 196140 has been hanging in the same request for 7 minutes
Jul 09 16:23:11 sogod [1257]: [WARN] <0x0x55cb8381c680[WOWatchDogChild]> pid 196141 has been hanging in the same request for 7 minutes
Jul 09 16:23:59 sogod [1257]: [WARN] <0x0x55cb837204c0[WOWatchDogChild]> pid 196257 has been hanging in the same request for 5 minutes
Jul 09 16:24:06 sogod [1257]: [WARN] <0x0x55cb83720ef0[WOWatchDogChild]> pid 196061 has been hanging in the same request for 9 minutes
Jul 09 16:24:08 sogod [1257]: [WARN] <0x0x55cb834a6500[WOWatchDogChild]> pid 196140 has been hanging in the same request for 8 minutes
Jul 09 16:24:11 sogod [1257]: [WARN] <0x0x55cb8381c680[WOWatchDogChild]> pid 196141 has been hanging in the same request for 8 minutes
Jul 09 16:24:59 sogod [1257]: [WARN] <0x0x55cb837204c0[WOWatchDogChild]> pid 196257 has been hanging in the same request for 6 minutes
Jul 09 16:25:08 sogod [1257]: [WARN] <0x0x55cb834a6500[WOWatchDogChild]> pid 196140 has been hanging in the same request for 9 minutes
Jul 09 16:25:11 sogod [1257]: [WARN] <0x0x55cb8381c680[WOWatchDogChild]> pid 196141 has been hanging in the same request for 9 minutes
Jul 09 16:25:38 sogod [1257]: [WARN] <0x0x55cb83720e40[WOWatchDogChild]> pid 196629 has been hanging in the same request for 1 minutes

I've matched some of these PIDs to the ones spawned by the sogo-tool cron jobs, so there appears to be a correlation. Band-aid solution is to restart the database server, change the error limit, or disable the cron jobs, but the issue itself is just plain bizarre, especially given that by and large, SOGo still "works" and the commands execute properly on the sogo-tool side.

TL;DR -- all three commands cause the error count to increase, to the best of my knowledge.

ludovic

ludovic

2020-07-09 16:40

administrator   ~0014498

The hanging pid has nothing to do with sogo-tool - those are from EAS clients and perfectly normal.

church

church

2020-07-09 16:46

reporter   ~0014499

Understood. Can rule out log content from relevance then. Nothing else is out of place in sogo.log, and as mentioned, the commands execute properly, but the mysql server still hangs up on the connections, by rate of frequency, seemingly all three commands.

ludovic

ludovic

2020-07-09 16:47

administrator   ~0014500

Can you confirm by how many the error rate increases when you run a command manually? and which one you run.

zhb

zhb

2020-07-09 22:39

reporter   ~0014506

hi @ludovic,

Each time i ran the command (any of them), it caused one MySQL error.
I personally don't have a server with this issue right now, hope @church can help follow up and provide required info.

church

church

2020-07-09 23:17

reporter   ~0014507

@ludovic I've done a good deal of further testing, and interestingly enough, my issue actually isn't exclusively caused by sogo-tool. However, I was able to CONFIRM that all three above commands increment the error rate, by one instance of the error each. I assumed it was limited to sogo-tool because of the pre-existing issue and error code, but after disabling my cronjobs containing the above sogo-tool commands, I found that my mariadb logs were still filled with the error message.

The sogod service itself is also triggering this error, I have verified this by disabling the service and manually running the same command it does (using sudo to runas the sogo user):

/usr/sbin/sogod -WOWorkersCount 10 -WOPidFile /var/run/sogo/sogo.pid -WOLogFile /var/log/sogo/sogo.log

This command also results in the mysql error, at a staggering frequency since sogod runs this command so often. It seems to do so at a rate of about 10 per minute (would make sense -- 10 workers), which is why the errors stack up so quickly for me. I'm restarting mariadb once daily, any less frequently and the database isn't able to keep up. Before you ask -- yes selinux is disabled, and there are no other issues on the machine. So, on my machine, the error is triggered by any execution of "sogo-tool" or "sogo-ealarms-notify" or "sogod."

Related Changesets

sogo: master cecf157d

2020-07-10 10:22:48

ludovic

Details Diff
fix(core): avoid pooling channels with tools (fixes 0004684) Affected Issues
0004684
mod - SOPE/GDLContentStore/GCSFolder.h Diff File
mod - SOPE/GDLContentStore/GCSFolder.m Diff File
mod - Tools/SOGoToolBackup.m Diff File
mod - Tools/SOGoToolCheckupUser.m Diff File
mod - Tools/SOGoToolCleanupUser.m Diff File
mod - Tools/SOGoToolExpireUserSessions.m Diff File
mod - Tools/SOGoToolManageACL.m Diff File
mod - Tools/SOGoToolRemove.m Diff File
mod - Tools/SOGoToolRemoveDoubles.m Diff File
mod - Tools/SOGoToolRenameUser.m Diff File
mod - Tools/SOGoToolRestore.m Diff File
mod - Tools/SOGoToolTruncateCalendar.m Diff File
mod - Tools/sogo-tool.m Diff File

Issue History

Date Modified Username Field Change
2019-02-19 22:50 zhb New Issue
2019-02-19 22:52 zhb Note Added: 0013398
2019-02-20 04:11 Christian Mack Note Added: 0013399
2019-02-20 04:44 zhb Note Added: 0013403
2019-03-11 00:58 zhb Note Added: 0013434
2019-03-12 21:27 nycvelo Note Added: 0013443
2019-03-13 04:53 Peter2121 Note Added: 0013444
2019-12-05 12:49 nycvelo Note Added: 0013954
2020-01-20 18:00 Neutro Note Added: 0014077
2020-02-13 11:24 zhb Note Added: 0014160
2020-02-17 09:23 zhb Note Added: 0014166
2020-03-06 07:45 jayanezj Note Added: 0014202
2020-07-08 14:04 church Note Added: 0014472
2020-07-09 15:44 ludovic Note Added: 0014496
2020-07-09 16:32 church Note Added: 0014497
2020-07-09 16:40 ludovic Note Added: 0014498
2020-07-09 16:46 church Note Added: 0014499
2020-07-09 16:47 ludovic Note Added: 0014500
2020-07-09 22:39 zhb Note Added: 0014506
2020-07-09 23:17 church Note Added: 0014507
2020-07-10 10:23 ludovic Changeset attached => sogo master cecf157d
2020-07-10 10:23 ludovic Assigned To => ludovic
2020-07-10 10:23 ludovic Resolution open => fixed
2020-07-10 10:43 francis Status new => resolved
2020-07-10 10:43 francis Fixed in Version => 5.0.0