After a few month´s, on our running system we have still problems with corosync to running stable.
I found following entry.
have been using Corosync with Pacemaker for almost a year in many different production systems, so far I haven’t hit any problems, but now I hit something that causes me trouble: I have a 3-node PostgreSQL cluster set up (two actual database nodes and one witness server). It took me quite some time to get this setup configured well, because we are using Ubuntu 14.04 LTS and it does everything a little bit different, but in the end I was able to create a cluster that worked well, data was replicated master-slave roles established, failover happened seamlessly so I went home. Next day in the morning the whole cluster was dead. Analyzing the logs it turned out that it run out of memory during the night. Since then I have been monitoring all the nodes, and it turns out that Corosync is the one responsible for this: I have a node that has been running for around 1 hour, and at the beginning Corosync used around 8% of memory. Now, after just one hour it is already using 25% RAM. It is easy to see that in some hours it is going to crash the node. Interesting thing is that all other nodes (Apache loadbalancers mainly) don’t have that problem despite the fact they are running almost the same setup: Ubuntu 14.04 LTS, Corosync 2.3.3, Pacemaker 1.1.10 – all from the official Ubuntu repositories. On the psql nodes I had to switch to the latest libqb (0.17) because the official one (0.16) caused the whole cluster to freeze with 100% CPU usage. So except for the libqb version all nodes are the same, they just run different resources, the Apache ones are fine after many days of running, the postgres nodes are not as I said.
I would really like to fix this issue if possible, so I’m open to any ideas or suggestions.
I have the same problem.
I found same entry´s for libq
I think with an update of libary libq it must be work