-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Dear all!
Using Quartz 2.5.0 in jakarta ee 10 project running inside wildfly 31 using mariadb:11.4.5-noble cluster (3 nodes) with innodb support.
Have multiple schedulers, each one is have it's own table. On of the schedulers have 100+ jobs which are cron jobs, running ever x hour and y minutes. When multiple jobs firing at the same time e.g.: 20 jobs at 12:05 some of them throwing this error:
[org.quartz.core.ErrorLogger] (auto_QuartzSchedulerThread) An error occurred while scanning for the next triggers to fire.: org.quartz.JobPersistenceException: Couldn't commit jdbc connection. (conn=607) Deadlock found when trying to get lock; try restarting transaction [See nested exception: java.sql.SQLTransactionRollbackException: (conn=607) Deadlock found when trying to get lock; try restarting transaction]
(auto_QuartzSchedulerThread) Error: 1213-40001: Deadlock found when trying to get lock; try restarting transaction.
DB tables created using the https://github.com/quartz-scheduler/quartz/blob/main/quartz/src/main/resources/org/quartz/impl/jdbcjobstore/tables_mysql_innodb.sql file.
properties file is:
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreCMT
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.dataSource=quartzDataSource
org.quartz.dataSource.quartzDataSource.jndiURL=java:jboss/datasources/QuartzDs
org.quartz.jobStore.nonManagedTXDataSource=qzDS
org.quartz.dataSource.qzDS.jndiURL=java:jboss/datasources/QuartzDsNoneManaged
org.quartz.jobStore.tablePrefix=AUTO_
org.quartz.threadPool.threadCount=1000
org.quartz.jobStore.isClustered=true
org.quartz.scheduler.instanceId=AUTO
org.quartz.scheduler.instanceName=auto
org.quartz.jobStore.clusterCheckinInterval=150000
org.quartz.jobStore.misfireThreshold=10000
# https://github.com/quartz-scheduler/quartz/issues/1084
org.quartz.jobStore.acquireTriggersWithinLock=true
org.quartz.jobStore.txIsolationLevelReadCommitted=true
acquireTriggersWithinLock setted because of #1084 and txIsolationLevelReadCommitted found in issue and the documentation
wildfly datasource config:
<datasource jndi-name="java:jboss/datasources/QuartzDs" pool-name="QuartzDs" enabled="true" use-java-context="true">
<connection-url>jdbc:mariadb://xxx.xxx.xxx.xxx:3307/quartz?useSSL=false&autoReconnect=true&useUnicode=true&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC</connection-url>
<driver>mariadb</driver>
<pool>
<max-pool-size>1000</max-pool-size>
</pool>
<security user-name="xxxx" password="xxx"/>
<validation>
<check-valid-connection-sql>SELECT 1 FROM DUAL</check-valid-connection-sql>
<background-validation>true</background-validation>
<background-validation-millis>5000</background-validation-millis>
</validation>
</datasource>
<datasource jta="false" jndi-name="java:jboss/datasources/QuartzDsNoneManaged" pool-name="QuartzDsNoneManaged" enabled="true" use-java-context="true">
<connection-url>jdbc:mariadb://xxx.xxx.xxx.xxx:3307/quartz?useSSL=false&autoReconnect=true&useUnicode=true&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC</connection-url>
<driver>mariadb</driver>
<pool>
<max-pool-size>1000</max-pool-size>
</pool>
<security user-name="xxxx" password="xxx"/>
<validation>
<check-valid-connection-sql>SELECT 1 FROM DUAL</check-valid-connection-sql>
<background-validation>true</background-validation>
<background-validation-millis>5000</background-validation-millis>
</validation>
</datasource>
Can't use READ_COMMITED isolation level for the whole cluster because of this: Warning When using Galera Cluster in primary-replica mode, all four levels are available to you, to the extent that MySQL supports it. In multi-primary mode, however, you can only use the REPEATABLE-READ level.
, written in the official galera documentation.
Tried to use <transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
in wildfly datasource, but the problem still exists.
Looked all of the opened and closed issues which has the same deadlock title or content but haven't found out the solution yet.
Sometimes in worst case deadlock the jobs got the state where they need to be manually reinitialized (have a scheduler which scanning all jobs which got this state and do a reinit).
Also when deadlock appears the job started duplicated so duplicated results be generated which is a big problem, in comment I attached a log
Is there any solution or advice where do I debug/look to solve this problem?
Thank you in advance.