-
Notifications
You must be signed in to change notification settings - Fork 2
Troubleshooting biocache‐service
biocache-service
does not start correctly when they dependencies are not running or reacheable. A typical error message looks like:
{
"message": "Error creating bean with name 'qidCacheDao' defined in file [/var/lib/tomcat7/webapps-records-ws.l-a.site/ROOT/WEB-INF/classes/au/org/ala/biocache/dao/QidCacheDAOImpl.class]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [au.org.ala.biocache.dao.QidCacheDAOImpl]: Constructor threw exception; nested exception is java.lang.NoClassDefFoundError: Could not initialize class au.org.ala.biocache.Config$",
"errorType": "Server error"
}
This wiki page tries to help you to fix this startup error with different checks.
In this example of commands, solr is installed in ala-install-test-2 and biocache-service in ala-install-test-1.
Check the solr service status, it should look like:
root@ala-install-test-2:~# service solr status
* solr.service - LSB: Controls Apache Solr as a Service
Loaded: loaded (/etc/init.d/solr; generated)
Active: active (exited) since Thu 2021-12-02 10:52:59 UTC; 6h ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/solr.service
Dec 02 11:51:36 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:41 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:41 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:41 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:51:44 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:54:38 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:54:39 ala-install-test-2 systemd[1]: solr.service: Failed to reset devices.list: Operation not permitted
Check if the port is listenning:
root@ala-install-test-2:~# lsof -i:8983
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 29838 solr 137u IPv6 757352331 0t0 TCP ala-install-test-2:8983 (LISTEN
If not, check the memory of the VM and the logs to verify why solr is not running.
If it's running, check if is accessible from the biocache-service
VM (in our example ala-install-test-1
):
root@ala-install-test-1:~# grep solr.home /data/biocache/config/biocache-config.properties
solr.home=http://index-es.l-a.site:8983/solr/biocache
and now we'll try to connect in a similar way:
root@ala-install-test-1:~# nc index-es.l-a.site 8983 -v
Connection to index-es.l-a.site 8983 port [tcp/*] succeeded!
If is not reacheable, verify things like the name resolution:
root@ala-install-test-1:~# ping -c 1 index-es.l-a.site
PING ala-install-test-2 (10.10.10.152) 56(84) bytes of data.
64 bytes from ala-install-test-2 (10.10.10.152): icmp_seq=1 ttl=64 time=0.027 ms
--- ala-install-test-2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.027/0.027/0.027/0.000 ms
and:
root@ala-install-test-1:~# getent ahosts index-es.l-a.site
10.10.10.152 STREAM ala-install-test-2
10.10.10.152 DGRAM
10.10.10.152 RAW
and if you have some firewall between the VMs allow the traffic for solr (8983/tcp) and/or zookeeper (2181/tcp).
You have to verify that the solr cores were created by ala-install
:
root@ala-install-test-2:~# ls -l /data/solr/data/
total 20
drwxr-xr-x 4 solr solr 4096 Dec 2 10:53 bie
drwxr-xr-x 4 solr solr 4096 Dec 2 10:53 bie-offline
drwxr-xr-x 4 solr solr 4096 Dec 2 10:53 biocache
-rw-r----- 1 solr solr 2180 Dec 2 10:51 solr.xml
-rw-r----- 1 solr solr 975 Dec 2 10:51 zoo.cfg
You can verify the cores in the solr interface using the solr admin interface that should be protected.
You should check similar things with cassandra, in our example is running in ala-install-test-3
:
root@ala-install-test-3:~# service cassandra status
* cassandra.service - LSB: distributed storage system for structured data
Loaded: loaded (/etc/init.d/cassandra; generated)
Active: active (running) since Thu 2021-12-02 10:50:26 UTC; 6h ago
Docs: man:systemd-sysv-generator(8)
Tasks: 58 (limit: 4915)
CGroup: /system.slice/cassandra.service
`-29012 /usr/bin/java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouc
Dec 02 11:28:44 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:54 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:54 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:57 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:57 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:57 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:59 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:59 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:29:59 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Dec 02 11:30:02 ala-install-test-3 systemd[1]: cassandra.service: Failed to reset devices.list: Operation not permitted
Let's see if the port 9042
is listenning:
root@ala-install-test-3:~# lsof -i:9042
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 29012 cassandra 85u IPv4 757303579 0t0 TCP *:9042 (LISTEN)
java 29012 cassandra 89u IPv4 759281357 0t0 TCP ala-install-test-3:9042->ala-install-test-1:59370 (ESTABLISHED)
java 29012 cassandra 90u IPv4 759281428 0t0 TCP ala-install-test-3:9042->ala-install-test-1:59380 (ESTABLISHED)
and also if is reacheable from the biocache-service
VM:
root@ala-install-test-1:~# grep cassandra.host /data/biocache/config/*
/data/biocache/config/biocache-config.properties:# cassandra hosts - this should be comma separated list in the case of a cluster
/data/biocache/config/biocache-config.properties:cassandra.hosts=ala-install-test-3
root@ala-install-test-1:~# nc ala-install-test-3 9042 -v
Connection to ala-install-test-3 9042 port [tcp/*] succeeded!
If not, check again dns resolution and firewall rules to allow this tcp traffic.
Let's check is a quick way if other services more than solr and cassandra are up an reacheable:
root@ala-install-test-1:~# for i in $(grep https /data/biocache/config/biocache-config.properties | cut -d "=" -f 2 | grep -v zip | sort | uniq) ; do echo; echo $i ----; curl --write-out '%{http_code}' --silent --output /dev/null $i; done
https://auth-es.l-a.site/userdetails ----
302
https://auth.ala.org.au/apikey/ws/check?apikey ----
200
https://colecciones-es.l-a.site/ws ----
200
https://colecciones-es.l-a.site/ws/citations ----
200
https://colecciones-es.l-a.site/ws/collection ----
200
https://dataquality.ala.org.au/ ----
000
https://doi-es.l-a.site ----
200
https://doi-es.l-a.site/api/ ----
302
https://doi-es.l-a.site/doi/ ----
200
https://doi-es.l-a.site/myDownloads ----
302
https://espacial-es.l-a.site/geoserver ----
302
https://espacial-es.l-a.site/ws ----
302
https://espacial-es.l-a.site/ws/fields ----
200
https://especies-ws-es.l-a.site ----
200
https://imagenes-es.l-a.site ----
200
https://listas-es.l-a.site ----
302
https://logger-es.l-a.site/service/logger/ ----
200
https://registros-es.l-a.site ----
200
https://registros-es.l-a.site/download/doi?doi ----
302
https://registros-ws-es.l-a.site ----
200
https://registros-ws-es.l-a.site/biocache-download ----
301
https://registros-ws-es.l-a.site/biocache-media/ ----
404
If you see some 500 error or more 404 errors, verify that services.
If you suffer so power outage and your VMs restart, sometimes biocache-service starts before their dependencies are up, failling to start. This also happens if you have many services in the same VM and biocache-service
starts before others. In this case, try to restart only this biocache-service
. In this case a simple touch should be enough:
root@ala-install-test-1:~# touch /var/lib/tomcat8/webapps-records-ws.l-a.site/ROOT.war
Index
- Wiki home
- Community
- Getting Started
- Support
- Portals in production
- ALA modules
- Demonstration portal
- Data management in ALA Architecture
- DataHub
- Customization
- Internationalization (i18n)
- Administration system
- Contribution to main project
- Study case