Skip to content

Online Blazegraph backup corrupts the blazegraph.jnl file #3886

Open
@Valcyclovir

Description

@Valcyclovir

Issue description

The Blazegraph Backup API (/blazegraph/backup) produces a corrupted blazegraph.jnl file when used for online backups while OTNode (OriginTrail V8 Node) is running. The backup file, generated with block=true and with or without compress=true, fails to restore properly, resulting in a java.lang.IllegalStateException: Invalid data checksum error when starting Blazegraph with the restored journal. This causes OTNode to fail with a "Cannot connect to Triple store" error and Blazegraph to return HTTP 503 Service Unavailable for SPARQL queries.

Expected behavior

The Backup API should produce a consistent, uncorrupted blazegraph.jnl file that can be restored to a functional Blazegraph instance, allowing OTNode to connect to the triple store and SPARQL queries to execute without errors.

Actual behavior

The backup file (blazegraph-backup.jnl or blazegraph-backup.jnl.gz) is corrupted. When used to replace the active blazegraph.jnl, Blazegraph fails to start, logging a java.lang.IllegalStateException: Invalid data checksum from address: 72130541568, size: 1104. OTNode reports "Cannot connect to Triple store (OtBlazegraph), repository: privateCurrent, located at: http://localhost:9999/ retry number: 2/10". SPARQL queries to http://localhost:9999/blazegraph/namespace/dkg/sparql return HTTP 503 Service Unavailable.

Steps to reproduce the problem

  1. Restart Blazegraph and OTNode
systemctl restart blazegraph otnode
  1. Run the Backup API command:
BLAZE_URL="http://localhost:9999/blazegraph/backup?block=true&compress=true"
BLAZE_OUTPUT_FILE="/root/blazegraph-backup.jnl.gz"
curl -X POST --data-urlencode "file=${BLAZE_OUTPUT_FILE}" "${BLAZE_URL}"

Alternatively, do not compress:

BLAZE_URL="http://localhost:9999/blazegraph/backup?block=true"
BLAZE_OUTPUT_FILE="/root/blazegraph-backup.jnl"
curl -X POST --data-urlencode "file=${BLAZE_OUTPUT_FILE}" "${BLAZE_URL}"

If compressed, decompress the backup:

gunzip /root/blazegraph-backup.jnl.gz
  1. Stop Blazegraph and OTNode, replace the active blazegraph.jnl with blazegraph-backup.jnl
systemctl stop blazegraph otnode
mv /root/ot-node/blazegraph.jnl /root/ot-node/blazegraph.jnl.bak
mv /root/blazegraph-backup.jnl /root/ot-node/blazegraph.jnl
  1. Restart both services
systemctl restart blazegraph 
sleep 5s
systemctl restart otnode
  1. Observe OTNode error: "Cannot connect to Triple store (OtBlazegraph), repository: privateCurrent, located at: http://localhost:9999/ retry number: 2/10".
  2. Run a SPARQL query:
curl -X POST http://localhost:9999/blazegraph/namespace/dkg/sparql -H "Content-Type: application/sparql-query" --data 'SELECT (COUNT(*) AS ?totalTriples) WHERE { ?s ?p ?o }'

Observe response: HTTP 503 Service Unavailable.

Specifications

Node version: OriginTrail node v8.0.11
Platform: Ubuntu 24.04 LTS
Node wallet: 0xe5Cc7fd75E87fD26EB6557236FE29566365Ba267
Node libp2p identity: 37

Error logs

Blazegraph logs (after restoring backup and restarting):

May 14 17:33:15 othub3 java[11878]: ERROR: Banner.java:134: Could not resolve name for host: java.net.UnknownHostException: othub3: othub3: Name or service not known
May 14 17:33:15 othub3 java[11878]: WARN : Banner.java:136: Falling back to null
May 14 17:33:15 othub3 java[11878]: WARN : NanoSparqlServer.java:517: Starting NSS
May 14 17:33:15 othub3 java[11878]: WARN : WebAppContext.java:554: Failed startup of context o.e.j.w.WebAppContext@5b94b04d{Bigdata,/blazegraph,jar:file:/root/ot-node/blazegraph.jar!/war,UNAVAILABLE}{jar:file:/root/ot-node/blazegraph.jar!/war}
May 14 17:33:15 othub3 java[11878]: java.lang.RuntimeException: java.lang.RuntimeException: addr=-19608250 : cause=java.lang.IllegalStateException: Invalid data checksum from address: 72130541568, size: 1104
May 14 17:33:15 othub3 java[11878]: at com.bigdata.rdf.sail.webapp.BigdataRDFServletContextListener.openIndexManager(BigdataRDFServletContextListener.java:816)
...
Caused by: java.lang.IllegalStateException: Invalid data checksum from address: 72130541568, size: 1104
May 14 17:33:15 othub3 java[11878]: at com.bigdata.rwstore.RWStore.getData(RWStore.java:2378)
...

SPARQL query response:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 503 Service Unavailable</title>
</head>
<body><h2>HTTP ERROR 503</h2>
<p>Problem accessing /blazegraph/namespace/dkg/sparql. Reason:
<pre>    Service Unavailable</pre></p><hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/>
</body>
</html>

OTNode error:

Cannot connect to Triple store (OtBlazegraph), repository: privateCurrent, located at: http://localhost:9999 retry number: 2/10

Disclaimer

Please be aware that the issue reported on a public repository allows everyone to see your node logs, node details, and contact details. If you have any sensitive information, feel free to share it by sending an email to tech@origin-trail.com (mailto:tech@origin-trail.com).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions