Add retry to VictoriaLogs data source #3654

thebondo · 2025-05-30T13:44:01Z

The current VictoriaLogs data source will exit if the data source goes away once a tail based connection has been established. This causes the whole crowdsec service to exit as well.

This update fixes the Tail method on the VictoriaLogs client so that it tries to reconnect with a backoff if the connection is lost.

Resolves #3653

If the original HTTP get for the tail endpoint is successful, but then the connection is lost, no retry is done. I updated the Tail method to also retry in this case.

Upon review, the added code was pretty different in the approach used to keep retrying compared to the approach for the QueryRange method. I updated the method to create a new doTail that has the same style as doQueryRange and updated Tail to use it. This has the following effects: - doTail will keep trying after losing a connection - the retry interval will grow (with an upper limit) and shrink (with a lower limit) as connections are made and broken - the time in the request is updated to avoid overlapping with previous data that was returned (missing in the first fix)

Bringing in changes from master.

Keeping up with changes from origin

The use of the ticker was unnecessary. I updated doTail to use a backoff interval with time.After.

github-actions · 2025-05-30T13:44:12Z

@thebondo: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.

/kind feature
/kind enhancement
/kind refactoring
/kind fix
/kind chore
/kind dependencies

Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions · 2025-05-30T13:44:13Z

@thebondo: There are no area labels on this PR. You can add as many areas as you see fit.

/area agent
/area local-api
/area cscli
/area appsec
/area security
/area configuration

Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

thebondo · 2025-05-30T13:44:41Z

/kind fix
/area agent

zekker6 · 2025-07-11T04:47:57Z

Hello @thebondo, do you think you will have some time to continue work on this PR?
If not, I can take over and create a PR with your commit and get it ready for review (attributing you as an original author).

thebondo · 2025-07-17T06:21:52Z

Hello @zekker6. Yes, I can still contribute on this. I have a question about the approach for the final fix.

As I understand it, the tail endpoint is expected to stay open and report log entries when received (after some possible delay, which helps keep entries sorted by time). However, the connection is sometimes lost. Ideally, the tail consumer would like to see all log entries exactly once. However, there does not seem to be a good way to start a new tail query that will both

return all log entries received after that last query disconnected
not return any log entries received before the last query disconnected

The solution I have so far is to keep track of the latest timestamp seen while the tail connection is open. If the connection is lost, then compute start_offset based on the time of the new query, the latest timestamp, and possible a small constant to account for potential delays in the request process.

This has the drawback that it can potentially provide a log entry more than once. From looking at the VictoriaLogs source, the start time for the tail interval appears to be calculated as time.Now() - offset - start_offset. That means that the consumer cannot specify the start time exactly because it involves the time the request is actually handled by VictoriaLogs (which is the reason for adding a small constant to start_offset).

Does this all sound correct to you? If so, I will update the code to use start_offset. I will also remove the use of the limit argument, which is not documented for the tail endpoint and does not appear to do anything either.

Keeping the fork up to date with the origin repository.

The tail query endpoint does not support start, but start_offset. I updated the doTail method to use this parameter, and calculate the required value each time the query is attempted based on the desired start time for the results returned from the query.

zekker6 · 2025-07-17T10:11:07Z

@thebondo Thank you for an update!

Does this all sound correct to you? If so, I will update the code to use start_offset. I will also remove the use of the limit argument, which is not documented for the tail endpoint and does not appear to do anything either.

Yes, this sounds correct to me.

thebondo added 5 commits May 16, 2025 15:22

update the client to retry the connection

4dbeffb

If the original HTTP get for the tail endpoint is successful, but then the connection is lost, no retry is done. I updated the Tail method to also retry in this case.

Merge branch 'master' into victorialogs-retry-bug

823b16a

Bringing in changes from master.

Merge branch 'master' into victorialogs-retry-bug

80a4483

Keeping up with changes from origin

Updated tail method

b1ff107

The use of the ticker was unnecessary. I updated doTail to use a backoff interval with time.After.

github-actions bot added the needs/kind label May 30, 2025

github-actions bot added the needs/area label May 30, 2025

github-actions bot added kind/fix area/agent and removed needs/kind needs/area labels May 30, 2025

thebondo marked this pull request as ready for review May 30, 2025 13:54

thebondo marked this pull request as draft May 30, 2025 13:54

thebondo added 2 commits July 17, 2025 06:31

Merge branch 'master' into victorialogs-retry-bug

05a564c

Keeping the fork up to date with the origin repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retry to VictoriaLogs data source #3654

Add retry to VictoriaLogs data source #3654

Uh oh!

thebondo commented May 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

thebondo commented May 30, 2025

Uh oh!

zekker6 commented Jul 11, 2025

Uh oh!

thebondo commented Jul 17, 2025

Uh oh!

zekker6 commented Jul 17, 2025

Uh oh!

Uh oh!

Add retry to VictoriaLogs data source #3654

Are you sure you want to change the base?

Add retry to VictoriaLogs data source #3654

Uh oh!

Conversation

thebondo commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

thebondo commented May 30, 2025

Uh oh!

zekker6 commented Jul 11, 2025

Uh oh!

thebondo commented Jul 17, 2025

Uh oh!

zekker6 commented Jul 17, 2025

Uh oh!

Uh oh!

thebondo commented May 30, 2025 •

edited

Loading