Worker task for test analytics export #582

adrianviquez · 2025-11-22T01:27:57Z

This is the worker task for the TA data export.

This task fetches test run data for a provided integration. This will upload a tarfile to a passed bucket.

This is the 2nd part for this #563 endpoint. This should be merged after 563 to ease PR review.

https://linear.app/getsentry/issue/CCMRG-1901/ta-build-codecov-worker-task-for-data-export

Note

Adds a worker task to stream/export test analytics runs per repo to JSON and upload as chunked tar.gz files to GCS, with a shared archiver utility and tests.

Worker:
- New task tasks/export_test_analytics_data.py: streams Testrun data per repo (compact JSON arrays) and uploads via _Archiver to GCS (tar.gz chunks) with success/fail tracking and Sentry reporting.
- Registers task in tasks/__init__.py.
Shared:
- New utility shared/storage/data_exporter.py _Archiver: manages chunked tar.gz creation and GCS uploads; supports _add_file and upload_json.
Tests:
- Add unit tests covering serialization, success/failure/mixed repository processing, and multiple test runs (tasks/tests/unit/test_export_test_analytics_data.py).
Dependencies:
- Add google-cloud-storage to libs/shared/pyproject.toml and root pyproject.toml.

^{Written by Cursor Bugbot for commit e515b84. This will update automatically on new commits. Configure here.}

…ndpoints-for-ta-data-export

apps/codecov-api/api/sentry/views.py

apps/worker/tasks/export_test_analytics_data.py

libs/shared/shared/storage/data_exporter.py

sentry · 2025-11-22T01:35:11Z

Codecov Report

❌ Patch coverage is 60.76923% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.79%. Comparing base (240de92) to head (e515b84).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
libs/shared/shared/storage/data_exporter.py	0.00%	51 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
- Coverage   93.87%   93.79%   -0.08%     
==========================================
  Files        1284     1286       +2     
  Lines       46528    46791     +263     
  Branches     1522     1525       +3     
==========================================
+ Hits        43676    43887     +211     
- Misses       2542     2594      +52     
  Partials      310      310

Flag	Coverage Δ
apiunit	`96.57% <ø> (+0.02%)`	⬆️
sharedintegration	`38.64% <0.00%> (-0.15%)`	⬇️
sharedunit	`88.44% <0.00%> (-0.33%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov-notifications · 2025-11-22T01:35:18Z

Codecov Report

❌ Patch coverage is 0% with 51 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
libs/shared/shared/storage/data_exporter.py	0.00%	51 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

codecov-eu · 2025-11-22T01:35:33Z

Codecov Report

❌ Patch coverage is 60.76923% with 51 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
libs/shared/shared/storage/data_exporter.py	0.00%	51 Missing ⚠️

📢 Thoughts on this report? Let us know!

codspeed-hq · 2025-11-22T01:36:39Z

CodSpeed Performance Report

Merging #582 will not alter performance

_{Comparing worker-task-for-test-analytics-export (e515b84) with main (7383d2c)¹}

Summary

✅ 9 untouched

No successful run was found on main (a2f8b4d) during the generation of this report, so 7383d2c was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

…r-task-for-test-analytics-export

adrianviquez · 2025-12-01T19:01:12Z

apps/worker/tasks/export_test_analytics_data.py

+        gcs_client = storage.Client(project=gcp_project_id)
+        bucket = gcs_client.bucket(destination_bucket)


Okay for now

libs/shared/shared/storage/data_exporter.py

apps/worker/tasks/export_test_analytics_data.py

calvin-codecov · 2025-12-01T21:39:27Z

apps/worker/tasks/export_test_analytics_data.py

+                            .values(*fields)
+                        )
+                        test_runs_data = [
+                            serialize_test_run(tr) for tr in list(test_runs_qs)


For repos with a large number of test runs, this seems like it could take up a lot of memory. Thoughts on possibly batching the processing of test runs if there are lot?

Added batching and writing to a temp file to help w/ memory during runtime

libs/shared/shared/storage/data_exporter.py

calvin-codecov · 2025-12-02T18:48:26Z

apps/worker/tasks/export_test_analytics_data.py

+                        with open(json_file_path, "rb") as f:
+                            archiver._add_file(blob_name, f)
+
+                        pathlib.Path(json_file_path).unlink()


i think we should have a try here to handle still running .unlink even if we run into a problem uploading the file

Wrapped the add_file call on a try/finally so that file cleanup happens. Also added missing_ok=True in case the temp file isnt created for some reason and we prevent an exception from it.

…r-task-for-test-analytics-export

apps/worker/tasks/export_test_analytics_data.py

libs/shared/shared/storage/data_exporter.py

cursor · 2025-12-02T22:01:41Z

apps/worker/tasks/export_test_analytics_data.py

+            "integration_name": integration_name,
+            "repositories_succeeded": repositories_succeeded,
+            "repositories_failed": repositories_failed,
+        }


Bug: Inconsistent return structure missing successful field

The error return paths include a "successful": False field, but the success return path at lines 204-209 omits the successful key entirely. This creates an inconsistent API contract where callers cannot reliably check result["successful"] - it would raise a KeyError on success. Other similar tasks in the codebase consistently include "successful": True in their success returns.

Additional Locations (2)

apps/worker/tasks/export_test_analytics_data.py#L82-L86

apps/worker/tasks/export_test_analytics_data.py#L95-L99

…open

sentry · 2025-12-04T23:58:37Z

libs/shared/shared/storage/data_exporter.py

+    def _upload(self) -> None:
+        self.archive.close()
+        blob_name = str(self.prefix / f"{self.counter}.tar.gz")
+        blob = self.bucket.blob(blob_name)
+        blob.upload_from_filename(self.current_archive, content_type="application/gzip")


Bug: Missing error handling for _upload() in _Archiver.__exit__() can crash the task.
_{Severity: CRITICAL | Confidence: High}

🔍 Detailed Analysis

The _Archiver.__exit__() method calls self._upload() at line 70 without any error handling. If _upload() (defined at lines 35-39) fails during the final archive upload (e.g., due to network issues, GCS permissions, or bucket not found), the exception will propagate uncaught. This will crash the task, leading to unexpected server failures and potential data loss for successfully processed repositories.

💡 Suggested Fix

Wrap the self._upload() call within the _Archiver.__exit__() method in a try-except block to catch and handle potential exceptions during GCS upload, preventing task crashes.

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: libs/shared/shared/storage/data_exporter.py#L35-L39 Potential issue: The `_Archiver.__exit__()` method calls `self._upload()` at line 70 without any error handling. If `_upload()` (defined at lines 35-39) fails during the final archive upload (e.g., due to network issues, GCS permissions, or bucket not found), the exception will propagate uncaught. This will crash the task, leading to unexpected server failures and potential data loss for successfully processed repositories.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 5661483}

cursor · 2025-12-05T00:01:32Z

libs/shared/shared/storage/data_exporter.py

+        self.entries += 1
+        if self.current_archive.stat().st_size >= ARCHIVE_SIZE_THRESHOLD:
+            self._upload()
+            self._next_chunk()


Bug: Failed upload leaves archiver in broken state

When the archive size exceeds ARCHIVE_SIZE_THRESHOLD, _upload() is called which closes the archive before uploading. If the upload fails (e.g., network error during upload_from_filename), the exception propagates without calling _next_chunk(), leaving self.archive pointing to a closed tarfile. In the task code, this exception is caught and processing continues, but subsequent _add_file() calls will fail on self.archive.addfile() since the archive is closed. This causes cascading failures for all remaining repositories after a single upload failure.

adrianviquez added 9 commits November 14, 2025 16:10

feat: add post/get endpoints to create/fetch ta export data

8c8d351

adjust log lines

80e18ec

address feedback + tests

4d64152

adjust exceptions

0d49f2f

add new task to test_celery_config test

4c43d61

feat: add worker task to export TA data

98441a4

Merge branch 'main' of https://github.com/codecov/umbrella into api-e…

c7a1808

…ndpoints-for-ta-data-export

revert changes meant for other branch

c9f62f2

feat: add worker task to export TA data

c626ee1

github-advanced-security bot found potential problems Nov 22, 2025

View reviewed changes

apps/codecov-api/api/sentry/views.py Fixed Show fixed Hide fixed

cursor bot reviewed Nov 22, 2025

View reviewed changes

apps/worker/tasks/export_test_analytics_data.py Show resolved Hide resolved

libs/shared/shared/storage/data_exporter.py Show resolved Hide resolved

Merge branch 'main' of https://github.com/codecov/umbrella into worke…

2631140

…r-task-for-test-analytics-export

sentry bot reviewed Dec 1, 2025

View reviewed changes

calvin-codecov reviewed Dec 1, 2025

View reviewed changes

apps/worker/tasks/export_test_analytics_data.py Show resolved Hide resolved

calvin-codecov reviewed Dec 1, 2025

View reviewed changes

adrianviquez added 2 commits December 1, 2025 14:55

feat: stream data into a temp file while batching every 10000 records

c6776d6

adjust test to take repo_name instead of name in log call

3e03dcb

sentry bot reviewed Dec 1, 2025

View reviewed changes

libs/shared/shared/storage/data_exporter.py Show resolved Hide resolved

calvin-codecov reviewed Dec 2, 2025

View reviewed changes

adrianviquez added 2 commits December 2, 2025 13:21

Merge branch 'main' of https://github.com/codecov/umbrella into worke…

9a75241

…r-task-for-test-analytics-export

add missing_ok=true to file path

5036f3b

cursor bot reviewed Dec 2, 2025

View reviewed changes

apps/worker/tasks/export_test_analytics_data.py Outdated Show resolved Hide resolved

wrap file addition call in try/finally to ensure file is cleanedup

4fb1df8

sentry bot reviewed Dec 2, 2025

View reviewed changes

libs/shared/shared/storage/data_exporter.py Show resolved Hide resolved

cursor bot reviewed Dec 2, 2025

View reviewed changes

closer the connection regardless to make sure the tarfile isn't left …

e515b84

…open

calvin-codecov approved these changes Dec 4, 2025

View reviewed changes

sentry bot reviewed Dec 4, 2025

View reviewed changes

cursor bot reviewed Dec 5, 2025

View reviewed changes

		gcs_client = storage.Client(project=gcp_project_id)
		bucket = gcs_client.bucket(destination_bucket)

Worker task for test analytics export #582

Are you sure you want to change the base?

Worker task for test analytics export #582

Conversation

adrianviquez commented Nov 22, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sentry bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-notifications bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-eu bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #582 will not alter performance

Summary

Footnotes

Uh oh!

This comment was marked as outdated.

Uh oh!

adrianviquez Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

calvin-codecov Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

adrianviquez Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

calvin-codecov Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

adrianviquez Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 2, 2025

Choose a reason for hiding this comment

Bug: Inconsistent return structure missing successful field

Uh oh!

sentry bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 5, 2025

Choose a reason for hiding this comment

Bug: Failed upload leaves archiver in broken state

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adrianviquez commented Nov 22, 2025 •

edited by cursor bot

Loading

sentry bot commented Nov 22, 2025 •

edited

Loading

codecov-notifications bot commented Nov 22, 2025 •

edited

Loading

codecov-eu bot commented Nov 22, 2025 •

edited

Loading

codspeed-hq bot commented Nov 22, 2025 •

edited

Loading

adrianviquez Dec 2, 2025 •

edited

Loading