Git provider
Other
System Info
- Deployment Type: Self-hosted Docker App
- Docker Image:
codiumai/pr-agent:0.34-gitea_app
- Git Provider: Gitea v1.25.5 (Self-hosted)
- Trigger Method: Gitea Webhook (
handle_gitea_webhooks)
- LLM Model:
gpt-5.4-nano
- Relevant Config:
patch_extension_skip_types correctly includes [".webp", ".mp3", ".png"], but the crash bypasses this mechanism.
Bug details
Describe the bug
When PR-Agent processes a Gitea webhook for a PR that replaces old media files with new binary formats (e.g., deleting existing .svg/.png and adding new .webp/.mp3 files), it crashes with a UnicodeDecodeError.
The crash occurs during the file content retrieval and diff generation phases (get_file_content and __add_file_diff). It appears the system attempts to decode the binary payload as utf-8 before the patch_extension_skip_types filter can effectively exclude these files.
Additionally, the Gitea API returns 404 Not Found (likely because the agent attempts to fetch the content of files that were just deleted or due to binary asset routing), which is not handled gracefully and leads to further cascade errors.
To Reproduce
- Set up PR-Agent using the
codiumai/pr-agent:0.34-gitea_app image.
- Create a Pull Request in Gitea (v1.25.5) that deletes old media files and adds new binary files (e.g., removing
q1bg.png and adding q1bg.webp).
- Trigger the PR-Agent via Gitea webhook.
- The agent triggers 404s for the assets, crashes due to UTF-8 decoding errors, and returns an empty prediction.
Expected behavior
The gitea_provider.py should catch UnicodeDecodeError (e.g., using errors='replace', checking mime types) and filter out ignored extensions before attempting to decode the raw content. API 404 errors (especially for deleted files) should also be handled without crashing the main process.
Relevant PR Diff (Example highlighting deletions and additions)
diff --git a/assets/images/screens/q1bg.png b/assets/images/screens/q1bg.png
deleted file mode 100644
index a35d62f..0000000
Binary files a/assets/images/screens/q1bg.png and /dev/null differ
diff --git a/assets/images/screens/q1bg.webp b/assets/images/screens/q1bg.webp
new file mode 100644
index 0000000..dd7b99b
Binary files /dev/null and b/assets/images/screens/q1bg.webp differ
Logs
# 1. 404 Error when fetching assets (Likely related to deleted or binary files)
file: /app/pr_agent/git_providers/gitea_provider.py
function: get_file_content (line 947)
ERROR: Error getting file: assets/images/screens/q1bg.webp, content: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'max-age=0, private, must-revalidate, no-transform', 'Content-Type': 'application/json;charset=utf-8', ...})
HTTP response body: b'{"errors":null,"message":"not found","url":"https://<gitea-host>/api/swagger"}\n'
# 2. UTF-8 Decoding Crash in get_file_content
file: /app/pr_agent/git_providers/gitea_provider.py
function: get_file_content (line 950)
ERROR: Unexpected error: 'utf-8' codec can't decode byte 0x86 in position 4: invalid start byte
# 3. UTF-8 Decoding Crash in __add_file_diff
file: /app/pr_agent/git_providers/gitea_provider.py
function: __add_file_diff (line 152)
ERROR: Error getting diff content: 'utf-8' codec can't decode byte 0x84 in position 2007: invalid start byte
# 4. Final failure
file: /app/pr_agent/tools/pr_description.py
WARNING: Empty prediction, PR: <repo>/<pr_id>
Git provider
Other
System Info
codiumai/pr-agent:0.34-gitea_apphandle_gitea_webhooks)gpt-5.4-nanopatch_extension_skip_typescorrectly includes[".webp", ".mp3", ".png"], but the crash bypasses this mechanism.Bug details
Describe the bug
When PR-Agent processes a Gitea webhook for a PR that replaces old media files with new binary formats (e.g., deleting existing
.svg/.pngand adding new.webp/.mp3files), it crashes with aUnicodeDecodeError.The crash occurs during the file content retrieval and diff generation phases (
get_file_contentand__add_file_diff). It appears the system attempts to decode the binary payload asutf-8before thepatch_extension_skip_typesfilter can effectively exclude these files.Additionally, the Gitea API returns
404 Not Found(likely because the agent attempts to fetch the content of files that were just deleted or due to binary asset routing), which is not handled gracefully and leads to further cascade errors.To Reproduce
codiumai/pr-agent:0.34-gitea_appimage.q1bg.pngand addingq1bg.webp).Expected behavior
The
gitea_provider.pyshould catchUnicodeDecodeError(e.g., usingerrors='replace', checking mime types) and filter out ignored extensions before attempting to decode the raw content. API404errors (especially for deleted files) should also be handled without crashing the main process.Relevant PR Diff (Example highlighting deletions and additions)
Logs