Skip to content

Conversation

@weiqianwang123
Copy link

Summary

Some beets commands such as:

beet modify -a onplayer=true queen

update only custom fields that are not mapped to media tags.
However, the current implementation still performs MediaFile.save(), causing:

unnecessary file writes

updated mtimes

20–30 second delays on large libraries

slow iPod sync / alternative device workflows

Root Cause

Item.write() filters _media_fields, but it always writes even if:

none of the media-tag fields changed

only flexible/custom database fields changed

Fix

Before writing, compute whether any media-tag-backed field changed.
If no media tag differences exist, skip the write entirely.

This preserves existing behavior for:

imports

replaygain

zero / scrub

metadata updates

Testing

A 0.5s sine-wave mp3 was generated via ffmpeg:

ffmpeg -f lavfi -i sine=frequency=440:duration=0.5 tiny.mp3
beet import tiny.mp3

Then:

Should NOT write to file

beet modify -a onplayer=true tiny

Should write to file

beet modify title="new title" tiny

Screenshots attached.

Performance improvement

Before: 25 seconds
After: <1 second
(repeated 200× batch modify)

DCO

All commits signed-off.

Related Issue: #5668

This patch prevents Item.write() from performing a mediafile.save()
when no writable media tag fields have changed. Without this change,
commands such as 'beet modify -a onplayer=true' trigger unnecessary
file writes, updating mtimes and causing significant slowdowns.

Fix verified with a tiny mp3 test case.

Signed-off-by: Qianwei Wang <qweiw@umich.edu>
Signed-off-by: weiqianwang123 <1416740298@qq.com>
@weiqianwang123 weiqianwang123 requested a review from a team as a code owner December 8, 2025 02:17
Copilot AI review requested due to automatic review settings December 8, 2025 02:17
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Loading original = Item.from_path(self.path) unconditionally in try_sync adds extra I/O even when write is False; consider only constructing original when write is True and short‑circuiting earlier if not.
  • Using Item.from_path(self.path) to detect changes may throw if the file is missing or unreadable, whereas try_write previously handled its own errors; consider wrapping this in similar error handling or falling back to writing when comparison fails.
  • Instead of reconstructing the original item from disk for _media_tags_changed, consider comparing against the pre‑modify database state (e.g., via self._db.get_item(self.id)) to avoid extra file reads and keep the change detection purely metadata‑based.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Loading `original = Item.from_path(self.path)` unconditionally in `try_sync` adds extra I/O even when `write` is False; consider only constructing `original` when `write` is True and short‑circuiting earlier if not.
- Using `Item.from_path(self.path)` to detect changes may throw if the file is missing or unreadable, whereas `try_write` previously handled its own errors; consider wrapping this in similar error handling or falling back to writing when comparison fails.
- Instead of reconstructing the original item from disk for `_media_tags_changed`, consider comparing against the pre‑modify database state (e.g., via `self._db.get_item(self.id)`) to avoid extra file reads and keep the change detection purely metadata‑based.

## Individual Comments

### Comment 1
<location> `beets/library/models.py:1016` </location>
<code_context>
         (conditionally).
         """
-        if write:
+        original = Item.from_path(self.path)
+
+        # only write tags if media tags changed
</code_context>

<issue_to_address>
**issue:** Handle failures when loading `original` via `Item.from_path` to avoid breaking `try_sync` on IO/tagging errors.

Because `try_sync` now calls `Item.from_path(self.path)`, it can fail due to transient or unrelated IO/tagging issues (missing file, permissions, corrupt metadata) and abort the entire sync. Consider wrapping `Item.from_path` in a try/except and, on failure, either skip the change-detection optimization (always write) or log and continue, so `try_sync` degrades gracefully instead of raising unexpectedly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

(conditionally).
"""
if write:
original = Item.from_path(self.path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Handle failures when loading original via Item.from_path to avoid breaking try_sync on IO/tagging errors.

Because try_sync now calls Item.from_path(self.path), it can fail due to transient or unrelated IO/tagging issues (missing file, permissions, corrupt metadata) and abort the entire sync. Consider wrapping Item.from_path in a try/except and, on failure, either skip the change-detection optimization (always write) or log and continue, so try_sync degrades gracefully instead of raising unexpectedly.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to optimize the modify command by avoiding unnecessary file writes when only non-media fields (custom/flexible database fields) are changed. The goal is to reduce I/O operations, prevent unnecessary mtime updates, and improve performance significantly (from 25 seconds to <1 second for 200 batch modifications).

Key changes:

  • Added _media_tags_changed() method to compare media tag fields between two Item instances
  • Modified try_sync() to conditionally skip file writes when only database-only fields are modified
  • Introduced logic to detect whether media tag fields have actually changed before writing to disk

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

(conditionally).
"""
if write:
original = Item.from_path(self.path)
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Item.from_path() can raise a ReadError exception if the file is unreadable, missing, or corrupted. This exception is not caught, which would cause try_sync() to fail unexpectedly.

This is particularly problematic because:

  1. The method is called try_sync(), implying it should handle errors gracefully
  2. File corruption or deletion could occur between modification and sync
  3. The original code path didn't read from disk at this point

Consider wrapping this in a try-except block, or using a different approach that doesn't require re-reading the file.

Suggested change
original = Item.from_path(self.path)
try:
original = Item.from_path(self.path)
except ReadError as exc:
log.error("{}", exc)
return False

Copilot uses AI. Check for mistakes.
Comment on lines +1016 to 1020
original = Item.from_path(self.path)

# only write tags if media tags changed
if write and self._media_tags_changed(original, self):
self.try_write()
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new optimization logic in try_sync() lacks test coverage. Consider adding tests to verify:

  1. Skipping writes when only flexible/custom fields are modified (e.g., onplayer=true)
  2. Performing writes when media tag fields are modified (e.g., title="new title")
  3. Handling edge cases like missing or corrupted files when checking for changes
  4. Performance improvement is achieved (optional: benchmark test)

The existing WriteTest class in test/test_library.py would be a good place to add these tests.

Copilot uses AI. Check for mistakes.
return False

def _media_tags_changed(self, original, modified):
for f in self._media_tag_fields: # this is defined in the class already
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _media_tags_changed method lacks documentation. Consider adding a docstring to explain:

  • What the method does
  • The parameters original and modified (expected to be Item instances)
  • Return value (boolean indicating if any media tag fields differ)

Additionally, the inline comment "this is defined in the class already" is unnecessary and can be removed.

Suggested change
for f in self._media_tag_fields: # this is defined in the class already
"""
Determine if any media tag fields differ between two Item instances.
Parameters:
original (Item): The original Item instance.
modified (Item): The modified Item instance.
Returns:
bool: True if any media tag fields differ, False otherwise.
"""
for f in self._media_tag_fields:

Copilot uses AI. Check for mistakes.
(conditionally).
"""
if write:
original = Item.from_path(self.path)
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Item.from_path() here defeats the performance optimization. from_path() creates a new Item and calls read(), which opens and parses the entire media file using MediaFile(syspath(read_path)). This is the expensive I/O operation the PR aims to avoid.

Instead, consider tracking which fields have been modified since the last read/write. For example:

  • Track dirty fields in a set when fields are modified via __setitem__
  • Check if any dirty fields intersect with _media_tag_fields
  • Only write if media tag fields have been modified

Alternatively, compare self against the in-memory state before any modifications were made, rather than re-reading from disk.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant