-
Notifications
You must be signed in to change notification settings - Fork 2k
Empty metadata support for autotagger plugins #6065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `test/plugins/test_musicbrainz.py:1035-1040` </location>
<code_context>
+ ("Artist", "Title", 1),
+ (None, "Title", 1),
+ ("Artist", None, 1),
+ (None, None, 0),
+ ],
+ )
+ def test_item_candidates(
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test case for empty strings as artist/title.
Please include test cases with empty strings ("", " ") for artist and/or title to verify this conversion logic is properly tested.
```suggestion
[
("Artist", "Title", 1),
(None, "Title", 1),
("Artist", None, 1),
(None, None, 0),
("", "Title", 1),
("Artist", "", 1),
("", "", 0),
(" ", "Title", 1),
("Artist", " ", 1),
(" ", " ", 0),
(None, "", 0),
("", None, 0),
(None, " ", 0),
(" ", None, 0),
],
```
</issue_to_address>
### Comment 2
<location> `test/plugins/test_musicbrainz.py:1055` </location>
<code_context>
)
- candidates = list(mb.item_candidates(Item(), "hello", "there"))
+ candidates = list(mb.item_candidates(Item(), artist, title))
- assert len(candidates) == 1
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test for error handling when plugin returns unexpected data.
Please add a test where the mocked plugin returns malformed or incomplete data to verify the function handles it without crashing.
</issue_to_address>
### Comment 3
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 4
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
search_artist = search_artist or item.artist
search_title = search_title or item.title or item.filepath.stem
log.debug("Item search terms: {} - {}", search_artist, search_title)
# Replace empty string with None
if isinstance(search_artist, str) and search_artist.strip() == "":
search_artist = None
if isinstance(search_title, str) and search_title.strip() == "":
search_title = None
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item, search_artist, search_title
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6065 +/- ##
==========================================
- Coverage 66.98% 66.93% -0.06%
==========================================
Files 118 118
Lines 18189 18206 +17
Branches 3079 3084 +5
==========================================
+ Hits 12184 12186 +2
- Misses 5345 5359 +14
- Partials 660 661 +1
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `test/plugins/test_musicbrainz.py:1057-1058` </location>
<code_context>
- assert len(candidates) == 1
- assert candidates[0].track_id == self.RECORDING["id"]
+ assert len(candidates) == expected_count
+ if expected_count == 1:
+ assert candidates[0].track_id == self.RECORDING["id"]
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for empty string values ("") for artist and title.
Tests currently check for None but not for empty strings. Since empty strings are normalized to None, please add test cases for empty string inputs to verify this behavior.
</issue_to_address>
### Comment 2
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 3
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
search_artist = search_artist or item.artist
search_title = search_title or item.title or item.filepath.stem
log.debug("Item search terms: {} - {}", search_artist, search_title)
# Replace empty string with None
if isinstance(search_artist, str) and search_artist.strip() == "":
search_artist = None
if isinstance(search_title, str) and search_title.strip() == "":
search_title = None
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item, search_artist, search_title
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 4
<location> `beetsplug/discogs.py:190-193` </location>
<code_context>
def candidates(
self,
items: Sequence[Item],
artist: str | None,
album: str | None,
va_likely: bool,
) -> Iterable[AlbumInfo]:
query = ""
if artist is not None:
query += artist
if album is not None:
query += f" {album}"
if va_likely:
query = album or ""
query = query.strip()
if not query:
return []
return self.get_albums(query)
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return [] if not query else self.get_albums(query)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
e2683ac to
b525d4c
Compare
88450c7 to
4f0beba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In beatport._get_releases the early
returnproduces None (not an iterable), which will breakyield from—change it to return an empty list or generator to keep the expected iterable type. - After changing the signature of
metadata_plugins.candidates/item_candidates, double-check that all existing plugins updated their implementations (or add a compatibility shim) so they won’t error when receiving keyword args instead of positional.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In beatport._get_releases the early `return` produces None (not an iterable), which will break `yield from`—change it to return an empty list or generator to keep the expected iterable type.
- After changing the signature of `metadata_plugins.candidates`/`item_candidates`, double-check that all existing plugins updated their implementations (or add a compatibility shim) so they won’t error when receiving keyword args instead of positional.
## Individual Comments
### Comment 1
<location> `beetsplug/beatport.py:463-468` </location>
<code_context>
+
+ # query may be empty strings
+ # We want to skip the lookup in this case.
+ if not query.strip():
+ self._log.debug(
+ "Empty search query after preprocessing, skipping {.data_source}.",
+ self,
+ )
+ return
+
for beatport_release in self.client.search(query, "release"):
</code_context>
<issue_to_address>
**issue (bug_risk):** Returning None instead of an empty iterable may cause issues for consumers expecting an iterable.
Returning None here may cause runtime errors if the caller uses iteration. Use 'return []' to ensure consistency and prevent such issues.
</issue_to_address>
### Comment 2
<location> `beets/autotag/match.py:246` </location>
<code_context>
def tag_album(
items,
search_artist: str | None = None,
search_album: str | None = None,
search_ids: list[str] = [],
) -> tuple[str, str, Proposal]:
"""Return a tuple of the current artist name, the current album
name, and a `Proposal` containing `AlbumMatch` candidates.
The artist and album are the most common values of these fields
among `items`.
The `AlbumMatch` objects are generated by searching the metadata
backends. By default, the metadata of the items is used for the
search. This can be customized by setting the parameters.
`search_ids` is a list of metadata backend IDs: if specified,
it will restrict the candidates to those IDs, ignoring
`search_artist` and `search album`. The `mapping` field of the
album has the matched `items` as keys.
The recommendation is calculated from the match quality of the
candidates.
"""
# Get current metadata.
likelies, consensus = get_most_common_tags(items)
cur_artist: str = likelies["artist"]
cur_album: str = likelies["album"]
log.debug("Tagging {} - {}", cur_artist, cur_album)
# The output result, keys are the MB album ID.
candidates: dict[Any, AlbumMatch] = {}
# Search by explicit ID.
if search_ids:
for search_id in search_ids:
log.debug("Searching for album ID: {}", search_id)
if info := metadata_plugins.album_for_id(search_id):
_add_candidate(items, candidates, info)
# Use existing metadata or text search.
else:
# Try search based on current ID.
if info := match_by_id(items):
_add_candidate(items, candidates, info)
rec = _recommendation(list(candidates.values()))
log.debug("Album ID match recommendation is {}", rec)
if candidates and not config["import"]["timid"]:
# If we have a very good MBID match, return immediately.
# Otherwise, this match will compete against metadata-based
# matches.
if rec == Recommendation.strong:
log.debug("ID match.")
return (
cur_artist,
cur_album,
Proposal(list(candidates.values()), rec),
)
# Search terms.
_search_artist, _search_album = _parse_search_terms(
(search_artist, cur_artist),
(search_album, cur_album),
)
log.debug("Search terms: {} - {}", _search_artist, _search_album)
# Is this album likely to be a "various artist" release?
va_likely = (
(not consensus["artist"])
or (_search_artist.lower() in VA_ARTISTS)
or any(item.comp for item in items)
)
log.debug("Album might be VA: {}", va_likely)
# Get the results from the data sources.
for matched_candidate in metadata_plugins.candidates(
items, _search_artist, _search_album, va_likely
):
_add_candidate(items, candidates, matched_candidate)
log.debug("Evaluating {} candidates.", len(candidates))
# Sort and get the recommendation.
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return cur_artist, cur_album, Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** Replace mutable default arguments with None ([`default-mutable-arg`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/default-mutable-arg/))
</issue_to_address>
### Comment 3
<location> `beets/autotag/match.py:334` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
_search_artist, _search_title = _parse_search_terms(
(search_artist, item.artist),
(search_title, item.title),
)
log.debug("Item search terms: {} - {}", _search_artist, _search_title)
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item,
_search_artist,
_search_title,
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
4f0beba to
a280237
Compare
This reverts commit a27cf64.
|
@snejus How do we want to continue here? The main concern was the duplication as far as I can remember. It might be possible to move the duplication check into an api layer (or shared session) at some point but right now I dont see how we can dedupe this otherwise. |
|
Forgot about this PR! My main concern is that I'm leaning towards making it inherit
|
|
I think the proposed change creates more problems than it solves. Maybe we are stuck here without a bigger refactor. Chroma does define Switching it to inherit directly from It also goes against one of I agree that from a maintainer standpoint, the proposal may simplify internal maintenance slightly. But it adds ambiguity for plugin authors, increases the risk of mismatched expectations in future core changes, and could even break existing workflows for users. TLDR: chroma fits the interface well enough, and |
snejus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with your last comment - let's just dedupe this logic and we're good to go
| return track_info | ||
|
|
||
| def item_candidates( | ||
| self, item: Item, artist: str, title: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing seems to have changed in this line. Re your own comment: #6184 (comment)
|
|
||
| # query may be empty strings | ||
| # We want to skip the lookup in this case. | ||
| if not query.strip(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define a helper method in MetadataSourcePlugin to deduplicate this logic?
...
def is_query_empty(query: str) -> bool:
if not query.strip():
self._log.debug(
"Empty search query after preprocessing, skipping {.data_source}.",
self,
)
return True
return Falseand then have
if self.is_query_empty(query):
return []here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could add MetadataSourcePlugin.is_query_empty. And I thought about this too. The issue I see here is that we again start to pollute our currently very clean MetadataSourcePlugin interface with functions that are not used in the core and only used by some subclasses. This kinda signals wrong abstraction boundaries to me, think interface segregation principle.
We can although add a helper function to the beetsplug._utils module and I would be more than fine with that.
Description
It is possible for an metadata lookup to be performed with an empty string for both
artistandtitle/album. This PR add handling for this edgecase for the metadata lookup ofmusibrainz,spotify,discogsandbeatport.Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the
metadata_plugin.item_candidatesfunction.closes #6060
#5965 might have helped here too