Skip to content

Conversation

@semohr
Copy link
Contributor

@semohr semohr commented Oct 2, 2025

Description

It is possible for an metadata lookup to be performed with an empty string for both artist and title/album. This PR add handling for this edgecase for the metadata lookup of musibrainz, spotify, discogs and beatport.

Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the metadata_plugin.item_candidates function.

closes #6060
#5965 might have helped here too

@semohr semohr requested a review from a team as a code owner October 2, 2025 14:07
Copilot AI review requested due to automatic review settings October 2, 2025 14:07

This comment was marked as outdated.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `test/plugins/test_musicbrainz.py:1035-1040` </location>
<code_context>
+            ("Artist", "Title", 1),
+            (None, "Title", 1),
+            ("Artist", None, 1),
+            (None, None, 0),
+        ],
+    )
+    def test_item_candidates(
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test case for empty strings as artist/title.

Please include test cases with empty strings ("", " ") for artist and/or title to verify this conversion logic is properly tested.

```suggestion
        [
            ("Artist", "Title", 1),
            (None, "Title", 1),
            ("Artist", None, 1),
            (None, None, 0),
            ("", "Title", 1),
            ("Artist", "", 1),
            ("", "", 0),
            (" ", "Title", 1),
            ("Artist", " ", 1),
            (" ", " ", 0),
            (None, "", 0),
            ("", None, 0),
            (None, " ", 0),
            (" ", None, 0),
        ],
```
</issue_to_address>

### Comment 2
<location> `test/plugins/test_musicbrainz.py:1055` </location>
<code_context>
         )

-        candidates = list(mb.item_candidates(Item(), "hello", "there"))
+        candidates = list(mb.item_candidates(Item(), artist, title))

-        assert len(candidates) == 1
</code_context>

<issue_to_address>
**suggestion (testing):** Missing test for error handling when plugin returns unexpected data.

Please add a test where the mocked plugin returns malformed or incomplete data to verify the function handles it without crashing.
</issue_to_address>

### Comment 3
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 4
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
    item: Item,
    search_artist: str | None = None,
    search_title: str | None = None,
    search_ids: list[str] | None = None,
) -> Proposal:
    """Find metadata for a single track. Return a `Proposal` consisting
    of `TrackMatch` objects.

    `search_artist` and `search_title` may be used to override the item
    metadata in the search query. `search_ids` may be used for restricting the
    search to a list of metadata backend IDs.
    """
    # Holds candidates found so far: keys are MBIDs; values are
    # (distance, TrackInfo) pairs.
    candidates = {}
    rec: Recommendation | None = None

    # First, try matching by the external source ID.
    trackids = search_ids or [t for t in [item.mb_trackid] if t]
    if trackids:
        for trackid in trackids:
            log.debug("Searching for track ID: {}", trackid)
            if info := metadata_plugins.track_for_id(trackid):
                dist = track_distance(item, info, incl_artist=True)
                candidates[info.track_id] = hooks.TrackMatch(dist, info)
                # If this is a good match, then don't keep searching.
                rec = _recommendation(_sort_candidates(candidates.values()))
                if (
                    rec == Recommendation.strong
                    and not config["import"]["timid"]
                ):
                    log.debug("Track ID match.")
                    return Proposal(_sort_candidates(candidates.values()), rec)

    # If we're searching by ID, don't proceed.
    if search_ids:
        if candidates:
            assert rec is not None
            return Proposal(_sort_candidates(candidates.values()), rec)
        else:
            return Proposal([], Recommendation.none)

    # Search terms.
    search_artist = search_artist or item.artist
    search_title = search_title or item.title or item.filepath.stem
    log.debug("Item search terms: {} - {}", search_artist, search_title)

    # Replace empty string with None
    if isinstance(search_artist, str) and search_artist.strip() == "":
        search_artist = None
    if isinstance(search_title, str) and search_title.strip() == "":
        search_title = None

    # Get and evaluate candidate metadata.
    for track_info in metadata_plugins.item_candidates(
        item, search_artist, search_title
    ):
        dist = track_distance(item, track_info, incl_artist=True)
        candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)

    # Sort by distance and return with recommendation.
    log.debug("Found {} candidates.", len(candidates))
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>


The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@semohr semohr marked this pull request as draft October 2, 2025 14:13
@codecov
Copy link

codecov bot commented Oct 2, 2025

Codecov Report

❌ Patch coverage is 6.25000% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.93%. Comparing base (c26c342) to head (ddca7c4).
⚠️ Report is 11 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beetsplug/beatport.py 14.28% 6 Missing ⚠️
beetsplug/discogs.py 0.00% 3 Missing ⚠️
beetsplug/musicbrainz.py 0.00% 2 Missing and 1 partial ⚠️
beetsplug/spotify.py 0.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6065      +/-   ##
==========================================
- Coverage   66.98%   66.93%   -0.06%     
==========================================
  Files         118      118              
  Lines       18189    18206      +17     
  Branches     3079     3084       +5     
==========================================
+ Hits        12184    12186       +2     
- Misses       5345     5359      +14     
- Partials      660      661       +1     
Files with missing lines Coverage Δ
beetsplug/discogs.py 70.25% <0.00%> (-0.54%) ⬇️
beetsplug/musicbrainz.py 68.94% <0.00%> (-0.55%) ⬇️
beetsplug/spotify.py 46.12% <0.00%> (-0.48%) ⬇️
beetsplug/beatport.py 42.85% <14.28%> (-1.05%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@semohr semohr marked this pull request as ready for review October 2, 2025 14:36
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `test/plugins/test_musicbrainz.py:1057-1058` </location>
<code_context>

-        assert len(candidates) == 1
-        assert candidates[0].track_id == self.RECORDING["id"]
+        assert len(candidates) == expected_count
+        if expected_count == 1:
+            assert candidates[0].track_id == self.RECORDING["id"]

</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test for empty string values ("") for artist and title.

Tests currently check for None but not for empty strings. Since empty strings are normalized to None, please add test cases for empty string inputs to verify this behavior.
</issue_to_address>

### Comment 2
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 3
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
    item: Item,
    search_artist: str | None = None,
    search_title: str | None = None,
    search_ids: list[str] | None = None,
) -> Proposal:
    """Find metadata for a single track. Return a `Proposal` consisting
    of `TrackMatch` objects.

    `search_artist` and `search_title` may be used to override the item
    metadata in the search query. `search_ids` may be used for restricting the
    search to a list of metadata backend IDs.
    """
    # Holds candidates found so far: keys are MBIDs; values are
    # (distance, TrackInfo) pairs.
    candidates = {}
    rec: Recommendation | None = None

    # First, try matching by the external source ID.
    trackids = search_ids or [t for t in [item.mb_trackid] if t]
    if trackids:
        for trackid in trackids:
            log.debug("Searching for track ID: {}", trackid)
            if info := metadata_plugins.track_for_id(trackid):
                dist = track_distance(item, info, incl_artist=True)
                candidates[info.track_id] = hooks.TrackMatch(dist, info)
                # If this is a good match, then don't keep searching.
                rec = _recommendation(_sort_candidates(candidates.values()))
                if (
                    rec == Recommendation.strong
                    and not config["import"]["timid"]
                ):
                    log.debug("Track ID match.")
                    return Proposal(_sort_candidates(candidates.values()), rec)

    # If we're searching by ID, don't proceed.
    if search_ids:
        if candidates:
            assert rec is not None
            return Proposal(_sort_candidates(candidates.values()), rec)
        else:
            return Proposal([], Recommendation.none)

    # Search terms.
    search_artist = search_artist or item.artist
    search_title = search_title or item.title or item.filepath.stem
    log.debug("Item search terms: {} - {}", search_artist, search_title)

    # Replace empty string with None
    if isinstance(search_artist, str) and search_artist.strip() == "":
        search_artist = None
    if isinstance(search_title, str) and search_title.strip() == "":
        search_title = None

    # Get and evaluate candidate metadata.
    for track_info in metadata_plugins.item_candidates(
        item, search_artist, search_title
    ):
        dist = track_distance(item, track_info, incl_artist=True)
        candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)

    # Sort by distance and return with recommendation.
    log.debug("Found {} candidates.", len(candidates))
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>


The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

### Comment 4
<location> `beetsplug/discogs.py:190-193` </location>
<code_context>
    def candidates(
        self,
        items: Sequence[Item],
        artist: str | None,
        album: str | None,
        va_likely: bool,
    ) -> Iterable[AlbumInfo]:
        query = ""
        if artist is not None:
            query += artist
        if album is not None:
            query += f" {album}"

        if va_likely:
            query = album or ""

        query = query.strip()
        if not query:
            return []

        return self.get_albums(query)

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return [] if not query else self.get_albums(query)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@snejus

This comment was marked as outdated.

snejus

This comment was marked as outdated.

@semohr

This comment was marked as outdated.

@semohr semohr marked this pull request as draft October 5, 2025 10:42
@semohr semohr force-pushed the empty_metadata_fix branch 2 times, most recently from e2683ac to b525d4c Compare October 20, 2025 12:51
@semohr semohr changed the title Empty metadata fix Empty metadata support for autotagger Oct 20, 2025
@semohr semohr force-pushed the empty_metadata_fix branch 2 times, most recently from 88450c7 to 4f0beba Compare October 20, 2025 13:14
@semohr semohr marked this pull request as ready for review October 20, 2025 13:20
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In beatport._get_releases the early return produces None (not an iterable), which will break yield from—change it to return an empty list or generator to keep the expected iterable type.
  • After changing the signature of metadata_plugins.candidates/item_candidates, double-check that all existing plugins updated their implementations (or add a compatibility shim) so they won’t error when receiving keyword args instead of positional.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In beatport._get_releases the early `return` produces None (not an iterable), which will break `yield from`—change it to return an empty list or generator to keep the expected iterable type.
- After changing the signature of `metadata_plugins.candidates`/`item_candidates`, double-check that all existing plugins updated their implementations (or add a compatibility shim) so they won’t error when receiving keyword args instead of positional.

## Individual Comments

### Comment 1
<location> `beetsplug/beatport.py:463-468` </location>
<code_context>
+
+        # query may be empty strings
+        # We want to skip the lookup in this case.
+        if not query.strip():
+            self._log.debug(
+                "Empty search query after preprocessing, skipping {.data_source}.",
+                self,
+            )
+            return
+
         for beatport_release in self.client.search(query, "release"):
</code_context>

<issue_to_address>
**issue (bug_risk):** Returning None instead of an empty iterable may cause issues for consumers expecting an iterable.

Returning None here may cause runtime errors if the caller uses iteration. Use 'return []' to ensure consistency and prevent such issues.
</issue_to_address>

### Comment 2
<location> `beets/autotag/match.py:246` </location>
<code_context>
def tag_album(
    items,
    search_artist: str | None = None,
    search_album: str | None = None,
    search_ids: list[str] = [],
) -> tuple[str, str, Proposal]:
    """Return a tuple of the current artist name, the current album
    name, and a `Proposal` containing `AlbumMatch` candidates.

    The artist and album are the most common values of these fields
    among `items`.

    The `AlbumMatch` objects are generated by searching the metadata
    backends. By default, the metadata of the items is used for the
    search. This can be customized by setting the parameters.
    `search_ids` is a list of metadata backend IDs: if specified,
    it will restrict the candidates to those IDs, ignoring
    `search_artist` and `search album`. The `mapping` field of the
    album has the matched `items` as keys.

    The recommendation is calculated from the match quality of the
    candidates.
    """
    # Get current metadata.
    likelies, consensus = get_most_common_tags(items)
    cur_artist: str = likelies["artist"]
    cur_album: str = likelies["album"]
    log.debug("Tagging {} - {}", cur_artist, cur_album)

    # The output result, keys are the MB album ID.
    candidates: dict[Any, AlbumMatch] = {}

    # Search by explicit ID.
    if search_ids:
        for search_id in search_ids:
            log.debug("Searching for album ID: {}", search_id)
            if info := metadata_plugins.album_for_id(search_id):
                _add_candidate(items, candidates, info)

    # Use existing metadata or text search.
    else:
        # Try search based on current ID.
        if info := match_by_id(items):
            _add_candidate(items, candidates, info)
            rec = _recommendation(list(candidates.values()))
            log.debug("Album ID match recommendation is {}", rec)
            if candidates and not config["import"]["timid"]:
                # If we have a very good MBID match, return immediately.
                # Otherwise, this match will compete against metadata-based
                # matches.
                if rec == Recommendation.strong:
                    log.debug("ID match.")
                    return (
                        cur_artist,
                        cur_album,
                        Proposal(list(candidates.values()), rec),
                    )

        # Search terms.
        _search_artist, _search_album = _parse_search_terms(
            (search_artist, cur_artist),
            (search_album, cur_album),
        )
        log.debug("Search terms: {} - {}", _search_artist, _search_album)

        # Is this album likely to be a "various artist" release?
        va_likely = (
            (not consensus["artist"])
            or (_search_artist.lower() in VA_ARTISTS)
            or any(item.comp for item in items)
        )
        log.debug("Album might be VA: {}", va_likely)

        # Get the results from the data sources.
        for matched_candidate in metadata_plugins.candidates(
            items, _search_artist, _search_album, va_likely
        ):
            _add_candidate(items, candidates, matched_candidate)

    log.debug("Evaluating {} candidates.", len(candidates))
    # Sort and get the recommendation.
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return cur_artist, cur_album, Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** Replace mutable default arguments with None ([`default-mutable-arg`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/default-mutable-arg/))
</issue_to_address>

### Comment 3
<location> `beets/autotag/match.py:334` </location>
<code_context>
def tag_item(
    item: Item,
    search_artist: str | None = None,
    search_title: str | None = None,
    search_ids: list[str] | None = None,
) -> Proposal:
    """Find metadata for a single track. Return a `Proposal` consisting
    of `TrackMatch` objects.

    `search_artist` and `search_title` may be used to override the item
    metadata in the search query. `search_ids` may be used for restricting the
    search to a list of metadata backend IDs.
    """
    # Holds candidates found so far: keys are MBIDs; values are
    # (distance, TrackInfo) pairs.
    candidates = {}
    rec: Recommendation | None = None

    # First, try matching by the external source ID.
    trackids = search_ids or [t for t in [item.mb_trackid] if t]
    if trackids:
        for trackid in trackids:
            log.debug("Searching for track ID: {}", trackid)
            if info := metadata_plugins.track_for_id(trackid):
                dist = track_distance(item, info, incl_artist=True)
                candidates[info.track_id] = hooks.TrackMatch(dist, info)
                # If this is a good match, then don't keep searching.
                rec = _recommendation(_sort_candidates(candidates.values()))
                if (
                    rec == Recommendation.strong
                    and not config["import"]["timid"]
                ):
                    log.debug("Track ID match.")
                    return Proposal(_sort_candidates(candidates.values()), rec)

    # If we're searching by ID, don't proceed.
    if search_ids:
        if candidates:
            assert rec is not None
            return Proposal(_sort_candidates(candidates.values()), rec)
        else:
            return Proposal([], Recommendation.none)

    # Search terms.
    _search_artist, _search_title = _parse_search_terms(
        (search_artist, item.artist),
        (search_title, item.title),
    )
    log.debug("Item search terms: {} - {}", _search_artist, _search_title)

    # Get and evaluate candidate metadata.
    for track_info in metadata_plugins.item_candidates(
        item,
        _search_artist,
        _search_title,
    ):
        dist = track_distance(item, track_info, incl_artist=True)
        candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)

    # Sort by distance and return with recommendation.
    log.debug("Found {} candidates.", len(candidates))
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@semohr semohr requested a review from snejus October 20, 2025 13:25
@semohr semohr force-pushed the empty_metadata_fix branch from 4f0beba to a280237 Compare October 20, 2025 21:22
@semohr semohr changed the title Empty metadata support for autotagger Empty metadata support for autotagger plugins Oct 21, 2025
@semohr
Copy link
Contributor Author

semohr commented Dec 2, 2025

@snejus How do we want to continue here?

The main concern was the duplication as far as I can remember. It might be possible to move the duplication check into an api layer (or shared session) at some point but right now I dont see how we can dedupe this otherwise.

@snejus
Copy link
Member

snejus commented Dec 3, 2025

Forgot about this PR! My main concern is that chroma plugin is very different from the rest of metadata source plugins - and whether we gain any value from forcing it to implement MetadataSourcePlugin? It does not implement track_for_id and album_for_id which also indicates that this interface may not the best fit for it.

I'm leaning towards making it inherit BeetsPlugin instead:

  1. Define data_source explicitly to make sure it's still treated as a data source
  2. Centralise the empty search handling for MetadataSourcePlugin subclasses

@semohr
Copy link
Contributor Author

semohr commented Dec 3, 2025

I think the proposed change creates more problems than it solves. Maybe we are stuck here without a bigger refactor.

Chroma does define track_for_id and album_for_id they just return empty results. And that actually makes sense: the plugin supports the interface contract, but simply doesn’t produce lookups because fingerprinting doesn’t map cleanly to ID-based metadata sources (which it could in theory). That’s a perfectly valid implementation and doesn’t justify removing it from MetadataSourcePlugin.

Switching it to inherit directly from BeetsPlugin' blurs the separation between the two interfaces. MetadataSourcePlugin has a clear purpose imo ID-based lookup and search semantics, while BeetsPlugin is intentionally generic. Moving chroma out of the metadata-source abstraction weakens that clarity and makes the inheritance structure harder to reason about.

It also goes against one of beets strengths: being open and flexible in how plugins can be implemented. We shouldn’t impose rules that don’t fit every plugin. Chroma isn’t the only case, there are external metadata plugins that also purely work on Item data without necessary requiring titles and artist (e.g. aisauce or audible).

I agree that from a maintainer standpoint, the proposal may simplify internal maintenance slightly. But it adds ambiguity for plugin authors, increases the risk of mismatched expectations in future core changes, and could even break existing workflows for users.


TLDR: chroma fits the interface well enough, and beets has always thrived by not forcing plugin authors into rigid patterns. This change would move in the opposite direction.

Copy link
Member

@snejus snejus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your last comment - let's just dedupe this logic and we're good to go

return track_info

def item_candidates(
self, item: Item, artist: str, title: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing seems to have changed in this line. Re your own comment: #6184 (comment)


# query may be empty strings
# We want to skip the lookup in this case.
if not query.strip():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define a helper method in MetadataSourcePlugin to deduplicate this logic?

    ...
    def is_query_empty(query: str) -> bool:
        if not query.strip():
            self._log.debug(
                "Empty search query after preprocessing, skipping {.data_source}.",
                self,
            )
            return True

        return False

and then have

    if self.is_query_empty(query):
        return []

here.

Copy link
Contributor Author

@semohr semohr Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could add MetadataSourcePlugin.is_query_empty. And I thought about this too. The issue I see here is that we again start to pollute our currently very clean MetadataSourcePlugin interface with functions that are not used in the core and only used by some subclasses. This kinda signals wrong abstraction boundaries to me, think interface segregation principle.

We can although add a helper function to the beetsplug._utils module and I would be more than fine with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

beet import -s fails with "at least one query term is required"

3 participants