Task label feature refined by ottointhesky · Pull Request #983 · ipython/ipyparallel

ottointhesky · 2026-02-12T11:58:18Z

This merge request will contain small improvements regarding the docu and unittests of the label feature. For now, only unittests were added which check submitted labels with the dictDB and sqliteDB backend.

A couple of days ago I realized that label will be written twice to the DB which is maybe unwanted (since its wastes resources):

We need label as explicit column to make it queryable. Hence, we could remove the entry from the metadata before writing the record to the database. This can be handled centrally. However, retrieving a record needs re-adding the label to metadata which makes everything more complicated since it requires specific handling for the different DB backends. So is it worth the effort since label will be empty for most users anyway? Probably not...

As mentioned earlier we also need the possibility to find records based on substrings within DB columns. In monoDB syntax this can be achieved using regex. E.g.
{'label': {'$regex': 'my'}}
would find any records where label contains the string my (at any position). So this would require a new comparision operator ($regex) for the filter defintion in ipp. Supporting this operator in dictDB shouldn't be to difficult but for sqlite, only a strongly reduced regex defintion could be supported via like. sql like basically only support wildcard (single and multi character macthing). So a regex only containing ^ $ . .* .+ could be translated. Anything else isn't possible. So the question here is, should we extended the supported operators by $regex or should we go a different/new way by add the possibility of passing backend specific filter objects (e.g. lamba object for dictDB and where clauses for sqliteDB)? If you are thinking of dropping support of mongoDB the second option might be more appealing. If you do not want to drop support for mongoDB yet, I sugguest that we add a monoDB installation to the github actions. Using the following action script this should be to difficult. No matter which way you want to go, I'm happy to provid the necessary implementation...

minrk · 2026-02-12T17:09:20Z

I don't think we need to worry about the cost of writing the label twice to make it queryable. It's quite small compared to anything else, so the impact will be negligible.

I don't imagine full regex search is going to be that useful, since users would only craft the labels specifically to make them searchable, I imagine wildcard matching is plenty.

If you wanted to put some time into testing mongodb, that would be super appreciated! If it takes too much of your time, just say so, and we can probably drop it.

…lel into task_label_feature

for more information, see https://pre-commit.ci

…bel_feature # Conflicts: # ipyparallel/controller/mongodb.py

for more information, see https://pre-commit.ci

…bel_feature # Conflicts: # ipyparallel/controller/mongodb.py

ottointhesky · 2026-02-12T19:57:37Z

I don't think we need to worry about the cost of writing the label twice to make it queryable. It's quite small compared to anything else, so the impact will be negligible.

Ok & thanks. I just wanted to double check with you...

If you wanted to put some time into testing mongodb, that would be super appreciated! If it takes too much of your time, just say so, and we can probably drop it.

As presumed, adding mongodb to the github tests was easy. supercharge/mongodb-github-action only works for linux container but that's definitely better than no test. I also changed to pymongo api 4.x and raise an exception if pymongo version is below 4

I don't imagine full regex search is going to be that useful, since users would only craft the labels specifically to make them searchable, I imagine wildcard matching is plenty.

Agreed, but how should a wildcard matching look in python code? So far the query objects syntax is defined by mongodb (query objects are passed to mongodb untouched) and there is no wildcard syntax there. If we come up with something new, e.g. based on sql like

{'label': {'$like': '%my%'}}

query objects will need preprocessing also for mongodb as it is NOT currently the case. Which direction should we go?

ottointhesky · 2026-02-13T12:33:05Z

FYI: for what ever reason the mongodb container seem to interfere with the slurm container. Sometimes it works but most of the time it doesn't. Deactivating mongodb via if for the slurm test doesn't seem to work. Hopefully I can find a solution to this problem...

ottointhesky · 2026-02-13T15:38:42Z

FYI: for what ever reason the mongodb container seem to interfere with the slurm container. Sometimes it works but most of the time it doesn't. Deactivating mongodb via if for the slurm test doesn't seem to work. Hopefully I can find a solution to this problem...

The error message

image "docker.io/library/ipp-cluster:slurm": already exists

seems to be a racing condition of docker build. With a little bit of research I discovered this bug report. Calling the actual build command twice (on error), seems to resolve the problem. Once the problem in docker is fixed, the double call can be removed

for more information, see https://pre-commit.ci

…lel into task_label_feature

ottointhesky · 2026-02-23T16:49:58Z

I have updated docu regarding label
github actions include all mongodb tests
label tests now also running with mongodb
I have added label support for View.execute and View.run

From my point of view only one question remains:
How do we proceed with the wildcard matching (for labels)?

ottointhesky · 2026-02-23T20:45:21Z

By the way, today I realized that the broadcast view does not write any entries to the hub db. Is this on purpose?

Again, reading the docu on Broadcast View helps to understand its idea/concept :-) Since the Broadcast View is tuned for efficiency (and no task entries are written to the hub db) it doesn't make much sense to support labels for this view. Do you agree?

minrk · 2026-02-24T23:53:13Z

That's great!

How do we proceed with the wildcard matching (for labels)?

Both do seem to have similar definitions with different symbols:

meaning	sql LIKE	mongodb wildcard	python fnmatch
0-to-many	`%`	`*`	`*`
exactly one	`_`	`?`	`?`

so I think it makes sense to use the fnmatch-style, as the Python-native form, which needs no modification for dictdb or mongodb, and use

pattern.replace('*', '%').replace('?', '_')`

in the sql backend. Does that sound reasonable?

I realized that the broadcast view does not write any entries to the hub db. Is this on purpose?

I don't think it is, but you don't need to fix that here, it might be complicated. We can open an Issue for it.

ottointhesky · 2026-02-25T11:28:52Z

Thanks for your input, but I fear it’s not as easy. Mongodb supports wildcard matching but not in the way, the ipp-mongodb client accesses entries in the DB. The code uses find for querying corresponding entries:

matches = list(self._records.find(check, keys))

and there only the following operators are supported. The wildcard operator only works with aggergation (which is a different query concept) and I wasn't able to get it to work with find in my local mongodb installation. Hence, it cannot be directly integrated in the current code concept (at least by my understanding, but I’m definitely not an mongodb expert)

So, I think we are back to my original suggestions:

Introducing a new $like operator which is translated to regular expressions for dictdb and mongodb. I do have already some code that can handle this even supporting definable escape charaters as it is possible by sql like
Alternatively, a $wildcard operator could be introduced using ? and * for matching which is maybe more commonly known.
Preserving the original concept of strictly sticking to the mongodb syntax and using regular expressions ($regex). This works straight forward for dictdb and for sqlite we could support regex that only contain ^ $ . .* .+ (or filtering entries within python after the sqlite query)
Passing native db dependent query objects to the db backend (new function or additional parameter)

Of course it possible to also implement multiple solutions.
May I ask you again which way you want to go (or do you have a better solution)? If you are unsure it’s maybe better to merge and close this pull request and I will create a new one…

minrk · 2026-03-02T19:20:49Z

In that case, I think we can say that wildcard matches aren't supported in mongodb, only dictdb/sqlitedb. If someone ever comes wanting to add support in the mongo backend, we can do it, but no need to put in the work now.

So let's use:

fnmatch syntax as input
use fnmatch module in dictdb
two-character substitution for sql LIKE
only support strict equality in mongodb

How does that sound?

ottointhesky · 2026-03-02T22:29:12Z

thanks for your comments!
sounds good to me. how should the query object look like? e.g.

{'label': {'$fnmatch' : '*my?'}}

translation to sql is straight forward (as you suggested). Even mongodb support could be added easily using the fnmatch.translate to convert wildcards to a regular expression. But if you do not want me to touch the mongodb code, I will postpone it...

So main question here: Do you like the naming of the new wildcard operator $fnmatch?

minrk · 2026-03-03T18:35:23Z

The structure looks great!

Naming things is hard, but I wouldn't pick the Python module name, I'd pick a more generic word like 'match' or what these patterns are, which are often called 'globs'. Naming discussions can get into the weeds, so I'll suggest you pick from this list and not go back and forth too much:

$glob - specific, refers to this kind of matching we are doing so indicates syntax for folks who know the name, but a bit jargony
$wildcard - like glob, but a bit more generic
$match - generic, might imply regex
$like - generic, but references the sql function we use

If the regex fits easily, feel free to implement the mongo one. I only didn't want to require it for you to be able to finish the feature, but by all means feel free if it's not a problem to implement the same semantics across the board.

ottointhesky · 2026-03-04T09:59:56Z

Thanks again for your suggestions.

I have two favourites: $glob and $wildcard. Maybe the correct candidate turns out itself, if we consider escaping (or not) which we haven’t discussed yet.
I wasn’t aware that glob not only supports * and ? but also character sequence matching if wrapped in [] . This also allows escaping ? or *. Hence, if we use glob/fnmatch (operator name -> $glob) as a basis for the new operator the sql translation should also understand [] expressions at least for escaping the meta-characters. Unfortunately, there is no equivalent for any character in sequence matching in the sql like syntax. Hence, we could translate such an expression to any single character match or throw an exception.

If we create our own wildcard operator (operator name -> $wildcard), we could limit its functionality to * and ? without any escape character support. Literals such as *, ?, _, %, [ and ] should not be used in relevant db columns or in the wildcard pattern to make the wildcard matching work consistently across all db classes. If we want to secure a consistent behaviour at least wildcard pattern checking is needed in all three db classes. Other option: only document it but do not check it...

So in short
$glob:

Easy and rigours support for dictdb and mongodb (via fnmatch.translate as regex)
Limited syntax support for sqlite (translation to like syntax more complicated)
Escape character support
Known and standardized behaviour

$wildcard:

Without pattern checking simple implementation
With rigours pattern checking maybe more complicate than $glob implementation
Easy support for dictdb, mongodb and sqlite
No escape character support
Without rigours pattern checking different result maybe returned based on different db backends (maybe no so relevant for user since one typically stays with one db)

I would go for $glob since it’s more rigorous and has a (hopefullly) 100% predictable behaviour. What do you think?

minrk · 2026-03-04T22:00:21Z

Great, let's do $glob. For ~100% of cases, all glob means to people is * support, so that seems totally fine to me. No need to go to too much trouble.

test code for labels added

fbab54a

Johannes Otepka and others added 11 commits February 12, 2026 18:52

tzinfo got lost when storing in mongodb

171bc98

Merge branch 'task_label_feature' of github.com:ottointhesky/ipyparal…

07923f8

…lel into task_label_feature

[pre-commit.ci] auto fixes from pre-commit.com hooks

2f1293f

for more information, see https://pre-commit.ci

make mongodb tzinfo aware

7f0400a

Merge remote-tracking branch 'origin/task_label_feature' into task_la…

6a887c1

…bel_feature # Conflicts: # ipyparallel/controller/mongodb.py

make mongodb tzinfo aware

a9e4266

[pre-commit.ci] auto fixes from pre-commit.com hooks

e473c81

for more information, see https://pre-commit.ci

switch to new mongodb api

5762cd6

mongodb installation added to github actions

6a36864

Merge remote-tracking branch 'origin/task_label_feature' into task_la…

29fe5e1

…bel_feature # Conflicts: # ipyparallel/controller/mongodb.py

ruff format changes

d6436d1

Johannes Otepka and others added 4 commits February 12, 2026 22:22

exclude mongodb isntall from slurm

1a2624e

Update test.yml

d4c6fab

Update test.yml

6a365f2

Update test.yml

3d20d63

Johannes Otepka added 2 commits February 13, 2026 15:58

work-a-round fix for image already exists

a45a644

pre-commit format changes

84009a6

Johannes Otepka and others added 8 commits February 13, 2026 17:08

enable mongodb for slurm test environment as well

0d94418

enable mongodb only under linux with no cluster

8a3438d

pre-commit format correction

9216581

corrections within windows

4e16aad

[pre-commit.ci] auto fixes from pre-commit.com hooks

1a57507

for more information, see https://pre-commit.ci

minor change: formating corrected

d2a0595

label added to docu / execute and run now supports label

51367d2

Merge branch 'task_label_feature' of github.com:ottointhesky/ipyparal…

a2421f7

…lel into task_label_feature

ottointhesky and others added 2 commits February 23, 2026 16:57

Merge branch 'ipython:main' into task_label_feature

fba459f

corrections for mongodb

f3f07fe

Merge branch 'ipython:main' into task_label_feature

b9f604d

Uh oh!

Conversation

ottointhesky commented Feb 12, 2026

Uh oh!

minrk commented Feb 12, 2026

Uh oh!

ottointhesky commented Feb 12, 2026

Uh oh!

ottointhesky commented Feb 13, 2026

Uh oh!

ottointhesky commented Feb 13, 2026

Uh oh!

ottointhesky commented Feb 23, 2026

Uh oh!

ottointhesky commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minrk commented Feb 24, 2026

Uh oh!

ottointhesky commented Feb 25, 2026

Uh oh!

minrk commented Mar 2, 2026

Uh oh!

ottointhesky commented Mar 2, 2026

Uh oh!

minrk commented Mar 3, 2026

Uh oh!

ottointhesky commented Mar 4, 2026

Uh oh!

minrk commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ottointhesky commented Feb 23, 2026 •

edited

Loading