Frequently Asked Questions (FAQ)

I see only one contributor, should I trust you?

There is currently one contributor, but every project started somehow. Selinon was designed for fabric8-analytics project at Red Hat that gathers project information for openshift.io (introduced at the Red Hat 2017 summit keynote), where it is still used and it already served millions flows and even more tasks. If you find Selinon interesting for your use-case, feel free to use it (and buy me some beer or at least let me know that you like it or share any experiences you have).

If you find a bug, place for enhancement or anything where I can be helpful, feel free to let me know. And not to forget - even you can be Selinon developer.

Dispatcher does not work properly or hangs in an infinite loop.

Check your result backend configuration for Celery. Currently, there is supported and well tested only Redis and PostgreSQL (feel free to extend this list based on your experiences). There were found serious issues when rpc was used.

Can I see Selinon in action?

See Selinon demo or fabric8-analytics project, especially it’s fabric8-analytics-worker.

Can I simulate Selinon run without deploying huge infrastructure?

Yes, you can. Just use shipped executor:

selinon-cli execute --nodes-definition nodes.yml --flow-definitions flow1.yml flow2.yml --help

This way you can also use Selinon to run your flows from a CLI. You can also explore prepared containerized demo.

Should I replace Celery with Selinon?

Well, hard to say. Celery is a great project and offers a lot of features. Selinon should be suitable for you when you have time and data dependencies between tasks and you can group these tasks into flows that are more sophisticated than Celery’s primitives such as chain or chord. If this is true for you, just give Selinon a try. If you are already using Celery, check prepared guide on how to migrate from raw Celery to Selinon.

How should I name tasks and flows?

You should use names that can became part of function name (or Python3 identifier). Keep in mind that there is no strict difference between tasks, flows and sub-flows, so they share name space.

How can I access nested keys in a dict in default predicates?

Assuming you are using predicates from selinon.predicates. What you want is (in Python3):

message['foo']['bar'] == "baz"

Predicates were designed to deal with this - just provide list of keys, where position in a list describes key position:

condition:
  name: 'fieldEqual'
  args:
      key:
          - 'foo'
          - 'bar'
      value: 'baz'

I need a custom predicate, how to write it?

If selinon.predicates predicates are not suitable for you or you miss a specific predicate, you can define your own module in the global configuration. See YAML configuration section for details.

What exceptions can predicates raise?

Predicates were designed to return always true or false. If a condition cannot be satisfied, there is returned false. So it is safe for example to access possibly non-existing keys - predicates will return false. This idea has to be kept even in your predicates as predicates are executed by dispatcher. If you rise an exception inside predicate the behaviour is undefined.

Danger

Predicates were designed to return always true or false. No exceptions can be raised!

Do I need result backend?

Or more precisely: Do I need a result backend even when I am using my custom database/storage for task results?

Yes, you do. The result backend is used by Celery to store information about tasks (their status, errors). Without result backend, Selinon is not capable to get information about tasks as it uses Celery. Do not use rpc backend as there were noted issues.

Why there is used generated code by Selinon?

Since YAML config files cover some logic (such as conditions), this needs to be evaluated somehow. We could simply interpret YAML file each time, but it was easier to generate directly Python code from YAML configuration files and let Python interpreter interpret it for us. Other parts from YAML file could be directly used, but mostly because of consistency and debugging the whole YAML file is used for code generation.

You can easily check how YAML files is transformed to Python code simply by running:

selinon-cli inspect --nodes-definition nodes.yml --flow-definitions flow1.yml flow2.yml --dump outputfile.py

How to write conditions for sub-flows?

This is currently a limitation of Selinon. You can try to reorganize your flows so you don’t need to inspect parent subflows, for most use cases it will work. Adding support for this is for future releases planned.

Is it possible to do changes in the configuration and do continuous redeployment?

Yes, you can do so. BUT make sure you do migrations - see the migration section to get insights on how to do it properly.

What happens if I forgot to do migrations?

If you do changes in the YAML configuration files and you do not perform migrations, unpredictable things may happen if your queues have still old messages. It’s always a good idea to check whether migration files need to be generated. See Migrations - Redeployment with changes for more details.

Is my YAML config file correct? How to improve or correct it?

See Best practices section for tips.

Can I rely on checks of YAML files?

You can a bit, but think before you write configuration. There are captured some errors, but checks are not bullet-proof. If you make logical mistakes or your flow is simply wrong, Selinon is not AI to check your configuration. There are not done checks on transitive dependencies, if given conditions could evaluate or so.

Is there a way how to limit task execution time?

Currently there is no such mechanism. Celery has time limit configuration option, but note that Selinon tasks are not Celery tasks.

Why there is no support for older Celery versions?

One of the requirements of Selinon is, that it defines tasks (Dispatcher and SelinonTaskEnvelope) before the Celery’s application gets instantiated. Older versions of Celery requested tasks to be registered after the Celery’s application was created. This makes it chicken-egg problem.

What broker type do I need?

Selinon uses Celery for queue handling and running, so you have to use broker implementation that is supported by Celery - such as SQS or RabbitMQ.

Selinon requires that you messages are delivered - it’s okay if messages are delivered more than once (see for example SQS details regarding deliver at least one). You will just end up with multiple tasks executed at the same time. You can tackle that in your application logic.

Why does a flow finishes too early when using AWS SQS?

Most likely you are using AWS SQS standard queues that can deliver a single message multiple times. If your application logic processes one message but a task fails when the second message is processed (e.g. integrity errors if task ids are unique in PostgreSQL), Celery overwrites task state stored in the result backend. This causes that even if task succeeds (first run) it’s state can be tracked as failed.

A solution to this problem is to patch Celery’s result backend to restrict only one task, something like (in case of PosgreSQL as a result backend):

diff --git a/celery/backends/database/__init__.py b/celery/backends/database/__init__.py
index 506a4cc69..57d29a6ca 100644
--- a/celery/backends/database/__init__.py
+++ b/celery/backends/database/__init__.py
@@ -110,6 +110,9 @@ class DatabaseBackend(BaseBackend):
                 task = Task(task_id)
                 session.add(task)
                 session.flush()
+            elif task.status in states.READY_STATES:
+                # Do not overwrite on multiple message delivery (e.g. SQS).
+                return task.result
             task.result = result
             task.status = state
             task.traceback = traceback

Or simply switch to AWS SQS FIFO queues that guarantee exactly once delivery of a message.

What does Selinon mean?

Selinon means Celery in Greek language. The main reason for using Greek language was the fact that there are already successful project out there that do distributed systems and have Greek names (see Kubernetes as an example). But Greek language is cool anyway :-).