Programming

Embedding Python in Elixir with Pythonx

Python interpreter integration and ML model inference in Elixir

26 min read elixir, python, interop, machine-learning

The Elixir ecosystem has a machine learning problem, and it's one of breadth rather than capability. Nx, Axon, and Bumblebee have proven Elixir can do serious numerical computing; the libraries are what's missing. Python has decades of battle-tested libraries, pre-trained models, and institutional knowledge baked into packages like scikit-learn, transformers, and countless domain-specific tools; rewriting all of that in Elixir would take years we don't have.

Pythonx solves this by embedding a Python interpreter directly into your BEAM process. There are no ports to manage, no NIFs with manual memory management, and no microservices adding network latency — just Python running inside Elixir, with data flowing between them.

When Python Interop Makes Sense (And When It Doesn't)

Before reaching for Pythonx, ask yourself: do I actually need Python?

The Elixir ML ecosystem has come a long way. Nx provides efficient tensor operations with GPU acceleration; Bumblebee offers pre-trained transformer models that run natively on the BEAM; Axon handles neural network training.nx-defn-compiler For many use cases—text embeddings, image classification, basic NLP—you can stay entirely within Elixir.

But Python interop pulls its weight in specific situations:

  • Legacy model integration. Your data science team has spent months fine-tuning a PyTorch model. Retraining it in Axon means duplicating work and risking subtle behavioral differences. Pythonx lets you load that exact .pt file and run inference with identical results.
  • Ecosystem gaps. Need to run a specific algorithm from scikit-learn? Want to use a niche library like prophet for time series forecasting? Some tools don't have Elixir equivalents yet.
  • Rapid prototyping. You're exploring whether an ML approach works for your problem. Python's interactive ecosystem—Jupyter, pandas, matplotlib—accelerates experimentation. Once you've validated the approach, you can decide whether to port it to native Elixir or keep running it via Pythonx.

Still, don't use Pythonx as a crutch. If you're just doing basic data manipulation, Elixir's Enum and Stream modules are more than capable. If you need matrix operations, reach for Nx first. Python interop adds complexity; only accept that complexity when the benefit is clear.

Pythonx: What It Is and Where It Comes From

Pythonx emerged from the Livebook project at Dashbit.livebook-lineage If you've used Livebook's Python integration, you've already used the technology underlying Pythonx. The library embeds a Python interpreter using Erlang's NIF interface, but abstracts away the gnarly details of memory management and type conversion.

What makes the architecture work: Python's GIL (Global Interpreter Lock) and the BEAM's concurrency model can coexist if you're careful about thread coordination.gil-beam-coexistence Pythonx handles the careful part for you.

Add it to your dependencies:

def deps do
  [
    {:pythonx, "~> 0.3"}
  ]
end

Pythonx will download and manage a Python distribution for you, or you can point it at an existing installation.

Setting Up the Python Environment

Pythonx needs to know which Python packages to make available. Two options: let Pythonx manage everything, or use your existing Python environment.

In your config/config.exs:

config :pythonx, :uv_init,
  packages: [
    "numpy",
    "torch",
    "transformers",
    "scikit-learn"
  ],
  python_version: "3.11"

Then initialize during application startup:

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    Pythonx.uv_init()

    children = [
      # ... your supervision tree
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

The uv_init/0 function uses uv—the fast Python package installer—to create an isolated environment with your specified packages.uv-astral This environment lives in your project's _build directory and is reproducible across deployments.

Using an Existing Python Environment

If your data science team maintains their own conda or virtualenv environment, you can point Pythonx at it:conda-virtualenv

config :pythonx, :python_path, "/path/to/your/venv/bin/python"

This approach gives you flexibility but requires you to manage Python dependencies outside of your Elixir build process. I prefer the managed approach for applications where I control the full stack; the external approach works well when integrating with an existing ML pipeline.

Calling Python Code from Elixir

You write Python code as a string, execute it, and get results back as Elixir terms. That's the whole API surface.

Synchronous Execution

defmodule MyApp.PythonMath do
  def calculate_statistics(numbers) do
    # Convert Elixir list to Python code
    python_code = """
    import numpy as np

    data = np.array(#{inspect(numbers)})
    result = {
        'mean': float(np.mean(data)),
        'std': float(np.std(data)),
        'median': float(np.median(data))
    }
    result
    """

    case Pythonx.eval(python_code) do
      {:ok, result} -> {:ok, result}
      {:error, reason} -> {:error, reason}
    end
  end
end

Pythonx.eval/1 executes the Python code and returns the last expression. Python dictionaries become Elixir maps; lists become lists; primitives map to their obvious counterparts.

Working with Python State

For anything beyond one-shot evaluation, you'll want to maintain state across multiple Python calls. Pythonx provides a session-based API:

defmodule MyApp.ModelSession do
  def run_pipeline do
    Pythonx.session(fn session ->
      # Load the model once
      session
      |> Pythonx.exec("""
        from transformers import pipeline
        classifier = pipeline('sentiment-analysis')
        """)

      # Run multiple inferences
      texts = ["I love this product", "This is terrible", "Meh, it's okay"]

      results = Enum.map(texts, fn text ->
        {:ok, result} = Pythonx.eval(session, """
          classifier('#{text}')[0]
          """)
        result
      end)

      results
    end)
  end
end

The session keeps the Python interpreter state alive across multiple exec and eval calls. This matters for ML workflows where loading a model takes seconds but inference takes milliseconds; you don't want to pay that startup cost on every request.

Asynchronous Execution

Python code can block the BEAM scheduler if it runs long computations. For CPU-intensive Python operations, use the async API:

defmodule MyApp.AsyncInference do
  def predict_async(input_data) do
    task = Pythonx.async_eval("""
      import time
      import json

      # Simulate expensive computation
      time.sleep(2)

      data = json.loads('#{Jason.encode!(input_data)}')
      prediction = sum(data['features']) * 0.5  # Dummy model
      prediction
      """)

    # Do other work while Python computes
    Logger.info("Prediction submitted, doing other work...")

    # Wait for result when needed
    case Pythonx.await(task, 10_000) do
      {:ok, prediction} -> {:ok, prediction}
      {:error, :timeout} -> {:error, "Prediction timed out"}
    end
  end
end

Under the hood, async_eval spawns the Python work on a dirty scheduler, preventing it from blocking regular BEAM processes. The timeout in await/2 is in milliseconds.

ML Model Integration: A Complete Example

A sentiment analysis service using a Hugging Face transformer model—this is the pattern I reach for most often in production.

defmodule MyApp.SentimentAnalyzer do
  use GenServer

  @model_name "distilbert-base-uncased-finetuned-sst-2-english"

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def analyze(text) when is_binary(text) do
    GenServer.call(__MODULE__, {:analyze, text}, 30_000)
  end

  # GenServer Callbacks

  @impl true
  def init(_opts) do
    # Initialize Python session with model loaded
    session = Pythonx.start_session()

    # Load model at startup (this takes a few seconds)
    Pythonx.exec(session, """
      from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

      model = AutoModelForSequenceClassification.from_pretrained('#{@model_name}')
      tokenizer = AutoTokenizer.from_pretrained('#{@model_name}')
      classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

      def analyze_sentiment(text):
          result = classifier(text)[0]
          return {
              'label': result['label'],
              'score': float(result['score'])
          }
      """)

    {:ok, %{session: session}}
  end

  @impl true
  def handle_call({:analyze, text}, _from, %{session: session} = state) do
    # Escape the text for safe Python string embedding
    escaped_text = String.replace(text, "'", "\\'")

    result = Pythonx.eval(session, """
      analyze_sentiment('#{escaped_text}')
      """)

    case result do
      {:ok, analysis} ->
        {:reply, {:ok, analysis}, state}

      {:error, reason} ->
        {:reply, {:error, reason}, state}
    end
  end

  @impl true
  def terminate(_reason, %{session: session}) do
    Pythonx.stop_session(session)
  end
end

Three things matter here.string-interpolation-risk The model loads once during init/1, amortizing the multi-second load time across all subsequent requests. The GenServer serializes access to the Python session—which is fine, because Python's GIL means only one thread can execute Python bytecode at a time anyway. And the terminate/2 callback ensures we clean up the Python session properly.

Data Conversion: Bridging Elixir and Python

The most friction in Python interop comes from data conversion. Pythonx handles primitives automatically, but tensors and complex data structures need explicit handling.

Working with Nx Tensors

Nx tensors and NumPy arrays are both contiguous blocks of memory with shape and dtype metadata; converting between them should be efficient.

defmodule MyApp.TensorBridge do
  @doc """
  Convert an Nx tensor to a NumPy array via shared memory.
  """
  def nx_to_numpy(session, tensor, var_name) do
    # Get tensor as binary
    binary = Nx.to_binary(tensor)
    shape = Nx.shape(tensor) |> Tuple.to_list()
    dtype = nx_type_to_numpy(Nx.type(tensor))

    # Encode binary as base64 for safe transport
    encoded = Base.encode64(binary)

    Pythonx.exec(session, """
      import numpy as np
      import base64

      _binary = base64.b64decode('#{encoded}')
      #{var_name} = np.frombuffer(_binary, dtype=np.#{dtype}).reshape(#{inspect(shape)})
      """)
  end

  @doc """
  Convert a NumPy array back to an Nx tensor.
  """
  def numpy_to_nx(session, var_name) do
    {:ok, result} = Pythonx.eval(session, """
      import base64

      _arr = #{var_name}
      {
          'data': base64.b64encode(_arr.tobytes()).decode('utf-8'),
          'shape': list(_arr.shape),
          'dtype': str(_arr.dtype)
      }
      """)

    binary = Base.decode64!(result["data"])
    shape = List.to_tuple(result["shape"])
    type = numpy_type_to_nx(result["dtype"])

    Nx.from_binary(binary, type) |> Nx.reshape(shape)
  end

  defp nx_type_to_numpy({:f, 32}), do: "float32"
  defp nx_type_to_numpy({:f, 64}), do: "float64"
  defp nx_type_to_numpy({:s, 32}), do: "int32"
  defp nx_type_to_numpy({:s, 64}), do: "int64"
  defp nx_type_to_numpy({:u, 8}), do: "uint8"

  defp numpy_type_to_nx("float32"), do: {:f, 32}
  defp numpy_type_to_nx("float64"), do: {:f, 64}
  defp numpy_type_to_nx("int32"), do: {:s, 32}
  defp numpy_type_to_nx("int64"), do: {:s, 64}
  defp numpy_type_to_nx("uint8"), do: {:u, 8}
end

The base64 encoding adds overhead, but it's the safest way to pass binary data through string interpolation.base64-overhead For very large tensors in performance-critical paths, Pythonx offers lower-level APIs that can pass binary data directly.

Performance Considerations and Pooling

A single Python session can only execute one piece of code at a time. If your application needs to handle concurrent ML inference requests, you have two options: accept the serialization, or pool sessions.

Session Pooling with Poolboy

defmodule MyApp.PythonPool do
  use Supervisor

  @pool_name :python_pool

  def start_link(opts) do
    Supervisor.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def init(_opts) do
    pool_config = [
      name: {:local, @pool_name},
      worker_module: MyApp.PythonWorker,
      size: System.schedulers_online(),
      max_overflow: 2
    ]

    children = [
      :poolboy.child_spec(@pool_name, pool_config, [])
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end

  def with_session(fun, timeout \\ 30_000) do
    :poolboy.transaction(
      @pool_name,
      fn worker -> GenServer.call(worker, {:execute, fun}, timeout) end,
      timeout
    )
  end
end

defmodule MyApp.PythonWorker do
  use GenServer

  def start_link(_opts) do
    GenServer.start_link(__MODULE__, [])
  end

  def init(_) do
    session = Pythonx.start_session()

    # Pre-load common imports
    Pythonx.exec(session, """
      import numpy as np
      import json
      """)

    {:ok, %{session: session}}
  end

  def handle_call({:execute, fun}, _from, %{session: session} = state) do
    result = fun.(session)
    {:reply, result, state}
  end

  def terminate(_reason, %{session: session}) do
    Pythonx.stop_session(session)
  end
end

Then calling into the pool is one line:poolboy-note

result = MyApp.PythonPool.with_session(fn session ->
  Pythonx.eval(session, "np.random.rand(10).tolist()")
end)

The pool size of System.schedulers_online/0 is a reasonable default. Python work happens on dirty schedulers; having more Python sessions than CPU cores can actually hurt throughput due to context switching.dirty-scheduler-saturation

Performance Benchmarks

In my testing with a simple transformer model, here's what to expect:

  • Session initialization: 50-100ms (without model loading)
  • Model loading (distilbert): 2-4 secondsdistilbert-size
  • Single inference: 20-50ms for short texts
  • Data conversion overhead: ~1ms for tensors under 1MB

Amortize model loading by keeping sessions alive; pool sessions if you need concurrent inference. The per-request overhead of Pythonx itself is negligible compared to actual ML computation time.

Production Considerations

A few things I've learned running Pythonx in production:

Memory management matters. Each Python session consumes memory for the interpreter plus any loaded models. A distilbert model is ~250MB; five pooled sessions with that model loaded means 1.25GB of RAM just for inference. Monitor memory usage and size your pools to match.

Error handling needs care. Python exceptions become Elixir error tuples; the messages can be cryptic enough that you'll want the full Python traceback logged, not just the tuple. Wrap Pythonx calls in try/rescue blocks to capture it.

Cold starts are real.python-cold-start If you're running on Lambda or similar serverless infrastructure, the combination of BEAM startup plus Python environment initialization plus model loading can push you past timeout limits. Consider keeping instances warm or using provisioned concurrency.

Version pinning is non-negotiable. Python's ecosystem is notoriously fragile around version compatibility. Pin your package versions explicitly in your Pythonx configuration; test upgrades in staging before production.

When to Move Beyond Interop

As the Elixir ML ecosystem matures, consider migrating critical paths to native implementations. Bumblebee already supports many popular models; Nx's defn compiler can generate GPU-accelerated code that rivals PyTorch performance.

Use Pythonx to validate approaches quickly and to access capabilities that don't exist in Elixir yet. But keep an eye on the native ecosystem. The gap is closing faster than most people realize.

What makes Pythonx worth reaching for is the transition path. Start with Python for everything, then migrate hot paths to native Elixir as the tooling matures. Your Python code doesn't have to be rewritten all at once—it can coexist with native Elixir ML code indefinitely.

Bumblebee already runs distilbert natively, no Python required. The real question is which model your team depends on crosses that line next — and whether you'll notice when it does.


What do you think of what I said?

Share with me your thoughts. You can tweet me at @allanmacgregor.

Further reading