Embedding Python in Elixir with Pythonx
Python interpreter integration and ML model inference in Elixir
The Elixir ecosystem has a machine learning problem, and it's one of breadth rather than capability. Nx, Axon, and Bumblebee have proven Elixir can do serious numerical computing; the libraries are what's missing. Python has decades of battle-tested libraries, pre-trained models, and institutional knowledge baked into packages like scikit-learn, transformers, and countless domain-specific tools; rewriting all of that in Elixir would take years we don't have.
Pythonx solves this by embedding a Python interpreter directly into your BEAM process. There are no ports to manage, no NIFs with manual memory management, and no microservices adding network latency — just Python running inside Elixir, with data flowing between them.
When Python Interop Makes Sense (And When It Doesn't)
Before reaching for Pythonx, ask yourself: do I actually need Python?
The Elixir ML ecosystem has come a long way. Nx provides efficient tensor operations with GPU acceleration; Bumblebee offers pre-trained transformer models that run natively on the BEAM; Axon handles neural network training.nx-defn-compiler For many use cases—text embeddings, image classification, basic NLP—you can stay entirely within Elixir.
But Python interop pulls its weight in specific situations:
-
Legacy model integration. Your data science team has spent months fine-tuning a PyTorch model. Retraining it in Axon means duplicating work and risking subtle behavioral differences. Pythonx lets you load that exact
.ptfile and run inference with identical results. -
Ecosystem gaps. Need to run a specific algorithm from scikit-learn? Want to use a niche library like
prophetfor time series forecasting? Some tools don't have Elixir equivalents yet. - Rapid prototyping. You're exploring whether an ML approach works for your problem. Python's interactive ecosystem—Jupyter, pandas, matplotlib—accelerates experimentation. Once you've validated the approach, you can decide whether to port it to native Elixir or keep running it via Pythonx.
Still, don't use Pythonx as a crutch. If you're just doing basic data manipulation, Elixir's Enum and Stream modules are more than capable. If you need matrix operations, reach for Nx first. Python interop adds complexity; only accept that complexity when the benefit is clear.
Pythonx: What It Is and Where It Comes From
Pythonx emerged from the Livebook project at Dashbit.livebook-lineage If you've used Livebook's Python integration, you've already used the technology underlying Pythonx. The library embeds a Python interpreter using Erlang's NIF interface, but abstracts away the gnarly details of memory management and type conversion.
What makes the architecture work: Python's GIL (Global Interpreter Lock) and the BEAM's concurrency model can coexist if you're careful about thread coordination.gil-beam-coexistence Pythonx handles the careful part for you.
Add it to your dependencies:
def deps do
[
{:pythonx, "~> 0.3"}
]
end
Pythonx will download and manage a Python distribution for you, or you can point it at an existing installation.
Setting Up the Python Environment
Pythonx needs to know which Python packages to make available. Two options: let Pythonx manage everything, or use your existing Python environment.
Managed Environment (Recommended for Production)
In your config/config.exs:
config :pythonx, :uv_init,
packages: [
"numpy",
"torch",
"transformers",
"scikit-learn"
],
python_version: "3.11"
Then initialize during application startup:
defmodule MyApp.Application do
use Application
def start(_type, _args) do
Pythonx.uv_init()
children = [
# ... your supervision tree
]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
end
The uv_init/0 function uses uv—the fast Python package installer—to create an isolated environment with your specified packages.uv-astral This environment lives in your project's _build directory and is reproducible across deployments.
Using an Existing Python Environment
If your data science team maintains their own conda or virtualenv environment, you can point Pythonx at it:conda-virtualenv
config :pythonx, :python_path, "/path/to/your/venv/bin/python"
This approach gives you flexibility but requires you to manage Python dependencies outside of your Elixir build process. I prefer the managed approach for applications where I control the full stack; the external approach works well when integrating with an existing ML pipeline.
Calling Python Code from Elixir
You write Python code as a string, execute it, and get results back as Elixir terms. That's the whole API surface.
Synchronous Execution
defmodule MyApp.PythonMath do
def calculate_statistics(numbers) do
# Convert Elixir list to Python code
python_code = """
import numpy as np
data = np.array(#{inspect(numbers)})
result = {
'mean': float(np.mean(data)),
'std': float(np.std(data)),
'median': float(np.median(data))
}
result
"""
case Pythonx.eval(python_code) do
{:ok, result} -> {:ok, result}
{:error, reason} -> {:error, reason}
end
end
end
Pythonx.eval/1 executes the Python code and returns the last expression. Python dictionaries become Elixir maps; lists become lists; primitives map to their obvious counterparts.
Working with Python State
For anything beyond one-shot evaluation, you'll want to maintain state across multiple Python calls. Pythonx provides a session-based API:
defmodule MyApp.ModelSession do
def run_pipeline do
Pythonx.session(fn session ->
# Load the model once
session
|> Pythonx.exec("""
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
""")
# Run multiple inferences
texts = ["I love this product", "This is terrible", "Meh, it's okay"]
results = Enum.map(texts, fn text ->
{:ok, result} = Pythonx.eval(session, """
classifier('#{text}')[0]
""")
result
end)
results
end)
end
end
The session keeps the Python interpreter state alive across multiple exec and eval calls. This matters for ML workflows where loading a model takes seconds but inference takes milliseconds; you don't want to pay that startup cost on every request.
Asynchronous Execution
Python code can block the BEAM scheduler if it runs long computations. For CPU-intensive Python operations, use the async API:
defmodule MyApp.AsyncInference do
def predict_async(input_data) do
task = Pythonx.async_eval("""
import time
import json
# Simulate expensive computation
time.sleep(2)
data = json.loads('#{Jason.encode!(input_data)}')
prediction = sum(data['features']) * 0.5 # Dummy model
prediction
""")
# Do other work while Python computes
Logger.info("Prediction submitted, doing other work...")
# Wait for result when needed
case Pythonx.await(task, 10_000) do
{:ok, prediction} -> {:ok, prediction}
{:error, :timeout} -> {:error, "Prediction timed out"}
end
end
end
Under the hood, async_eval spawns the Python work on a dirty scheduler, preventing it from blocking regular BEAM processes. The timeout in await/2 is in milliseconds.
ML Model Integration: A Complete Example
A sentiment analysis service using a Hugging Face transformer model—this is the pattern I reach for most often in production.
defmodule MyApp.SentimentAnalyzer do
use GenServer
@model_name "distilbert-base-uncased-finetuned-sst-2-english"
def start_link(opts) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
end
def analyze(text) when is_binary(text) do
GenServer.call(__MODULE__, {:analyze, text}, 30_000)
end
# GenServer Callbacks
@impl true
def init(_opts) do
# Initialize Python session with model loaded
session = Pythonx.start_session()
# Load model at startup (this takes a few seconds)
Pythonx.exec(session, """
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained('#{@model_name}')
tokenizer = AutoTokenizer.from_pretrained('#{@model_name}')
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
def analyze_sentiment(text):
result = classifier(text)[0]
return {
'label': result['label'],
'score': float(result['score'])
}
""")
{:ok, %{session: session}}
end
@impl true
def handle_call({:analyze, text}, _from, %{session: session} = state) do
# Escape the text for safe Python string embedding
escaped_text = String.replace(text, "'", "\\'")
result = Pythonx.eval(session, """
analyze_sentiment('#{escaped_text}')
""")
case result do
{:ok, analysis} ->
{:reply, {:ok, analysis}, state}
{:error, reason} ->
{:reply, {:error, reason}, state}
end
end
@impl true
def terminate(_reason, %{session: session}) do
Pythonx.stop_session(session)
end
end
Three things matter here.string-interpolation-risk The model loads once during init/1, amortizing the multi-second load time across all subsequent requests. The GenServer serializes access to the Python session—which is fine, because Python's GIL means only one thread can execute Python bytecode at a time anyway. And the terminate/2 callback ensures we clean up the Python session properly.
Data Conversion: Bridging Elixir and Python
The most friction in Python interop comes from data conversion. Pythonx handles primitives automatically, but tensors and complex data structures need explicit handling.
Working with Nx Tensors
Nx tensors and NumPy arrays are both contiguous blocks of memory with shape and dtype metadata; converting between them should be efficient.
defmodule MyApp.TensorBridge do
@doc """
Convert an Nx tensor to a NumPy array via shared memory.
"""
def nx_to_numpy(session, tensor, var_name) do
# Get tensor as binary
binary = Nx.to_binary(tensor)
shape = Nx.shape(tensor) |> Tuple.to_list()
dtype = nx_type_to_numpy(Nx.type(tensor))
# Encode binary as base64 for safe transport
encoded = Base.encode64(binary)
Pythonx.exec(session, """
import numpy as np
import base64
_binary = base64.b64decode('#{encoded}')
#{var_name} = np.frombuffer(_binary, dtype=np.#{dtype}).reshape(#{inspect(shape)})
""")
end
@doc """
Convert a NumPy array back to an Nx tensor.
"""
def numpy_to_nx(session, var_name) do
{:ok, result} = Pythonx.eval(session, """
import base64
_arr = #{var_name}
{
'data': base64.b64encode(_arr.tobytes()).decode('utf-8'),
'shape': list(_arr.shape),
'dtype': str(_arr.dtype)
}
""")
binary = Base.decode64!(result["data"])
shape = List.to_tuple(result["shape"])
type = numpy_type_to_nx(result["dtype"])
Nx.from_binary(binary, type) |> Nx.reshape(shape)
end
defp nx_type_to_numpy({:f, 32}), do: "float32"
defp nx_type_to_numpy({:f, 64}), do: "float64"
defp nx_type_to_numpy({:s, 32}), do: "int32"
defp nx_type_to_numpy({:s, 64}), do: "int64"
defp nx_type_to_numpy({:u, 8}), do: "uint8"
defp numpy_type_to_nx("float32"), do: {:f, 32}
defp numpy_type_to_nx("float64"), do: {:f, 64}
defp numpy_type_to_nx("int32"), do: {:s, 32}
defp numpy_type_to_nx("int64"), do: {:s, 64}
defp numpy_type_to_nx("uint8"), do: {:u, 8}
end
The base64 encoding adds overhead, but it's the safest way to pass binary data through string interpolation.base64-overhead For very large tensors in performance-critical paths, Pythonx offers lower-level APIs that can pass binary data directly.
Performance Considerations and Pooling
A single Python session can only execute one piece of code at a time. If your application needs to handle concurrent ML inference requests, you have two options: accept the serialization, or pool sessions.
Session Pooling with Poolboy
defmodule MyApp.PythonPool do
use Supervisor
@pool_name :python_pool
def start_link(opts) do
Supervisor.start_link(__MODULE__, opts, name: __MODULE__)
end
def init(_opts) do
pool_config = [
name: {:local, @pool_name},
worker_module: MyApp.PythonWorker,
size: System.schedulers_online(),
max_overflow: 2
]
children = [
:poolboy.child_spec(@pool_name, pool_config, [])
]
Supervisor.init(children, strategy: :one_for_one)
end
def with_session(fun, timeout \\ 30_000) do
:poolboy.transaction(
@pool_name,
fn worker -> GenServer.call(worker, {:execute, fun}, timeout) end,
timeout
)
end
end
defmodule MyApp.PythonWorker do
use GenServer
def start_link(_opts) do
GenServer.start_link(__MODULE__, [])
end
def init(_) do
session = Pythonx.start_session()
# Pre-load common imports
Pythonx.exec(session, """
import numpy as np
import json
""")
{:ok, %{session: session}}
end
def handle_call({:execute, fun}, _from, %{session: session} = state) do
result = fun.(session)
{:reply, result, state}
end
def terminate(_reason, %{session: session}) do
Pythonx.stop_session(session)
end
end
Then calling into the pool is one line:poolboy-note
result = MyApp.PythonPool.with_session(fn session ->
Pythonx.eval(session, "np.random.rand(10).tolist()")
end)
The pool size of System.schedulers_online/0 is a reasonable default. Python work happens on dirty schedulers; having more Python sessions than CPU cores can actually hurt throughput due to context switching.dirty-scheduler-saturation
Performance Benchmarks
In my testing with a simple transformer model, here's what to expect:
- Session initialization: 50-100ms (without model loading)
- Model loading (distilbert): 2-4 secondsdistilbert-size
- Single inference: 20-50ms for short texts
- Data conversion overhead: ~1ms for tensors under 1MB
Amortize model loading by keeping sessions alive; pool sessions if you need concurrent inference. The per-request overhead of Pythonx itself is negligible compared to actual ML computation time.
Production Considerations
A few things I've learned running Pythonx in production:
Memory management matters. Each Python session consumes memory for the interpreter plus any loaded models. A distilbert model is ~250MB; five pooled sessions with that model loaded means 1.25GB of RAM just for inference. Monitor memory usage and size your pools to match.
Error handling needs care. Python exceptions become Elixir error tuples; the messages can be cryptic enough that you'll want the full Python traceback logged, not just the tuple. Wrap Pythonx calls in try/rescue blocks to capture it.
Cold starts are real.python-cold-start If you're running on Lambda or similar serverless infrastructure, the combination of BEAM startup plus Python environment initialization plus model loading can push you past timeout limits. Consider keeping instances warm or using provisioned concurrency.
Version pinning is non-negotiable. Python's ecosystem is notoriously fragile around version compatibility. Pin your package versions explicitly in your Pythonx configuration; test upgrades in staging before production.
When to Move Beyond Interop
As the Elixir ML ecosystem matures, consider migrating critical paths to native implementations. Bumblebee already supports many popular models; Nx's defn compiler can generate GPU-accelerated code that rivals PyTorch performance.
Use Pythonx to validate approaches quickly and to access capabilities that don't exist in Elixir yet. But keep an eye on the native ecosystem. The gap is closing faster than most people realize.
What makes Pythonx worth reaching for is the transition path. Start with Python for everything, then migrate hot paths to native Elixir as the tooling matures. Your Python code doesn't have to be rewritten all at once—it can coexist with native Elixir ML code indefinitely.
Bumblebee already runs distilbert natively, no Python required. The real question is which model your team depends on crosses that line next — and whether you'll notice when it does.