Lance integration for Daft.
# Install just the daft-lance extension
pip install daft-lance
# Install daft with the daft-lance extension
pip install 'daft[lance]'
from daft_lance import compact_files
compact_files("s3://bucket/my_dataset")from daft_lance import create_scalar_index
create_scalar_index("s3://bucket/my_dataset", column="name", index_type="INVERTED")from daft_lance import merge_columns_df
merge_columns_df(df, "s3://bucket/my_dataset")The migration only requires replacing daft.io.lance with daft_lance.
# See changes in current directory and all subdirectories
find . -type f -name "*.py" -exec sed 's/daft\.io\.lance/daft_lance/g' {} +
# Apply the changes
find . -type f -name "*.py" -exec sed -i 's/daft\.io\.lance/daft_lance/g' {} +The daft_lance extension supports Lance BLOB V2 by reading descriptors
into the following daft datatype. Note that daft.read_lance will NOT
materialize Lance BLOB V2 bytes.
{
kind: uint8,
position: uint64,
size: uint64,
blob_id: uint32,
blob_uri: string,
}
To materialize blobs, read the dataset with row IDs enabled and call take_blobs:
import lance
import daft
from daft_lance import take_blobs
ds = lance.dataset("s3://bucket/my_dataset")
df = daft.read_lance(ds.uri, default_scan_options={"with_row_id": True})
df = take_blobs(df, ds, "blob_column")
# each value is a lance.Blob — call .read() to fetch bytes
blobs = df.select("blob_column").to_pydict()["blob_column"]
data = blobs[0].read()To write binary columns as Lance Blob V2, use the blob_columns opt-in:
import daft
df = daft.from_pydict({"id": [1, 2, 3], "data": [b"...", b"...", b"..."]})
df.write_lance("s3://bucket/my_dataset", blob_columns=["data"]).collect()Requires uv.
uv sync
uv run pytest tests/ -v
Apache-2.0