Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,21 +80,23 @@ granite-switch/

## Installation (local/dev)

This project uses [uv](https://docs.astral.sh/uv/getting-started/installation/).

```bash
# Core package only (config)
pip install -e .
uv sync

# With HuggingFace backend
pip install -e ".[hf]"
uv sync --extra hf

# With vLLM backend
pip install -e ".[vllm]"
uv sync --extra vllm

# With compose tools
pip install -e ".[compose]"
uv sync --extra compose

# Everything (development)
pip install -e ".[dev]"
uv sync --extra dev
```

## Import Paths
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ Thank you for your interest in contributing to Granite Switch!
```bash
git clone https://github.com/<your-username>/granite-switch.git
cd granite-switch
pip install -e ".[dev]"
uv sync --extra dev
```
3. Create a feature branch and make your changes
4. Run tests: `pytest tests/ -v`
4. Run tests: `uv run pytest tests/ -v`
5. Submit a pull request

## Contribution Guidelines
Expand Down
28 changes: 18 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Granite Switch — Build AI models like you build software

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

| [**Browse Adapters**](https://huggingface.co/collections/ibm-granite/granite-libraries) | [Models on HF](https://huggingface.co/ibm-granite/granite-switch-4.1-8b-preview) | [Tutorials](tutorials/README.md) |

Expand All @@ -23,21 +23,21 @@ Browse available libraries in the [Granite Libraries collection](https://hugging
```bash
python -m venv venv && source venv/bin/activate

# Granite-Switch installation is based on your usecase:
pip install "granite-switch[compose]" # Compose modular models
pip install "granite-switch[hf]" # HuggingFace inference
pip install "granite-switch[vllm]" # vLLM production inference (0.19.x)
pip install "granite-switch[vllm20]" # vLLM 0.20+ (requires CUDA 13+)
pip install "granite-switch[dev]" # Everything (uses vLLM 0.19.x by default)
pip install "granite-switch[dev-vllm20]" # Dev environment with vLLM 0.20+
# Install the extra for your use case:
pip install "granite-switch[compose]" # Compose modular models
pip install "granite-switch[hf]" # HuggingFace inference
pip install "granite-switch[vllm]" # vLLM inference (CUDA 12.x)
pip install "granite-switch[vllm20]" # vLLM 0.20+ (CUDA 13+)
```

Requires Python 3.9+ and PyTorch 2.0+.
Requires Python 3.10+ and PyTorch 2.0+.

> **vLLM version note:** This project currently defaults to vLLM 0.19.1 due to vLLM 0.20's
> dependency on CUDA 13.0+ (via PyTorch 2.11), which is incompatible with many existing
> environments running CUDA 12.x drivers. Use `.[vllm20]` if your environment supports CUDA 13+.

> **Contributing?** See [CONTRIBUTING.md](CONTRIBUTING.md) for the dev environment setup with `uv`.

### Compose a Model

Compose a base Granite model with adapter libraries into a single deployable checkpoint:
Expand All @@ -62,10 +62,18 @@ For convenience, you can find already composed Granite Switch models for the Gra

### Run Inference

> **Tip: pre-download the model for faster startup.** The first run will download several GB from Hugging Face, which can be slow. To download in advance using the fast transfer backend:
> ```bash
> uv pip install huggingface_hub[hf_transfer]
> huggingface-cli login # one-time, if not already logged in
> HF_HUB_ENABLE_HF_TRANSFER=1 hf download ibm-granite/granite-switch-4.1-3b-preview
> ```
> Subsequent runs will use the local cache automatically.

**vLLM + Mellea (recommended):**

```bash
pip install mellea
uv pip install mellea
# Example with the 3B model
python -m vllm.entrypoints.openai.api_server --model ibm-granite/granite-switch-4.1-3b-preview --port 8000
```
Expand Down
2 changes: 1 addition & 1 deletion docs/GIT_WORKFLOW.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Fixes #123

Before committing:

1. **Run tests**: `pytest tests/ -v`
1. **Run tests**: `uv run pytest tests/ -v`
2. **Check comments match code** — stale comments are worse than no comments
3. **Update docs** if behavior changed

Expand Down
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,14 @@ conflicts = [
{ extra = "dev-vllm20" },
{ extra = "vllm" },
],
[
{ extra = "tutorials" },
{ extra = "vllm20" },
],
[
{ extra = "tutorials" },
{ extra = "dev-vllm20" },
],
]

[tool.setuptools.packages.find]
Expand Down
16 changes: 10 additions & 6 deletions tutorials/PREREQUISITES.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,30 +19,34 @@ Python 3.10+ is required.

### Base Installation

Install [uv](https://docs.astral.sh/uv/getting-started/installation/), then:

```bash
pip install granite-switch
git clone https://github.com/generative-computing/granite-switch.git
cd granite-switch
uv sync
```

### HuggingFace Backend

For direct model inference with HuggingFace Transformers:

```bash
pip install "granite-switch[hf,compose]"
uv sync --extra hf
```

This includes:
- `transformers` for model loading and generation
- `torch` with CUDA support
- `peft` for LoRA operations
- Compose tools for model building

### vLLM Backend

For production inference with vLLM:

```bash
pip install "granite-switch[vllm]"
uv sync --extra vllm # CUDA 12.x
uv sync --extra vllm20 # CUDA 13+ (requires PyTorch 2.11+)
```

This includes:
Expand All @@ -54,15 +58,15 @@ This includes:
Mellea provides high-level intrinsic functions for adapter invocation:

```bash
pip install mellea
uv pip install mellea
```

### Notebook Dependencies

For running Jupyter notebooks:

```bash
pip install jupyter chromadb tqdm httpx python-dotenv
uv pip install jupyter chromadb tqdm httpx python-dotenv
```

## Model Access
Expand Down
4 changes: 2 additions & 2 deletions tutorials/guides/compare_inference_throughput.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ The notebook runs both servers sequentially on a single A100 GPU and produces
- Two GPUs (one per server) for simultaneous mode, or one GPU for sequential mode
- Install dependencies:
```bash
pip install -e ".[vllm]"
pip install mellea chromadb rich tqdm transformers httpx
uv sync --extra vllm
uv pip install mellea chromadb rich tqdm transformers httpx
```
- Build the ChromaDB index (once):
```bash
Expand Down
2 changes: 1 addition & 1 deletion tutorials/guides/mellea_with_granite_switch.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ See [PREREQUISITES.md](../PREREQUISITES.md) for detailed setup instructions.

```bash
# Install Mellea from source
pip install "git+https://github.com/generative-computing/mellea.git@main"
uv pip install "git+https://github.com/generative-computing/mellea.git@main"
```

## Quick Example
Expand Down
Loading