# Building gem2claude: A 48-Hour CLOUDEATHON API Bridge That Saved Me $100/Month

**Author:** kelexine  
**Date:** 2026-01-18  
**Category:** Engineering  
**Tags:** Rust, Reverse Engineering, Gemini, Claude, API, Open Source, Systems Programming  
**URL:** https://kelexine.is-a.dev/blog/building-gem2claude

---

## Introduction

The developer tools landscape is shifting rapidly. Since the release of **Claude Code**, Anthropic set a new standard for agentic coding assistance. **Claude Code** is a standalone terminal utility that acts as a semi-autonomous developer. It reads your files, understands your project structure and runs terminal commands.

However, there's a significant catch: **Cost**.

Running an agentic loop that reads files, "thinks" about architecture, and iteratively applies edits consumes a massive number of tokens. A single "refactor this module" command might read 50 files, generating tens of thousands of input tokens. For heavy daily usage, this can easily surpass **$100/month** in API credits.

Here was my dilemma: I already had a **Google Pro plan** from last year, which gave me access to Google's flagship **Gemini 3 series** models (competitors to the powerful **Claude 4.5 series**). Buying a paid Claude plan or API subscription on top of that would mean paying twice. I wanted the superior *developer experience* of the Claude Code CLI, but powered by the *compute* I was already paying for.

Meanwhile, these **Gemini 3 series** models are accessible via **Gemini Code Assist** (often included in Pro plans or free tiers for individuals), and feature a massive 1M+ token context window, perfect for digesting large codebases without needing RAG (Retrieval-Augmented Generation).

This led to an obvious question: **Can we make the Claude Code CLI talk to Gemini?**

The answer is yes. There were other tools attempting this, but many were bloated, slow, or required complex configuration. I wanted something targeted, efficient, and "plug-and-play." So I built **gem2claude**.

In this post, I'll take you through the 48-hour sprint of building a high-performance proxy server in **Rust** that bridges these two ecosystems. We'll dive deep into reverse-engineering internal APIs, translating server-sent events (SSE) on the fly, managing OAuth lifecycles in a thread-safe way, and building a system that feels completely native.

## The Architecture

At its core, `gem2claude` is a translation layer. It pretends to be the Anthropic API server, intercepting requests from the Claude Code CLI, translating them into Gemini's format, and streaming the results back.

### High-Level Data Flow

```mermaid
sequenceDiagram
    participant CLI as Claude Code CLI
    participant G2C as gem2claude (Proxy)
    participant GEM as Gemini API (Google)

    Note over CLI, G2C: Anthropic Protocol (JSON/SSE)
    
    CLI->>G2C: POST /v1/messages
    Note right of CLI: Includes system prompt, tools, previous turn
    
    G2C->>G2C: Translate Request Model
    G2C->>G2C: Check/Create Cache
    
    Note over G2C, GEM: Gemini Protocol (Protobuf/JSON)

    G2C->>GEM: POST /v1internal/...:streamGenerateContent
    Note right of G2C: Gemini Format (Content/Parts)
    
    loop Streaming Response
        GEM-->>G2C: Server-Sent Event (Chunk)
        G2C->>G2C: Parse & Translate to "content_block_delta"
        G2C-->>CLI: Server-Sent Event (Anthropic Format)
    end
    
    G2C-->>CLI: [DONE]
```

### Why Rust?

I chose Rust for this project for three key reasons, which proved crucial as the complexity grew:

1.  **Type Safety for Protocol Translation**: Mapping complex, deeply nested JSON structures between two different AI providers is error-prone. One wrong field type can cause the CLI to crash. Rust's `serde` framework and strong type system made it easy to define exact schemas for both Anthropic and Gemini APIs, catching validation errors at compile time rather than runtime.
2.  **Concurrency with Tokio**: I needed to handle multiple streaming connections simultaneously with low latency. The `gem2claude` proxy often needs to maintain a connection to Gemini while simultaneously parsing chunks and keeping the downstream CLI connection alive. Tokio and Axum provide a world-class async runtime that handles this effortlessly.
3.  **Performance overhead**: The bridge adds a hop in the network path. Rust ensures this overhead is negligible (often sub-millisecond). When you're waiting for an LLM to think, you don't want your proxy adding 500ms of lag.

## Reverse Engineering the "Code Assist" API

One of the biggest challenges was that I didn't just want to use the public Vertex AI or AI Studio endpoints—I wanted to mimic the exact behavior of Google's internal "Code Assist" tools to ensure maximum compatibility and free tier access where applicable.

This involved digging directly into the source code of the official `@google-gemini/gemini-cli` tool.

### The Discovery: gemini-cli Reference

I analyzed the `@google-gemini/gemini-cli` source code (specifically `packages/core/src/code_assist/server.ts`) and found the smoking gun. The CLI doesn't use the public API at all. Instead, it talks to:

```typescript
// packages/core/src/code_assist/server.ts
export const CODE_ASSIST_ENDPOINT = 'https://cloudcode-pa.googleapis.com';
export const CODE_ASSIST_API_VERSION = 'v1internal';
```

My research confirmed that `cloudcode-pa.googleapis.com` is the internal backend powering **Google Cloud Code** plugins for VS Code and IntelliJ. It's designed to support IDE-native AI features, which explains why it supports "Thinking" tags and automatic caching out of the box—features critical for a smooth developer experience.

Critically, the authentication scope isn't the standard Gemini scope. In `packages/core/src/code_assist/oauth2.ts`, I found:

```typescript
// packages/core/src/code_assist/oauth2.ts
export const OAUTH_SCOPE = [
  'https://www.googleapis.com/auth/cloud-platform',
  'https://www.googleapis.com/auth/userinfo.email',
];
```

Using the `cloud-platform` scope allows the token to act with the full privileges of the user's GCP account, bypassing the restrictions of the public `generativelanguage` API keys.

1.  **Automatic Caching**: Unlike the public API where you explicitly create `CachedContent` resources, this internal endpoint handles prefix caching automatically for IDE performance. This meant I could simplify my `CacheManager` to mostly be a pass-through monitoring layer.
2.  **Thinking Tags**: Gemini 3 models output a "thought" field in their JSON response or embedded `<think>` tags, which catches and translates into Claude Code's `thinking_block_delta` format.
3.  **Project ID Resolution**: The CLI performs a specific "handshake" request (`loadCodeAssist`) to resolve the default Google Cloud Project ID. `gem2claude` emulates this exact call:
    ```json
    POST /v1internal:loadCodeAssist
    { "metadata": { "clientType": "GEMINI_CLI" } }
    ```
    This returns the `cloudaicompanionProject` ID needed for all subsequent calls.


## Implementation Deep Dive

Let's look at the critical components that make this bridge work.

### 1. The Stream Translator: Taming the Token Firehose

The heartbeat of the application is the `src/translation/streaming.rs` module. It transforms a stream of Gemini `GenerateContentResponse` chunks into Anthropic `MessageStreamEvent`s.

The hardest part was handling **Stateful Parsing** of "Thinking" blocks.

Gemini might send a stream chunk ending in the middle of a "thinking" tag. Because TCP guarantees order but not boundaries, a chunk might look like this:

**Chunk N:**
```json
{ "text": "I need to analyze <thi" }
```

**Chunk N+1:**
```json
{ "text": "nk> the file structure..." }
```

If we just piped this through blindly, the Claude CLI would display `<thi` and `nk>` as regular text in your terminal. I implemented a state machine in `StreamTranslator` that buffers partial tags and switches modes between `BlockType::Text` and `BlockType::Thinking`.

```rust
// Simplified logic from src/translation/streaming.rs
fn process_text_chunk(&mut self, text: &str) -> Vec<(BlockType, String)> {
    let mut segments = Vec::new();
    let mut full_text = self.thinking_buffer.clone() + text;
    self.thinking_buffer.clear();

    // Logic to detect <think>, </think>, and partial tags like <thi...
    // Switches self.in_thinking state
    // Returns properly segmented blocks
}
```

This ensures that `gem2claude` handles thinking blocks cleanly. I mapped them to Claude's reasoning blocks.

### 2. OAuth Lifecycle Management: Thread-Safety is Key

Initially, authentication credetials where loaded from `~/.gemini/oauth_creds.json`, which needed that the gemini cli be installed and logged in via `OAuth` (A.K.A. `Login with Google`). This was a bit of a hassle since it required the gemini cli to be installed before using `gem2claude`, so I implemented a custom OAuth flow that allowed the user to log in via a browser and store the credentials in `~/.gem2claude/oauth_creds.json`.

Authentication is rarely simple. We needed a robust way to:
1.  Login via a browser (using a local listener callback).
2.  Store credentials securely in `~/.gem2claude/oauth_creds.json`.
3.  Refresh tokens automatically *before* they expire.
4.  Handle concurrency (preventing "thundering herd" where 50 requests try to refresh the token at once).

I implemented a **Double-Checked Locking** pattern in `src/oauth/manager.rs`. This allows extremely fast reads for valid tokens while serializing the refresh logic.

```rust
pub async fn get_token(&self) -> Result<String> {
    // 1. Fast path: Read lock
    {
        let creds = self.credentials.read().await;
        if !creds.is_expired() { return Ok(creds.access_token.clone()); }
    }
    
    // 2. Slow path: Write lock (Mutex) - serialize refreshes
    let _guard = self.refresh_lock.lock().await;
    
    // 3. Re-check (someone else might have refreshed while we wait)
    {
        let creds = self.credentials.read().await;
        if !creds.is_expired() { return Ok(creds.access_token.clone()); }
    }
    
    // 4. Actual Refresh
    self.refresh_token().await
}
```

This ensures thread safety and high performance even under heavy load, such as when Claude is reading multiple files in parallel.

### 3. Improving Resilience with Keep-Alives

Translating SSE isn't just about JSON bodies; it's about network behavior.

Gemini sends events separated by `\r\n\r\n`. Anthropic expects events formatted as:
```text
event: message_delta
data: { ... }

```

The bridge uses `async-stream` to create a generator that yields these formatted strings. A crucial detail I learned the hard way was adding **Keep-Alive Pings**.

If Gemini stops generating for 10 seconds (e.g., doing a heavy computation), the Claude CLI might conclude the connection is dead and time out. I added a `tokio::select!` loop to inject pings if the upstream is silent:

```rust
tokio::select! {
    chunk = gemini_stream.next() => { ... handle chunk ... },
    _ = tokio::time::sleep(Duration::from_secs(15)) => {
        // Keep the connection alive!
        yield Ok("event: ping\ndata: {\"type\": \"ping\"}\n\n".to_string());
    }
}
```

This simple addition made the bridge significantly more stable on complex prompts.

## Robust Rate Limiting and Handling 429s

One of the nuances of using an API (even a generous one like Gemini) is managing rate limits.

While Gemini allows significant throughput, bursts of traffic can trigger transient `429 Too Many Requests` errors.

To handle this gracefully, I implemented a `ModelAvailabilityService`. This service:
1.  **Tracks Model Health**: It monitors every API call. If a call fails with a 429 or 503, it marks that specific model as `StickyRetry` or `Terminal`.
2.  **Circuit Breaking**: If a model consistently fails, the service stops sending it traffic immediately, failing fast rather than hanging the CLI.
3.  **Metrics**: We export `gemini_model_availability` metrics so I can set up alerts.

## Metrics & Observability

You can't optimize what you can't measure. I instrumented the entire application using a `prometheus` registry.

In `src/metrics/registry.rs`, we define counters for:
- `gemini_api_calls`: Usage tracking by model and status code.
- `gemini_token_usage`: Input/Output token counters (crucial for validating the savings).
- `gemini_model_availability`: Tracks if a model is "Healthy".
- `cache_operations`: Hits vs. Misses.

Exposing a `/metrics` endpoint allows me to scrape this data and visualize exactly how much "money" I'm saving.

## Building a Production-Ready Deployment

Because `gem2claude` is a critical part of my workflow, it needs to be reliable. I set up a comprehensive CI/CD pipeline in GitHub Actions to ensure I don't break it with future updates.

## Challenges & Solutions

### The "Malformed Function Call" Issue

Gemini is powerful, but sometimes it gets creative with tool arguments. For instance, if a tool expects an integer `lines` parameter, Gemini might send `"10"` (string) or `10.0` (float). 

The Claude CLI is strict; if the types don't match, it crashes.

**Solution**: I added an interception layer in `StreamTranslator`. If we detect a `MALFORMED_FUNCTION_CALL` from Gemini, we construct a descriptive `invalid_request_error` event. This allows the CLI to potentially recover or at least fail gracefully.

### Image Handling

Claude supports base64 images. Gemini supports... mostly the same, but with specific limits.

I implemented a `Vision` module (`src/vision/`) that:
1.  Sniffs valid MIME types from the magic bytes.
2.  Validates sizes against Gemini's limits.
3.  Transforms the `ContentBlock::Image` into Gemini's `InlineData` format.

This enables `gem2claude` to be fully multimodal. You can paste a screenshot into the CLI, and Gemini sees it.

## The Result: $0 Bill, 100% Productivity

After 48 hours of coding, the bridge is stable.

**Performance stats:**
- **Latency**: The overhead of the proxy is < 2ms locally.
- **Speed**: Gemini generations often outpace the terminal's render speed.
- **Cost**: My Anthropic bill for the last week of heavy coding? **$0.00**.

## How to Run It

I've invested heavily in the developer experience to make this easy for anyone to adopt.

### 1. Build from Source
```bash
git clone https://github.com/kelexine/gem2claude.git
cd gem2claude
cargo build --release
```

### 2. Login
Run the login command. This will pop open your default browser to authenticate with your Google account.
```bash
./target/release/gem2claude --login
```

### 3. Run the Server
```bash
./target/release/gem2claude
# Listening on 127.0.0.1:8080
```

### 4. Configure Claude Code
Tell the CLI to use your local proxy instead of Anthropic's servers. Note that `gem2claude` defaults to port 8080.

```bash
export ANTHROPIC_BASE_URL="http://localhost:8080"
export ANTHROPIC_AUTH_TOKEN="dummy"
claude
# Enjoy free coding!
```

## Conclusion

Building `gem2claude` was a lesson in the importance of open protocols. Because Anthropic and Google both adhere (mostly) to standard HTTP, JSON, and SSE patterns, we can bridge them with a bit of ingenuity and a lot of Rust.

It also highlights the power of **Rust** for developer tooling. The resulting binary is small, uses minimal RAM, and handles everything I throw at it without a stutter. It's the kind of tool you set up once and forget about—until you realize you haven't paid an API bill in months.

If you're tired of rationing your API credits but love the Claude Code workflow, give `gem2claude` a try.

## Credits

This project stands on the shoulders of giants. A special thanks to:
- The **Anthropic team** for building such an incredible CLI tool.
- The **Google Gemini team** for making the gemini cli open source, which led to this project.
- The **Rust community** for maintaining crates like `tokio`, `axum`, and `serde` that make systems programming a joy.
- **[You]**: For reading this far!

---

*Check out the code on [GitHub](https://github.com/kelexine/gem2claude).*

> **NOTE**
>  This tool is not affiliated with Anthropic or Google. It is a personal project built for my own use. so use it at your own risk.

---

*This content is available at [kelexine.is-a.dev/blog/building-gem2claude](https://kelexine.is-a.dev/blog/building-gem2claude)*
