# 2.3. Ollama y prototipo

## 2.3. Instalación de infraestructura: Elige tu camino

Para ejecutar Llama, necesitamos un "motor". \
En este taller ofrecemos dos rutas de implementación. Ambas son válidas, pero atienden a perfiles distintos.

{% hint style="warning" %}
**CRITERIO DE DECISIÓN**

* Elige **Ruta A** si quieres **privacidad total** y tienes una laptop moderna (M1/M2/M3 o Windows con GPU).
* Elige **Ruta B** si tu equipo es antiguo o corporativo (bloqueado) y **priorizas la velocidad**.
  {% endhint %}

***

{% tabs %}
{% tab title="🛡️ Ruta A: Local (soberanía)" %}

#### Herramienta: LM Studio / Ollama

Esta ruta descarga el "cerebro" a tu disco duro. Funciona sin internet.

**Paso 1: Instalación del motor**

1. Descarga **LM Studio** desde su sitio oficial.
2. Instala y ejecuta la aplicación.

**Paso 2: Adquisición del modelo (Llama 3.2)**

1. En la barra de búsqueda (lupa), escribe: `llama 3.2 3b`.
2. Selecciona la opción **"Quantization q4\_k\_m"** (Balance ideal velocidad/calidad).
3. Clic en **Download**.

**Paso 3: Inferencia**

1. Ve a la pestaña de Chat (burbuja).
2. Selecciona el modelo cargado arriba al centro.
3. **Prueba de fuego:** Desconecta tu WiFi y escribe: *"Hola, ¿estás funcionando?"*

> **Resultado:** Tienes una IA privada, soberana y gratuita de por vida.
> {% endtab %}

{% tab title="⚡ Ruta B: Nube (velocidad)" %}

#### Herramienta: Groq (API)

Esta ruta utiliza servidores externos ultra-rápidos. Ideal para prototipar agentes complejos.

**Paso 1: Acceso**

1. Ingresa a la consola de **Groq**.
2. Crea una cuenta gratuita.

**Paso 2: Selección de modelo**

1. En el menú desplegable, selecciona `llama-3.2-70b-versatile` (Modelo de mayor capacidad).
2. Ajusta la "Temperatura" a `0` (para tareas precisas) o `0.7` (para creatividad).

**Paso 3: Inferencia**

Escribe tu prompt en el área de System y User.

> **Resultado:** Velocidad de 300 tokens/segundo. Ideal para procesar documentos largos en segundos.
> {% endtab %}
> {% endtabs %}

<br>

#### Validación de Instalación

Antes de pasar al siguiente módulo, asegúrate de cumplir este checklist:

* [ ] Tienes acceso a un modelo Llama (local o nube).
* [ ] Has enviado un "Hola Mundo" y recibido respuesta.
* [ ] Entientes la diferencia entre los parámetros de **temperatura** (creatividad) y **max tokens** (longitud).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://incmty.gitbook.io/llama/sesion-2/2.3.-ollama-y-prototipo.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.