Skip to main content

Retrieval-Augmented Generation (RAG) with Pinecone

In this tutorial, you'll build a pipeline with Dagster that:

  • Loads data from GitHub and Documentation site
  • Translates the data into embeddings and tags metadata
  • Stores the data in a vector database
  • Retrieves relevant information to answer ad hoc questions
Prerequisites

To follow the steps in this guide, you'll need:

  • Basic Python knowledge
  • Python 3.9+ installed on your system. Refer to the Installation guide for information.

Step 1: Set up your Dagster environment

First, set up a new Dagster project.

  1. Clone the Dagster repo and navigate to the project:

    cd examples/project_ask_ai_dagster
  2. Create and activate a virtual environment:

    uv venv dagster_tutorial
    source dagster_tutorial/bin/activate
  3. Install Dagster and the required dependencies:

    uv pip install -e ".[dev]"

Step 2: Launch the Dagster webserver

To make sure Dagster and its dependencies were installed correctly, navigate to the project root directory and start the Dagster webserver:

dagster dev

Next steps

  • Continue this tutorial with sources