swiftide

Table of Contents - [About The Project](#about-the-project) - [Example](#example) - [Features](#features) - [Vision](#vision) - [Getting Started](#getting-started) - [Prerequisites](#prerequisites) - [Installation](#installation) - [Usage and concepts](#usage-and-concepts) - [Roadmap](#roadmap) - [Contributing](#contributing) - [License](#license)

CI Coverage Status Crate Badge Docs Badge Contributors Stargazers MIT License LinkedIn


Logo

Swiftide

Blazing fast data pipelines for Retrieval Augmented Generation written in Rust
Explore the docs »

API Docs · Report Bug · Request Feature

About The Project

Swiftide is a straightforward, easy-to-use, easy-to-extend asynchronous data ingestion and processing library. It is designed to be used in a RAG (Retrieval Augmented Generation) system. It is built to be fast and efficient, with a focus on parallel processing and asynchronous operations.

While working with other Python-based tooling, frustrations arose around performance, stability, and ease of use. Thus, Swiftide was born. Ingestion performance went from multiple tens of minutes to a few seconds.

Part of the bosun.ai project. An upcoming platform for autonomous code improvement.

We <3 feedback: project ideas, suggestions, and complaints are very welcome. Feel free to open an issue.

(back to top)

Example

IngestionPipeline::from_loader(FileLoader::new(".").with_extensions(&["rs"]))
        .filter_cached(Redis::try_from_url(
            redis_url,
            "swiftide-examples",
        )?)
        .then(MetadataQACode::new(openai_client.clone()))
        .then_chunk(ChunkCode::try_for_language_and_chunk_size(
            "rust",
            10..2048,
        )?)
        .then_in_batch(10, Embed::new(openai_client.clone()))
        .then_store_with(
            Qdrant::try_from_url(qdrant_url)?
                .batch_size(50)
                .vector_size(1536)
                .collection_name("swiftide-examples".to_string())
                .build()?,
        )
        .run()
        .await?;

(back to top)

Features

(back to top)

Vision

Our goal is to create an extremely fast, extendable platform for data ingestion and querying to further the development of automated LLM applications, with an easy-to-use and easy-to-extend api.

(back to top)

Getting Started

Prerequisites

Make sure you have the rust toolchain installed. rustup Is the recommended approach.

To use OpenAI, an API key is required. Note that by default async_openai uses the OPENAI_API_KEY environment variables.

Other integrations will need to be installed accordingly.

Installation

  1. Set up a new Rust project
  2. Add swiftide
    cargo add swiftide
    
  3. Enable the features of integrations you would like to have or use ‘all’ in your Cargo.toml
  4. Write a pipeline (see our examples and documentation)

(back to top)

Usage and concepts

Before building your stream, you need to enable and configure any integrations required. See /examples.

A stream starts with a Loader that emits IngestionNodes. For instance, with the Fileloader each file is a Node.

You can then slice and dice, augment, and filter nodes. Each different kind of step in the pipeline requires different traits. This enables extension.

IngestionNodes have a path, chunk and metadata. Currently metadata is copied over when chunking and always embedded when using the OpenAIEmbed transformer.

Additionally, several generic transformers are implemented. They take implementers of SimplePrompt and EmbedModel to do their things.

[!NOTE] No integrations are enabled by default as some are code heavy. Either cherry-pick the integrations you need or use the “all” feature flag.

[!WARNING] Due to the performance, chunking before adding metadata gives rate limit errors on OpenAI very fast, especially with faster models like 3.5-turbo. Be aware.

For more examples, please refer to /examples and the Documentation

(back to top)

Roadmap

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Swiftide is in a very early stage and we are aware that we do lack features for the wider community. Contributions are very welcome. :tada:

If you have a great idea, please fork the repo and create a pull request. You can also simply open an issue with the tag “enhancement”. Don’t forget to give the project a star! Thanks again!

If you just want to contribute (bless you!), see our issues.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'feat: Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)