The primary entrypoint to programs leveraging Codegen is the Codebase class.

Local Codebases

Construct a Codebase by passing in a path to a local git repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a .git folder).

from codegen import Codebase
from codegen.sdk.enums import ProgrammingLanguage

# Parse from a git repository root
codebase = Codebase("path/to/repository")

# Parse from a subfolder within a git repository
codebase = Codebase("path/to/repository/src/subfolder")

# Parse from current directory (must be within a git repo)
codebase = Codebase("./")

# Specify programming language (instead of inferring from file extensions)
codebase = Codebase("./", programming_language=ProgrammingLanguage.TYPESCRIPT)

By default, Codegen will automatically infer the programming language of the codebase and parse all files in the codebase. You can override this by passing the programming_language parameter with a value from the ProgrammingLanguage enum.

The initial parse may take a few minutes for large codebases. This pre-computation enables constant-time operations afterward. Learn more here.

Remote Repositories

To fetch and parse a repository directly from GitHub, use the from_repo function.

import codegen
from codegen.sdk.enums import ProgrammingLanguage

# Fetch and parse a repository (defaults to /tmp/codegen/{repo_name})
codebase = codegen.from_repo('fastapi/fastapi')

# Customize temp directory, clone depth, specific commit, or programming language
codebase = codegen.from_repo(
    'fastapi/fastapi',
    tmp_dir='/custom/temp/dir',  # Optional: custom temp directory
    commit='786a8ada7ed0c7f9d8b04d49f24596865e4b7901',  # Optional: specific commit
    shallow=False,  # Optional: full clone instead of shallow
    programming_language=ProgrammingLanguage.PYTHON  # Optional: override language detection
)

Remote repositories are cloned to the /tmp/codegen/{repo_name} directory by default. The clone is shallow by default for better performance.

Configuration Options

You can customize the behavior of your Codebase instance by passing a CodebaseConfig object. This allows you to configure secrets (like API keys) and toggle specific features:

from codegen import Codebase
from codegen.sdk.codebase.config import CodebaseConfig, GSFeatureFlags, Secrets

codebase = Codebase(
    "path/to/repository",
    config=CodebaseConfig(
        secrets=Secrets(
            openai_key="your-openai-key"  # For AI-powered features
        ),
        feature_flags=GSFeatureFlags(
            sync_enabled=True,  # Enable graph synchronization
            ...  # Add other feature flags as needed
        )
    )
)

The CodebaseConfig allows you to configure:

  • secrets: API keys and other sensitive information needed by the codebase
  • feature_flags: Toggle specific features like language engines, dependency management, and graph synchronization

For a complete list of available feature flags and configuration options, see the source code on GitHub.

Advanced Initialization

For more complex scenarios, Codegen supports an advanced initialization mode using ProjectConfig. This allows for fine-grained control over:

  • Repository configuration
  • Base path and subdirectory filtering
  • Multiple project configurations

Here’s an example:

from codegen import Codebase
from codegen.git.repo_operator.local_repo_operator import LocalRepoOperator
from codegen.git.schemas.repo_config import BaseRepoConfig
from codegen.sdk.codebase.config import ProjectConfig
from codegen.sdk.enums import ProgrammingLanguage

codebase = Codebase(
    projects = [
        ProjectConfig(
            repo_operator=LocalRepoOperator(
                repo_path="/tmp/codegen-sdk",
                repo_config=BaseRepoConfig(),
                bot_commit=True
            ),
            programming_language=ProgrammingLanguage.TYPESCRIPT,
            base_path="src/codegen/sdk/typescript",
            subdirectories=["src/codegen/sdk/typescript"]
        )
    ]
)

For more details on advanced configuration options, see the source code on GitHub.

Supported Languages

Codegen currently supports:

Was this page helpful?