Parsing Codebases
The primary entrypoint to programs leveraging Codegen is the Codebase class.
Local Codebases
Construct a Codebase by passing in a path to a local git
repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a .git
folder).
By default, Codegen will automatically infer the programming language of the codebase and
parse all files in the codebase. You can override this by passing the programming_language
parameter
with a value from the ProgrammingLanguage
enum.
The initial parse may take a few minutes for large codebases. This pre-computation enables constant-time operations afterward. Learn more here.
Remote Repositories
To fetch and parse a repository directly from GitHub, use the from_repo
function.
Remote repositories are cloned to the /tmp/codegen/{repo_name}
directory by
default. The clone is shallow by default for better performance.
Configuration Options
You can customize the behavior of your Codebase instance by passing a CodebaseConfig
object. This allows you to configure secrets (like API keys) and toggle specific features:
The CodebaseConfig
allows you to configure:
secrets
: API keys and other sensitive information needed by the codebasefeature_flags
: Toggle specific features like language engines, dependency management, and graph synchronization
For a complete list of available feature flags and configuration options, see the source code on GitHub.
Advanced Initialization
For more complex scenarios, Codegen supports an advanced initialization mode using ProjectConfig
. This allows for fine-grained control over:
- Repository configuration
- Base path and subdirectory filtering
- Multiple project configurations
Here’s an example:
For more details on advanced configuration options, see the source code on GitHub.
Supported Languages
Codegen currently supports:
Was this page helpful?