Codegen provides three primary abstractions for working with your codebase’s file structure:

  • File - Represents a file in the codebase (e.g. README.md, package.json, etc.)
  • SourceFile - Represents a source code file (e.g. Python, TypeScript, React, etc.)
  • Directory - Represents a directory in the codebase

SourceFile is a subclass of File that provides additional functionality for source code files.

Accessing Files and Directories

You typically access files from the codebase object with two APIs:

# Get a file from the codebase
file = codebase.get_file("path/to/file.py")

# Iterate over all files in the codebase
for file in codebase.files:
    pass

# Check if a file exists
exists = codebase.has_file("path/to/file.py")

These APIs are similar for Directory, which provides similar methods for accessing files and subdirectories.

# Get a directory
dir = codebase.get_directory("path/to/dir")

# Iterate over all files in the directory
for file in dir.files:
    pass

# Get the directory containing a file:
dir = file.directory

# Check if a directory exists
exists = codebase.has_directory("path/to/dir")

Differences between SourceFile and File

  • File - a general purpose class that represents any file in the codebase including non-code files like README.md, .env, .json, image files, etc.
  • SourceFile - a subclass of File that provides additional functionality for source code files written in languages supported by the codegen-sdk (Python, TypeScript, JavaScript, React).

The majority of intended use cases involve using exclusively SourceFile objects as these contain code that can be parsed and manipulated by the codegen-sdk. However, there may be cases where it will be necessary to work with non-code files. In these cases, the File class can be used.

By default, the codebase.files property will only return SourceFile objects. To include non-code files the extensions='*' argument must be used.

# Get all source files in the codebase
source_files = codebase.files

# Get all files in the codebase (including non-code files)
all_files = codebase.files(extensions="*")

When getting a file with codebase.get_file, files ending in .py, .js, .ts, .jsx, .tsx are returned as SourceFile objects while other files are returned as File objects.

Furthermore, you can use the isinstance function to check if a file is a SourceFile:

py_file = codebase.get_file("path/to/file.py")
if isinstance(py_file, SourceFile):
    print(f"File {py_file.filepath} is a source file")

# prints: `File path/to/file.py is a source file`

mdx_file = codebase.get_file("path/to/file.mdx")
if not isinstance(mdx_file, SourceFile):
    print(f"File {mdx_file.filepath} is a non-code file")

# prints: `File path/to/file.mdx is a non-code file`

Currently, the codebase object can only parse source code files of one language at a time. This means that if you want to work with both Python and TypeScript files, you will need to create two separate codebase objects.

Accessing Code

SourceFiles and Directories provide several APIs for accessing and iterating over their code.

See, for example:

# Get all functions in a file
for function in file.functions:
    print(f"Found function: {function.name}")
    print(f"Parameters: {[p.name for p in function.parameters]}")
    print(f"Return type: {function.return_type}")

# Get all classes
for cls in file.classes:
    print(f"Found class: {cls.name}")
    print(f"Methods: {[m.name for m in cls.methods]}")
    print(f"Attributes: {[a.name for a in cls.attributes]}")

# Get imports (can also do `file.import_statements`)
for imp in file.imports:
    print(f"Import from: {imp.module}")
    print(f"Imported symbol: {[s.name for s in imp.imported_symbol]}")

# Get specific symbols
main_function = file.get_function("main")
user_class = file.get_class("User")
config = file.get_global_var("CONFIG")

# Access code blocks
if main_function:
    for statement in main_function.code_block.statements:
        print(f"Statement type: {statement.statement_type}")

# Get local variables in a function
if main_function:
    local_vars = main_function.code_block.get_local_var_assignments()
    for var in local_vars:
        print(f"Local var: {var.name} = {var.value}")

Working with Non-Code Files (README, JSON, etc.)

By default, Codegen focuses on source code files (Python, TypeScript, etc). However, you can access all files in your codebase, including documentation, configuration, and other non-code files like README.md, package.json, or .env:

# Get all files in the codebase (including README, docs, config files)
files = codebase.files(extensions="*")

# Print files that are not source code (documentation, config, etc)
for file in files:
    if not file.filepath.endswith(('.py', '.ts', '.js')):
        print(f"📄 Non-code file: {file.filepath}")

You can also filter for specific file types:

# Get only markdown documentation files
docs = codebase.files(extensions=[".md", ".mdx"])

# Get configuration files
config_files = codebase.files(extensions=[".json", ".yaml", ".toml"])

These APIs are similar for Directory, which provides similar methods for accessing files and subdirectories.

Raw Content and Metadata

# Grab raw file string content
content = file.content # For text files
print('Length:', len(content))
print('# of functions:', len(file.functions))

# Access file metadata
name = file.name # Base name without extension
extension = file.extension # File extension with dot
filepath = file.filepath # Full relative path
dir = file.directory # Parent directory

# Access directory metadata
name = dir.name # Base name without extension
path = dir.path # Full relative path from repository root
parent = dir.parent # Parent directory

Editing Files Directly

Files themselves are Editable objects, just like Functions and Classes.

Learn more about the Editable API.

This means they expose many useful operations, including:

# Get a file
file = codebase.get_file("path/to/file.py")

# Replace all instances of a string
file.replace("name", "new_name")
file.replace("name", "new_name", include_comments=False) # Don't edit comments

# Replace entire text of the file
file.edit('hello, world!')

# Get + delete all instances of a string
for editable in file.search("foo"):
    editable.remove()

# Insert text at the top of the file
file.insert_before("def main():\npass")
# ... or at the bottom
file.insert_after("def end():\npass")

# Delete the file
file.remove()

You can frequently do bulk modifictions via the .edit(...) method or .replace(...) method.

Most useful operations will have bespoke APIs that handle edge cases, update references, etc.

Moving and Renaming Files

Files can be manipulated through methods like File.update_filepath(), File.rename(), and File.remove():

# Move/rename a file
file.update_filepath("/path/to/foo.py")  # Move to new location
file.rename("bar")  # Rename preserving extension, e.g. `bar.py`

# Remove a file (potentially destructive)
file.remove()

# Move all tests to a tests directory
for file in codebase.files:
    if 'test_' in file.name:
        # This will handle updating imports and other references
        file.update_filepath('tests/' + file.filepath.replace("test_", ""))

Removing files is a potentially breaking operation. Only remove files if they have no external usages.

Directories

Directories expose a similar API to the File class, with the addition of the subdirectories property.

# Get a directory
dir = codebase.get_directory("path/to/dir")

# Iterate over all directories in the codebase
for directory in codebase.directories:
    print(f"Found directory: {directory.path}")

# Check directory existence
exists = codebase.has_directory("path/to/dir")

# Access metadata
name = dir.name  # Directory name
path = dir.path  # Full path
parent = dir.parent  # Parent directory

# Get specific items
file = dir.get_file("file.py")
subdir = dir.get_subdirectory("subdir")

# Get all ancestor subdirectories
subdirs = dir.subdirectories

# Get the parent directory
parent_dir = dir.parent

# Find all child directories
for subdir in dir.subdirectories:
    if dir.parent == subdir:
        print(f"Found child subdirectory: {subdir.path}")

# Move to new location
dir.update_filepath("new/path")

# Rename directory in place
dir.rename("new_name")

# Remove a directory and all contents (potentially destructive)
dir.remove()

Removing directories is a potentially destructive operation. Only remove directories if they have no external usages.

Was this page helpful?