AST Implementation Notes
AST Implementation Notes
Analysis of Current String-Based Issues
Problems Identified
- Error-prone parsing: strchr/strstr can miss edge cases
- No source tracking: Errors donβt show location in source
- Limited extensibility: Adding operators requires string surgery
- Type safety: No compile-time validation of AST structure
Key Files to Replace
process_set.c:96-180- Assignment parsing with strchrprocess_directive.c:50-94- Command extraction with string opsprocess_multiline_directive_enhanced.c:49- Line-by-line tokenization
AST Architecture Design
Node Hierarchy
AST_PROGRAM (root)
βββ AST_DIRECTIVE (xmd commands)
β βββ AST_ASSIGNMENT (var = value)
β βββ AST_CONDITIONAL (if/elif/else)
β βββ AST_LOOP (for loops)
βββ AST_EXPRESSION
β βββ AST_BINARY_OP (+, -, ==, etc)
β βββ AST_FUNCTION_CALL (import, exec)
β βββ AST_LITERAL (strings, numbers)
Memory Management Strategy
- Each node owns its children (tree ownership)
- Reference counting for shared subtrees
- Single ast_free() call frees entire tree
Parser Strategy
- Recursive descent with operator precedence
- Left-to-right associativity for same precedence
- Error recovery with meaningful messages
Integration Points
- Replace process_directive() with ast_parse_directive()
- Replace variable assignment logic with ast_evaluate()
- Maintain same external API initially
Implementation Order
- Core AST nodes and memory management
- Enhanced lexer with all token types
- Recursive descent parser
- AST evaluator/interpreter
- Integration with existing processor
- String parser removal
Testing Strategy
- Unit tests for each AST node type
- Parser tests with valid/invalid syntax
- Integration tests with existing XMD files
- Performance benchmarks vs string parsing