MacroSplitter
Starting with version 0.2, MacroSplitter includes an optional 6502 peephole optimizer that can remove redundant load and store instructions from the compiler output, resulting in smaller and faster code.
Starting with version 0.2, MacroSplitter can also perform the macro expansion step itself via the -M flag, replacing the cpp.exe invocation in the build pipeline. When used this way, it reads the compiler's .c2 output directly, expands the MACROS.H definitions, and emits annotation comments showing the original macro call before each expanded block. This makes debugging the generated assembly much easier.
%OSDK%\bin\MacroSplitter [switches] source_file destination_fileThis tool is normally invoked automatically by the OSDK build system (make.bat) during the C compilation phase.
-O => Enable peephole optimization of the assembly output -M<macrofile> => Expand macros from <macrofile> (replaces cpp.exe macro step)
The -M flag tells MacroSplitter to read the compiler's .c2 output directly and expand all MACROS.H definitions internally. It supports all macro types (object-like, function-like, nested), handles parenthesized arguments like (ap) and (fp), and inserts annotation comments before each expanded block:
; === ENTER(1,6) === ldx #6 lda #1 jsr enter ; === MOVW_YD((ap),0,tmp0) === ldy #0 lda (ap),y sta tmp0 iny lda (ap),y sta tmp0+1
To enable the built-in macro expansion for your project, add the following to your osdk_config.bat file:
SET OSDKMACROEXPAND=1
This can be combined with the optimizer:
SET OSDKMACROEXPAND=1 SET OSDKMACRO=-O
When OSDKMACROEXPAND is not set (the default), the old pipeline using cpp.exe for macro expansion is used, and MacroSplitter only performs line splitting.
The optimizer applies several safe peephole patterns to the assembly output. These patterns target the redundant code sequences that the C compiler typically generates through macro expansion.
Pattern 1 - Self-store elimination: When a value is loaded and immediately stored back to the same location, the store is redundant and is removed.
Before: After: lda tmp0 lda tmp0 sta tmp0 (removed)
Pattern 2 - Load-after-store elimination: When a value is stored to a location and immediately loaded back from the same location, the load is redundant and is removed.
Before: After: sta tmp0 sta tmp0 lda tmp0 (removed)
Pattern 3 - Dead store elimination: When a value is stored to a location and immediately stored again to the same location, the first store is dead and is removed. This pattern will not be applied to writes targeting the Oric I/O page ($03xx) since each write to that page has a hardware side effect.
Before: After: sta tmp0 (removed) sta tmp0 sta tmp0
Pattern 4 - Tail call optimization: When a subroutine call is immediately followed by a return, the call can be converted to a jump.
Before: After: jsr MyRoutine jmp MyRoutine rts (removed)
Pattern 5 - Cross-register transfer: When a value is stored from one register and later loaded into a different register from the same address, the load can be replaced with a register transfer instruction.
Before: After: stx tmp0 stx tmp0 sta tmp0+1 sta tmp0+1 lda tmp0 txa
Pattern 6 - Same-immediate reload: When an immediate value is loaded, stored somewhere, and then the same immediate is loaded again, the second load is redundant. The optimizer resolves #<(N) and #>(N) expressions for numeric literals.
Before: After: lda #0 lda #0 sta _gDelayStream sta _gDelayStream lda #0 (removed) sta _gDelayStream+1 sta _gDelayStream+1
All load/store patterns work with all three register pairs: lda/sta, ldx/stx, and ldy/sty. The optimizer respects labels and directives as barriers and will never optimize across them.
The optimizer runs in multiple passes to handle cascading eliminations where removing one instruction creates a new optimization opportunity.
The optimizer uses the existing OSDKVERBOSITY environment variable to control its output level:
Level 0 : Silent - no optimizer output Level 1-2 : Summary - one line showing total removed instructions Level 3+ : Detailed - prints each individual elimination with line number
Example output at verbosity level 3:
MacroSplitter: [line 10] Self-store eliminated: sta tmp0 MacroSplitter: [line 31] Load-after-store eliminated: lda tmp0 MacroSplitter: Optimizer removed 25 redundant instructions
No known problem - please signal any issue on the Cross development tools forum.
comments powered by Disqus
