It was quite simple, to begin with. Instead of creating a standard recursive descent parser, I decided to take advantage of BASIC's 'line number-command list' structure, and create a simple regex to parse each line. I added a couple of checks to ensure that the command separator (the colon) wasn't mis-presented within quoted strings.
CodeLine.parse = function(str) { var split = str.match(/^\s*(\d+)\s+(.*)\s*$/m); if (split) { var line = new CodeLine(split[1]); var colon = split[2].split(':'); for(var instr=0;instr<colon.length;++instr) { var numOfQuotes = (colon[instr].split('"').length - 1); if (numOfQuotes & 1 == 1) { // there's an odd number quote, so we failed to correctly split // join this and the next text together to rebuild the string(BUGWARN) var cmd = colon[instr] + ":" + colon[instr+1]; line.addInstruction(CodeInstruction.parse(cmd)).lineNumber = line.lineNumber; ++instr; } else { line.addInstruction(CodeInstruction.parse(colon[instr])).lineNumber = line.lineNumber; } } return line; } return null; }
From here, I studied the syntax of each BASIC command to create one, or sometimes two, regexes to differentiate between them.
if (args = str.match(/^goto\s+(\d+)/i)) { this.method = new cbGoto(args[1]); ... } else if (args = str.match(/^for\s+(\w+)\s*=(.*)\s*to\s*(.*)/i)) { this.method = new cbForLoop(args[1], args[2], args[3], 1); } else if (args = str.match(/^for\s+(\w+)\s*=(.*)\s*to\s*(.*)\s*step\s+(.*)/i)) { this.method = new cbForLoop(args[1], args[2], args[3], args[4]); ...
When combined, these steps build up a program parse tree, ready for evaluation. When each command is encountered a callback function is invoked with the current program state, and the operation performed. To handle the GOSUB library, I simply added a system-level callback at the appropriate line number to invoke the HTML5-specific code.
Et voila!
Well, nearly! I still need to properly parse the expressions... And that's where I am now. Awaiting some time to write/discover/incorporate a parser.