i'm redoing all the parsing lol

11 months ago · cd51f4cce1
parent d152c4092a
commit cd51f4cce1
7 changed files with 487 additions and 30 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,98 @@
 # clyde
-A command-line shell. Initial development is for Windows because Windows has the worst starting point when it comes to CLI shells.
+A command-line shell. Initial development is for Windows because Windows has
 the worst starting point when it comes to CLI shells. Assuming nothing here
 works and it doesn't even compile.
 ## Background
 The needs of this project reflect my professional experience working on
 multiplayer video games. Prior to working in the games industry, I worked in
 the web industry, primarily writing server software. The working practices of
 programmers in these industries differ substantially. The large differences in
 these working practices create a substantial and difficult social divide
 within studios that make multiplayer games. These social differences make an
 already-difficult category of software even more difficult to develop. The
 goal of this project is to reduce the tooling divide that exists between
 client and server developers at multiplayer game studios, with the expectation
 that shrinking this tooling gap can reduce the social divide that exists
 between game developers and server/infrastructure developers at game studios.
 ### Windows is a hard requirement
 The first gap that appears within the greater software developer landscape is
 the question of using Windows at all. Let us dispense with this question
 immediately: Windows is a necesary tool within the game development industry.
 Supporting Windows is a hard requirement of this project.
 Many necessary tools, such as tools provided by game console manufacturers and
 tools that integrate with the dominant game engines (Unity and Unreal) only
 support Windows. If your reaction to this is "don't use those tools" or "make
 your game in Godot", please just stop. This is a very common reaction from
 professional programmers who have never worked in gamedev, who don't
 understand the constraints that most game developers face. Unfortunately a
 large portion of programming discussion happens on Hacker News, Reddit, and
 Lobsters, each of which uses a democratic structure.
 I will not attempt to convince you that Windows is broadly
 unavoidable in professional game development and that Windows is a necessary
 tool for game development studios, even studios that employ engineers for whom
 Windows is not necessary for their individual role. If you want to advocate
 for Linux superiority, please go bother Nintendo, Sony, Microsoft, etc, not
 individual game developers. If you really can't get over this and feel the
 need to yell at me, please walk a few hundred yards into a dense forest and
 scream your complaints into the wilderness to feel better.
 The command-line environments built into Windows are incredibly primitive.
 These environments are so primitive that millions of professional programmers
 who work primarily on Windows are led to believe that command-line
 environments themselves are *inherently* primitive.
 . The limitations of built-in
 command-line environments on Windows is so severe that there are entire
 categories of professional computer programmers who believe that they are "not
 terminal people", often without realizing that the terminal on Linux and MacOS
 is a vastly different experience and different tooling ecosystem than it is on
 Windows. Many professional Windows developers believe that CLI environments
 are inherently primitive, because their only exposure to CLI environments is
 through the built-in Windows shells.
 Windows ships with two built-in
 shells. The first shell is the Command Prompt, which we'll refer to as
 [cmd.exe](https://en.wikipedia.org/wiki/Cmd.exe) as it is colloquially known,
 even when running inside of
 [terminal.exe](https://en.wikipedia.org/wiki/Windows_Terminal). The second
 shell is [PowerShell](https://en.wikipedia.org/wiki/PowerShell).
 ### Insufficiency of cmd.exe
 The insufficiency of cmd.exe is widely understood to people with experience
 with Unix-like operating systems such as Linux and MacOS, which typically
 include either [bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) or
 [zsh](https://en.wikipedia.org/wiki/Z_shell). This insufficiency is severe
 enough that it drives people off of Windows entirely.
 In some industries, such as games, Windows is a required development
 environment, because many tools are available only on Windows. Free software
 purists who want to insist that you can make video games on Linux, please feel
 free to leave now, this project is not for you.
 ## Terimology
 - **executable**: a shell built-in or an executable file (i.e., an .exe file)
 ## examples
    $> a
 Find an executable named `a` somewhere on `PATH` and execute it in the
 foreground.
    $> ./a
 Use an executable named `a` within the current directory and execute it in the
 foreground. `a` must be a file in the current directory. If `a` would refer to
 both a shell built-in and a file in the current directory, use the file in the
 current directory.
    $> 
--- a/src/error.rs
+++ b/src/error.rs
@ -1,4 +1,4 @@
-use crate::lex::Topoglyph;
+use crate::lex::{Token, Topoglyph};
 use std::io;
 use thiserror::Error;
 use windows::Win32::Foundation::{GetLastError, BOOL};
@ -44,8 +44,23 @@ impl LexError {
 #[derive(Debug, Error)]
 pub enum ParseError {
-    #[error("Unexpected Token")]
+    #[error("lex error")]
-    UnexpectedToken,
+    LexError(#[from] LexError),
    #[error("Unexpected Token: {0:?}")]
    UnexpectedToken(Token),
    #[error("Illegal attempt to climb parse tree while already at root")]
    AtRootAlready,
    #[error("Illegal attempt to climb barse tree when target parent has already been dropped")]
    ParentIsGone,
    #[error("Illegal attempt to double-borrow a node")]
    BorrowError(#[from] std::cell::BorrowMutError),
    #[error("Illegal attempt to push a value as a child to a terminal value")]
    PushOntoTerminal,
 }
 impl Error {
--- a/src/lex.rs
+++ b/src/lex.rs
@ -22,7 +22,8 @@ fn is_keyword(s: &str) -> bool {
    }
 }
-/// The position of a specific glyph within a corpus of text
+/// The position of a specific glyph within a corpus of text. We use this for rendering error
 /// messages and communicating to the user the location of errors.
 #[derive(PartialEq, Clone, Copy)]
 pub struct Position {
    /// The visual line in which this glyph appears in the source text
@ -37,12 +38,16 @@ impl Position {
        Self { line: 0, column: 0 }
    }
    /// Increments position by column, going from the current line,column position to the next
    /// column on the same line.
    fn incr(&mut self) -> Position {
        let p = *self;
        self.column += 1;
        p
    }
    /// Increments the position by line, going from the current line,column position to the
    /// beginning of the next line.
    fn incr_line(&mut self) -> Position {
        let p = *self;
        self.column = 0;
@ -105,7 +110,8 @@ impl<'text> Topoglypher<'text> {
        }
    }
-    fn feed(&mut self, n: usize) -> bool {
+    /// reads the next n characters from the source text into our lookahead buffer
    fn fill_lookahead(&mut self, n: usize) -> bool {
        while self.lookahead.len() < n {
            let c = match self.source.next() {
                Some(c) => c,
@ -132,19 +138,26 @@ impl<'text> Topoglypher<'text> {
        self.lookahead.len() == n
    }
    /// returns a reference to the next character from the source text, advancing our internal
    /// lookahead buffer if necessary. Returns None if we're already at the end of our source text.
    fn peek(&mut self) -> Option<&Topoglyph> {
        self.peek_at(0)
    }
    /// takes the next character from our input text
    fn pop(&mut self) -> Result<Topoglyph, LexError> {
        self.next().ok_or(LexError::UnexpectedEOF)
    }
    /// returns a reference to a character in our lookahead buffer at a given position. This allows
    /// us to perform a lookahead read without consuming any tokens, maintaining our current
    /// position and keeping our unconsumed characters safe.
    fn peek_at(&mut self, idx: usize) -> Option<&Topoglyph> {
-        self.feed(idx + 1);
+        self.fill_lookahead(idx + 1);
        self.lookahead.get(idx)
    }
    /// checks whether or not the next character in our source text matches some predicate
    fn next_is<F>(&mut self, pred: F) -> bool
    where
        F: FnOnce(&Topoglyph) -> bool,
@ -152,10 +165,17 @@ impl<'text> Topoglypher<'text> {
        self.peek().map(pred).unwrap_or(false)
    }
-    fn is_empty(&mut self) -> bool {
+    /// checks whether or not we're already at the end of our input text. If we're already at the
    /// end of our input text, we do not expect any future reads to produce new characters.
    fn at_eof(&mut self) -> bool {
        self.peek().is_none()
    }
    /// discards characters from our current position so long as the upcoming characters match some
    /// predicate. This is called yeet_while instead of skip_while in order to avoid conflicting
    /// with the
    /// [skip_while](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.skip_while)
    /// method of the stdlib Iterator trait.
    pub fn yeet_while<F>(&mut self, mut pred: F)
    where
        F: FnMut(&Topoglyph) -> bool,
@ -211,7 +231,7 @@ impl<'text> Iterator for Topoglypher<'text> {
    type Item = Topoglyph;
    fn next(&mut self) -> Option<Self::Item> {
-        self.feed(1);
+        self.fill_lookahead(1);
        self.lookahead.pop_front()
    }
 }
@ -219,7 +239,7 @@ impl<'text> Iterator for Topoglypher<'text> {
 /// A Lexeme is the text of a given Token, without respect to that Token's type, but with respect
 /// to where the text appears relative to some source code. This is, simply, a string that contains
 /// the addresses of each of its characters with respect to some source text.
-#[derive(PartialEq)]
+#[derive(PartialEq, Clone)]
 pub struct Lexeme {
    elems: Vec<Topoglyph>,
 }
@ -272,18 +292,18 @@ impl From<Vec<Topoglyph>> for Lexeme {
 }
 #[allow(dead_code)]
-#[derive(Debug, PartialEq)]
+#[derive(Debug, PartialEq, Clone)]
 pub enum Token {
-    BareString(Lexeme),
+    String(Lexeme),
    Glob(Lexeme),
 }
-struct Lexer<'text> {
+pub struct Tokenizer<'text> {
    source: Topoglypher<'text>,
 }
-impl<'text> Lexer<'text> {
+impl<'text> Tokenizer<'text> {
-    fn new(text: &'text str) -> Self {
+    pub fn new(text: &'text str) -> Self {
        Self {
            source: Topoglypher::new(text),
        }
@ -323,7 +343,7 @@ impl<'text> Lexer<'text> {
        if progress.is_empty() {
            Err(LexError::UnexpectedEOF)
        } else {
-            Ok(Token::BareString(progress.into()))
+            Ok(Token::String(progress.into()))
        }
    }
@ -361,7 +381,7 @@ impl<'text> Lexer<'text> {
    }
 }
-impl<'text> Iterator for Lexer<'text> {
+impl<'text> Iterator for Tokenizer<'text> {
    type Item = Result<Token, LexError>;
    fn next(&mut self) -> Option<Self::Item> {
@ -369,6 +389,55 @@ impl<'text> Iterator for Lexer<'text> {
    }
 }
 pub fn lex(source: &str) -> Result<Vec<Token>, LexError> {
    Tokenizer::new(source).collect()
 }
 pub struct Lexer<'text> {
    source: Tokenizer<'text>,
    lookahead: VecDeque<Token>,
 }
 impl<'text> Lexer<'text> {
    pub fn new(source: &'text str) -> Self {
        Self {
            source: Tokenizer::new(source),
            lookahead: VecDeque::new(),
        }
    }
    fn fill_lookahead(&mut self, n: usize) -> Result<bool, LexError> {
        while self.lookahead.len() < n {
            let token = match self.source.next() {
                Some(res) => res?,
                None => return Ok(false),
            };
            self.lookahead.push_back(token);
        }
        Ok(true)
    }
    pub fn peek_at(&mut self, idx: usize) -> Result<Option<&Token>, LexError> {
        self.fill_lookahead(idx + 1)?;
        Ok(None)
    }
    pub fn peek(&mut self) -> Result<Option<&Token>, LexError> {
        self.peek_at(0)
    }
 }
 impl<'text> Iterator for Lexer<'text> {
    type Item = Result<Token, LexError>;
    fn next(&mut self) -> Option<Self::Item> {
        match self.lookahead.pop_front() {
            Some(token) => Some(Ok(token)),
            None => self.source.next(),
        }
    }
 }
 #[cfg(test)]
 mod tests {
    use super::*;
@ -379,7 +448,7 @@ mod tests {
            #[test]
            fn $name() {
                println!("testing that we can lex the following input text:\n\t{}", $line);
-                let lexer = Lexer::new($line);
+                let lexer = Tokenizer::new($line);
                let tokens: Result<Vec<Token>, LexError> = lexer.collect();
                match tokens {
                    Ok(tokens) => {
@ -400,8 +469,7 @@ mod tests {
            #[test]
            fn $name() {
                println!("testing that we will fail to lex the following input text:\n\t{}", $line);
-                let lexer = Lexer::new($line);
+                let tokens = lex($line);
                let tokens: Result<Vec<Token>, LexError> = lexer.collect();
                match tokens {
                    Ok(tokens) => {
                        println!("output tokens: {tokens:?}");
--- a/src/log.rs
+++ b/src/log.rs
@ -1,5 +1,5 @@
 use crate::error::Error;
-pub use log::{debug, error, info, set_logger, set_max_level, trace, warn, LevelFilter};
+pub use log::{debug, info, set_logger, set_max_level, warn, LevelFilter};
 use std::{
    fs::File,
@ -49,19 +49,19 @@ where
                match record.level() {
                    log::Level::Error => {
                        _ = write!(out, "\x1b[31m{}\x1b[0m\n", record.args());
-                    },
+                    }
                    log::Level::Warn => {
                        _ = write!(out, "\x1b[33m{}\x1b[0m\n", record.args());
-                    },
+                    }
                    log::Level::Info => {
                        _ = write!(out, "\x1b[37m{}\x1b[0m\n", record.args());
-                    },
+                    }
                    log::Level::Debug => {
                        _ = write!(out, "\x1b[90m{}\x1b[0m\n", record.args());
-                    },
+                    }
                    log::Level::Trace => {
                        _ = write!(out, "\x1b[36m{}\x1b[0m\n", record.args());
-                    },
+                    }
                }
            }
        }
--- a/src/main.rs
+++ b/src/main.rs
@ -7,6 +7,7 @@ mod line;
 mod log;
 mod output;
 mod parse;
 mod parse2;
 mod prompt;
 mod shell;
@ -181,3 +182,7 @@ fn main() -> Result<()> {
        }
    }
 }
 /*
 */
--- a/src/parse.rs
+++ b/src/parse.rs
@ -47,10 +47,6 @@ impl Node {
            children: Vec::new(),
        }
    }
    // pub fn visit(self) -> Tree {
    //     self.into()
    // }
 }
 impl fmt::Debug for Node {
--- a/src/parse2.rs
+++ b/src/parse2.rs
@ -0,0 +1,278 @@
 use crate::error::ParseError;
 use crate::lex::{Lexer, Token};
 use std::{
    cell::RefCell,
    collections::VecDeque,
    rc::{Rc, Weak},
 };
 #[derive(PartialEq)]
 pub enum Value {
    /// The start symbol of our parse tree. Each parse tree is rooted in a node whose value is the
    /// start symbol. This is the only node in the tree that should utilize the start symbol.
    Start,
    Statement,
    Terminal(Token),
 }
 impl Value {
    fn is_terminal(&self) -> bool {
        matches!(self, Value::Terminal(_))
    }
 }
 /// A node in a parse tree.
 pub struct Node {
    /// A node may or may not have a parent node. If a node does not have a parent node, that node
    /// is the root node of a tree.
    parent: Option<Weak<Node>>,
    /// The value of the element at this node
    value: Value,
    /// A node may or may not have children. Since an empty vector is a valid vector, a node
    /// without children is represented as having an empty children vector. A node having an empty
    /// list of children is a leaf node in a tree.
    children: RefCell<Vec<Rc<Node>>>,
 }
 impl Node {
    fn new() -> Cursor {
        let root = Node {
            parent: None,
            value: Value::Start,
            children: RefCell::new(Vec::new()),
        };
        let root = Rc::new(root);
        Cursor {
            target: Rc::clone(&root),
            root,
        }
    }
 }
 /// Cursor values expose access to a parse tree.
 struct Cursor {
    target: Rc<Node>,
    root: Rc<Node>,
 }
 impl Cursor {
    /// Climbs one level up a parse tree. The cursor is re-pointed from its current target node to
    /// the parent of its current target node. This method fails if the cursor is already at the
    /// root node of the parse tree.
    fn up(&mut self) -> Result<(), ParseError> {
        match &self.target.parent {
            None => Err(ParseError::AtRootAlready),
            Some(parent) => match parent.upgrade() {
                Some(parent) => {
                    self.target = parent;
                    Ok(())
                }
                None => Err(ParseError::ParentIsGone),
            },
        }
    }
    /// Adds a value to the children of the current target node, then descends to select that
    /// child.
    fn push(&mut self, v: Value) -> Result<(), ParseError> {
        if self.target.value.is_terminal() {
            return Err(ParseError::PushOntoTerminal);
        }
        let node = Node {
            parent: Some(Rc::downgrade(&self.target)),
            value: v,
            children: RefCell::new(Vec::new()),
        };
        let node = Rc::new(node);
        self.target
            .children
            .try_borrow_mut()?
            .push(Rc::clone(&node));
        self.target = node;
        Ok(())
    }
    fn is_root(&self) -> bool {
        self.target.parent.is_none()
    }
    fn into_root(self) -> Rc<Node> {
        Rc::clone(&self.root)
    }
    fn value(&self) -> &Value {
        &self.target.value
    }
 }
 struct Parser<'text> {
    source: Lexer<'text>,
    cursor: Cursor,
 }
 impl<'text> Parser<'text> {
    pub fn new(source: Lexer<'text>) -> Self {
        Self {
            source,
            cursor: Node::new(),
        }
    }
    pub fn parse(mut self) -> Result<Rc<Node>, ParseError> {
        while self.step()? {}
        Ok(self.cursor.into_root())
    }
    fn step(&mut self) -> Result<bool, ParseError> {
        match self.cursor.value() {
            Value::Start => self.step_start(),
            Value::Statement => self.step_statement(),
            Value::Terminal(_) => panic!(),
        }
    }
    fn step_start(&mut self) -> Result<bool, ParseError> {
        assert!(matches!(self.cursor.value(), Value::Start));
        match self.source.peek()? {
            Some(Token::String(_)) => {
                self.cursor.push(Value::Statement)?;
                let token = self.source.next().unwrap()?;
                self.cursor.push(Value::Terminal(token))?;
                self.cursor.up()?;
                Ok(true)
            }
            Some(Token::Glob(_)) => {
                let token = self.source.next().unwrap()?;
                Err(ParseError::UnexpectedToken(token))
            }
            None => Ok(false),
        }
    }
    fn step_statement(&mut self) -> Result<bool, ParseError> {
        assert!(matches!(self.cursor.value(), Value::Statement));
        match self.source.peek()? {
            Some(Token::String(_) | Token::Glob(_)) => {
                let token = self.source.next().unwrap()?;
                self.cursor.push(Value::Terminal(token))?;
                self.cursor.up()?;
                Ok(true)
            }
            None => Ok(false),
        }
    }
 }
 fn parse(source: &str) -> Result<Rc<Node>, ParseError> {
    let tokens = Lexer::new(source);
    let parser = Parser::new(tokens);
    parser.parse()
 }
 #[cfg(test)]
 mod test {
    use super::*;
    use crate::lex::lex;
    #[test]
    fn root() {
        let mut cursor = Node::new();
        assert!(cursor.up().is_err());
        assert!(cursor.target.value == Value::Start);
        assert!(cursor.is_root());
    }
    #[test]
    fn single_val() {
        let mut cursor = Node::new();
        let mut tokens = lex("  ls  ").unwrap();
        let ls = tokens.pop().unwrap();
        assert!(cursor.push(Value::Statement).is_ok());
        assert!(cursor.push(Value::Terminal(ls.clone())).is_ok());
        assert!(cursor.push(Value::Terminal(ls.clone())).is_err());
        assert!(cursor.target.value == Value::Terminal(ls));
        assert!(!cursor.is_root());
        assert!(cursor.up().is_ok());
        assert!(cursor.up().is_ok());
        assert!(cursor.is_root());
        assert!(cursor.up().is_err());
        assert!(cursor.value() == &Value::Start);
        let root = cursor.into_root();
        assert!(root.value == Value::Start);
    }
    #[test]
    fn test_parse() -> Result<(), ParseError> {
        parse("ls")?;
        // parse("*")?;
        // parse("x* ls")?;
        Ok(())
    }
 }
 /*
 > ls
  start
    statement
      ls
 > ls ;
  start
    statement
      ls
      ;
 > ls ; ls
  start
    statement
      ls
      ;
    statement
      ls
 > ls one two three
  start
    statement
      ls
      one
      two
      three
 > ls > files.txt ; echo files.txt
  start
    statement
      ls
      >
      files.txt
      ;
    statement
      echo
      files.txt
 > if exists ~/.vimrc : echo you have a vimrc
 > if $x == 3: echo hi
  start
    if
      expression
        $x
        ==
        3
      :
      statement
        echo
        hi
 */