i'm redoing all the parsing lol

main
Jordan Orelli 10 months ago
parent d152c4092a
commit cd51f4cce1

@ -1,3 +1,98 @@
# clyde # clyde
A command-line shell. Initial development is for Windows because Windows has the worst starting point when it comes to CLI shells. A command-line shell. Initial development is for Windows because Windows has
the worst starting point when it comes to CLI shells. Assuming nothing here
works and it doesn't even compile.
## Background
The needs of this project reflect my professional experience working on
multiplayer video games. Prior to working in the games industry, I worked in
the web industry, primarily writing server software. The working practices of
programmers in these industries differ substantially. The large differences in
these working practices create a substantial and difficult social divide
within studios that make multiplayer games. These social differences make an
already-difficult category of software even more difficult to develop. The
goal of this project is to reduce the tooling divide that exists between
client and server developers at multiplayer game studios, with the expectation
that shrinking this tooling gap can reduce the social divide that exists
between game developers and server/infrastructure developers at game studios.
### Windows is a hard requirement
The first gap that appears within the greater software developer landscape is
the question of using Windows at all. Let us dispense with this question
immediately: Windows is a necesary tool within the game development industry.
Supporting Windows is a hard requirement of this project.
Many necessary tools, such as tools provided by game console manufacturers and
tools that integrate with the dominant game engines (Unity and Unreal) only
support Windows. If your reaction to this is "don't use those tools" or "make
your game in Godot", please just stop. This is a very common reaction from
professional programmers who have never worked in gamedev, who don't
understand the constraints that most game developers face. Unfortunately a
large portion of programming discussion happens on Hacker News, Reddit, and
Lobsters, each of which uses a democratic structure.
I will not attempt to convince you that Windows is broadly
unavoidable in professional game development and that Windows is a necessary
tool for game development studios, even studios that employ engineers for whom
Windows is not necessary for their individual role. If you want to advocate
for Linux superiority, please go bother Nintendo, Sony, Microsoft, etc, not
individual game developers. If you really can't get over this and feel the
need to yell at me, please walk a few hundred yards into a dense forest and
scream your complaints into the wilderness to feel better.
The command-line environments built into Windows are incredibly primitive.
These environments are so primitive that millions of professional programmers
who work primarily on Windows are led to believe that command-line
environments themselves are *inherently* primitive.
. The limitations of built-in
command-line environments on Windows is so severe that there are entire
categories of professional computer programmers who believe that they are "not
terminal people", often without realizing that the terminal on Linux and MacOS
is a vastly different experience and different tooling ecosystem than it is on
Windows. Many professional Windows developers believe that CLI environments
are inherently primitive, because their only exposure to CLI environments is
through the built-in Windows shells.
Windows ships with two built-in
shells. The first shell is the Command Prompt, which we'll refer to as
[cmd.exe](https://en.wikipedia.org/wiki/Cmd.exe) as it is colloquially known,
even when running inside of
[terminal.exe](https://en.wikipedia.org/wiki/Windows_Terminal). The second
shell is [PowerShell](https://en.wikipedia.org/wiki/PowerShell).
### Insufficiency of cmd.exe
The insufficiency of cmd.exe is widely understood to people with experience
with Unix-like operating systems such as Linux and MacOS, which typically
include either [bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) or
[zsh](https://en.wikipedia.org/wiki/Z_shell). This insufficiency is severe
enough that it drives people off of Windows entirely.
In some industries, such as games, Windows is a required development
environment, because many tools are available only on Windows. Free software
purists who want to insist that you can make video games on Linux, please feel
free to leave now, this project is not for you.
## Terimology
- **executable**: a shell built-in or an executable file (i.e., an .exe file)
## examples
$> a
Find an executable named `a` somewhere on `PATH` and execute it in the
foreground.
$> ./a
Use an executable named `a` within the current directory and execute it in the
foreground. `a` must be a file in the current directory. If `a` would refer to
both a shell built-in and a file in the current directory, use the file in the
current directory.
$>

@ -1,4 +1,4 @@
use crate::lex::Topoglyph; use crate::lex::{Token, Topoglyph};
use std::io; use std::io;
use thiserror::Error; use thiserror::Error;
use windows::Win32::Foundation::{GetLastError, BOOL}; use windows::Win32::Foundation::{GetLastError, BOOL};
@ -44,8 +44,23 @@ impl LexError {
#[derive(Debug, Error)] #[derive(Debug, Error)]
pub enum ParseError { pub enum ParseError {
#[error("Unexpected Token")] #[error("lex error")]
UnexpectedToken, LexError(#[from] LexError),
#[error("Unexpected Token: {0:?}")]
UnexpectedToken(Token),
#[error("Illegal attempt to climb parse tree while already at root")]
AtRootAlready,
#[error("Illegal attempt to climb barse tree when target parent has already been dropped")]
ParentIsGone,
#[error("Illegal attempt to double-borrow a node")]
BorrowError(#[from] std::cell::BorrowMutError),
#[error("Illegal attempt to push a value as a child to a terminal value")]
PushOntoTerminal,
} }
impl Error { impl Error {

@ -22,7 +22,8 @@ fn is_keyword(s: &str) -> bool {
} }
} }
/// The position of a specific glyph within a corpus of text /// The position of a specific glyph within a corpus of text. We use this for rendering error
/// messages and communicating to the user the location of errors.
#[derive(PartialEq, Clone, Copy)] #[derive(PartialEq, Clone, Copy)]
pub struct Position { pub struct Position {
/// The visual line in which this glyph appears in the source text /// The visual line in which this glyph appears in the source text
@ -37,12 +38,16 @@ impl Position {
Self { line: 0, column: 0 } Self { line: 0, column: 0 }
} }
/// Increments position by column, going from the current line,column position to the next
/// column on the same line.
fn incr(&mut self) -> Position { fn incr(&mut self) -> Position {
let p = *self; let p = *self;
self.column += 1; self.column += 1;
p p
} }
/// Increments the position by line, going from the current line,column position to the
/// beginning of the next line.
fn incr_line(&mut self) -> Position { fn incr_line(&mut self) -> Position {
let p = *self; let p = *self;
self.column = 0; self.column = 0;
@ -105,7 +110,8 @@ impl<'text> Topoglypher<'text> {
} }
} }
fn feed(&mut self, n: usize) -> bool { /// reads the next n characters from the source text into our lookahead buffer
fn fill_lookahead(&mut self, n: usize) -> bool {
while self.lookahead.len() < n { while self.lookahead.len() < n {
let c = match self.source.next() { let c = match self.source.next() {
Some(c) => c, Some(c) => c,
@ -132,19 +138,26 @@ impl<'text> Topoglypher<'text> {
self.lookahead.len() == n self.lookahead.len() == n
} }
/// returns a reference to the next character from the source text, advancing our internal
/// lookahead buffer if necessary. Returns None if we're already at the end of our source text.
fn peek(&mut self) -> Option<&Topoglyph> { fn peek(&mut self) -> Option<&Topoglyph> {
self.peek_at(0) self.peek_at(0)
} }
/// takes the next character from our input text
fn pop(&mut self) -> Result<Topoglyph, LexError> { fn pop(&mut self) -> Result<Topoglyph, LexError> {
self.next().ok_or(LexError::UnexpectedEOF) self.next().ok_or(LexError::UnexpectedEOF)
} }
/// returns a reference to a character in our lookahead buffer at a given position. This allows
/// us to perform a lookahead read without consuming any tokens, maintaining our current
/// position and keeping our unconsumed characters safe.
fn peek_at(&mut self, idx: usize) -> Option<&Topoglyph> { fn peek_at(&mut self, idx: usize) -> Option<&Topoglyph> {
self.feed(idx + 1); self.fill_lookahead(idx + 1);
self.lookahead.get(idx) self.lookahead.get(idx)
} }
/// checks whether or not the next character in our source text matches some predicate
fn next_is<F>(&mut self, pred: F) -> bool fn next_is<F>(&mut self, pred: F) -> bool
where where
F: FnOnce(&Topoglyph) -> bool, F: FnOnce(&Topoglyph) -> bool,
@ -152,10 +165,17 @@ impl<'text> Topoglypher<'text> {
self.peek().map(pred).unwrap_or(false) self.peek().map(pred).unwrap_or(false)
} }
fn is_empty(&mut self) -> bool { /// checks whether or not we're already at the end of our input text. If we're already at the
/// end of our input text, we do not expect any future reads to produce new characters.
fn at_eof(&mut self) -> bool {
self.peek().is_none() self.peek().is_none()
} }
/// discards characters from our current position so long as the upcoming characters match some
/// predicate. This is called yeet_while instead of skip_while in order to avoid conflicting
/// with the
/// [skip_while](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.skip_while)
/// method of the stdlib Iterator trait.
pub fn yeet_while<F>(&mut self, mut pred: F) pub fn yeet_while<F>(&mut self, mut pred: F)
where where
F: FnMut(&Topoglyph) -> bool, F: FnMut(&Topoglyph) -> bool,
@ -211,7 +231,7 @@ impl<'text> Iterator for Topoglypher<'text> {
type Item = Topoglyph; type Item = Topoglyph;
fn next(&mut self) -> Option<Self::Item> { fn next(&mut self) -> Option<Self::Item> {
self.feed(1); self.fill_lookahead(1);
self.lookahead.pop_front() self.lookahead.pop_front()
} }
} }
@ -219,7 +239,7 @@ impl<'text> Iterator for Topoglypher<'text> {
/// A Lexeme is the text of a given Token, without respect to that Token's type, but with respect /// A Lexeme is the text of a given Token, without respect to that Token's type, but with respect
/// to where the text appears relative to some source code. This is, simply, a string that contains /// to where the text appears relative to some source code. This is, simply, a string that contains
/// the addresses of each of its characters with respect to some source text. /// the addresses of each of its characters with respect to some source text.
#[derive(PartialEq)] #[derive(PartialEq, Clone)]
pub struct Lexeme { pub struct Lexeme {
elems: Vec<Topoglyph>, elems: Vec<Topoglyph>,
} }
@ -272,18 +292,18 @@ impl From<Vec<Topoglyph>> for Lexeme {
} }
#[allow(dead_code)] #[allow(dead_code)]
#[derive(Debug, PartialEq)] #[derive(Debug, PartialEq, Clone)]
pub enum Token { pub enum Token {
BareString(Lexeme), String(Lexeme),
Glob(Lexeme), Glob(Lexeme),
} }
struct Lexer<'text> { pub struct Tokenizer<'text> {
source: Topoglypher<'text>, source: Topoglypher<'text>,
} }
impl<'text> Lexer<'text> { impl<'text> Tokenizer<'text> {
fn new(text: &'text str) -> Self { pub fn new(text: &'text str) -> Self {
Self { Self {
source: Topoglypher::new(text), source: Topoglypher::new(text),
} }
@ -323,7 +343,7 @@ impl<'text> Lexer<'text> {
if progress.is_empty() { if progress.is_empty() {
Err(LexError::UnexpectedEOF) Err(LexError::UnexpectedEOF)
} else { } else {
Ok(Token::BareString(progress.into())) Ok(Token::String(progress.into()))
} }
} }
@ -361,7 +381,7 @@ impl<'text> Lexer<'text> {
} }
} }
impl<'text> Iterator for Lexer<'text> { impl<'text> Iterator for Tokenizer<'text> {
type Item = Result<Token, LexError>; type Item = Result<Token, LexError>;
fn next(&mut self) -> Option<Self::Item> { fn next(&mut self) -> Option<Self::Item> {
@ -369,6 +389,55 @@ impl<'text> Iterator for Lexer<'text> {
} }
} }
pub fn lex(source: &str) -> Result<Vec<Token>, LexError> {
Tokenizer::new(source).collect()
}
pub struct Lexer<'text> {
source: Tokenizer<'text>,
lookahead: VecDeque<Token>,
}
impl<'text> Lexer<'text> {
pub fn new(source: &'text str) -> Self {
Self {
source: Tokenizer::new(source),
lookahead: VecDeque::new(),
}
}
fn fill_lookahead(&mut self, n: usize) -> Result<bool, LexError> {
while self.lookahead.len() < n {
let token = match self.source.next() {
Some(res) => res?,
None => return Ok(false),
};
self.lookahead.push_back(token);
}
Ok(true)
}
pub fn peek_at(&mut self, idx: usize) -> Result<Option<&Token>, LexError> {
self.fill_lookahead(idx + 1)?;
Ok(None)
}
pub fn peek(&mut self) -> Result<Option<&Token>, LexError> {
self.peek_at(0)
}
}
impl<'text> Iterator for Lexer<'text> {
type Item = Result<Token, LexError>;
fn next(&mut self) -> Option<Self::Item> {
match self.lookahead.pop_front() {
Some(token) => Some(Ok(token)),
None => self.source.next(),
}
}
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
@ -379,7 +448,7 @@ mod tests {
#[test] #[test]
fn $name() { fn $name() {
println!("testing that we can lex the following input text:\n\t{}", $line); println!("testing that we can lex the following input text:\n\t{}", $line);
let lexer = Lexer::new($line); let lexer = Tokenizer::new($line);
let tokens: Result<Vec<Token>, LexError> = lexer.collect(); let tokens: Result<Vec<Token>, LexError> = lexer.collect();
match tokens { match tokens {
Ok(tokens) => { Ok(tokens) => {
@ -400,8 +469,7 @@ mod tests {
#[test] #[test]
fn $name() { fn $name() {
println!("testing that we will fail to lex the following input text:\n\t{}", $line); println!("testing that we will fail to lex the following input text:\n\t{}", $line);
let lexer = Lexer::new($line); let tokens = lex($line);
let tokens: Result<Vec<Token>, LexError> = lexer.collect();
match tokens { match tokens {
Ok(tokens) => { Ok(tokens) => {
println!("output tokens: {tokens:?}"); println!("output tokens: {tokens:?}");

@ -1,5 +1,5 @@
use crate::error::Error; use crate::error::Error;
pub use log::{debug, error, info, set_logger, set_max_level, trace, warn, LevelFilter}; pub use log::{debug, info, set_logger, set_max_level, warn, LevelFilter};
use std::{ use std::{
fs::File, fs::File,
@ -49,19 +49,19 @@ where
match record.level() { match record.level() {
log::Level::Error => { log::Level::Error => {
_ = write!(out, "\x1b[31m{}\x1b[0m\n", record.args()); _ = write!(out, "\x1b[31m{}\x1b[0m\n", record.args());
}, }
log::Level::Warn => { log::Level::Warn => {
_ = write!(out, "\x1b[33m{}\x1b[0m\n", record.args()); _ = write!(out, "\x1b[33m{}\x1b[0m\n", record.args());
}, }
log::Level::Info => { log::Level::Info => {
_ = write!(out, "\x1b[37m{}\x1b[0m\n", record.args()); _ = write!(out, "\x1b[37m{}\x1b[0m\n", record.args());
}, }
log::Level::Debug => { log::Level::Debug => {
_ = write!(out, "\x1b[90m{}\x1b[0m\n", record.args()); _ = write!(out, "\x1b[90m{}\x1b[0m\n", record.args());
}, }
log::Level::Trace => { log::Level::Trace => {
_ = write!(out, "\x1b[36m{}\x1b[0m\n", record.args()); _ = write!(out, "\x1b[36m{}\x1b[0m\n", record.args());
}, }
} }
} }
} }

@ -7,6 +7,7 @@ mod line;
mod log; mod log;
mod output; mod output;
mod parse; mod parse;
mod parse2;
mod prompt; mod prompt;
mod shell; mod shell;
@ -181,3 +182,7 @@ fn main() -> Result<()> {
} }
} }
} }
/*
*/

@ -47,10 +47,6 @@ impl Node {
children: Vec::new(), children: Vec::new(),
} }
} }
// pub fn visit(self) -> Tree {
// self.into()
// }
} }
impl fmt::Debug for Node { impl fmt::Debug for Node {

@ -0,0 +1,278 @@
use crate::error::ParseError;
use crate::lex::{Lexer, Token};
use std::{
cell::RefCell,
collections::VecDeque,
rc::{Rc, Weak},
};
#[derive(PartialEq)]
pub enum Value {
/// The start symbol of our parse tree. Each parse tree is rooted in a node whose value is the
/// start symbol. This is the only node in the tree that should utilize the start symbol.
Start,
Statement,
Terminal(Token),
}
impl Value {
fn is_terminal(&self) -> bool {
matches!(self, Value::Terminal(_))
}
}
/// A node in a parse tree.
pub struct Node {
/// A node may or may not have a parent node. If a node does not have a parent node, that node
/// is the root node of a tree.
parent: Option<Weak<Node>>,
/// The value of the element at this node
value: Value,
/// A node may or may not have children. Since an empty vector is a valid vector, a node
/// without children is represented as having an empty children vector. A node having an empty
/// list of children is a leaf node in a tree.
children: RefCell<Vec<Rc<Node>>>,
}
impl Node {
fn new() -> Cursor {
let root = Node {
parent: None,
value: Value::Start,
children: RefCell::new(Vec::new()),
};
let root = Rc::new(root);
Cursor {
target: Rc::clone(&root),
root,
}
}
}
/// Cursor values expose access to a parse tree.
struct Cursor {
target: Rc<Node>,
root: Rc<Node>,
}
impl Cursor {
/// Climbs one level up a parse tree. The cursor is re-pointed from its current target node to
/// the parent of its current target node. This method fails if the cursor is already at the
/// root node of the parse tree.
fn up(&mut self) -> Result<(), ParseError> {
match &self.target.parent {
None => Err(ParseError::AtRootAlready),
Some(parent) => match parent.upgrade() {
Some(parent) => {
self.target = parent;
Ok(())
}
None => Err(ParseError::ParentIsGone),
},
}
}
/// Adds a value to the children of the current target node, then descends to select that
/// child.
fn push(&mut self, v: Value) -> Result<(), ParseError> {
if self.target.value.is_terminal() {
return Err(ParseError::PushOntoTerminal);
}
let node = Node {
parent: Some(Rc::downgrade(&self.target)),
value: v,
children: RefCell::new(Vec::new()),
};
let node = Rc::new(node);
self.target
.children
.try_borrow_mut()?
.push(Rc::clone(&node));
self.target = node;
Ok(())
}
fn is_root(&self) -> bool {
self.target.parent.is_none()
}
fn into_root(self) -> Rc<Node> {
Rc::clone(&self.root)
}
fn value(&self) -> &Value {
&self.target.value
}
}
struct Parser<'text> {
source: Lexer<'text>,
cursor: Cursor,
}
impl<'text> Parser<'text> {
pub fn new(source: Lexer<'text>) -> Self {
Self {
source,
cursor: Node::new(),
}
}
pub fn parse(mut self) -> Result<Rc<Node>, ParseError> {
while self.step()? {}
Ok(self.cursor.into_root())
}
fn step(&mut self) -> Result<bool, ParseError> {
match self.cursor.value() {
Value::Start => self.step_start(),
Value::Statement => self.step_statement(),
Value::Terminal(_) => panic!(),
}
}
fn step_start(&mut self) -> Result<bool, ParseError> {
assert!(matches!(self.cursor.value(), Value::Start));
match self.source.peek()? {
Some(Token::String(_)) => {
self.cursor.push(Value::Statement)?;
let token = self.source.next().unwrap()?;
self.cursor.push(Value::Terminal(token))?;
self.cursor.up()?;
Ok(true)
}
Some(Token::Glob(_)) => {
let token = self.source.next().unwrap()?;
Err(ParseError::UnexpectedToken(token))
}
None => Ok(false),
}
}
fn step_statement(&mut self) -> Result<bool, ParseError> {
assert!(matches!(self.cursor.value(), Value::Statement));
match self.source.peek()? {
Some(Token::String(_) | Token::Glob(_)) => {
let token = self.source.next().unwrap()?;
self.cursor.push(Value::Terminal(token))?;
self.cursor.up()?;
Ok(true)
}
None => Ok(false),
}
}
}
fn parse(source: &str) -> Result<Rc<Node>, ParseError> {
let tokens = Lexer::new(source);
let parser = Parser::new(tokens);
parser.parse()
}
#[cfg(test)]
mod test {
use super::*;
use crate::lex::lex;
#[test]
fn root() {
let mut cursor = Node::new();
assert!(cursor.up().is_err());
assert!(cursor.target.value == Value::Start);
assert!(cursor.is_root());
}
#[test]
fn single_val() {
let mut cursor = Node::new();
let mut tokens = lex(" ls ").unwrap();
let ls = tokens.pop().unwrap();
assert!(cursor.push(Value::Statement).is_ok());
assert!(cursor.push(Value::Terminal(ls.clone())).is_ok());
assert!(cursor.push(Value::Terminal(ls.clone())).is_err());
assert!(cursor.target.value == Value::Terminal(ls));
assert!(!cursor.is_root());
assert!(cursor.up().is_ok());
assert!(cursor.up().is_ok());
assert!(cursor.is_root());
assert!(cursor.up().is_err());
assert!(cursor.value() == &Value::Start);
let root = cursor.into_root();
assert!(root.value == Value::Start);
}
#[test]
fn test_parse() -> Result<(), ParseError> {
parse("ls")?;
// parse("*")?;
// parse("x* ls")?;
Ok(())
}
}
/*
> ls
start
statement
ls
> ls ;
start
statement
ls
;
> ls ; ls
start
statement
ls
;
statement
ls
> ls one two three
start
statement
ls
one
two
three
> ls > files.txt ; echo files.txt
start
statement
ls
>
files.txt
;
statement
echo
files.txt
> if exists ~/.vimrc : echo you have a vimrc
> if $x == 3: echo hi
start
if
expression
$x
==
3
:
statement
echo
hi
*/
Loading…
Cancel
Save