ceptr
 All Data Structures Files Functions Variables Typedefs Macros Modules Pages
Semantic Tree Regular Expressions

Semtrex provides a language for pattern matching on semantic trees similar to regular expressions for matching on text strings. More...

Files

file  semtrex.c
 semtrex implementation
 
file  semtrex.h
 Semantic tree regular expression header file.
 

Data Structures

struct  SgroupOpen
 
struct  SgroupClose
 
struct  Svalue
 
struct  Sliteral
 
union  STypeData
 
struct  SState
 

Macros

#define isTransitionPop(t)   (t<0)
 
#define isTransitionNext(t)   (t==0)
 
#define LITERAL_NOT   0x01
 
#define LITERAL_SET   0x02
 
#define _sl(t, s)   __sl(t,false,1,s)
 macro to add a single symbol literal to semtrex tree
 
#define _sln(t, s)   __sl(t,true,1,s)
 macro to add a single symbol literal not to semtrex tree
 
#define DS(l, t)   {char buf[1000];puts("\n" #l ":");_dump_semtrex(G_sem,t,buf);puts(buf);}
 debugging macro for quickly dumping out a semtrex text string
 

Typedefs

typedef int StateType
 
typedef int TransitionType
 
typedef struct SState SState
 
typedef struct SgroupOpen SgroupOpen
 
typedef struct SgroupClose SgroupClose
 
typedef struct Svalue Svalue
 
typedef struct Sliteral Sliteral
 
typedef union STypeData STypeData
 

Enumerations

enum  StateType {
  StateSymbol, StateAny, StateValue, StateSplit,
  StateMatch, StateGroupOpen, StateGroupClose, StateDescend,
  StateWalk, StateNot
}
 
enum  { TransitionDown =1, TransitionNone =0x8000 }
 

Functions

SState_stx_makeFA (T *s, int *statesP)
 
void _stx_freeFA (SState *s)
 
int _t_match (T *semtrex, T *t)
 
int _t_matchr (T *semtrex, T *t, T **r)
 
T_stx_get_matched_node (Symbol s, T *match_results, T *match_tree, int *sibs)
 
void _stx_replace (T *semtrex, T *t, T *replace)
 
T_t_get_match (T *result, Symbol group)
 
T__t_embody_from_match (SemTable *sem, T *match, T *t)
 
T_t_embody_from_match (SemTable *sem, T *match, Symbol group, T *t)
 
char * _dump_semtrex (SemTable *sem, T *s, char *buf)
 
TmakeASCIITree (char *c)
 
TparseSemtrex (SemTable *sem, char *stx)
 
T_stx_results2sem_map (SemTable *sem, T *match_results, T *match_tree)
 
T__stxcv (T *stxx, char c)
 
T__stxcvm (T *stxx, int not, int count,...)
 
TasciiT_toi (T *asciiT, T *match, T *t, Symbol s)
 
TasciiT_tol (T *asciiT, T *match, T *t, Symbol s)
 
TasciiT_tof (T *asciiT, T *match, T *t, Symbol s)
 
TasciiT_tos (T *asciiT, T *match, T *t, Symbol s)
 
TasciiT_toc (T *asciiT, T *match, T *t, Symbol s)
 
Twrap (T *tokens, T *results, Symbol contents_s, Symbol open_s)
 
T__sl (T *p, bool not, int count,...)
 
void __stx_dump (SState *s, char *buf)
 
char * _stx_dump (SState *s, char *buf)
 
void stx_dump (T *s)
 

Variables

SStateG_cur_stx_state
 
char G_stx_dump_buf [100000]
 

Detailed Description

Semtrex provides a language for pattern matching on semantic trees similar to regular expressions for matching on text strings.

Semtrex explained

Semantic Tree Regular Expressions (Semtrex for short) provide a matching language for semantic trees that will feel familiar to anyone who has used regular expressions for matching on strings.
A Semtrex is itself a semantic tree, but we have also created a linear textual representation of (and parser for) semtrex trees to make it easier to create them until we build better UI for processing trees in general.
Examples (with tree version and the parts of a sample tree that match in red)

Here's a BNF for the textual representation of semtrex trees:
<semtrex> ::= <root> [ "/" <child>...]
<root> ::= "/" <element>
<child> ::= <element> | <optional> | <semtrex>
<element> ::= <walk> | <symbol_literal> | <value_literal> | <symbol_literal_set> | <value_literal_set> | <any> | <capture> | <siblings>
<optional> ::= <or> | <zero_or_more> | <one_or_more> | <zero_or_one>
<walk> ::= "%" <semtrex>
<one_or_more> ::= <semtrex> "+"
<zero_or_more> ::= <semtrex> "*"
<zero_or_one> ::= <semtrex> "?"
<sequence> ::= <semtrex> ["," <semtrex>]...
<siblings> ::= "(" <sequence> ")"
<or> ::= <semtrex> "|" <semtrex>
<symbol_literal> ::= ["!"] <symbol>
<symbol_literal_set> ::= ["!"] "{" <symbol> ["," <symbol>]... "}"
<value> ::= <string> | <char> | Int | Fload
<value_literal> ::= <symbol> ["!"] "=" <value>
<value_literal_set> ::= <symbol> ["!"] "={" <value> ["," <value> ]... "}"
<any> ::= "."
<capture> ::= "<" <symbol> ":" <semtrex> ">"
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<letter> ::= a | b ... | z | A | B ... | Z
<special> ::= "_"
<symbol> ::= (<letter> | <digit> | <special>)+
<char> ::= "'" <letter> | <digit> | <special> "'"
<string> ::= "\"" (<letter> | <digit> | <special>)+ "\""
<float> ::= <digit>+ ["." <digit>+ ]

Typedef Documentation

typedef struct SgroupClose SgroupClose

data for group close FSA state

typedef struct SgroupOpen SgroupOpen

data for group open FSA state

typedef union STypeData STypeData

Different state types need to store different kinds of values so we put them in a union

Enumeration Type Documentation

anonymous enum

The possible transitions in the match tree when advancing through states in the FSA.

In an old fashioned regex, the transition is implicit because it's always "NextCharacter". For semtrex we need to expand the possibilities. NOTE: the actual value stored in the SState structure may be a negative number less than -1, because you can pop up multiple level, but you never pop down more than once

Definition at line 30 of file semtrex.h.

Function Documentation

T* __sl ( T p,
bool  not,
int  count,
  ... 
)

utility function to create a semtrex litteral symbol set

Definition at line 1279 of file semtrex.c.

char* _dump_semtrex ( SemTable sem,
T s,
char *  buf 
)

convert a semtrex tree into linear text format

Parameters
[in]semthe semantic contexts
[in]sthe semtrex tree
[in]bufthe string buffer to fill
Returns
the buffer

Definition at line 1096 of file semtrex.c.

void _stx_freeFA ( SState s)

free the memory allocated by an FSA

Definition at line 367 of file semtrex.c.

SState* _stx_makeFA ( T t,
int *  statesP 
)

wrapper function for building the finite state automata recursively and patching it to the final match state

Definition at line 323 of file semtrex.c.

T* _stx_results2sem_map ( SemTable sem,
T match_results,
T match_tree 
)

create SEMANTIC_MAP tree from a SEMTREX_MATCH tree for use in filling templates

Parameters
[in]match_resultsresults tree from _t_matchr
[in]match_treethe tree the semtrex was matched against a SEMANTIC_MAP tree

Definition at line 2029 of file semtrex.c.

T* _t_embody_from_match ( SemTable sem,
T match,
Symbol  group,
T t 
)

create a new tree based on the matched elements from a semtrex match

Parameters
[in]semsemantic context
[in]matcha match results from a call to _t_matchr
[in]groupthe symbol of the group in the match to embody
[in]tthe matching tree

Definition at line 873 of file semtrex.c.

T* _t_get_match ( T match,
Symbol  group 
)

extract the portion of a semtrex match results that corresponds with a given group symbol

Parameters
[in]resultmatch results from the _t_matchr call
[in]groupthe uid from the semtrex group that you want the result for
Returns
T of the match or NULL if no such match found

Definition at line 848 of file semtrex.c.

int _t_match ( T semtrex,
T t 
)

Match a tree against a semtrex

Parameters
[in]semtrexthe semtrex pattern tree
[in]tthe tree to match against the pattern
Returns
1 or 0 if matched or not

Definition at line 809 of file semtrex.c.

int _t_matchr ( T semtrex,
T t,
T **  rP 
)

Match a tree against a semtrex and get back match results

Parameters
[in]semtrexthe semtrex pattern tree
[in]tthe tree to match against the pattern
[in,out]rPa pointer to a T to be filled with a match results tree
Returns
1 or 0 if matched or not

Definition at line 798 of file semtrex.c.

T* asciiT_toc ( T asciiT,
T match,
T t,
Symbol  s 
)

convert ascii tokens from a match to a char and add them to the given tree

Definition at line 1270 of file semtrex.c.

T* asciiT_tof ( T asciiT,
T match,
T t,
Symbol  s 
)

convert ascii tokens from a match to an float and add them to the given tree

Definition at line 1251 of file semtrex.c.

T* asciiT_toi ( T asciiT,
T match,
T t,
Symbol  s 
)

convert ascii tokens from a match to an integer and add them to the given tree

Definition at line 1233 of file semtrex.c.

T* asciiT_tol ( T asciiT,
T match,
T t,
Symbol  s 
)

convert ascii tokens from a match to a 64 bit integer and add them to the given tree

Definition at line 1242 of file semtrex.c.

T* asciiT_tos ( T asciiT,
T match,
T t,
Symbol  s 
)

convert ascii tokens from a match to a string and add them to the given tree

Definition at line 1261 of file semtrex.c.

T* makeASCIITree ( char *  c)

convert a cstring to an ASCII_CHARS tree

Parameters
[in]cstring
Returns
T ASCII_CHARS tree

Definition at line 1206 of file semtrex.c.

T* parseSemtrex ( SemTable sem,
char *  stx 
)

convert a cstring to semtrex tree

Parameters
[in]rReceptor context for searching for symbols
[in]stxthe cstring representation of a semtrex tree
Returns
T semtrex tree
Todo:
, this should be implemented using template filling?

Definition at line 1298 of file semtrex.c.