1 Star 5 Fork 1

望尽海 / Fregex

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT
> gcc -Os -c re.c
> size re.o
    text     data     bss     dec     hex filename
    2319        0     544    2863     b2f re.o
    

For ARM/Thumb using GCC 4.8.1 it's around 1.5kb code and less RAM :

> arm-none-eabi-gcc -Os -mthumb -c re.c
> size re.o
    text     data     bss     dec     hex filename
    1418        0     280    1698     6a2 re.o

For 8-bit AVR using AVR-GCC 4.8.1 it's around 2kb code and less RAM :

> avr-gcc -Os -c re.c
> size re.o
    text     data     bss     dec     hex filename
    2128        0     130    2258     8d2 re.o

API

This is the public / exported API:

/* Typedef'd pointer to hide implementation details. */
typedef struct regex_t* re_t;

/* Compiles regex string pattern to a regex_t-array. */
re_t re_compile(const char* pattern);

/* Finds matches of the compiled pattern inside text. */
int  re_matchp(re_t pattern, const char* text);

/* Finds matches of pattern inside text (compiles first automatically). */
int  re_match(const char* pattern, const char* text);

Supported regex-operators

The following features / regex-operators are supported by this library.

NOTE: inverted character classes are buggy - see the test harness for concrete examples.

  • . Dot, matches any character
  • ^ Start anchor, matches beginning of string
  • $ End anchor, matches end of string
  • * Asterisk, match zero or more (greedy)
  • + Plus, match one or more (greedy)
  • ? Question, match zero or one (non-greedy) Îʺűí´ïʽ´æÔÚBUG
  • [abc] Character class, match if one of {'a', 'b', 'c'}
  • [^abc] Inverted class, match if NOT one of {'a', 'b', 'c'} NOTE: This feature is currently broken for some usage of character ranges!
  • [a-zA-Z] Character ranges, the character set of the ranges { a-z | A-Z }
  • \s Whitespace, \t \f \r \n \v and spaces
  • \S Non-whitespace
  • \w Alphanumeric, [a-zA-Z0-9_]
  • \W Non-alphanumeric
  • \d Digits, [0-9]
  • \D Non-digits

Usage

Compile a regex from ASCII-string (char-array) to a custom pattern structure using re_compile().

Search a text-string for a regex and get an index into the string, using re_match() or re_matchp().

The returned index points to the first place in the string, where the regex pattern matches.

If the regular expression doesn't match, the matching function returns an index of -1 to indicate failure.

Examples

Example of usage:

/* Standard null-terminated C-string to search: */
const char* string_to_search = "ahem.. 'hello world !' ..";

/* Compile a simple regular expression using character classes, meta-char and greedy + non-greedy quantifiers: */
re_t pattern = re_compile("[Hh]ello [Ww]orld\\s*[!]?");

/* Check if the regex matches the text: */
int match_idx = re_matchp(pattern, string_to_search);
if (match_idx != -1)
{
  printf("match at idx %d.\n", match_idx);
}

For more usage examples I encourage you to look at the code in the tests-folder.

TODO

  • Fix the implementation of inverted character classes.
  • Fix implementation of branches (|), and see if that can lead us closer to groups as well, e.g. (a|b)+.
  • Add example.c that demonstrates usage.
  • Add tests/test_perf.c for performance and time measurements.
  • Testing: Improve pattern rejection testing.

FAQ

  • Q: What differentiates this library from other C regex implementations?

    A: Well, the small size for one. <500 lines of C-code compiling to 2-3kb ROM, using very little RAM.

License

All material in this repository is in the public domain.

MIT License Copyright (c) 2021 望尽海 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

C语言简化版正则表达式,tiny-regex-c修改而来 展开 收起
C
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
C
1
https://gitee.com/FxMan/Fregex.git
git@gitee.com:FxMan/Fregex.git
FxMan
Fregex
Fregex
master

搜索帮助