A highly efficient collection of text manipulation functions written in C++ based on the .Net framework. It has been developed as an additional tool in the course of my master thesis at the Graz University of Technology.
You can specify an arbitrary amount of source text (.txt) files to work with. Command line flags start with a “-“, whereas source files don’t. Command line flags are seperated by spaces and can occur in any order. In the following, usage options will be presented in greater detail.
Note: TextTools is not being maintained anymore. Nevertheless, you can still download the last version in the download section below.
Using the command line version is pretty straight forward. Here is a simple example of how you can call Texttools from the command line:
Texttools -w=my_output_file.txt -s=asc -e=; -ed=my_extracted_lines.txt source.txt
This command does the following:
- reads in file “source.txt” (whereas blank lines are excluded by default)
- sorts the read lines from “source.txt” lexicographically ascending (default). Can be turned OFF by using flag -b=0
- extracts all characters from the lines until the delimiter “;” is found. If delimiter is not found the entire line is extracted (-e=;)
- writes the extracted lines to file “my_extracted_lines.txt” (-ef=my_extr…)
- writes the sorted lines from (2) back to “my_output_file.txt” (-w=my_out…)
You can always get the flag overview by calling Texttools.exe without any flags:
[-p= [-ps=src] [-pd=dest]]
- if 0 statistics will NOT be displayed, is turned ON by default
- extracts all (sub)strings from [src_files] up to the delimiter specified.
- Use -ed to specify which destination filename to use, otherwise
DEFAULT_EXTRACTION_FILEwill be used
- rotates strings from [src_files] until given delimiter. If delimiter is empty the entire line will be rotated.
- Use -od to specify which destination filename to use, otherwise
DEFAULT_ROTATION_FILEwill be used
- replaces (sub)string from [src_files] based on the rules file specified by the flag -ps.
- Use -ps to specify which source filename to use, otherwise
DEFAULT_RULE_FILEwill be used.
- Use -pd to specify which destination filename to use, otherwise
DEFAULT_REPLACED_FILEwill be used.
- used to sort list read from src_files. NOTE that once -s flag is used the internal list will be sorted and the program will use the sorted list from this point onwards!
- if 0 or empty list will NOT be sorted
- if asc list is sorted lexicographic ASCending
- if desc list is sorted lexicographic DESCending
- otherwise ERROR
- used to uniquify list, i.e. removes all double entries
- if empty simply removes all “equal” strings,
- otherwise takes predicate based on flag specified [NOT AVAILABLE YET]
- takes -ud if specified for destination filename to write removed lines to.
- if empty
- otherwise dest_filename is used
- used to write list generated during reading of [src_files] to file specified.
- If empty or not existent
- If s then the program will automatically write the output-lines to files which start the line’s leading character, i.e. a-z.txt and other.txt are used.
- otherwise specified filename is taken (if it not already exists)
- If c the output-lines will be printed to the commandline.
- removes expression specified from [src_files] [NOT AVAILABLE YET]
List of default filenames
This file must have the following format:
old_expr | new_expr [ | begin_env [ | end_env ] ]
Expressions can be simple replaced by other expression in context of the entire line.
With Texttools you can also replace expressions inside so-called environments.
An environment has a special beginning and an end (delimiters). This can either be a single character or an arbitrarely long string.
The environment is optional: If you do not specify an environment Texttools will simply replace ALL occurrences of old_expr by new_expr.
When specifying an environment you have to following options:
... [ | (EMPTY || *#*BOL*#* || left_delim) [ | (EMPTY || *#*EOL*#* || right_delim) ] ]
*#*BOL*#* stands for “begin of line”, meaning that old_expr must be the first word in the current line.
*#*EOL*#* stands for “end of line”, meaning that old_expr must be the last word in the current line.
TEST|new will replace ALL occurrences of TEST with new in each line and is the same as
TEST | new | *#*BOL*#* | *#*EOL*#*
TEST|new|*#*BOL*#* will replace ALL occurrences of TEST with new in each line only if TEST is the first word in the current line
TEST|new||*#*EOL*#* will replace ALL occurrences of TEST with new in each line only if TEST is the last word in the current line