pawk: "Like Awk, but in Python"

About

The venerable tool Awk has showed us how useful it is to be able to write quick one-liners to process files one line at a time.

Awk's only downside for me is that I don't use it enough to remember the Awk language from one time to the next.

Wouldn't it be nice to have an Awk-inspired tool, but where the language is Python? Where it knows how to read csv and parquet files? Where it has a few synonyms for the command-line flags so that what we type is more likely to work on the first try? This is what "pawk" is about.

Overview

At its heart the tool will read a file, split each line into words, and give that to Python code you provide.

You can also specify code to run before or after the loop. Since your code is put directly into the program, you can use continue or break to go to the next line or stop processing.

The program will also parse csv, tsv, json, toml, yaml or parquet files if given as input so you don't have to worry about things like commas in quoted strings in your csv. It will import datetime, defaultdict, re, and json for you so you don't have to.

To save you from having to initialize them, variables a to z are already set to 0 on start for you. You can override this of course.

There are a few more niceties, use pawk --help to see a list.

Usage examples

# Print $PATH on separate lines
echo $PATH | pawk -F: 'print("\n".join(words))'

# count lines
cat README.md | pawk --begin 'c=0' --each 'c+=1' --end 'print(f"line count: {c}")'

# count lines too, but rely on single-letter variables auto-initialization (to 0)
# Also, show --file is an option.
pawk --file README.md 'c+=1' --end 'print(f"line count: {c}")'

# multi-line commands work, but you have to obey Python's indentation rules.
pawk --file README.md --each 'if line.startswith("```"): a = not a; continue
if a:
  print("    " + line)
else:
  print(line)'

# Format text into two columns
pawk 'if not d: old_line=line;d=1' 'else: print(f"{old_line[:40]:<40}{line[:40]}");d=0 ' --last 'if d: print(old_line)'

# Show the columns in a Parquet file
pawk --file tests/data/delta_byte_array.parquet --last 'print(header)'

# Look for a regular expression
pawk --file tests/data/numbers.csv 'if re.match(r"o.*", words[1]): print(line)'

# count distinct words
pawk --file README.md --begin 'd=defaultdict(str)' 'for w in words: d[w]=1' \
     --end 'print(f"distinct words: {len(d.keys())}")'

# Read a specific key in a JSON file
pawk --file tests/data/onejson.json --print 'word["two"]'

# Process an array of JSON objects
pawk --file tests/data/jsonarray.json --each 'a += word["age"]' \
     --end 'print(f"sum of ages: {a}")'

# Read a key in a yaml file (from stdin)
cat tests/data/planets.yaml | pawk --mode yaml --print 'word["planet"]'

Installation

Make the venv and pip-install the requirements like this:

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install --upgrade pip setuptools wheel
$ pip install -r requirements.txt
$ deactivate

Then put a symlink to pawk somewhere in your PATH. Something like this:

$ ln -s $(pwd)/pawk ~/bin/pawk

You can then run pawk directly. Try the help command!

$ pawk --help

Notes

There is another tool also called "Pawk." We got the same inspiration but the two efforts are otherwise separate.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pawk		pawk
pawk1.py		pawk1.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pawk: "Like Awk, but in Python"

About

Overview

Usage examples

Installation

Notes

About

Releases

Packages

Languages

License

jean-philippe-martin/pawk

Folders and files

Latest commit

History

Repository files navigation

pawk: "Like Awk, but in Python"

About

Overview

Usage examples

Installation

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages