Skip to content

Commit

Permalink
arrays
Browse files Browse the repository at this point in the history
  • Loading branch information
pmitev committed Sep 2, 2024
1 parent 515353b commit 310e440
Show file tree
Hide file tree
Showing 2 changed files with 89 additions and 1 deletion.
88 changes: 88 additions & 0 deletions docs/arrays.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,92 @@
# Awk arrays

Here is a simple array definition and a way to scan its elements.

```awk
#!/usr/bin/awk -f
BEGIN {
D["a"]="A"
D["b"]="B"
D["c"]="C"
for (i in D){ # loop over the index
print i" : "D[i]
}
}
```

Output:
```
a : A
b : B
c : C
```
> Note: by default, when a for loop traverses an array, the order is undefined, meaning that the awk implementation determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of awk to the next.. Here is how to sort them [Predefined Array Scanning order](https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html)
## Multidimensional arrays - [docs](https://www.gnu.org/software/gawk/manual/html_node/Multidimensional.html)
A multidimensional array is an array in which an element is identified by a sequence of indices instead of a single index. For example, a two-dimensional array requires two indices. The usual way (in many languages, including awk) to refer to an element of a two-dimensional array named grid is with `grid[x,y]`.

```awk
#!/usr/bin/awk -f
BEGIN {
D["a","A"]="aA"
D["a","B"]="aB"
D["a","C"]="aC"
D["b","A"]="bA"
for (i in D){ # loop over the first index
print i" : "DD[i]
print "--------"
}
}
```
Output:
```
aA : aA
--------
aB : aB
--------
aC : aC
--------
bA : bA
--------
```

> If you look carefully, `i` iterates over an index that is a string concatenated of both indexes. In other words, the combined string is used as a single index into an ordinary, one-dimensional array. This makes it somewhat dificult to iterate ovet the second index... but could be used in some specific solutions like [Manipulating the output from a genome analysis - vcf and gff](./Case_studies/manipulating_vcf.md).
## Array of arrays - [docs](https://www.gnu.org/software/gawk/manual/html_node/Arrays-of-Arrays.html)

The so called "Array of Arrays" implementation is easier for scanning (iterating) than the above [multidimensional array](https://www.gnu.org/software/gawk/manual/html_node/Multidimensional.html) implementation.

```awk
#!/usr/bin/awk -f
BEGIN {
D["a"]["A"]="aA"
D["a"]["B"]="aB"
D["a"]["C"]="aC"
D["b"]["A"]="bA"
for (i in D){ # loop over the first index
for (j in D[i]) # loop over the second index
print i","j" : "D[i][j]
print "--------"
}
}
```
Output
```
a,A : aA
a,B : aB
a,C : aC
--------
b,A : bA
--------
```

## More

If you want to learn or just check what other "tricks" one could do with arrays in Awk, here a suggested tutorial on the topic - look for "AWK tips and tricks" section on the page.

!!! quote
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ nav:
- 2.Teasing with grep: 2.Teasing_with_grep.md
- 3.Shell we awk: 3.Shell_we_awk.md
- 4.Brief commands: 4.Brief_commands.md
- 4.a Awk arrays: arrays.md
- 5.String manipulation: 5.String_manipulation.md
- 6.One line programs: 6.One_line_programs.md
- Python vs. awk: Python_vs_awk.md
Expand Down Expand Up @@ -34,7 +35,6 @@ nav:
- Fixed size fields: Other/Fixed_size_fields.md
- Backreferences: Other/Backreferences.md
- Multi-Line records: https://www.gnu.org/software/gawk/manual/html_node/Multiple-Line.html
- Awk arrays: arrays.md
- Awk or Bash: awk_bash.md
- Localization problems: Other/Localization.md
- Awk vs. nawk vs. gawk: http://www.thegeekstuff.com/2011/06/awk-nawk-gawk/
Expand Down

0 comments on commit 310e440

Please sign in to comment.