Web Hosting Talk







View Full Version : finding text between two characters


innova
12-20-2004, 06:36 PM
Hi,

What is the most efficient way (in bash shell) to grab a block of text from a file? My particular problem is this:


var1=value1
var2=value2

[Section 2]
var3=value3
var4=value4

I want to be able to store the block that contains JUST the data from Section1 as a variable. Thus, I figure that I need to grab all the text between '[Section1]' and the following '[' that starts the next section.

Alternately, I could do something similar to what apache does, and close each section with . Either way, its the same basic problem.

For sake of speed, I would like to avoid using arrays if possible, as they are not too fast in bash. I know I could just do line-by-line processing into an array, and filter the unnecessary parts, but I suspect there is a much cleaner way. I would also like to not use an external language (Perl/php etc). Builtins and common external programs like sed/awk/tr are fine. Anyone have an idea?

runesolutions
12-20-2004, 07:40 PM
I can't remember where this came from (it's not my own work - thanks to whoever wrote it) but I nearly wanted to do something similar to you some time ago (but didn't in the end).

This is an awk script that parses ini files:


#
# parseini --- parses 'ini' style configuration files.
#
# Usage:
# awk -f parseini S=<section> P=<param> <ini file>
#
# if section is an empty string, then we use the default section
#
# example ini file:
#
# fruit = apple
# fruit = pear
# multiline = this is a multiline \
# parameter
#
# # this is a comment
#
# [colors]
# red = yes
# green = no
# blue = maybe
#
# [ocean]
# fish = red
# fish = blue
#
# example usage:
# > awk -f parseini S=ocean P=fish testfile.ini
# would return:
# red
# blue
#

BEGIN {
readlines = 1
implied = 1
}

# remove lines starting with #, but not #!
/^#[^!]/ {next}

# skip blank
/^[ \r\t]*$/ {next}

# we want to read the lines of the matched section
# and disable for other sections
/^\[.+\][ \r\t]*$/ {
continueline = 0
if (S && implied) {
nline = 0
implied = 0
}
if (S && match($0, "^\\[" S "\\][ \n]*")) {
# we found the section, so start reading.
readlines = 1
}
else {
# no section, so stop reading lines
if (readlines) readlines = 0
}
next
}

# when reading, store lines.

{
if (!readlines) next
line[nline++] = $0
if ($0 ~ /\\[ \r\t]*$/)
continueline = 1
else
continueline = 0
}

# process the read lines lines, matching parameters

END {
# if section is set but implied is still true
# then we never found the section, so use everything
if (S && implied) {
nline = 0
}

# if have P then find P in read lines and get values
if (P) {
MATCH = "^[ \r\t]*" P "[ \r\t]*="
continueline = 0
for (x = 0; x < nline; ++x) {
v = line[x]
if (continueline) {
sub(/[ \r\t]+$/, "", v)
if (v ~ /\\$/) {
v = substr(v, 1, length(v)-1)
sub(/[ \r\t]+$/, "", v)
}
if (v) value[nvalue++] = v
}
else if (v ~ MATCH) {
sub(MATCH, "", v)
sub(/^[ \r\t]+/, "", v)
sub(/[ \r\t]+$/, "", v)
if (v ~ /\\$/) {
continueline = 1
v = substr(v, 1, length(v)-1)
sub(/[ \r\t]+$/, "", v)
}
if (v) value[nvalue++] = v
}
}
# copy parameter definition to output array
nline = nvalue
for (x = 0; x < nvalue; ++x)
line[x] = value[x]
}

# trim all leading & trailing whitespace;
# except for leading whitespace in continuation lines,

for (x = 0; x < nline; ++x) {
sub(/^[ \r\t]+/, "", line[x])
sub(/[ \r\t]+$/, "", line[x])
}

# output the final result
for (x = 0; x < nline; ++x)
print line[x]

if (nline) exit 0
else exit 1
}


This may help.

Burhan
12-21-2004, 02:20 AM
I want to be able to store the block that contains JUST the data from Section1 as a variable. Thus, I figure that I need to grab all the text between '[Section1]' and the following '[' that starts the next section.


Couldn't grep be used for this?