Viewing Category: Shell Programming  [clear category selection]

A Large Prime, Please

I wanted a largish prime number with which I could test a hash table that uses the modulo of the prime to select a bucket. Anyway, there's a handy command line option in OpenSSL to determine which inputs are prime. So, say I wanted a prime of something a bit bigger than 5 million:

for N in $(seq 5000000 5000100); do openssl prime $N | awk '/is prime/ {print "ibase=16;"$1}' | bc done

The command returns 5000011, 5000077, 5000081 and 5000087. All lovely primes. I found this trick on an OpenSSL Command-Line Howto page. You might also be surprised to know that the seq command from GNU core utilities doesn't exist on Mac OS X. Here's a little hack that Fredrik Rodland created to mimic seq with jot.

External IP in Shell Script

I saw a tip from @linuxalive about obtaining the apparent public IP address from the current machine. In a blog post last month, Racker Hacker wrote about his frustration viewing the HTML returned by the service DynDNS offers at checkip.dyndns.org. He created a site at icanhazip.com that echos only the IP in the server response. (A commenter on that post mentioned whatismyip.org for the same result.)

With these HTTP services, it's possible to include the result in shell scripts or programs that do something useful. Here are two examples using Wget and cURL:

wget -O - -q icanhazip.com curl whatismyip.org

Using a system without HTTP aware apps, it's possible to get the same result with nc or Netcat

H=icanhazip.com echo -e 'GET / HTTP/1.0\nHost: '$H'\n' | nc $H 80

Of course, you'll see the HTTP response headers using the last method. To show only the IP address, pipe it through grep:

grep -o -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$'

Surely that was more than you wanted to know. ;)

Firewall Blacklist Tool

A few months ago I read an article on nixCraft about blocking IPs by country. It's a bit harsh block an entire country, but some machines just don't need to be accessible to the entire world. I've also had the need to quickly drop packets from an IP that is running an attack. Here are my design goals:

  • Easy to add/edit/delete blocks, but also maintain version control (history/log)
  • Ability to add notes about why the IP was blocked (password guessing, vulnerability scan, flood, etc.)
  • Automatic updating of data files.
  • Reporting on rules in place and packets dropped.
  • Easy installation to multiple machines.
  • Integrate nicely with existing iptables configuration on RHEL/CentOS
  • Packaged as an RPM to properly check for system dependencies.
  • Authenticate downloaded country data by computing message digest hash.

For the impatient, here's a link to the project source directory: blacklist

The project includes an installer, appropriately named install that will: 1) create a data directory (/etc/blacklist), 2) create a cron file (/etc/cron.daily/blacklist), 3) create a sample rules file (/etc/sysconfig/blacklist) and 4) copy the main application script (/usr/local/sbin/blacklist). Beyond the typical Linux shell tools, blacklist requires wget and perl. The script could be rewritten to use curl and gawk, if one was so motivated.

To integrate blacklist with the standard firewall configuration, insert the following lines into /etc/sysconfig/iptables:

:Blacklist - [0:0] # Rules are inserted into this chain by /usr/local/sbin/blacklist # Rules are defined in /etc/sysconfig/blacklist :Blocked - [0:0] -A Blocked -p tcp -j LOG --log-prefix "Blocked: " -A Blocked -j DROP :Firewall - [0:0] -A Firewall -p tcp -j Blacklist

The rule syntax takes two basic forms: a source IPv4 address (with an optional destination TCP port) and a two-character country code (see ISO 3166-2 and IPdeny Country List). When blacklist reads the country code rule, it injects the data collected from downloaded zone file.

When the iptables rules are in place, a simple summary can be displayed using the blacklist -s option. Here's a sample summary:

/usr/local/sbin/blocklist -s 4528 rules, 51 pkts, 2596 bytes

Details of the specific packet dropped are logged to /var/log/messages in the block chain.

The cron script will purge and refresh the zone files every week. There's probably no reason to pull new data any more often, however it can be done manually with the blacklist -p option.

Bundling this up as an RPM package is on the TODO list. I haven't built an RPM from scratch in many years. If you are interested in using blacklist on your systems via an RPM package, please let me know.

Tomcat Monitoring and Startup Via Cron

Over the last week, my virtual private server needed to be restarted a couple times. Once, I was there to see the restart, and manually run the script to bring Tomcat up. However, another time I wasn't. Since I run Tomcat from a plain user account, it doesn't start up with the server itself using the SysV-style init scripts from /etc/init.d. Many years ago, I created a cronjob to check on an Eggdrop IRC bot that sometimes died or went haywire. The same solution works fine for Tomcat. The following shell script (re)starts Tomcat, if needed. It searches for all the processes with the command name java. Any found processes are output with a user-defined format that includes just two fields, and no header. The next command in the pipeline filters out lines that do not start with the appropriate username -- the one that kicked off the cronjob. Any lines making it to the second grep are matched for the Tomcat class name. Lastly, wc counts up the number of lines, which should accurately specify the number of Tomcat instances started by this user. Currently, there aren't any other user accounts that would start an instance of Tomcat, but it's best to prepare for the possibility that a different user account will run Tomcat.

#!/bin/sh export CATALINA_HOME=$HOME/server/tomcat export JRE_HOME=$HOME/java/jre/default PROCS=`/bin/ps -C java -o euser=,args= | grep -E "^$USER" | grep -o -F org.apache.catalina.startup.Bootstrap | wc -l` case $PROCS in 0) echo "Tomcat does not appear to be running. Starting Tomcat..." $CATALINA_HOME/bin/catalina.sh start exit 1 ;; 1) exit 0 ;; *) echo "More than one Tomcat appears to be running. Restarting Tomcat..." $CATALINA_HOME/bin/catalina.sh stop && $CATALINA_HOME/bin/catalina.sh start exit 2 ;; esac

The crontab for the plain user account will run the script above every 5 minutes, which seems pretty reasonable.

0-59/5 * * * * $HOME/bin/check-tomcat.sh

While working on this VPS, I decided to update the JRE to Java 6 update 10 because I've heard that some operations, such as CFC instantiation, are faster. It seems faster, but I don't have any actual performance data to prove it.

Resetting PostgreSQL Sequences After Import

In the last blog post about migrating from MSSQL to PostgreSQL, I left one detail for later -- resetting all the sequences after importing data. I created two PL/pgSQL functions to do the work of selecting the maximum value in the primary key and updating the sequence.

CREATE OR REPLACE FUNCTION get_pk_max(table_name VARCHAR(100), pk_name VARCHAR(100)) RETURNS INTEGER AS $$ DECLARE i INTEGER; DECLARE q TEXT; BEGIN q := 'SELECT max(' || pk_name || ') FROM ' || table_name; EXECUTE(q) INTO i; IF i IS NULL THEN RETURN 0; ELSE RETURN i; END IF; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION reset_seq(table_name VARCHAR(100), pk_name VARCHAR(100), seq_name VARCHAR(100)) RETURNS INTEGER AS $$ DECLARE i INTEGER; BEGIN SELECT get_pk_max(table_name, pk_name) INTO i; RETURN setval(seq_name, i + 1, false); END; $$ LANGUAGE plpgsql;

The reset_seq() function is called with from the PostgreSQL client for each table by a shell script:

for T in `cat ${DB_INFO_DIR}/tables`; do if [ -f ${DB_INFO_DIR}/${T}.sequence ]; then echo "Processing $T" PK=`gawk 'BEGIN { RS="\000"; FS="\n" }; { print $1 }' ${DB_INFO_DIR}/${T}.sequence` SEQ=`gawk 'BEGIN { RS="\000"; FS="\n" }; { print $2 }' ${DB_INFO_DIR}/${T}.sequence` psql -c "SELECT reset_seq('ll_${T}', '${PK}', '${SEQ}');" else echo "No sequence information for $T" fi done

This loops over every table name in the new database. The table name, prefixed with ll_, is the first argument to reset_seq(). The second and third arguments are read from a file containing two lines; the first line contains the name of the table's primary key, and the second line contains the name of the sequence.

The shell script executes very quickly, resetting all the sequences in the database in a few seconds.

MSSQL 2000 to PostgreSQL 8 Migration Project

The following are notes on a migration from Microsoft SQL Server 2000 to PostgreSQL 8. The old database has 60 tables, some with a few million rows, and about 7 Gb of data in total. There were several issues to deal with:

  • The order of the columns in the tables of each schema aren't consistent. The older database has had columns added over time, while the new database schema has been created from the most current SQL scripts.
  • The data exported using bcp will need reformatting to be imported using psql.
  • Fields with NULL values will need special consideration. The bcp utility is not consistent in the use of 0x00 (the null character) to indicate a null value.
  • The MSSQL database has a mix of character encodings; The PostgreSQL database uses UTF8.
  • The import must be automated so it can be performed after the existing database is taken offline, and before the new one is put into production. This ensures that all user information is consistent.
  • The data will not be imported in a single transaction. Therefore, the order of the import must be correct for the foreign key relationships that exist.
  • The indexes and sequences of the new database must be set. (TODO)

The MSSQL database is hosted on a Windows server in a remote facility. I can open a VNC connection through an SSH tunnel, but the responsiveness is unacceptable. Instead, I use the Cygwin port of the OpenSSH daemon, running as a service, and providing a bash shell.

The first step in preparing to export the data is to get a list of table names. I used the isql utility to connect to the server:

isql -Q "select name from sysobjects where type = 'U'" \ -S $MSSQL_SERVER -U $MSSQL_USERNAME -P $MSSQL_PASSWORD \ | grep -Eo 'tbl_[a-z_]+' \ | sed -s 's/tbl_//g' \ | sort > tables

The result is file containing one table name (with the tbl_ prefix removed) per line. I copied the file as tables_to_export, and removed the tables that were not necessary to export. Incidentally, I don't know if the Microsoft documentation on these utilities will be accessible in the future. To be safe, I've created PDF archives of the pages on MSDN. The following bit of shell script loops over each line in the list of tables to be exported:

TEMPFILE=`mktemp` for T in `cat tables_to_export`; do bcp "tbl_${T}" format $TEMPFILE -f "${EXPORT_DIR}/${T}.fmt" -c -k \ -S $MSSQL_SERVER -U $MSSQL_USERNAME -P $MSSQL_PASSWORD bcp "tbl_${T}" out "${EXPORT_DIR}/${T}" -o "${EXPORT_DIR}/${T}.out" -c -k \ -r '_~R~_' -t '_~F~_' \ -S $MSSQL_SERVER -U $MSSQL_USERNAME -P $MSSQL_PASSWORD done

There are two runs of bcp for each table; the first documents the format of the table being exported, and the second performs the actual data export. The command line switches -c and -k specify character data, and to keep null values, rather than export with column defaults. I selected _~R~_ and _~F~_ for row and field terminators, rather than \n and \t, because the data itself contains new lines and tabs. After completing the export, I created a tarball and transferred it to an administration shell account on the same network as the new server. This little detail is important -- the new PostgreSQL server does not have file system access to the data files. See the documentation on the COPY command for more information.

The tarball is extracted to the local /tmp directory, rather than the NFS-mounted home directory for speed -- an important consideration when processing large files. The first task is to escape the embedded backslash, new line, and tab characters, convert the nulls, and then strip out the row and field terminators. A fairly straightforward Gawk program does the work:

BEGIN { RS="_~R~_" FS="_~F~_" OFS="\t" } { gsub(/\\/, "\\\\") gsub(/\000/, "\\N") gsub(/\r\n/, "\\n") gsub(/\n/, "\\n") gsub(/\t/, "\\t") $1 = $1 print $0 }

The data in some files required extra massaging, such as inserting missing null values or removing obsolete fields. For each data file that requires extra processing, I created a file of the same name containing additional gawk rules. The processing script checks for the extra rule files and applies them when found. The new files are created with .tab extensions, indicating that the file is ready for import.

This result is a set of data files that can be used by psql. However, the order of the columns must be dealt with. For that, I wrote a script to loop over all of the exported table names (bundled into the tarball), and compare the column order to the new database:

# Create a column list using bcp format files for T in `cat ${DBOLD_DIR}/tables_to_export`; do gawk 'BEGIN { FS="[ ]+" } /[ ]+/ { print $7 }' \ "${DBOLD_DATADIR}/${T}.fmt" > "${DBOLD_WORK}/${T}.columns" done # Create a list of all the tables in the new database psql -c '\dt *' | gawk 'BEGIN { FS=" " } /public/ { print $3 }' \ | sed -e 's/^ll_//' > $DBNEW_TABLES # Create column list for each table in the new database for T in `cat $DBNEW_TABLES`; do psql -c "\\d ll_${T}" \ | gawk 'BEGIN { FS=" " } /Column/ { next } /^[ ][a-z]/ { print $1 }' \ > "${DBNEW_WORK}/${T}.columns" done # Compare each exported table to the new table for T in `cat ${DBOLD_DIR}/tables_to_export` ; do echo "Table: $T" if [ ! -f ${DBNEW_DIR}/${T}.columns ]; then echo "Does not exist in new database." else # Quietly check for differences diff -w -i -q "${DBNEW_DIR}/${T}.columns" "${DBOLD_DIR}/${T}.columns" > /dev/null if [ $? != 0 ]; then # Show differences side-by-side diff -w -i -y "${DBNEW_DIR}/${T}.columns" "${DBOLD_DIR}/${T}.columns" else echo "No differences." fi fi echo done

I went to the effort to programmatically compare the old tables to the new tables for two reasons: 1) it's highly likely that I would have made errors doing it by visual inspection, and 2) the resulting output is used to create a column list used during the import of the data. These column list files are placed into a directory that will be checked by the import script. If a file is found matching the name of the table, the column list is used instead of a straight copy.

I don't yet have a tool for examining the SQL scripts, and determining the dependency order. Fortunately, this only needs to be done once (for this project). I created a file containing the table names in the order that they must be imported.

Finally, the data can be imported!

for T in `cat ${DB_IMPORT_INFO_DIR}/table_import_order`; do echo "Processing $T" # If the column order is special, use the column list if [ -f ${DB_IMPORT_INFO_DIR}/${T} ]; then Q=`head -n 1 ${DB_IMPORT_INFO_DIR}/${T}` psql -c "COPY ll_${T} (${Q}) FROM STDIN" < ${DB_DATA_DIR}/${T}.tab # Otherwise, perform a straight copy else psql -c "COPY ll_${T} FROM STDIN" < ${DB_DATA_DIR}/${T}.tab fi done

I should note that I set the PGCLIENTENCODING environment variable to LATIN1 in the import script. This makes the conversion to UTF8 explicit, and eliminates any byte sequence errors. Initially, I tried setting \encoding LATIN1 in the ~/.psqlrc file, as well as specifying -v encoding=LATIN1 on the psql command line -- neither of these methods cause the correct character encoding to be set during the import.

As indicated above, the SQL to set the indexes and sequences needs to be completed. Other than that, the project is nearly complete.

List of LDAP User Accounts

While working on servers that reference user account information from a OpenLDAP server, I encountered the need to query for the list of user accounts. This is quick and easy with a GUI LDAP tool like Apache Directory Studio. However, I needed the information as part of a package of administration shell scripts. My solution is in two parts: run the query using ldapsearch, parse the data with gawk. Here's the command line for part one:

ldapsearch -LLL -x -W -D cn=admin,dc=company,dc=com \ -b ou=users,dc=company,dc=com \ -s children uidNumber uid \ | gawk -f parse.awk \ | sort -n</code> <p> The <span class="code">gawk</span> program is simple enough. </p> <code>BEGIN { RS = "" FS = "\n" } { # The ldapsearch output fields are not in a consistent order # Each field must be evaluated for its attribute name for (i = 1; i <=3; i++) { if ($i ~ /uidNumber: /) uidNumber = gensub(/uidNumber: /, "", "g", $i) if ($i ~ /uid: /) uid = gensub(/uid: /, "", "g", $i) } # Only print user accounts if (uidNumber >= 5000 && uidNumber < 6000) print uidNumber, uid }

The next enhancement would be to add a filter to show which accounts are active. The sort could be done internally and the next available uidNumber could be the single result of the script.

Postfix Queue Cleaner

In my never-ending fight against spam and backscatter, I updated my Postfix mail queue cleaning script. It now allows performs two actions: list or delete. It also accepts multiple patterns to match against e-mail addresses. There are still a couple of issues, such as when an envelope specifies more than one recipient, but to improve it any more, I'd really have to port it to Perl. I might do that some day, but it's been several years since I've hacked with Perl.

Here's the updated version of the bash shell script:

#!/bin/bash usage() { echo "Usage: $0 {list|delete} pattern [pattern]" } if [ $# -lt 2 ]; then usage exit 1 fi COMMAND=$1 shift PATTERNS="$*" list() { for P in $PATTERNS; do mailq | grep -E "^ {4,}.*$P" | tr -d ' ' done } delete() { DELIDS=`mktemp /tmp/delids.XXXXXX` for P in $PATTERNS; do mailq | grep -B 2 -E "^ {4,}.*$P" | grep -i -E '^[0-9A-F]' | \ cut -d ' ' -f 1 > $DELIDS postsuper -d - < $DELIDS done } case "$COMMAND" in list) list ;; delete) delete ;; *) usage exit 1 ;; esac

Bash Here Documents

I've been working on a kickstart script that nearly completely configures a cluster node. I encountered a bit of trouble with my sed scripting that modifies /etc/mail/sendmail.mc such that the node's e-mails are properly masqueraded for delivery to an administrative mailbox on a remote network. The issue was with special character escaping in code such as this:

#!/bin/bash RULES=$(cat <<EOF s/\(`something'\)/(`otherthing')/; s/^$/dnl/g; EOF) sed -r -e "$RULES" file.txt

Normally the here document is processed for variable substitution. It's possible to disable this, by surrounding the limit string with single quotes. However, that still doesn't fix all the crazy quoting substitution. Argh.

After much frustration, I believe the easiest way to deal with this is like so:

#!/bin/bash RULES=$( printf "%s\n" 's/\(`something'\''\)/(`otherthing'\'')/;' printf "%s\n" 's/^$/dnl/g;' ) sed -r -e "$RULES" file.txt