Searching for strings
The search for a given string in a file (a case in-sensitive search can be enabled with the -i
option) can be
done with the grep
program. Let's search for test in the file myfile:
grep -i "test" myfile
Instead of filenames we can also enter wildcards. If we want to perform recursive searches we use the -r
flag.
Another possibility is to output the lines, after the match. This can be done with the -A
option. Usually this
option is set to 0, i.e. no lines after the matching one are printed. The following command will print the next three lines
after the match occured:
grep -A 3 "test" myfile
While -A
means after, -B
means before and -C
means around the match. Those two options
are used similar to the -A
option.
In order to just print the name of the file, where the string has been matched, we have to specify the -l
parameter.
If we are not interested in finding strings in files, but finding strings in filenames, then we should use the find
program. An example would be:
find -iname "myfilename"
Here the -iname
option specifies case insensitive filesnames to be searched.
Changing contents on the fly
The Stream Editor program (sed
) is a powerful tool that let's us change file contents on the fly. Let's have
a look at a simple example:
sed 's/.$//' filename
In this example we see two features: First of all we see that the syntax is building upon regular expressions. The first argument is the matching expression with the replacement (seperated by slashes / with the options before the first and after the third, i.e. last, slash. The second argument is the filename of the input stream. If we do not specify an output stream (or pipe it), the output gets redirected to the standard output (shell).
A more complicated example is the following:
sed '/./=' thegeekstuff.txt | sed 'N; s/\n/ /'
Here we add line numbers to all non-empty lines. Another possibility to change file contents is the awk
program.
It allows us for instance to remove all duplicate lines from a file:
awk '!($0 in array) { array[$0]; print }' myfile
AWK is also a complete programming languages. Therefore it is possible to do very complex things (with very few words).
Extract files
Creating and extracting tarballs is a mandatory job. To fully utilize this we only need to know three basic commands. Create a new tar archive (here named myarchive.tar) with the contents from the relative (local) directory dirname:
tar cvf myarchive.tar dirname/
Extract from an existing tar archive (here named myarchive.tar) to the current directory:
tar xvf myarchive.tar
And of course sometimes we want to have a look at the contents of a tarball first. In such cases we use want to view an existing tar archive:
tar tvf myarchive.tar
A generic stopwatch
If we want to use a simple and straight forward way to measure the performance of any program, we could use the inbuilt
time
command. However, we should note that this command is different in bash. Therefore we should always call
the program the following way:
/usr/bin/time
Now there is a list of possible arguments, but the most simple case is to use just the target program, i.e. the application that should be measured, as an argument.
DNS lookup
The nslookup
tool can be used to make all sorts of DNS lookups. The first and most important command snippet
is the following:
nslookup redhat.com
This little snippet gives us IPs, names and addresses from redhat.com. However, sometimes we want to run a specific query on the DNS system backwards. One example would be:
nslookup -query=mx redhat.com
Here we additionally specified the -query
parameter with the value mx (Mail eXchange). The
answer might look the following:
Server: 192.168.19.2
Address: 192.168.19.2#53
Non-authoritative answer:
redhat.com mail exchanger = 10 mx2.redhat.com.
redhat.com mail exchanger = 5 mx1.redhat.com.
Authoritative answers can be found from:
mx2.redhat.com internet address = 66.187.233.33
mx1.redhat.com internet address = 209.132.183.28
Here we see the mail exchange servers as set in the DNS system of redhat.com, with the preferences (5 and 10) of the the system (lower numbers are prefered).
Additionally we can write a lot of other queries, for example:
- soa, start of authority, which provides the authoritative information about the domain
- ns, name server, maps a domain name to a list of DNS servers
- any, to view all the available DNS records
We can also do a reverse lookup by entering an IP instead of a name. Other popular features include the specification of a port and changing the timeout interval to wait for a reply. Examples of such commands are:
nslookup -port 56 redhat.com
nslookup -timeout=10 redhat.com
Files and folders disk usage size
We can use the du
command to retrieve information about file and folder sizes. An important parameter is
-a
. This parameter shows the disk usage of all the files and directories from the current location. Without
using it we would just get information about directories, which have a non-zero size.
In order to understand the directory sizes we need to add information about the unit size (K for Kilobytes, M for Megabytes
and so on). By using the -h
parameter we set the human-readable output, i.e. an output that includes units.
Sometimes we are only interested in the total sum. To display only the total count we are using the -s
parameter.
Since the final count (or every value) is determined by the number of blocks specified in the file allocation table, we could be
interested how many blocks could be used using a different block size. Changing the block size is possible. All we need to do it
using du --block-size=2048
, where 2048 is the size of one block in bytes. If we combine some of the already discussed
we might end up with the following command:
du -ahc --block-size=2048
Here we display all entities in a human-readable form, also displaying the grand total in the output using -c
.
Additionally we can tell the program to display everything in bytes (instead of blocks) and with their modification time. We can
also customize the display style or exclude certain files using a certain mask. One example would be:
du -cbha --exclude="*.txt"
File system disk usage size
The df
command offers similar options as the du
command. Initially the program gives us some valuable
information on the file systems, their mount points, their memory usage, and various other things. By using the -a
option we can display Information of all the file systems. Again we can specify the memory block size, here by using -B
:
df -B 100
Similarly the option -h
is used to tell the program that all units should be displayed, i.e. making the output
human-readable. The grand total can be retrieved by using the --total
parameter:
df -h --total
Till now we used df
to print the second column as total memory blocks. If information in terms of inode is desired the
option -i
should be used. In computing, an inode (index node) is a data structure, where each inode stores all the information
about a file system object (file, directory, device node, socket, pipe, etc.), except data content and file name.
Additionally we might want to get information about the type of file system. This is possible by using the option -T
. An
example that shows the number of inodes and the type of file system is the following:
df -Ti
As with the files and folders program we can also exclude certain items from the list. Here our exclusion (or inclusion) rule is
mainly focused on types of file systems. We can make a white list (only include file systems with the following type) by using the
-t
parameter. Otherwise we might end up with a black list (exclude file systems with the following type) by using the
-x
option.
# only show file systems with type ext2
df -t ext2
# exclude all file systems with type ext2
df -x ext2
Information on symbols
To gather information on the symbols that are used in an object file or an executable, we can use the nm
command. By
default we are already getting a lot of interesting information from this program. We get:
- The virtual address of the symbol
- A character which depicts the symbol type, i.e. if the character is in lower case then the symbol is local but if the character is in upper case then the symbol is external
- Of course the name of the symbol
There are various characters that identify symbol types. A short list includes the following:
- A Global absolute symbol
- a Local absolute symbol
- B Global bss symbol
- b Local bss symbol
- D Global data symbol
- d Local data symbol
- f Source file name symbol
- L Global thread-local symbol (TLS)
- l Static thread-local symbol (TLS)
- T Global text symbol
- t Local text symbol
- U Undefined symbol
Let's have a short look at the default (very trivial) syntax of this little helper:
nm myobject.o
nm someexecutable
The default argument (if we do not specify any object or executable) is to search for a.out
. Combined with wildcards
and pipelined to grep we can search in a set of objects or executables for a set of names. Let's have a look at one example:
nm -A ./*.o | grep func
Here we want all global absolute symboles in all objects of the current directory to be found. Additionally we just print out those results, where the name func is found.
Sometimes we have a lots of results and therefore need a way to sort them. We can use the flag -n
, so that the output comes
out to be in sorted with the undefined symbols first and then according to the addresses. Sorting can help in the process of debugging a
problem. Another way of sorting is by using --size-sort
, which sorts the results by their size.
Information on addresses is usually not enough - what we additionally care about are sizes. By using the -S
option we will
additionally get information on the size of the object. Consider the following example, which searches for dmw in all objects of the
current directory:
nm -S ./*.o | grep dmw
If we want to get information about the external symbols of an object or executabe we can use the -g
flag.
Downloading files
By using the program cURL we can transfer data using the URL syntax. cURL supports various protocols like, DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP. Downloading a single file is as easy as:
curl http://www.google.de
Now the output is being redirected to the command line. If we want to save the content of the file we just have to pipe it to the specific file:
curl http://www.google.de > index.html
However, curl
also provides more direct ways to do this by using flags like -o
or -O
. While the
first one expects a filename to be chosen by the user (chosen via the command line arguments), the second one choses a filename automatically.
The choice is usually dependent on the filename specified in the URL.
cURL also allows us to download multiple files. All we have to do is to specify the files seperately like,
curl -O URL1 -O URL2 -O URL3 -o FILENAME4 URL4 ...
The program understands the protocols it supports quite nicely. Therefore it knows about status codes from the HTTP protocol. Usually
redirects are not followed, i.e. we will not get the same result as in the browser (here going to Google.com will result in a local page,
like Google.de, by performing a redirect) in general. We can, however, specify the -L
option to follow HTTP redirects.
curl -L http://www.google.com
If a previous download (of a large file) stopped for some reason, then we can continue by using the -C
flag. It is important
to use the same file parameters (like the same name for a manual choice or otherwise the automatic choice again) for this to work.
Maybe the file download did not work due to some bandwidth limitation of your provider (some people just have a limit quota per day, so
fully using their bandwidth might result in exceeding their quota too early). Here we can limit the bandwidth of the download by using the
--limit-rate
flag:
curl --limit-rate 1000B -C -O http://www.gnu.org/software/gettext/manual/gettext.html
In this example we set the bandwidth to 1000 Bytes per second. We wanted to continue with our download and we let the program choose the corresponding file name (will be gettext.html in this case).
Another nice use-case is the usage of -z
to start the download only if the file has been modified after a particular time.
By using a negative date, i.e. the date starts with a minus sign, we will start the download only if the file has been modified before a
particular time. Here is an example for starting the download only if the file has been modified after the given date (31st of December
2010):
curl -z 31-Dec-10 ftp://example.com/somefile
Some URLs are protected by a HTTP Username / Password protection. Again, this can be solved by using some cURL parameters. Here we just
use the -u
option to enter username and password seperated by a colon:
curl -u username:password URL
This is also needed to log in on a secured FTP server. Additionally it is also possible to upload files to the server by using the
option -T
. We can either upload a single file (simple by specifying the local path to the file and the URL to the directory)
or multiple files. Both ways are displayed below:
curl -u ftpuser:ftppass -T file ftp://example.com/
curl -u ftpuser:ftppass -T "{file1,file2,...,fileN}" ftp://example.com/
More Information can be received by using the verbose mode (option -v
) or the trace option (--trace
). The last
flag will enable a lot of interesting output to be displayed by cURL.
Another really important option can be set with -x
. Here we have the ability to specify a proxy server to be used for the
request. The proxy server will then execute the request itself:
curl -x proxysever.test.com:3128 http://www.google.de
Recursive downloads with wget
An alternative to curl
is the wget
program. While cURL builds upon the popular libcurl
library,
which provides APIs for transfers like uploads and downloads to various protocols, wGet is just a command-line tool without any APIs.
There is one main advantage for using wget
:
- wget supports recursive download, while curl doesn't.
On the other hand cURL supports lot more protocols that wGet lacks support of. For example: SCP, SFTP, TFTP, TELNET, LDAP(S), FILE, POP3, IMAP, SMTP, RTMP and RTSP. However, they both can be used to download files using FTP and HTTP(s) or send HTTP POST requests.
The following example downloads the file and stores in the same name as the remote server:
wget http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
This is actually a difference to cURL, where we had to specify the -O
flag to save the file with the same name as the remote
server (otherwise the transfer was redirected to the standard output like the console). The -O
flag is also present in wGet, but
here we are allowed to specify a new file name for the downloaded file.
Quite similar is also the --limit-rate
flag:
wget --limit-rate=200k http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
With -c
a previously cancelled download will be resumed. Additionally we can perform a download in the background by using
-b
. A nice feature of wGet is the possibility to send a custom user agent string to the server. Let's have a look:
wget --user-agent="Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.10.289 Version/12.00" URL
We can use this feature to mask our download as if it would be performed by a (popular) webbrowser. Using the --spider
option
we can test various scenarios:
- Checking the status before scheduling a download.
- Monitoring whether a website is available or not at certain intervals.
- Checking links from a list (like our bookmarks) to check which entries are still available.
A great feature is the option to download a complete webpage (including external sources). This is the recursive part of the program. This can be done by entering the following command:
wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
The --mirror
flag activates the mirroring mode, while -p
downloads all external files that are included in the given
HTML page. With --convert-links
all references to external sources (images, scripts, ...) in the document will be converted to the
downloaded local version. The -P
just states that we are specifying a directory as target, not a file.
The last scenario that is easy to imagine and solve by using wGet is the task to download only certain file types with the flags -r
and -A
. Usually we want to scan a webpage for a certain type of linked document and then perform the download of those linked
resources.
wget -r -A.pdf http://example.com/some-page-with-pdf(s)
Here we scan the webpage for all files with the extension pdf
and download the files, which have been found.
Information about the process or user using a file
The fuser
command allows us to identify which processes are using a particular file or directory. The very basic command has just
a directory (can also be the current directory, set by a .) as argument.
If we perform this command we see that the output consists of process IDs followed by a character. This character indicates the type of access. The type of access can be any one of the following:
- c current directory
- e executable being run
- f open file (usually omitted)
- F open file for writing (usually omitted)
- r root directory
- m maped file or shared library
To display detailed information in the output we have to use the verbose option -v
. If we use the program on an executable file instead
of a directory or file we can see which user is running the program. Another option is to look at resources with the option -n
. The
following command would look for the TCP port 5000:
fuser -v -n tcp 5000
Here we would get information about the process (name and ID) and the user running the process that is using this resource. We could also specify to kill all processes that are using the requested file or resource. If the program socket_serv is should be killed we could do that like:
fuser -v -k socket_serv
This is just another way to kill a process. Other ways involve kill
(pid), xkill
(auswahl), killall
(name)
and pkill
(signal). With fuser
you can now interactively kill processes. The statement for doing this is simply:
fuser -v -k -i socket_serv
Suppose we want to delete a file forcefully but it is being used by many processes then the processes won't let us delete the file. In that case, we can use this utility to kill all the processes (or selected processes) that are using that file.
Change owner and group
The concept of owner and groups for files is fundamental to Linux. Every file is associated with an owner and a group. We can use chown
and chgrp
commands to change the owner or the group of a particular file or directory.
To change the owner a specific file one has to enter the following command:
chown OWNER FILE
This changes the owner of the FILE file to OWNER. If we want to change the group of the file we can do that by simply placing a colon in front of the owner. Therefore we now have:
chown :GROUP FILE
This looks quite nice and has one direct consequence. If we want to change both, owner and group of the file, we can do that by seperating the owner from the group with a colon, like:
chown OWNER:GROUP FILE
When the chown
command was issued on a symbolic link to change the owner or the group then its the referent of the symbolic link. This
means that the owner and group of the original file got changed. This is the default behavior of the program. To prevent this we have to use the special
-h
flag.
What is if we want to change the permissions only if the file is currently owner by a specific user? This is possible, too, using the following syntax:
chown --from=CURRENT_OWNER OWNER FILE
Here we use the --from
option to set our constraint. We can also use it with groups or with users and groups together. The syntax is the same,
i.e. groups start with colons, which seperate the user from the group, as in the first examples.
Sometimes we want to copy permissions from one file to another. Using the --reference
option we can set any file as source as we want. Consider
the following example:
chown --reference=SOURCE_FILE TARGET_FILE
Another often used option is the ability to change permissions of files recursivly by using -R
. Other quite popular options include the option
to forcefully change the owner or group of a symbolic link directory recursively with -H
and enabling the verbose mode with -v
.