This Markdown:

The purpose of this markdown is to provide problems that challenge you to apply what you have learned from the tutorial sheet.

Each section will have multiple problems and solutions for each. Note that there is often more than one solution to a problem, so if you solve something another way, that is great!

Click each tab to go to exercises corresponding to that section. Note that the exercises increase in difficulty with the tabs, as these build upon each other

How to use this Markdown

Throughout the markdown there are prompts followed by bottons like the one shown below this text. The idea here is that you try to solve the problem first, then look at the solution.

For example, how would you print “Gengar is one of my favorite Pokémon” to the console?

As you go through the other sections, do your best to solve the challenges!

Basic Commands

Problem 1

Problem 1: Create a new file

  • What is your current working directory?
  • Change location to your tutorial directory.
  • Use a command to create a new file called sharkObservations.txt.
  • Use a command to confirm that the file was created.
Problem 2

Problem 2: Making directories and moving files

  • Create a new directory called sharkData.
  • Use a command to confirm the directory was created.
  • Move the sharkObservations.txt file to the directory called sharkData.
  • Change your current working directory to sharkData and confirm the file is there
Problem 3

Problem 3: Removing and renaming files

  • Make sure you are in the directory called sharkData.
  • Create a new file in the sharkData directory called moreSharkObservations.txt.
  • Delete the sharkObservations.txt file from the sharkData directory.
  • rename moreSharkObservations.txt. to SharkObservations_v2.txt.
  • Confirm the file is there.
Problem 4
  • What does ls -thor do?
  • Change to any other directory on your computer
  • Find the oldest file, how big is it?
  • Without typing the full path, go back to the previous directory you were in
Problem 5

Remember that the command whoami will give you your username. Can you write a line of code to return your user name and your directory in a sentence?

  • This one might be hard unless you go through the later section on variables and command output first

Working with cat

Your workshop folder has a file called elementarySchoolShark.txt. We will use that for this exercise. You should check that you are in the folder that contains it before starting.

Problem 1

move the elementarySchoolShark.txt file from your workshop examples download to your working directory then

Display only the first 5 lines of the elementarySchoolShark.txt file

Problem 2

get only the names of the sharks from the elementarySchoolShark.txt file write them into another file called sharkNames.txt

Then view the first two lines of the new file

Problem 3

Cut out rows 2-4 of the elementarySchoolShark.txt file
save the output as sharkFactSubset.txt

Then view the new file

Problem 4

You often need to mix things you just learned with things you know. For this problem, you just learned that wc -l will count the lines of something. How could you use this with cat to count the number of lines in elementarySchoolShark.txt?

After you figure it out, also count the number of lines in sharkFactSubset.txt

Problem 5

Append a new line of data to elementarySchoolShark.txt that includes the name Oceanic whitetip shark in the name field and the fact Oceanic whitetip sharks are known for their distinctive long, pointed fins, which can extend up to 1.5 meters (4.9 feet) in length.

Then count the number of lines

also view just the names of the shark species

Here are some practice problems for working with grep

Problem 1

Let’s start simply. Search for lines that contain the word “mako”:

Problem 2

Search for lines that start with the letter “t”:

Problem 3

Let’s pretend we are searching for a sequence motif, search for lines ending with “gaagcatt” in the example.fasta file

note that the -B [number] argument allows you to include a line above a match

Problem 4

Similar to problem 3 above we can use -A [number] to include 1 line below. Armed with that knowledge, can you return all the octopus sequences from the example.fasta file?

Problem 5

Similar to problem 4 above we can build on a more complex grep. -v allows us to exclude specific matches. Here are all the genera containing the stem teuthis in our fasta: Architeuthis, Bathyteuthis, Brachioteuthis, Chiroteuthis, Histioteuthis, Joubiniteuthis, Mastigoteuthis, Octopoteuthis, Onychoteuthis, Opisthoteuthis, Pholidoteuthis, Pholidoteuthis, Pterygioteuthis, Selenoteuthis, Sepioteuthis, Thysanoteuthis

Can you return all sequences from the example.fasta file except those from Octopoteuthis, Onychoteuthis, and Opisthoteuthis, ?

Problem 6

Can you count how many sequences are in the example.fasta?

Problem 7

Can you find all the lines in a file that contain a word mouth, any two words, and then the word plankton?

To do this you will need to use the -E option with grep to enable extended regular expressions (also known as extended regex or ERE). By default, grep uses basic regular expressions (BRE), which have a more limited set of pattern matching capabilities.With extended regular expressions, you can use additional metacharacters and quantifiers to specify more complex patterns.

In this case you can use .+ to denote a single word.

Problem 8

we can use sort with grep to return a sorted output.
Can you return a sorted elementaryShark.txt output that includes only hammerhead, tiger, or mako sharks?

Problem 9

This one is a bit more of a challenge. Building on the above, can you count the number of A, C, T, G, in a fasta file with just one line and a few piped grep or other commands? You will need to combine your experience from the above examples AND
use one more command uniq, that returns unique entries. Note that for uniq to work, entries need to be sorted. You can supply arguments like -c to uniq to count….

Problem 1

Write a sed command that deletes lines that include the word plates from the elementarySchoolShark.txt file: hint: the d argument deletes.

Problem 2

Let’s build on the above. /N keeps the line below. How might I delete all the Octopus sequences from example.fasta using sed?

Problem 3

I just discovered my Sepia latimanus samples were misidentified and in fact Sepia mestus! How could I rename the sequences in example.fasta using sed?

Problem 4

When you use sed, y/acgtu/ACGTU/ replaces all lower case letters with upper case. Can you write a sed command that makes all the sequences uppercase, but not the sequence headers?

Problem 5

Replace all the thymine in example.fasta with uracil.

Problem 1

Declare a variable called hobby and assign a hobby you like to it. Then, print out a message that says “Hello, I have many hobbies including [your hobby]”.

Problem 2

Declare a variable called age and assign your age to it. Then, print out a message that says “In 5 years I will be [age+5] years old”.

Problem 3

Declare a variable called PI and assign the value of pi (3.14159) to it. Then, calculate the circumference of a circle with a radius of 5 using the formula C = 2 * PI * r.

Problem 4

make a variable that points to another folder and then list all the files in that folder, including the file sizes, in order from oldest to newest

Problem 5

create a variable called sequence and another variable called fasta. In sequence, assign a sequence “gaag” and in fasta give the name of the example.fasta file. search the fasta using the variables and return the sequences and headers

Problem 6

Create an array named fishes that contains five different fish Then, print the length of the array.

Problem 7

Create an array named fishes that contains five different fish Then, print the first fish in the array.

Problem 8

Create an array named fishes that contains five different fish Then, sort it alphabetically.

Problem 1

Write a script that takes an integer input and determines whether it is even or odd.
To take user input, the line is: read -p “Enter an integer:” num

Problem 2

Write a condition statement that checks whether elementarySchoolShark.txt exists and is readable. If yes, print “The file exists and is readable.” if not print “The file does not exist or is not readable.”

Problem 3

Write a condition statement that checks whether evenOdd.sh that we created in problem 1 is executable. If yes, print “The file is executable.” if not print “The file is not executable.”

Problem 4

Write a condition statement that checks whether a fasta has sequences and returns a message that includes the number of sequences if it does, or a message that it is empty if it does not. You can use example.fasta for this

Problem 5

Create a variable that contains a filename.
Then write a condition statement that checks whether the file has the extension .fasta
If yes, it returns a message that includes the filename indicating it does
If not it again includes a message with the filename that it does not. If not, also fix the extension to be .fasta.

You can create a file for this or use an existing file if you don’t mind the extension being potentially changed.

Problem 6

This one kicks it up a notch create a file called squid.seq.s1.r.fa (that’s a lot of dots!) with some sequence data Then write a condition statement that checks whether the file has the extension .fasta put does this in a way that looks for the last “.”

To do this we will learn two new things “%” and “##”.

  • The % character signals that we want to remove a suffix pattern from the variable value. It will remove the shortest possible suffix.

  • In contrast, the ## characters signal that we want to remove the longest matching suffix pattern from the variable value.

    Note this only works with shell variables

    You normally use syntax like this to use these characters

filename="example.file.txt"
extension="${filename##*.}"
echo "The file extension is $extension"  

With that massive hint, see if you can use this to check if a file is a .fasta and rename if necessary and print out a message. This is nice since you don’t need to worry about how many possible “.” there are in a file to do this task.

Problem 1

Create an array of numbers. Finds the average of all the numbers in the array and prints it.

Problem 2

Create an array called fruits that includes: “apple” “banana” “orange” “kiwi” “mango”. Loop through this array of strings, count the length of each string, and print the length of each string to the terminal.

Problem 3

Use a loop to return the name of each sequence, and the total of each base (how many A, C, T, and G). Your output for each species should look like this:
Sequence: “tremoctopus_violaceus”
Number of A’s: 47
Number of T’s: 38
Number of A’s: 22
Number of T’s: 23

This one is challenging, try doing bits of the loop at a time and building out. For example return all the headers or all the sequence values first. Then put it together.

Problem 4

Create an array of three directories on your computer. Loop through these and print the number of files in each along with the name.

Problem 5

You can use $RANDOM to generate random numbers. Generate an array of random numbers, then loop through the array and print the numbers in descending order

Problem 6

Make an array of three directories that contain files with the same extension and then print files with that extension within those and all their subdirectories. I’ll use .pdf for this

To do this we can use the find command. The find command to search for .pdf files in each directory and its subdirectories. The find command takes several options, including -type f to search for only files (not directories), -name “*.pdf” to match files with a .pdf extension, and -print to print the names of the matching files to the console.

Problem 7

We need to loop through two arrays and print only colors that start with “r” and sizes that contain the letter “m”.

Your arrays are: colors=(“red” “red” “red” “red” “ruby” “ruby” “ruby” “ruby” “blue” “green” “yellow” “orange” “purple” “pink” “teal” “cyan” “magenta” “navy” “olive” “maroon”)

sizes=(“small” “small” “medium” “large” “large” “medium” “small” “medium” “large” “small” “medium” “large” “small” “large” “medium” “medium” “small” “large” “large” “medium”)

Problem 1

Write a function “add_numbers” that takes two numbers as arguments and returns their sum. Solution:

Problem 2

Write a function “reverse_string” that takes a string as an argument and prints it in reverse order. hint the rev function can reverse a printed string Solution:

Problem 3

Write a function “is_palindrom” that takes a string as an argument and checks if it is a palindrome and returns a message if yes or no. Solution:

Problem 4

Write a function “is_palindrom” that takes a string as an argument and checks if it is a palindrome and returns a message if yes or no. Solution:

Problem 5

Write a function that takes a string as an argument and converts all the characters to uppercase. Solution:

Problem 6

Write a function that calculates the GC content of a fasta file.

Problem 7

Write a function or functions that reverse complements a fasta file. There are multiple solutions to this. I will supply a two function solution, you are welcome to try alternate approaches

Try to remember everything you have seen:

Problem 1

Write an awk command that prints the first column of a file separated by a hyphen using the file elementarySchoolShark.txt

Problem 2

Write an awk command that counts the number of lines in a file. elementarySchoolShark.txt

Problem 3

Write an awk command that prints the lines of elementarySchoolShark.txt that contain both “shark” and “goblin”.

Problem 4

Use awk to print the number of sequences in a fasta file

Problem 5

Print the length of each sequence in a FASTA file.

Problem 6

This builds on the last example. Print the sequence ID and length of each sequence in a FASTA file.

Problem 7

Print the GC content for each line in a fasta file. To do this, use printf with these options in logic like the above: “%s\t%.2f\n”

  • %s is a placeholder for a string value

  • \t inserts a tab character

  • %.2f is a placeholder for a floating-point number with two decimal places

  • inserts a newline character.

So, %s\t%.2f\n formats a string followed by a tab character and a floating-point number with two decimal places, and ends with a newline character.

File compression

File compression is common. There are no example problems but review this cheat sheet by Rick White from his introduction to scripting course at UNC Charlotte if you ever get stuck.

#compress a file
gzip <filename> 
# results in <filename>.gz
  
#Uncompresses files compressed by gunzip (.gz)
gunzip <filename>.gz
# results in <filename>

#Compresses files compressed by tar (tar.gz)
tar -cvzf <foldername.tar.gz>
#Arguments are
- -z: gzip compression
- -c: Creates a new .tar archive file.
- -v: Verbosely show the .tar file progress.
- -f:  File name type of the archive file.

# List contents of tar.gz 
tar -tvf <foldername.tar.gz>

# Prints a zipped file without opening it
gzcat <filename.gz> | more
gzcat <filename.gz> | less

#Uncompresses files compressed by tar (tar.gz) 
tar -zxvf <foldername.tar.gz>
#arguments
- -z: many files
- -x: extract gzip 
- -v: Verbosely show the .tar file progress.
- -f:  File name type of the archive file.

# Compresses files compressed by tar (tar.bz2, more compression)
tar -cvjf <foldername.tar.gz>
#argument explanation
- -j: bz2 compression
- -c: Creates a new .tar archive file.
- -v: Verbosely show the .tar file progress.
- -f:  File name type of the archive file.

#Uncompresses files compressed by tar (tar.bz2) 
tar -xjvf <foldername.tar.bz2>
#argument explanation
- -j: bz2 file
- -x: extract gzip 
- -v: Verbosely show the .tar file progress.
- -f:  File name type of the archive file.