We will start our transcriptomic analysis with the first tool, Top Hat. Top Hat is being used to align reads against a general reference genome, allowing for spliced alignments. And in doing so, we will illustrate with a data set, which I'm going to show you. The dataset that consists of three replicates, each for the dataset and for a test. Let's call it the disease dataset. Each one of them, each one of these samples consists of and. Let's see how long. Zcat Data/Ctrl1_1.fastq.gz. And let's look only at the top ten line. So these are fairly short ones. They are 48 base pairs long. So we have two conditions, we have two replicates each. They are then. And we will have to align them with Top Hat. So first, let's see what are the options, and what is the command line usage for Top Hat. And we do so, just like we done during the previous lecture. So we'll run Top Hat and we'll store the output into a log file. So we can view it. Okay. As you can see here, so the basic idea is you use Top Hat followed by a number of options followed by a bowtie index just as we used with bowtie and the reset. Resets for my number one, for my number two, and so on. Some of these options that one might find relevant are options related to, for instance, minimum length, maximum length, minimum length of an anchor. So they refer to the alignment. Then useful options, particularly useful, are for instance the number of threads by default, Top Hat is single threaded. So if you want to spit out the calculation, we're going to use multiple threads. Then we can keep it a transcript on index, with- g gtf. Is if annotations as well as the index. As if if spurs junctions to help guide the discovery of additional splice junctions, and then if we have information about the insert distance. The distance between the mate pairs, and we can provide that for increased specificity as well. So- R is the average N of distance between mates, and then we can specify the standard deviation as well. In general, the defaults are fairly well suited. And then some options for finding supply junctions and so on. So there's a fairly large set of options. However, in general, we only need a few because most of these options have already been calibrated for the best performance. So let's see what we can remember. So I'm going to say head tophat.log. We would like to align selection you are near. And let's try to do that for this one. We would have to write tophat, or tophat2. A number of options, let's say p 10. Then maybe we would like to write a directory, an output directory and let's say that we would like Test1.tophat is just the name. Then the [INAUDIBLE] index, /Data1/ign2 and so on. And as you can see, it's very easy to get lost in the details, especially when we start putting the read files and we have to write the full task. So, an alternative to just type in the comment from the comment line is to create a batch file, a very simple shell script. So, let me show you how you can do that. So let's type something that's called com.tophat, so it'll load ShowScript, and we would like to write a comment here. So, first of all, let's create the output directory. Make Directory. We would like to create a directory that's called, let's say test bot. First of all, let me show you where we are. We're in Coursera which is data and results, a little bit of cheating there. And then, I'm going to make a directory top hat. Okay, and that's where I'm going to put the results for each of the six samples, separately, in a separate directory. Okay. So we're going to make this directory This one on the top Help. And now we'll record tophot2. And make the directory -b, in case it already exists, then we make it again. Then, the output directory would be this directory Okay. We would like ten threads to make this multi-threading. Let's say that we want the maximum number of kids. Four max. Okay? Let's say that the number ten by default it is 40. We can split the line if it becomes too long. We want to give it an annotation file. And then remember that Okay. data1/gn3 These are fairly long file as you can see. So that's the annotation. We also have to provide an index. [INAUDIBLE] index. And that comes from the same location. Okay. As you can tell, it is very tedious to just write these things down, to type them down. If we know the information about the inner mate distance, the average, let's put it here. Let's say that it's 120. And the next standard deviation. Let's say that that's 30, but these are generally not needed. Okay? We have to give the Bowtie index. data1 and I remember this one igm3. Okay, and lastly, we're going to give the data sets, Okay? And this going to be test1_1.fastq.gz, and the second file. So then we're going to take a look at this, and see how we can improve it. How we can make it easier.