In this section, we will consider the use of graphs in public health and epidemiologic work. Graphs are one of the most widely used forms of data visualization in public health and epidemiology. Why do we use graphs? These display data in visual form. They allow us to highlight patterns and differences that are harder to discern when presented as a list or a table. It's an effective way to communicate with a wide variety of audiences. As public health professionals, we have a responsibility to convey data in a way that is accessible to our target audience, but also in a way that is not biased or misleading. Therefore, consider the following best practices in creating a graph. Every line or shape should have purpose, everything on your graph should convey something. Ensure that labels and titles are all clear and self-explanatory. All symbols and abbreviations should be defined. Consider your message, and make an effort to convey this message in a straightforward and objective manner. And decisions made in the type or style of graph can fundamentally change the story that are conveyed by the graph or figure. Important components of the graph are as follows, a title, a title that clearly delineates the data being depicted. Titles of every axis, letting the audience know what is being shown on each of the axes. A legend that describes the different components that are depicted on the graph. And data sources, if not indicated in the title or in the legend, this must be placed in a separate text box. Essentially, the graph should be able to stand alone. If pulled away from your manuscript or your report, someone should know what the graph is, where the data comes from, and what the story is that you're trying to tell. There are different types of graphs. Different types of graphs can highlight different points or trends in your data. Compare the two graphs depicted here and think about the differences in the message that are conveyed by each, the line graph at the top and the bar graph at the bottom. Keep in mind, however, that certain types graphs may not be appropriate for certain types of data. Is it appropriate to connect the dots across the different series in a line graph, for example? When I consider these two graphs, the bar graph strikes me as an excellent way to provide comparisons within each of the series in each point across time, or whatever it is that the x-axis is depicting. The line graph, in contrast, is an excellent way of showing how the data connect across the x-axis depictions. If these depictions of 1, 2, 3, 4 and 5 are different points in time, it may be very appropriate to connect these dots and show that as a trend. However, if they represent discrete things, such as, say, types of cars, it may not be appropriate to connect the dots, because the space in between the two depictions has no meaning. Many graphs use dual axes. What do we mean by a dual axes, take a look at the example provided here. On the left side, y axis, you can see data depicted in thousands. This is in this graph a depiction of violent crime events. On the right side is the second y axis. Here you see a different in the number of practitioners, and that is displayed in a line, or the red line across this graph. Why might one use a dual axis, and in what cases is using a dual axis appropriate? A dual axis is a useful way to depict two very different types of data. For example, if you have opposite trend directions, or as in this case very different scales. You can see the y-axis on the left is in the thousands, while the y-axis on the right is in single and low double digit numbers. It would not be possible to even see the red line, if we tried to show it on the same axis as the bars that are displayed on the left-side y-axis. Another interesting component that is often used in graphs is the broken axis. Take a look at the two graphs depicted here. They depict the same data, but the top figure does not use a broken axis while the bottom one does. Why did the graph maker choose to break the y-axis in the bottom graph? Clearly, the values of the first column are dwarfing the values in the subsequent categories in the top graph. As you can see, once you see the values for the first area, the loggerhead turtle, you can barely discern any differences in the subsequent three bars of the desert tortoise, the painted turtle, and the snapping turtle. However, consider your audience, is the broken axis clear or does it convey a misleading message? For example, in the bottom graph it is much easier to see the differences between the bars for the desert tortoise, painted turtle and snapping turtle. However, an individual that does not fully understand the broken axis might think that the difference between the loggerhead turtle and the desert tortoise is actually not that much. Let's think a little bit more about the axes themselves. The first question, does starting at 0 matter? Look at the top graph carefully. This is a graph that shows the average number of weekly hours worked in one's main job by different countries. You can see that they're highlighting the number of hours worked, for example, in Germany, and comparing that to the number of hours worked in, say, Italy or France. And you can see visually that there might be quite a difference when you observe the number of hours worked per the different countries. Now, the bottom graph has taken the exact same data, but standardized it to an x-axis that starts at zero. Does your inference change? For me, it certainly does. In the bottom graph, the difference in the number of hours worked, say, in Romania, the UK and Germany doesn't look nearly as marked when compared with Italy or France in the bottom graph as it does in the top one. You are conveying the same information, but ultimately passing along a very different message in the two graphs. Now, can axes be inverted? Look at this graph. Stepping back, this is a graph that looks at gun deaths in Florida. It compares these from the early 1990s through the mid 2010s. You can see that they have highlighted a change in the year 2005, when a particular legislation was enacted. When you look at this graph, what inferences do you make about the impact of the highlighted legislation on firearm homicides? Now, please note that the y-axis is inverted. Does this graph clearly depict that there was actually an increase in the number of firearm deaths after the implemented legislation? Or does it appear that by inverting an axis they have somewhat misleadingly implied that perhaps the legislation has decreased deaths? Depending on the information or the message or the goal of the maker of this figure, you can see how making that decision of inverting or not inverting the axis conveys very different information. A slightly more nuanced or perhaps more difficult choice is the choice of scale. One can use a linear scale or a logarithmic scale on the axis. As a reminder, a linear scale depicts exact counts or rates. In this case, in the top graph, the linear scale depicts some sort of number being conveyed that goes between approximately 0 and approximately 4,000. You can depict the same information on a logarithmic scale, which depicts counts or rates on an exponential scale. Without thinking about what this graph is talking about, think about what message is being conveyed in each of these graphs. If there in fact the constant rate of growth each year in whatever this is that these graphs are conveying, which graph more accurately depicts that? In contrast, using the linear scale could potentially even induce panic or worry in someone that might think there's a dramatic increase in the events that are being depicted in the latter years of this graph. In contrast, the logarithmic scale depicts a much more constant and steady rate of growth. Color selection and pattern selection must be carefully done. One important question is, are the chosen colors easy to differentiate and are they easy to see? Take a look at the graph shown here. Things that jump out at me, the difference in the blues between Region E and Region A might actually be difficult for some people to discern. For some computer monitors and some individuals, the yellow in Region D might be very difficult to see. It's important to think about these things when selecting colors. Often this means overriding the default of a computer program. Consider the setting in which your graph will be displayed, and therefore how the graph actually looks in that arena. Most of us have experience where a graph looks one way on your computer, but completely different on the screen. Let's take a moment and practice critiquing a graph. This graph is published by an economics institute in a high income country. Given the aspects of best practices in graph development that we have been discussing, think about some of the positive aspects of this graph, as well as some potential areas for improvement. Some positive aspects that came to my mind is that the graph brings attention to risk factors, which actually can be actionable targets in public health practice. Second, it shows in quantitative order the burden of different risk factors across high income countries. So it's easy to pull out from this graph, which are the more important risk factors that one might want to work on first. The y-axis starts at zero, thus giving a clear depiction of the comparisons that are being made. Some areas for improvement that I thought of. The y-axis label is not clear, nor is the abbreviation used in the title. A non-public health expert might ask, what is a DALY? There is no way to know from this graph that the disability adjusted life year is a composite measure that depicts both mortality, as well as disability in affecting an individual's life. Several of the labels on the y-axis are cut off. I am unclear, for example, in the fourth x-axis category, what blood is depicting. Finally, is the message directed to individuals, i.e., personal responsibility to change behavior, or is the message to public health officials at a community level? So there's no differentiation, for example, about access to health services, where someone might live, or what ability they have to change any of these things. As we think about health and health equity, this is an important consideration. In summary, data visualization is the presentation of data in pictorial or graphical format. Graphs are one of the most widely used forms of data visualization in public health and in epidemiology. And thus decisions made in the type or style of graph can fundamentally change the story that is being conveyed by the graph or figure. So never forget, a picture is worth 1,000 words. Thank you very much for joining us in this course. [MUSIC]