SAS and R: Data Management, Statistical Analysis, and Graphics (second edition)

Ken Kleinman and Nicholas J. Horton

Reviews of the first edition


Amazon.com

  1. I am a long time SAS user who is surrounded by R experts. As such, I have been looking, for years, for a dictionary to translate between R and SAS. That is what this book is designed to do and it is absolutely excellent for this purpose. It covers all the SAS data manipulation and graphics procedures and functions that I use all the time and it shows how to do them in R. Happily the book is very up to date and the most modern (9.21) SAS graphics procedures (like sgplot) are covered.

    The organization and indexing are fantastic. There is a table of contents, an index with SAS vocabulary, an index of R vocabulary and an overall index. Using these tools you can quickly find the procedure/methods that you want to accomplish and get parallel code snippets in both languages along with annotation to say what the differences are between the two implementations. In addition to the pure dictionary organization there are extended examples working through the analysis and visualization of a large data set.

    The book is not a textbook on the fundamental differences between R and SAS, like the different approach to objects and data sets. For a real text on that take a look at R for SAS and SPSS Users (Statistics and Computing).
    Amazon will not let me post the link to the book's website but if you search the web for the authors' last names and Smith (as in the liberal arts college in Northampton Mass) you should be able to see it. There you will find a PDF with the table of contents, code snippets and lots of supporting material.

    This book is a must for people moving to R from SAS (or the other direction) and it should be excellent for people needing a dictionary to find functions/procedures to do data manipulation and graphic(s) tasks in either language (5 stars).

  2. This book is a really helpful reference. I'm the author of "R for SAS and SPSS Users", and I thought you might be interested in how these two books differ.

    "SAS and R" is a well-crafted dictionary of how to do things in both SAS and R. For each topic the authors clearly and concisely show how to perform that task in SAS, then in R. They typically provide a paragraph of description for each. The brevity of explanation allows the authors to cover a much wider range of topics. If you needed to know more about a topic, at least they have given you a good start and you'll know what SAS statements or R functions to pursue. That's very helpful information, especially in R. Each chapter concludes with example programs with output which demonstrate the topics covered. Output for both packages is shown. The book does include brief introductions to both SAS and R in the appendices but, as the authors state in the preface, their book is not meant to be read cover to cover. However, unlike a standard dictionary, the entries are organized by category, so reading several entries in a row is usually helpful.

    "R for SAS and SPSS Users" is a step-by-step introductory text, meant to be read in order. I assume you already know SAS or SPSS, and the only discussion of them is used to help you learn R. Rather than a paragraph of explanation per topic, I typically provide several pages, stepping through complete example programs, and pointing out where beginners typically make mistakes. However, given that added explanation, the range of topics is narrower. I do include programs in all three at the end of each topic, but I provide detailed explanations for only the R programs. To save space, I show only the R output. While I include some redundancy to facilitate using it as a reference, it is important to read it through at least once.

    So for someone learning R, these books complement each other well. I recommend starting with "R for SAS and SPSS Users" to build a solid understanding of R, then use "SAS and R" to look up the many topics that the other book did not cover.

    For someone learning SAS, I recommend reading a book devoted to that topic, such as, "The Little SAS Book: A Primer", then using "SAS and R" to look up the many topics that book does not cover. "R for SAS and SPSS Users" is not a good choice for learning SAS or SPSS.

    In either case, you'll probably need additional books devoted to the particular methods of analysis you need (5 stars).

  3. For those of us who are very lazy, it is much easier to learn R by cook booking from SAS. Great book (5 stars).

  4. I'm a R programmer who has some familiarity with SAS. I knew early-on that SAS is a mountain to climb, I was looking for something that would assist me in handing tasks between the 2-systems. This book is the one. Excellent examples and numerous explanations makes this a no-brainer for people using either system and wanting to learn the other (5 stars).

statisticalanalysisconsulting.com

Book review: SAS and R by Ken Kleinman and Nicholas J. Horton

By Peter Flom, May 10, 2010 2:26 pm

There are many books that teach you to use SAS or that teach you to use R. There is at least one book that teaches R to people who know SAS or SPSS (R for SAS and SPSS users by Robert Muenchen, and it’s very good).

Most of those try to teach you to use the program from the ground up, as it were. If we make an analogy to books about learning languages like French, these would be text books. The book I am reviewing today is very different. SAS and R: Data Management, Statistical Analysis and Graphics by Ken Kleinman and Nicholas J. Horton is more like a English-French dictionary, or perhaps a phrase book. Rather than try to teach the languages, textbook style, Kleinman and Horton try to list various tasks you might have, and how to do them in SAS and R. This is not a book to get if you know nothing about one of these languages. Nor is it a book to get if you want a formal course in a language (it does have two appendixes, one for SAS and one for R). But it is a very good book indeed if you know some SAS and some R, and have some tasks you need to accomplish in one or the other, or a task that you know how to do in one and want to do in another.

What makes or breaks a book like this is two things: First, the authors have to know what they are doing. They do. I learned a lot about both programs, just browsing through the book. Second, it has to be possible to find the material you want, when you want it. Here, too, this book is excellent. This is because of the extensive table of contents (7 pages) and three indexes: One for concepts, one for SAS commands, and one for R commands (33 pages in all).

I am sure I will use this book a lot-–both to browse through and to find particular PROCs and functions and ways to do things.

Teaching Statistics in the Health Sciences

May, 2013 by Robert Alan Greevy, Jr, PhD (Associate Professor of Biostatistics Vanderbilt University School of Medicine)

There is a strong selection bias in my picking a book to review. I do not want to invest time in a book I am sure I will not like. I had made such a snap judgment with SAS and R and had stuck it on my shelf to collect dust. But to my surprise, it never stayed on the shelf long enough to do so. I kept pulling it down to find a command I needed. After finally giving in and reading it, I cannot believe I waiting this long. What was I thinking?

What I was thinking was that in this Google era, it is hard to write a reference book worth its shelf space. When is the last time you bought a dictionary? Encyclopedia Britannica stopped even selling its print edition last year. This is especially true for programming languages where so much information is available for free online. Indeed, there is nothing in SAS and R that is not online somewhere, but the key is how hard is it to find the information you need. By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. If a future edition adds Stata to the mix, it will be my personal grand slam. But the first edition is already a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf.

A strength of the book is that it emphasizes breadth over depth.1 It does not show all the nuances of merging two datasets or running a quantile regression or forming hierarchical clusters, but it shows you enough to get you started. You may still go to Google track down a nuance you need, but that search will be much faster and more successful with SAS and R having already revealed the foundation of the commands you need. In addition, each chapter ends with a section of simple worked examples for the commands in that chapter. As anyone who has waded through online documentation knows, there are times when you would give anything for a simple working example to start from. SAS and R provides that.

The authors maintain an active blog, which shares the title of the book and presents numerous additional topics and examples. The blog is at http://sas-and-r.blogspot.com/. The authors recently announced on their blog that they are working on a second edition of SAS and R due this Fall. Given the excellent work they have done on the first edition, I am very much looking forward to the second.

Technometrics

May, 2011 by Charles E. Heckler

As the authors point out in the Introduction, the book functions like an English-French dictionary. The material is organized by task. By looking up a particular task you wish to perform, R and SAS code are presented and briefly explained. For example, suppose you want to reshape a dataset from wide format (e.g. measurements taken at different times are organized by columns) to long format (e.g. all the measurements are in a single column with another identifying the time). It is easy to find the section in the text which gives several ways to do this in both SAS and R. As they gain experience, most users of these languages learn `tricks' to accomplish tasks. Because the authors often present alternative ways to do a task, this book can be a great source of diverse and elegant solutions even to inexperienced users. Each task is cross-referenced to other tasks. There is a single application dataset, the HELP (Health Evaluation and Linkage to Primary Care) study (Samet 2003) that is used throughout. This dataset is used to illustrate everything from descriptive statistics to survival and repeated measured analysis. The HELP dataset is available on the book website. Many of the topics in the book have cross-references to detailed applications using the HELP data. The book has a comprehensive website containing the code, datasets, a FAQ, blog and errata list with a link to report new errors.

The book is organized into broad task categories, including data management, basic descriptive and inferential procedures, least squares and generalized linear models, linear mixed models, survival analysis, multiple imputation, multivariate procedures, including recursive partitioning. There is also a lengthy chapter devoted to graphics with similar breadth that also provides solutions to those irritating aspects of graphics production such as margin control and legends. The end of the book is very useful, where there are good introductions to SAS and R, as well as separate subject, SAS and R indices. These indices are invaluable for finding a topic when you are unsure of exactly how to phrase it.

Obviously, there is great breadth and scope of the material in this book. However, it should not be depended upon as a guide to sound statistical practice. Some advice is given, but the book simply cannot cover all one needs to know to apply and interpret, for example, regression diagnostics. I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. People proficient in SAS who are new to R might also find R for SAS and SPSS Users (Muenchen 2009) helpful because it is more tutorial in nature.

Journal of Statistical Software

January 2011, Volume 37, Book Review 3 by Jeroen Ooms

The book SAS and R arose from the popular blog with the same name and is the first in a series of currently three books by Kleinman and Horton about statistical computing in SAS and R. This book features an extensive list of techniques and worked examples in data management, statistical analysis, and graphics, illustrated in both R and SAS. In addition it has two appendices with brief introductions to both systems. The book has not been written to be read cover to cover; it rather is a convenient reference text to quickly learn by example how to perform common tasks in both software packages. To navigate through the examples, the book has a comprehensive table of contents and three indices: a detailed subject index in English, a SAS index organized by SAS syntax, and an R index organized by R syntax.

The authors affirm that "the book functions in the same way that an English-French dictionary informs users of both the equivalent nouns and verbs in the two languages as well as differences and grammar." Therefore, it is probably not the best introductory text to statistical computing or either software package. A basic understanding of the general concepts in statistics and programming seems to be assumed and is required to understand the examples. However, for the reader that meets these requirements, the book provides a powerful starting point to a wide variety of statistical techniques available in SAS and R. The multiple indices ectively locate the appropriate sections, especially if one already has some experience with either SAS or R. Also it is clear and pleasant that the authors are extremely proficient and experienced with both languages. Although they "do not claim to provide the most elegant solution", the quality of the code is actually one of the strong points of the book. The examples are clear and understandable and the code is equivalent and readable.

The main chapters in the book cover respectively data management, statistical analysis, and graphics, however the scope of these chapters is somewhat selective and traditional with a preference for methods relevant to biostatistics applications. For example, the first chapter discusses the usual variable manipulation and reading/writing to several data formats e.g. CSV and XML, but it does not treat databases at all other than mentioning that both systems have SQL interfaces. The major part of the statistical analysis chapters is focused on regression related methods, e.g. ANOVA, GLM, GAM, time series, survival analysis and mixed models. The book does not cover more exotic multivariate techniques, like PCA, SEM, or network analysis to name a few. The chapter on graphics is very detailed, but most of the examples are again the usual suspects: histograms, scatterplots, and smooth lines. In conclusion, this book does exactly what it promises: it facilitates a translation between SAS and R, without getting overly detailed or technical. It is mainly useful as a starting point for those who already know either R or SAS, and want to learn the other language, without going over extensive manuals or introductory texts.

Significance magazine

Kleinman and Horton have previously published two books, Using SAS for Data Management, Statistical Analysis, and Graphics and Using R for Data Management, Statistical Analysis, and Graphics, with both books containing a complete description of the statistical methods that can be applied in both SAS and R that are most often used by statistical analysts, researchers and data analysts. This third book provides users with knowledge of SAS and R and also for users with SAS knowledge a familiarity of R programming and vice versa. It is an excellent text that is designed to translate SAS to R. The authors explain that SAS and R are fundamentally distinct and that an enumeration of their differences would be counter-productive. New users need to bear in mind of some of these differences.

For statisticians with knowledge of both SAS and R programming this book provides a useful resource to understand the differences between SAS and R codes and can be used for browsing and for finding particular SAS and R functions to perform common tasks. The book will strengthen the analytical abilities of relatively new users of either system by providing them with a concise reference manual and annotated examples executed in both packages. Professional analysts as well as statisticians, epidemiologists and others who are engaged in research or data analysis will find this book very useful. The book is comprehensive and covers an extensive list of statistical techniques from data management to graphics procedures, cross-referencing, indexing and good worked examples in SAS and R at the end of each chapter.