class: center, middle, inverse, title-slide # Making sense of CRAN ## Package and collaboration networks ###
Ioannis Kosmidis
Reader in Data Science
### University of Warwick & The Alan Turing Institute ###
useR! 2019
12 July 2019
--- <!-- 13 July 2019 --> <style type="text/css"> .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .small .remark-code { /*Change made here*/ font-size: 70% !important; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } .tiny-tex { /*Change made here*/ font-size: 50% !important; } </style> ## CRAN today, 12 July 2019 CRAN has <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M192 256c61.9 0 112-50.1 112-112S253.9 32 192 32 80 82.1 80 144s50.1 112 112 112zm76.8 32h-8.3c-20.8 10-43.9 16-68.5 16s-47.6-6-68.5-16h-8.3C51.6 288 0 339.6 0 403.2V432c0 26.5 21.5 48 48 48h288c26.5 0 48-21.5 48-48v-28.8c0-63.6-51.6-115.2-115.2-115.2zM480 256c53 0 96-43 96-96s-43-96-96-96-96 43-96 96 43 96 96 96zm48 32h-3.8c-13.9 4.8-28.6 8-44.2 8s-30.3-3.2-44.2-8H432c-20.4 0-39.2 5.9-55.7 15.4 24.4 26.3 39.7 61.2 39.7 99.8v38.4c0 2.2-.5 4.3-.6 6.4H592c26.5 0 48-21.5 48-48 0-61.9-50.1-112-112-112z"/></svg> 20395 authors, contributing in <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9zm585.1 102.8L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9z"/></svg> 14523 packages <br/> <br/> CRAN became <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M384 320H256c-17.67 0-32 14.33-32 32v128c0 17.67 14.33 32 32 32h128c17.67 0 32-14.33 32-32V352c0-17.67-14.33-32-32-32zM192 32c0-17.67-14.33-32-32-32H32C14.33 0 0 14.33 0 32v128c0 17.67 14.33 32 32 32h95.72l73.16 128.04C211.98 300.98 232.4 288 256 288h.28L192 175.51V128h224V64H192V32zM608 0H480c-17.67 0-32 14.33-32 32v128c0 17.67 14.33 32 32 32h128c17.67 0 32-14.33 32-32V32c0-17.67-14.33-32-32-32z"/></svg> a rich and diverse software ecosystem <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M448 73.143v45.714C448 159.143 347.667 192 224 192S0 159.143 0 118.857V73.143C0 32.857 100.333 0 224 0s224 32.857 224 73.143zM448 176v102.857C448 319.143 347.667 352 224 352S0 319.143 0 278.857V176c48.125 33.143 136.208 48.572 224 48.572S399.874 209.143 448 176zm0 160v102.857C448 479.143 347.667 512 224 512S0 479.143 0 438.857V336c48.125 33.143 136.208 48.572 224 48.572S399.874 369.143 448 336z"/></svg> a large database of authors, tools and knowledge which are naturally linked to each other <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M304 192v32c0 6.6-5.4 12-12 12H124c-6.6 0-12-5.4-12-12v-32c0-6.6 5.4-12 12-12h168c6.6 0 12 5.4 12 12zm201 284.7L476.7 505c-9.4 9.4-24.6 9.4-33.9 0L343 405.3c-4.5-4.5-7-10.6-7-17V372c-35.3 27.6-79.7 44-128 44C93.1 416 0 322.9 0 208S93.1 0 208 0s208 93.1 208 208c0 48.3-16.4 92.7-44 128h16.3c6.4 0 12.5 2.5 17 7l99.7 99.7c9.3 9.4 9.3 24.6 0 34zM344 208c0-75.2-60.8-136-136-136S72 132.8 72 208s60.8 136 136 136 136-60.8 136-136z"/></svg> hard to explore and keep track of --- class: inverse, center, middle # Exploring CRAN --- ## Exploring CRAN <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"/></svg> `` <!-- Aggregates information about new, updated and removed packages from CRAN (live feed) --> <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9zm585.1 102.8L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9z"/></svg> `cranlogs` <!-- Gabor Csardi --> <!-- API for package download counts from RStudio CRAN mirror and badges []( --> <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"/></svg> `` (metacran) <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"/></svg> `` <!-- Stefan Schliebs --> <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9zm585.1 102.8L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9z"/></svg> `CRANsearcher` <!-- Becca Krouse, Agustin Calatroni --> <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9zm585.1 102.8L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9z"/></svg> `RWsearch` <!-- Patrice Kiener --> --- ## CRAN package database  blah blah blah  blah blah blah  --- ## tools::CRAN_package_db Returns a character data frame with DESCRIPTION metadata for the current packages in CRAN ```r p_db <- tools::CRAN_package_db() names(p_db) ## [1] "Package" "Version" "Priority" ## [4] "Depends" "Imports" "LinkingTo" ## [7] "Suggests" "Enhances" "License" ## [10] "License_is_FOSS" "License_restricts_use" "OS_type" ## [13] "Archs" "MD5sum" "NeedsCompilation" ## [16] "Additional_repositories" "Author" "Authors@R" ## [19] "Biarch" "BugReports" "BuildKeepEmpty" ## [22] "BuildManual" "BuildResaveData" "BuildVignettes" ## [25] "Built" "ByteCompile" "Classification/ACM" ## [28] "Classification/ACM-2012" "Classification/JEL" "Classification/MSC" ## [31] "Classification/MSC-2010" "Collate" "Collate.unix" ## [34] "" "Contact" "Copyright" ## [37] "Date" "Description" "Encoding" ## [40] "KeepSource" "Language" "LazyData" ## [43] "LazyDataCompression" "LazyLoad" "MailingList" ## [46] "Maintainer" "Note" "Packaged" ## [49] "RdMacros" "SysDataCompression" "SystemRequirements" ## [52] "Title" "Type" "URL" ## [55] "VignetteBuilder" "ZipData" "Published" ## [58] "Path" "X-CRAN-Comment" "Reverse depends" ## [61] "Reverse imports" "Reverse linking to" "Reverse suggests" ## [64] "Reverse enhances" "MD5sum" ``` --- ## Why bother make <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9zm585.1 102.8L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9z"/></svg> [cranly](https:/ ### Scientific reasons CRAN is a network, not only physically but also mathematically Tools for discovering and understanding interconnections in CRAN Research on modelling of software networks ### Less altruistic reasons Keep track of who/what/why links to my R packages Do what is at <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"/></svg> []( Simplify finding referees for my editorial work (seriously!) --- class: inverse, center, middle # cranly <svg style="height:0.8em;top:.04em;position:relative;fill:#f7f7f7;" viewBox="0 0 640 512"><path d="M53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9zm585.1 102.8L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9z"/></svg> --- ## cranly <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9zm585.1 102.8L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9z"/></svg> <table> <tbody> <tr> <td style="text-align:left;"> clean_CRAN_db </td> <td style="text-align:left;"> clean up the CRAN package database </td> </tr> <tr> <td style="text-align:left;"> build_network </td> <td style="text-align:left;"> build networks out of it </td> </tr> <tr> <td style="text-align:left;"> build_dependence_tree </td> <td style="text-align:left;"> build package dependence trees </td> </tr> <tr> <td style="text-align:left;"> subset </td> <td style="text-align:left;"> subset cranly networks </td> </tr> <tr> <td style="text-align:left;"> summary </td> <td style="text-align:left;"> summarize cranly networks </td> </tr> <tr> <td style="text-align:left;"> plot </td> <td style="text-align:left;"> visualize cranly networks or summaries of those </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> various extractor functions </td> </tr> </tbody> </table> --- ## The DESCRIPTION file .tiny[ ```r (lubridate_desc <- packageDescription("lubridate")) ## Package: lubridate ## Type: Package ## Version: 1.7.4 ## Title: Make Dealing with Dates a Little Easier ## Description: Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of ## components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span ## objects. The 'lubridate' package has a consistent and memorable syntax that makes working with dates easy and fun. Parts of ## the 'CCTZ' source code, released under the Apache 2.0 License, are included in this package. See ## <> for more details. ## Authors@R: c( person("Vitalie", "Spinu", email = "", role = c("aut","cre")), person("Garrett", "Grolemund", role = "aut"), ## person("Hadley", "Wickham", role = "aut"), person("Ian", "Lyttle", role="ctb"), person("Imanuel", "Constigan", role = "ctb"), ## person("Jason", "Law", role="ctb"), person("Doug","Mitarotonda", role="ctb"), person("Joseph", "Larmarange", role="ctb"), ## person("Jonathan", "Boiser", role="ctb"), person("Chel Hee", "Lee", role = "ctb") ) ## Maintainer: Vitalie Spinu <> ## License: GPL (>= 2) ## Depends: methods, R (>= 3.0.0) ## Imports: stringr, Rcpp (>= 0.12.13), ## LinkingTo: Rcpp, ## Suggests: testthat, knitr, covr ## Enhances: chron, fts, timeSeries, timeDate, tis, tseries, xts, zoo ## SystemRequirements: A system with zoneinfo data (e.g. /usr/share/zoneinfo) as well as a recent-enough C++11 compiler (such as g++-4.8 or ## later). On Windows the zoneinfo included with R is used. ## VignetteBuilder: knitr ## LazyData: true ## Collate: 'Dates.r' 'POSIXt.r' 'RcppExports.R' 'util.r' 'parse.r' 'timespans.r' 'intervals.r' 'difftimes.r' 'durations.r' 'periods.r' ..... ## RoxygenNote: 6.0.1 ## URL:, ## BugReports: ## NeedsCompilation: yes ## Packaged: 2018-04-10 15:18:02 UTC; vspinu ## Author: Vitalie Spinu [aut, cre], Garrett Grolemund [aut], Hadley Wickham [aut], Ian Lyttle [ctb], Imanuel Constigan [ctb], Jason Law [ctb], ## Doug Mitarotonda [ctb], Joseph Larmarange [ctb], Jonathan Boiser [ctb], Chel Hee Lee [ctb] ## Repository: CRAN ## Date/Publication: 2018-04-11 10:08:43 UTC ## Built: R 3.6.0; x86_64-apple-darwin15.6.0; 2019-04-26 22:29:01 UTC; unix ## ## -- File: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/lubridate/Meta/package.rds ``` ] ```r options(width = wdt) ``` --- ## Package directives network .pull-left[ Package directives .tiny[ ```r lubridate_desc[c("Suggests", "Imports", "Depends", "Enhances", "LinkingTo")] ## $Suggests ## [1] "testthat, knitr, covr" ## ## $Imports ## [1] "stringr, Rcpp (>= 0.12.13)," ## ## $Depends ## [1] "methods, R (>= 3.0.0)" ## ## $Enhances ## [1] "chron, fts, timeSeries, timeDate, tis, tseries, xts, zoo" ## ## $LinkingTo ## [1] "Rcpp," ``` ] link packages to each other define a **package directives network** ] .pull-right[ <!-- --> ] --- ## Author collaboration network Author field specifies a group of collaborators ```r library("magrittr") lubridate_desc$Author %>% strwrap(70) ## [1] "Vitalie Spinu [aut, cre], Garrett Grolemund [aut], Hadley Wickham" ## [2] "[aut], Ian Lyttle [ctb], Imanuel Constigan [ctb], Jason Law [ctb]," ## [3] "Doug Mitarotonda [ctb], Joseph Larmarange [ctb], Jonathan Boiser" ## [4] "[ctb], Chel Hee Lee [ctb]" ``` Author fields from all DESCRIPTION files define the **CRAN collaboration network** --- class: inverse, center, middle # Cleaning up the CRAN package database --- ## Regex, Regex, Regex Need a systematic way of extracting package and author names from DESCRIPTION ```r p_db$Author[grep("guidance", p_db$Author)][1] %>% strwrap(70) ## [1] "Ravi Varadhan [aut, cph, trl], Paul Gilbert [aut, cre], Marcos Raydan" ## [2] "[ctb] (with co-authors, wrote original algorithms in fortran. These" ## [3] "provided some guidance for implementing R code in the BB package.)," ## [4] "JM Martinez [ctb] (with co-authors, wrote original algorithms in" ## [5] "fortran. These provided some guidance for implementing R code in the" ## [6] "BB package.), EG Birgin [ctb] (with co-authors, wrote original" ## [7] "algorithms in fortran. These provided some guidance for implementing" ## [8] "R code in the BB package.), W LaCruz [ctb] (with co-authors, wrote" ## [9] "original algorithms in fortran. These provided some guidance for" ## [10] "implementing R code in the BB package.)" ``` ```r p_db$Author[grep("Queen", p_db$Author)][1] %>% strwrap(70) ## [1] "Alex M Chubaty [aut, cre], Her Majesty the Queen in Right of Canada," ## [2] "as represented by the Minister of Natural Resources Canada [cph]" ``` `cranly::clean_up_directives` `cranly::clean_up_author` --- ## clean_up_author ```r library("cranly") p_db$Author[grep("guidance", p_db$Author)][1] ## [1] "Ravi Varadhan [aut, cph, trl],\n Paul Gilbert [aut, cre],\n Marcos Raydan [ctb] (with co-authors, wrote original algorithms in\n fortran. These provided some guidance for implementing R code in\n the BB package.),\n JM Martinez [ctb] (with co-authors, wrote original algorithms in\n fortran. These provided some guidance for implementing R code in\n the BB package.),\n EG Birgin [ctb] (with co-authors, wrote original algorithms in fortran.\n These provided some guidance for implementing R code in the BB\n package.),\n W LaCruz [ctb] (with co-authors, wrote original algorithms in fortran.\n These provided some guidance for implementing R code in the BB\n package.)" p_db$Author[grep("guidance", p_db$Author)][1] %>% clean_up_author() ## [[1]] ## [1] "Ravi Varadhan" "Paul Gilbert" "Marcos Raydan" "JM Martinez" ## [5] "EG Birgin" "W LaCruz" ``` ```r p_db$Author[grep("Queen", p_db$Author)][1] ## [1] "Alex M Chubaty [aut, cre],\n Her Majesty the Queen in Right of Canada, as represented by the\n Minister of Natural Resources Canada [cph]" p_db$Author[grep("Queen", p_db$Author)][1] %>% clean_up_author() ## [[1]] ## [1] "Alex M Chubaty" ``` --- ## clean_up_directives ```r packageDescription("tidyverse")$Imports ## [1] "broom (>= 0.4.2), cli (>= 1.0.0), crayon (>= 1.3.4), dplyr (>=\n0.7.4), dbplyr (>= 1.1.0), forcats (>= 0.2.0), ggplot2 (>=\n2.2.1), haven (>= 1.1.0), hms (>= 0.3), httr (>= 1.3.1),\njsonlite (>= 1.5), lubridate (>= 1.7.1), magrittr (>= 1.5),\nmodelr (>= 0.1.1), purrr (>= 0.2.4), readr (>= 1.1.1), readxl\n(>= 1.0.0), reprex (>= 0.1.1), rlang (>= 0.1.4), rstudioapi (>=\n0.7), rvest (>= 0.3.2), stringr (>= 1.2.0), tibble (>= 1.3.4),\ntidyr (>= 0.7.2), xml2 (>= 1.1.1)" packageDescription("tidyverse")$Imports %>% clean_up_directives() ## [[1]] ## [1] "broom" "cli" "crayon" "dplyr" "dbplyr" ## [6] "forcats" "ggplot2" "haven" "hms" "httr" ## [11] "jsonlite" "lubridate" "magrittr" "modelr" "purrr" ## [16] "readr" "readxl" "reprex" "rlang" "rstudioapi" ## [21] "rvest" "stringr" "tibble" "tidyr" "xml2" ``` --- ## A clean CRAN package database ```r cran_db <- clean_CRAN_db(p_db) ``` ```r class(cran_db) ## [1] "cranly_db" "data.frame" ``` ```r cran_db[cran_db$package == "lubridate", "imports"] ## [[1]] ## [1] "stringr" "Rcpp" cran_db[cran_db$package == "lubridate", "author"] ## [[1]] ## [1] "Vitalie Spinu" "Garrett Grolemund" "Hadley Wickham" ## [4] "Ian Lyttle" "Imanuel Constigan" "Jason Law" ## [7] "Doug Mitarotonda" "Joseph Larmarange" "Jonathan Boiser" ## [10] "Chel Hee Lee" ``` --- class: inverse, center, middle # cranly networks --- ## build_network `build_network` organises the information in `cranly_db` objects in networks Supports two network perspectives under the same interface: author and package <table> <thead> <tr> <th style="text-align:left;"> network </th> <th style="text-align:left;"> perspective </th> <th style="text-align:left;"> nodes </th> <th style="text-align:left;"> edge formation </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> author collaboration network </td> <td style="text-align:left;"> author </td> <td style="text-align:left;"> authors </td> <td style="text-align:left;"> same author field </td> </tr> <tr> <td style="text-align:left;"> package directives network </td> <td style="text-align:left;"> package </td> <td style="text-align:left;"> packages </td> <td style="text-align:left;"> from directives fields </td> </tr> </tbody> </table> --- ## build_network ```r pkg_net <- build_network(cran_db, perspective = "package") aut_net <- build_network(cran_db, perspective = "author") ``` ```r str(pkg_net, 1) ## List of 2 ## $ edges:'data.frame': 94429 obs. of 3 variables: ## $ nodes:'data.frame': 15017 obs. of 65 variables: ## - attr(*, "class")= chr [1:2] "cranly_network" "list" ## - attr(*, "timestamp")= POSIXct[1:1], format: "2019-07-12 13:30:37" ## - attr(*, "perspective")= chr "package" str(aut_net, 1) ## List of 2 ## $ edges:'data.frame': 106308 obs. of 10 variables: ## $ nodes:'data.frame': 20395 obs. of 2 variables: ## - attr(*, "class")= chr [1:2] "cranly_network" "list" ## - attr(*, "timestamp")= POSIXct[1:1], format: "2019-07-12 13:30:37" ## - attr(*, "perspective")= chr "author" ``` --- ## Interrogating cranly_network objects ```r pkg_net %>% author_of("data.table") ## [1] "Matt Dowle" "Arun Srinivasan" "Jan Gorecki" ## [4] "Michael Chirico" "Pasha Stetsenko" "Tom Short" ## [7] "Steve Lianoglou" "Eduard Antonyan" "Markus Bonsch" ## [10] "Hugh Parsonage" "Scott Ritchie" "Kun Ren" ## [13] "Xianying Tan" "Rick Saporta" "Otto Seiskari" ## [16] "Xianghui Dong" "Michel Lang" "Watal Iwasaki" ## [19] "Seth Wenchel" "Karl Broman" "Tobias Schmidt" ## [22] "David Arenburg" "Ethan Smith" "Francois Cocquemas" ## [25] "Matthieu Gomez" "Philippe Chataignon" "Declan Groves" ## [28] "Daniel Possenriede" "Felipe Parages" "Denes Toth" ## [31] "Mus Yaramaz-David" "Ayappan Perumal" "James Sams" ## [34] "Martin Morgan" "Michael Quinn" "@javrucebo" ## [37] "@marc-outins" pkg_net %>% version_of("dplyr") ## [1] "0.8.3" "0.3.2" "0.1.0" "0.1.4" "0.3.0" pkg_net %>% package_with("trackeR") ## [1] "BayesianAnimalTracker" "NutrienTrackeR" "trackeR" ## [4] "trackeRapp" ``` --- ```r aut_net %>% author_with("Ioannis") ## [1] "Ioannis Kosmidis" "Ioannis N Athanasiadis" ## [3] "Ioannis Tsamardinos" pkg_net %>% maintainer_of("colorspace") ## [1] "Achim Zeileis" pkg_net %>% package_by("Kosmidis") ## [1] "betareg" "brglm" "brglm2" "cranly" ## [5] "enrichwith" "PlackettLuce" "profileModel" "semnar" ## [9] "trackeR" "trackeRapp" pkg_net %>% title_of("semnar") ## [1] "Constructing and Interacting with Databases of Presentations" pkg_net %>% email_with("") ## [1] "" "" ## [3] "" "" ## [5] "" "" ## [7] "" "" ## [9] "" "" ## [11] "" "" ``` --- ## Distribution of release dates of CRAN packages ```r pkg_net %>% release_date_of(Inf) %>% hist(breaks = 50, main = "", freq = TRUE, xlab = "date") ``` <!-- --> --- class: inverse, center, middle # Visualising cranly networks --- ## Visualising package directives networks ```r plot(pkg_net, author = "Ioannis Kosmidis", legend = FALSE, title = FALSE, width = "100%") ```
--- ## Visualising collaboration networks ```r plot(aut_net, author = "Brian Ripley", legend = FALSE, title = FALSE, width = "100%") ```
--- class: inverse, center, middle # Network summaries --- ## Summaries Access to a range of network summaries `n_packages`, `n_imports`, `n_suggests`, `n_imported_by`, `n_linked_by`, etc `n_collaborators` `betweenness`, `closeness`, `page_rank`, `degree`, `eigen_centrality` ```r aut_summaries <- summary(aut_net) optional_pkg_net <- subset(pkg_net, recommended = FALSE, base = FALSE) optional_pkg_summaries <- summary(optional_pkg_net) ``` --- .pull-left[ ```r aut_summaries %>% plot(according_to = "n_packages") ``` <!-- --> ] .pull-right[ ```r optional_pkg_summaries %>% plot(according_to = "n_imported_by") ``` <!-- --> ] --- class: inverse, center, middle # Dependence trees --- ### Package dependence tree Computes what else needs to be installed with the package Constructed using a neat recursion (see `?compute_dependence_tree`) ### Package dependence index aka "baggage" index `$$-\frac{\sum_{i \in C_p; i \ne p} \frac{1}{N_i} g_i}{\sum_{i \in C_p; i \ne p} \frac{1}{N_i}}$$` - `\(C_p\)` is the dependence tree for the package(s) `\(p\)` - `\(N_i\)` is the total number of packages that depend, link or import package `\(i\)` - `\(g_i\)` is the generation that package `\(i\)` appears in the dependence tree of package(s) `\(p\)` --- ```r tibble_tree <- build_dependence_tree(pkg_net, "tibble") plot(tibble_tree, title = FALSE, legend = TRUE) ```
--- ## Current work .pull-left[ Network models for software networks Modelling of dependence trees Topic analysis e.g. `description_of` and `title_of` <br/> <br/> shiny interface Licence compatibility checks ] .pull-right[ .tiny[ ```r as.igraph(pkg_net) ## IGRAPH 417bacc DN-- 15017 94429 -- ## + attr: name (v/c), version (v/c), author (v/x), date ## | (v/n), url (v/c), license (v/c), maintainer (v/x), ## | type (e/c) ## + edges from 417bacc (vertex names): ## [1] httr ->abbyyR XML ->abbyyR ## [3] curl ->abbyyR readr ->abbyyR ## [5] plyr ->abbyyR progress->abbyyR ## [7] graphics->ABC.RAP stats ->ABC.RAP ## [9] utils ->ABC.RAP plotrix ->ABCanalysis ## [11] Rcpp ->ABCoptim graphics->ABCoptim ## + ... omitted several edges ``` ]  .tiny-text[ by [David A. Wheeler (2007)]( ] ] --- ## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M512 144v288c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V144c0-26.5 21.5-48 48-48h88l12.3-32.9c7-18.7 24.9-31.1 44.9-31.1h125.5c20 0 37.9 12.4 44.9 31.1L376 96h88c26.5 0 48 21.5 48 48zM376 288c0-66.2-53.8-120-120-120s-120 53.8-120 120 53.8 120 120 120 120-53.8 120-120zm-32 0c0 48.5-39.5 88-88 88s-88-39.5-88-88 39.5-88 88-88 88 39.5 88 88z"/></svg> `cranly: Package Directives and Collaboration Networks in CRAN` []( []( []( .pull-left[ Install from CRAN: .small[ ```r install.packages("cranly") ``` ] ] .pull-right[ Install development version from GitHub: .small[ ```r # install.packages("devtools") devtools::install_github("ikosmidis/cranly", ref = "develop") ``` ] ] <br/> <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"/></svg> []( <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> [ikosmidis]( <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> [ikosmidis_]( (do not forget the underscore!)