| Property | Value |
|
?:about
|
|
|
?:abstract
|
-
Break down URLs into their components
(xsd:string)
|
|
?:alternateName
|
-
A Fast 'WHATWG' Compliant URL Parser
(xsd:string)
|
|
?:associatedTask
|
|
|
?:codeRepository
|
|
|
?:contributor
|
|
|
?:dateModified
|
|
|
?:datePublished
|
|
|
?:description
|
-
A wrapper for ‘ada-url’, a ‘WHATWG’ compliant and fast URL parser written in modern ‘C++’. Also contains auxiliary functions such as a public suffix extractor. Keywords URL Parsing Webtracking Data Webscraping Use Cases URL parsing is an important process in the analysis of webtracking data, e.g. GESIS Web Tracking . Although not using this package, the technique has been used in various social science publications, e.g. de León et al. (2023) . The package was used in various webscraping projects for communication research, e.g. paperboy . Input Data The input data has to be a vector of URLs and looks like this: urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1")
urls [1] "https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1" Output Data The output data is a data frame of parsed URLs. Hardware Requirements adaR runs on any hardware that can run R. Environment Setup With R installed: install.packages("adaR") How to Use Please refer to the “Introduction to adaR” for a comprehensive introduction of the package. The main function of this package is ada_url_parse() and it decomposes a url into its components. library(adaR)
urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1",
"https://www.nytimes.com/2024/06/19/world/africa/sudan-darfur-takeaways.html",
"https://www.sueddeutsche.de/thema/Fu%C3%9Fball-EM")
ada_url_parse(urls) href
1 https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1
2 https://www.nytimes.com/2024/06/19/world/africa/sudan-darfur-takeaways.html
3 https://www.sueddeutsche.de/thema/Fußball-EM
protocol username password host hostname port
1 https: www.google.de www.google.de
2 https: www.nytimes.com www.nytimes.com
3 https: www.sueddeutsche.de www.sueddeutsche.de
pathname
1 /search
2 /2024/06/19/world/africa/sudan-darfur-takeaways.html
3 /thema/Fußball-EM
search hash
1 ?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1
2
3 Technical Details See the official CRAN page for further information about technical details. Contact Details Maintainer: David Schoch david@schochastics.net Issue Tracker: https://github.com/gesistsa/adaR/issues
(xsd:string)
|
|
?:downloadURL
|
|
|
?:format
|
-
SCRIPTS
(de)
-
SCRIPTS
(en)
|
|
is
?:hasPart
of
|
|
|
?:license
|
|
|
?:linksDocumentation
|
|
|
?:name
|
|
|
?:portalUrl
|
|
|
?:programmingLanguage
|
|
|
?:sourceInfo
|
-
GESIS-Methods Hub
(xsd:string)
|
|
rdf:type
|
|
|
?:version
|
-
23795fd51cd666b3e5c2e6161cbcacecd3c50e96
(xsd:string)
|