An extension of Mill.jl to convert URLs to Mill structure
A simple library implementing representation of URLs from the paper Nested Multiple Instance Learning in Modelling of HTTP network traffic, Tomas Pevny, Marek Dedic, 2020
Example:
using Url2Mill
julia> ds = url2mill("st.360buyimg.com/m/css/2014/index/home_2017_5_9.css?v=jd201705182030")
ProductNode # 1 obs, 152 bytes
βββ hostname: BagNode # 1 obs, 104 bytes
β β°ββ ArrayNode(2053Γ3 NGramMatrix with Int64 elements) # 3 obs, 166 bytes
βββββββ path: BagNode # 1 obs, 104 bytes
β β°ββ ArrayNode(2053Γ5 NGramMatrix with Int64 elements) # 5 obs, 214 bytes
β°βββββ query: BagNode # 1 obs, 136 bytes
β°ββ ProductNode # 1 obs, 64 bytes
βββββ key: ArrayNode(2053Γ1 NGramMatrix with Int64 elements) # 1 o β―
β°ββ value: ArrayNode(2053Γ1 NGramMatrix with Int64 elements) # 1 o β―
If you want to represent strings by ngrams directly as SparseArrays
, use use_sparse_arrays = true
julia> ds = url2mill("st.360buyimg.com/m/css/2014/index/home_2017_5_9.css?v=jd201705182030";use_sparse_arrays = true)
ProductNode # 1 obs, 184 bytes
βββ hostname: BagNode # 1 obs, 112 bytes
β β°ββ ArrayNode(2053Γ3 SparseMatrixCSC with Int64 elements) # 3 obs, 552 b β―
βββββββ path: BagNode # 1 obs, 112 bytes
β β°ββ ArrayNode(2053Γ5 SparseMatrixCSC with Int64 elements) # 5 obs, 888 b β―
β°βββββ query: BagNode # 1 obs, 152 bytes
β°ββ ProductNode # 1 obs, 80 bytes
βββββ key: ArrayNode(2053Γ1 SparseMatrixCSC with Int64 elements) # β―
β°ββ value: ArrayNode(2053Γ1 SparseMatrixCSC with Int64 elements) # β―