Skip to content

CTUAvastLab/Url2Mill.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Url2Mill.jl

An extension of Mill.jl to convert URLs to Mill structure

A simple library implementing representation of URLs from the paper Nested Multiple Instance Learning in Modelling of HTTP network traffic, Tomas Pevny, Marek Dedic, 2020

Example:

using Url2Mill

julia> ds = url2mill("st.360buyimg.com/m/css/2014/index/home_2017_5_9.css?v=jd201705182030")
ProductNode  # 1 obs, 152 bytes
  β”œβ”€β”€ hostname: BagNode  # 1 obs, 104 bytes
  β”‚               ╰── ArrayNode(2053Γ—3 NGramMatrix with Int64 elements)  # 3 obs, 166 bytes
  β”œβ”€β”€β”€β”€β”€β”€ path: BagNode  # 1 obs, 104 bytes
  β”‚               ╰── ArrayNode(2053Γ—5 NGramMatrix with Int64 elements)  # 5 obs, 214 bytes
  ╰───── query: BagNode  # 1 obs, 136 bytes
                  ╰── ProductNode  # 1 obs, 64 bytes
                        β”œβ”€β”€β”€β”€ key: ArrayNode(2053Γ—1 NGramMatrix with Int64 elements)  # 1 o β‹―
                        ╰── value: ArrayNode(2053Γ—1 NGramMatrix with Int64 elements)  # 1 o β‹―

If you want to represent strings by ngrams directly as SparseArrays, use use_sparse_arrays = true

julia> ds = url2mill("st.360buyimg.com/m/css/2014/index/home_2017_5_9.css?v=jd201705182030";use_sparse_arrays = true)
ProductNode  # 1 obs, 184 bytes
  β”œβ”€β”€ hostname: BagNode  # 1 obs, 112 bytes
  β”‚               ╰── ArrayNode(2053Γ—3 SparseMatrixCSC with Int64 elements)  # 3 obs, 552 b β‹―
  β”œβ”€β”€β”€β”€β”€β”€ path: BagNode  # 1 obs, 112 bytes
  β”‚               ╰── ArrayNode(2053Γ—5 SparseMatrixCSC with Int64 elements)  # 5 obs, 888 b β‹―
  ╰───── query: BagNode  # 1 obs, 152 bytes
                  ╰── ProductNode  # 1 obs, 80 bytes
                        β”œβ”€β”€β”€β”€ key: ArrayNode(2053Γ—1 SparseMatrixCSC with Int64 elements)  # β‹―
                        ╰── value: ArrayNode(2053Γ—1 SparseMatrixCSC with Int64 elements)  # β‹―

About

An extension of Mill.jl to convert URLs to Mill structure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages