Skip to content

IvanEOD/osrs-wiki-scraper

Repository files navigation


OSRS Wiki Scraper


An OSRS Wiki scraper in Kotlin designed to easily scrape data YOU want from the OSRS Wiki.


Table of Contents

About the Project

  • This project includes several useful examples of how to use the scraper to get customized data, but is primarily intended to provide a framework for programmers to be able to create their own objects and methods to scrape the data they want.

  • The project was created to be used by programmers and provide a replacement for the no longer maintained OsrsBox project.

  • To be able to efficiently create your own methods to scrape the data you want you will need a basic understanding of Lua, Kotlin, and MediaWiki.

Built With

  • Kotlin
  • Lua
  • Gradle
  • Intellij Idea

(back to top)

OsrsWiki (OsrsWiki.kt)

The OsrsWiki class is the main class of the project. It provides methods to scrape data from the OSRS Wiki.

       OsrsWiki Builder:


   val wiki = OsrsWiki.builder()
    .withCookieManager(CookieManager())
    .withProxy(Proxy())
    .withUserAgent("Custom User Agent")
    .withScribuntoSessionCount(10)
    .build() 
  • Optionally set a custom cookie manager.

    • .withCookieManager( CookieManager() )
  • Optionally set a custom proxy.

    • .withProxy( Proxy() )
  • Optionally set a custom user agent.

    • .withUserAgent( "Custom User Agent" )
  • Optionally set the default number of Scribunto sessions used for bulk Scribunto requests.

    • .withScribuntoSessionCount( 10 )

(back to top)

       Premade data parsing methods:


  • Get page titles from Item IDs:

    • wiki.getItemPageTitlesFromIds(11832, 11834, 11836) // ["Bandos chestplate", "Bandos tassets", "Bandos boots"]
  • Get page titles from NPC IDs:

    •  wiki.getNpcPageTitlesFromIds(1399, 2639) // ["King Roald", "Robert The Strong"]
  • Get all Item titles:

    • wiki.getAllItemTitles() // ["Abyssal whip", "Abyssal bludgeon", "Abyssal dagger", ...]
  • Get all NPC titles:

    • wiki.getAllNpcTitles() // ["Abyssal demon", "Abyssal leech", "Abyssal lurker", ...]
  • Get ItemDetails by name(s) or all:

    • wiki.getItemDetails("Bandos chestplate", "Bandos tassets", "Bandos boots") // Map<String, List<ItemDetails>>
      wiki.getAllItemDetails() // Map<String, List<ItemDetails>>
  • Get NpcDetails by name(s) or all:

    • wiki.getNpcDetails("King Roald", "Robert The Strong") // Map<String, List<NpcDetails>>
      wiki.getAllNpcDetails() // Map<String, List<NpcDetails>>
  • Get MonsterDetails by name(s) or all:

    • wiki.getMonsterDetails("Abyssal demon", "Abyssal leech", "Abyssal lurker") // Map<String, List<MonsterDetails>>
      wiki.getAllMonsterDetails() // Map<String, List<MonsterDetails>>
  • Get QuestRequirement's for all quests:

    • wiki.getQuestRequirements() // Map<String, List<QuestRequirement>>
  • Get VarbitDetails for all varbits on the Wiki:

    • wiki.getVarbitDetails() // Map<Int, VarbitDetails>
  • Get ProductionDetails for all items with production data:

    • wiki.getProductionDetails() // Map<String, ProductionDetails>
  • Get ItemPrice for Item ID:

    • wiki.getItemPrice(11832) // WikiItemPrice?
  • Get all LocLineDetails:

    • wiki.getAllLocLineDetails() // Map<String, List<LocLineDetails>>
  • Get Slayer Monsters and their Task IDs:

    • wiki.getSlayerMonstersAndTaskIds() // Map<String, Int>
  • Get Slayer Masters that assign task:

    • wiki.getSlayerMastersThatAssign("Ghouls") // ["Mazchna", "Vannaka"]

(back to top)

       Standard data parsing methods:


  • Get all titles in a category:

    •   wiki.getTitlesInCategory("Items", "Monsters") // List<String>
  • Get all titles using any (one or more) of the specified template(s):

    •    wiki.getAllTitlesUsingTemplate("Infobox Item", "Infobox Bonuses") // List<String>
  • Get all titles using all of the specified template(s):

    •    wiki.getAllTitlesUsingTheseTemplates("Infobox Item", "Infobox Bonuses") // List<String>
  • Get all template names present on a page:

    •    wiki.getNamesOfTemplatesOnPage("Baby chinchompa") // List<String>
  • Get all uses of a template across the entire Wiki:

    •    wiki.getAllTemplateUses("Infobox Item") // Map<String, List<JsonObject>>
  • Get all data for specified template(s) on a page:

    •    wiki.getTemplateDataOnPage("Baby chinchompa", "Infobox Item", "Infobox Bonuses") // Map<String, List<JsonObject>>
  • Get all data for all templates on a page:

    •    wiki.getAllTemplateDataOnPage("Baby chinchompa") // Map<String, List<JsonObject>>
  • Get all titles in categories with revisions since a specified date:

    •   val threeDaysAgo = Date.from(Instant.now().minus(3, ChronoUnit.DAYS)) 
        wiki.getAllTitlesWithRevisionsSince(threeDaysAgo, "Items") // List<String>
  • Get last revision timestamp for title(s):

    •   wiki.getLastRevisionTimestamp("Baby chinchompa", "Black chinchompa") // Map<String, String>
        wiki.getLastRevisionTimestamp(listOf("Baby chinchompa", "Black chinchompa")) // Map<String, String>
  • Dynamic Page List (DPL3) query:

    • val query = mapOf(
          "category" to "Items",
          "count" to 10,
          "include" to "{Infobox Item}",    
      )
      val response = wiki.dplAsk(query) // JsonElement
    • Further explanation on DPL3 queries can be found below in ScribuntoSession and DPL3 Documentation
  • MediaWiki Semantic Search:

    • val query = listOf(
          "[[Location JSON::+]]",
          "?#-=title",
          "?Production JSON",
      )
      val response = wiki.smwAsk(query) // JsonElement

(back to top)

Scribunto Session (ScribuntoSession.kt)

The Scribunto Session connects to the MediaWiki API and allows for the execution of Lua scripts on the Wiki.

Why is that useful?

  • Executing custom Lua scripts on the Wiki.
  • Loading data from the Wiki Lua modules.
  • Using the DPL3 query language to query the Wiki.
  • Controlling the format and the volume of the data returned by the Wiki.

       Creating a Scribunto Session:


val session = wiki.createScribuntoSession {
 withoutDefaultCode()
 withWikiModule("ModuleName")
 withCode("print('Hello World')")
 withCode {
     /* Use the Lua Builder */
 }
}
  • Optionally disable the default code included in the session, you can add your own code with the withCode function.
    • .withoutDefaultCode()
  • Optionally set the module the session will use, by default this is "Var", for no particular reason other than being a small module.
    • .withWikiModule("ModuleName")
  • Optionally add code to persist in the session.
    • .withCode("print('Hello World')")
      .withCode { /* Use the Lua Builder */ }
    • See LuaBuilder.kt for more information on the Lua Builder.


(back to top)

       Using a Scribunto Session:


  • Send a request with a string of Lua code:

    •   session.sendRequest("print(\"Hello World\"") // Pair<Boolean, JsonElement>
  • Send a request with a LuaBuilder instance:

    •   session.sendRequest {
          /* Use the Lua Builder */
        }
        // Pair<Boolean, JsonElement>
  • Send a request with the first parameter being true and it will automatically refresh the Scribunto Session:

    •   session.sendRequest(true, "print(\"Hello World\"") // Pair<Boolean, JsonElement>
        session.sendRequest(true) {
              /* Use the Lua Builder */
        }    
        // Pair<Boolean, JsonElement>
  • The return value from the sendRequest function is a Pair<Boolean, JsonElement> where the first value is whether or not the request was successful and the second value is the response from the Wiki print return field.

  • To get a value back from the wiki use the Lua print function.

  • The default Lua code provided includes a method to return values called printReturn and will return the input value as a JSON string.

    • {
          "success": true,
          "message": "Only present if success is false",
          "printReturn": "{\"json\": \"value\"}"        
      }
  • The default code sent to the Wiki can be found here: Scribunto.lua

  • The session uses the same Session ID for each request. The wiki will continue to add the code in the requests to the session until the session is refreshed or the session expires.

  • The session will automatically refresh if the session expires or if the session is refreshed manually.

  • If the session has failed too many requests since the last refresh it will automatically refresh.

  • The session can be refreshed manually:

    •   session.refresh() 

(back to top)

Lua Builder (LuaBuilder.kt)

The Lua Builder is a DSL for easily creating Lua code from Kotlin.

This is not intended to be a full Lua interpreter or converter, but rather a tool to make it easier to create Lua code.

  • You can create a LuaScope instance with the lua function:

    •   lua {
          /* Use the Lua Builder */
        }
  • The LuaScope will convert values to a Lua representation.

  • The supported value types are:

    • String
    • Number
    • Date
    • Boolean
    • Map<*, *> (* values may be any of the above types)
    • Iterable<*> (* values may be any of the above types)
  • To set a key's value use `=` like "key" `=` "value".

  • There are two types of LuaScope with slight differences.

    • The LuaGlobalScope

      • This is the default scope and only allows String keys.
      • These values allow the use of ".local()" to prepend the key with "local" making it a local variable.
        • "myValue".local() will output local myValue
    • The LuaTableScope

      • This scope allows String, Number, Boolean, and Date keys.
      • These values can not use .local() because they are values in a table.

      I don't know what is going on with the formatting in this table, I'm sorry, I tried! 🙃

Kotlin Lua Output
"myValue" `=` "value"
myValue = "value"
"myValue".local() `=` "value"
local myValue = "value"
"myModule" `=` require("ModuleName")
myModule = require("ModuleName")
+"print('This code is just added as is to the Lua script')"
print('This code is just added as is to the Lua script')
"myTable" `=` {
    "myKey" `=` "myValue"
    48 `=` Date()
    Date() `=` "myValue"
    1.0 `=` 1
    true `=` "myTrueValue"
    "something" `=` true
    "myListInLua" `=` listOf("a", "b", "c")
    "myMapInLua" `=` mapOf("a" to "b", "c" to "d")
  }

Inside the brackets is LuaTableScope which allows values other than String to be keys.

myTable = { 
    ["myKey"] = "myValue",
    [48] = "2022-12-21 17:33:09",
    ["2022-12-21 17:33:09"] = "myValue",
    [1.0] = 1,
    [true] = "myTrueValue",
    ["something"] = true,
    ["myListInLua"] = {"a", "b", "c"},
    ["myMapInLua"] = {
      ["a"] = "b", 
      ["c"] = "d"
    }
}

(back to top)

Utility Classes

Some notable classes used by and made available by the scraper.

This is used for templates from the wiki to parse versioned data and determine images and page references within the value.
  • The best way to obtain this is by calling .toVersionedMap() on a JsonObject received from the wiki.
        val versionedMap = jsonObject.toVersionedMap()
  • The VersionedMap will create a TemplatePropertyData for each key:
data class TemplatePropertyData(
  val name: String,
  val key: String,
  val isWikiKey: Boolean,
  val version: Int,
  val value: String
)
  • Example Template Data:
{
  "id1" : 111,
  "id2" : 222,
  "id3" : 333
}
  • Would create these property data classes:
TemplatePropertyData(name="id1", key="id", isWikiKey=true, version=1, value="111")
TemplatePropertyData(name="id2", key="id", isWikiKey=true, version=2, value="222")
TemplatePropertyData(name="id3", key="id", isWikiKey=true, version=3, value="333")
  • You can check how many versions a template has with versionedMap.versions
  • By default, getting a property without the version will return Version 0.
  • Version 0 is all values combined, or in a single versioned property, the value itself.
  • You can also use the original key if you know it and are expecting it.
  • id3 will work the same as ["id", 3]
  • If a template has multiple versions, some values may be the same across all versions, and will not have a versioned key.
  • So if a version of a key is requested that does not exist, it will return the first or only value available.
  • You can get a full map of a specific version, or a list containing a map for each individual version.
val versionCount = versionedMap.versions    // 3

val id = versionedMap["id"]                 // "111, 222, 333"
val id1 = versionedMap["id", 1]             // "111"
val id2 = versionedMap["id", 2]             // "222"
val id5 = versionedMap["id", 5]             // "111"

val version2 = versionedMap.getVersion(2)   // Map<String, String>
val allVersions = versionedMap.getIndividualVersions()  // List<Map<String, String>>

(back to top)

This is used for efficiently scraping data by titles in bulk.

  • If the response is too long the Wiki will return an error, if this happens you may need to lower the chunk size.

  • Create a new queue with the list of titles and the chunk size. (The default size is 100)

    • val titles = wiki.getAll
      val queue = TitleQueue(titles, 50)
  • Then call queue.execute { /* Your code here */ } to execute the queue.

    • The block inside the execute function is suspending.

    • The parameter passed to the block is a list of titles to be processed.

    • The block should only return titles that failed to be processed and will be re-added to the queue.

    • val processedResults = mutableMapOf<String, String>()
      queue.execute { titlesChunk ->
        // Process the titles here adding any data to your results, and returning any failed titles.
        // No data is returned from execute.
      }

(back to top)

Useful References

Some useful references to assist in using this project.


(back to top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published