Skip to content

CapitolHillTreebank

SanghounSong edited this page Jan 4, 2017 · 2 revisions

Mostly subscribed by Joshua Crowgey! I just created this page.

Francis: Treebanking is important but only Dan does it

Dan: Montse used to do it too

Francis: ... but we need a reminder/walkthrough of how. Also, I need a reminder of a few things. Eg: I know I want the tree that has X type in it, so I go through the trees and I'd like to save some of my steps to repeat them more than once. But let's start with a walkthrough:

Dan: Ok, what I will show you is how I run though the steps of grammar modification/updating a treebank, inspecting the results and repeating the loop. My procedures has roots in an older machinery with a lisp-based interface, I am still partially using that, but a person in the room doesn't use that. ... I use this $LOGONROOT/parse script, but there are other ways to do this first part. Of course I also need to know which set of profiles I want to use:

[ONSCREEN]

danf@baseque3:~$ $LOGONROOT/parse --terg/ace --protocol 2 --best 1 --count 4 hike

[/ONSCREEN]

I will talk through these arguments:

First argument is which grammar and which engine you want to use. This is in the dot.tsdbrc file. (--terg)

Other arguments are --protocol (which for me) means "store the edge relations", I want to store the packed forest, what else does that do, woodley, (Woodley: store the forest). Stefan used to suggest --best 0 but that doesn't make sense for me, I prefer to "unpack 1 tree". --count 4 is how many cpus I can use. Last argument is profile (hike). So I call that...

[ONSCREEN] LKB loads grammar ... [/ONSCREEN]

[Dan talks about how this shows how he still uses LKB under the hood]

Some discussion about how this isn't really necessary...

Some discussion about how this relates to the loading of malrules...

... so the engine has now parsed this and has stored it in a directory, `erg/trunk/hike/DATE/ace'

So I can now go back to the treebanking tool, answer (or fftb, which answer invokes)

[People complain of the long command on the screen]

Dan: I said I would show people how I do this, not ...

Woodley: this thing called answer is a 20 line script written by Oe in order to hide people from the truth ... I've clarified what I wanted to clarify

Dan: I have copy and paste function and the history program in my shell so I have a way to invoke that long command. I do so, it opens a webbrowser [pointing to 127.0.0.1:50768/private/session?0] ...

Francis: We see an exuberant use of color ...

Dan: green is great, red is disappointing, I get a lot of browns, which means I have added some ambiguity. The yellow means that I have stuff I haven't really added. Down here for example, I see a brown. I can say "show me that item" [he clicks on it], and I see that it's actually spurious...

Glenn: asks a question about this particular example, whether this is a case where the ERG blocks on semantics

Woodley: What is an expletive index

Dan: it's an 'it' or 'there',

Woodley: how are those indices? I'm being a bit snarkey, there is not semantic difference because they have no semantics

Francis: ---moving on!---

Dan: ok, so I pick the one I wanted ..., now I can see the tree that I want by clicking 'accept', and I am now taken to the next one which has a difference, here we see an example where I have recently conceded that 'on' can be an intransitive prep. Here I have some ambiguity then, as in "Saturday, he went"~"On, he went". I had to retouch a lot of trees because of this, but there are a few , not in the 17,000 that I had to retouch, where I wanted this new analysis ... anyway, life.

Glenn: for the people who don't do treebanking, this is what they're scared of

Dan: I knew what I was in for. [Gives another example of something similar where he changed his mind about the copula be and a comma] ---:So if we're still actively developing the grammar, we probably shouldn't be treebanking.

Dan: No, that's the wrong thing to learn, I made a conscious decision to do this because I had a spare month, mostly, things aren't like this. For example, in this 'hike' corpus, I didn't actually have a lot of changes. I think the lesson here is to do treebanking all the time so that when you make a change, you have a small amount of things to change each time.d

Francis: You'll note that this tool uses your previous work so that as long as you're continuously treebanking, you'll just have a little bit of work to do each time you do it.

Luis: Treebanking may force you to deal with amigiuity early on, you waited a long time to deal with this ambiguity, would you have preferred to have done this earlier in the development of this project? (question continues ...)

Dan: this particular example was on which was relatively rare, so I waited until I was ready to deal with it.

Luis: If you know you have to deal with it eventually, why don't you do it earlier

Woodley: It's not that he has to redo anything

Luis: but that descriminant...

Dan: ...didn't exist before

Luis: fair enough

Francis: so Dan fixed more important problems first...

Luis: I see now, that problem didn't exist before, now he's decided to work on it

Glenn: when you were in your blissful period, is there a sense of a generalization which is true now...because you're ... when you're adding a constraint, ...

Dan: the types were all there, I had them, I just added "on" to a new class which was already there

Dan: we probably wasted enough time on this, let me show you something else...Here we see that I can highlight a part of a substring and I can see what options I have for, which rules can span this string. I can select one and it will try to build a tree using that rule for this string, and try to find a spanning parse for the rest. This is a way to force the machine to use a particular rule.

Glenn: where's the part where you're certifying your choices?

Dan: I choose "accept" here, it's hard to see because of my font sizes, but this is where I click.

Dan: here is one more instance, "don't be tricked by that" is "by" the passive by or the locative by? I know that it's the passive by in this case, so I can click here and . There is some efficiency stuff which actually creates some false potential analyses here, so you can actually turn it off if your grammar is small enough that you don't hit resource limits when parsing with it off.

Dan: ok, and now I'm back at the overview, everthing is green or yellow, and I'm ready to exit.

Woodley: the command line that you had people copy, has some --browser=..., what that actually is is a shell script from Oe which does some magic to deal with LD_LIBRARY_PATH

Francis: for example, someone without logon would have a hard time finding these scripts

[ Woodley takes screen to show how to replicate these command without having logon ] Woodley: this is OSX, there is no logon for OSX, the libtsdb you can download from my website,

[he uses ./mkprof -s and ./art -f -a ]

Two commands, didn't require loading the lkb.

The fftb software can be downloaded via svn and compiled, there may also be a binary, it can also be pulled out of the logon tree.

Luis: so there's nothing stopping us from running a server and having people working on this in other location.

Woodley: nothing stopping you, no.

Clone this wiki locally