Skip to content

Commit cb6a2f6

Browse files
committed
Add a large chunk of documentation to Ssurgeon
1 parent 5250f9f commit cb6a2f6

File tree

1 file changed

+115
-1
lines changed

1 file changed

+115
-1
lines changed

src/edu/stanford/nlp/semgraph/semgrex/ssurgeon/Ssurgeon.java

Lines changed: 115 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
<uid>...</uid>
5353
<notes>...</notes>
5454
<semgrex>...</semgrex>
55+
<language>...</language>
5556
<edit-list>...</edit-list>
5657
</ssurgeon-pattern>
5758
</ssurgeon-pattern-list>
@@ -61,11 +62,124 @@
6162
* The {@code notes} are comments on the Ssurgeon. <br>
6263
* The {@code semgrex} is a Semgrex pattern to use when matching for this operation. <br>
6364
* The {@code edit-list} is the actual Ssurgeon operation to execute. <br>
65+
* The {@code language} is an optional field to determine what
66+
* language formalism to use when making new dependencies. By default
67+
* it will be English for SD when using the Java API, although most
68+
* people probably want UniversalEnglish for UD (including non-English
69+
* UD datasets) <br>
70+
*<br>
71+
* Below, edge means an edge in the Semgrex results, and node refers to a matched word.
72+
*<br>
6473
*
6574
* Available operations and their arguments include:
6675
* <ul>
67-
* <li> {@code addEdge -gov a1 -dep a2 -reln dep -weight 0.5}
76+
* <li> {@code addEdge -gov node1 -dep node2 -reln depType -weight 0.5}
77+
* <li> {@code relabelNamedEdge -edge edgename -reln depType}
78+
* <li> {@code removeEdge -gov node1 -dep node2 reln depType}
79+
* <li> {@code removeNamedEdge -edge edgename}
80+
* <li> {@code addDep -gov node1 -reln depType -position where ...attributes...}
81+
* <li> {@code editNode -node node ...attributes...}
82+
* <li> {@code setRoots n1 (n2 n3 ...)}
83+
* <li> {@code killAllIncomingEdges -node node}
84+
* <li> {@code deleteGraphFromNode -node node}
85+
* <li> {@code killNonRootedNodes}
6886
* </ul>
87+
*
88+
*<p>
89+
* {@code addEdge} adds a new edge between two existing nodes.
90+
* {@code -gov} and {@code -dep} will be nodes matched by the Semgrex pattern.
91+
* {@code -reln} is the name of the dependency type to add.
92+
*</p><p>
93+
* {@code relabelNamedEdge} changes the dependency type of a named edge.
94+
* {@code edge} is the name of the edge in the Semgrex pattern.
95+
* {@code -reln} is the name of the dependency type to use.
96+
*</p><p>
97+
* {@code removeEdge} deletes an edge based on its description.
98+
* {@code -gov} is the governor to delete, a named node from the Semgrex pattern.
99+
* {@code -dep} is the dependent to delete, a named node from the Semgrex pattern.
100+
* {@code -reln} is the name of the dependency to delete.
101+
* If {@code -gov} or {@code -dep} are left empty, then all (matching) edges to or from the
102+
* remaining argument will be deleted.
103+
*</p><p>
104+
* {@code removeNamedEdge} deletes an edge based on its name.
105+
* {@code edge} is the name of the edge in the Semgrex pattern.
106+
*</p><p>
107+
* {@code addDep} adds a word and a dependency arc to the dependency graph.
108+
* {@code -gov} is the governor to attach to, a named node from the Semgrex pattern.
109+
* {@code -reln} is the name of the dependency type to use.
110+
* {@code -position} is where in the sentence the word should go. {@code -} will be the first word of the sentence,
111+
* {@code +} will be the last word of the sentence, and {@code -node} or {@code +node} will be before or after the
112+
* named node.
113+
* {@code ...attributes...} means any attributes which can be set from a string or numerical value
114+
* eg {@code -text ...} sets the text of the word (currently no spaces allowed, which would be a limitation for Vietnamese),
115+
* {@code -pos ...} sets the xpos of the word, {@code -cpos ...} sets the upos of the word, etc.
116+
* You cannot set the index of a word this way; an exception will be thrown.
117+
*</p><p>
118+
* {@code editNode} will edit the attributes of a word.
119+
* {@code -node} is the node to edit.
120+
* {@code ...attributes...} are the attributes to change, same as with {@code addDep}
121+
*</p><p>
122+
* {@code setRoots} sets the roots of the sentence to a new root.
123+
* {@code n1, n2, ...} are the names of the nodes from the Semgrex to use as the root(s).
124+
* This is best done in conjunction with other operations which actually manipulate the structure
125+
* of the graph, or the new root will weirdly have dependents and the graph will be incorrect.
126+
*</p><p>
127+
* {@code killAllIncomingEdges} deletes all edges to a node.
128+
* {@code -node} is the node to edit.
129+
* Note that this is the same as {@code removeEdge} with only the dependent set.
130+
*</p><p>
131+
* {@code deleteGraphFromNode} deletes all nodes reachable from a specific node.
132+
* {@code -node} is the node to delete.
133+
* You will only want to do this after separating the node from the parts of the graph you want to keep.
134+
*</p><p>
135+
* {@code killNonRootedNodes} searches the graph and deletes all nodes which have no path to a root.
136+
*</p>
137+
*<p>
138+
* A practical example comes from the {@code UD_English-Pronouns}
139+
* dataset, where some words had both {@code nsubj} and {@code csubj}
140+
* dependencies:
141+
*<pre>
142+
1 Hers hers PRON PRP Gender=Fem|Number=Sing|Person=3|Poss=Yes|PronType=Prs 3 nsubj _ _
143+
2 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 cop _ _
144+
3 easy easy ADJ JJ Degree=Pos 0 root _ _
145+
4 to to PART TO _ 5 mark _ _
146+
5 clean clean VERB VB VerbForm=Inf 3 csubj _ SpaceAfter=No
147+
6 . . PUNCT . _ 5 punct _ _
148+
</pre>
149+
*</p><p>
150+
* We can update this with the following Semgrex/Ssurgeon pair:
151+
*<pre>
152+
{}=source >nsubj {} >csubj=bad {}
153+
relabelNamedEdge -edge bad -reln advcl
154+
*</pre>
155+
*</p><p>
156+
* The result will be the {@code csubj} updated to {@code advcl}
157+
*</p><p>
158+
* For the most part, each of these operations is already bomb-proof,
159+
* eg the pattern will execute once and not repeat on the same part of
160+
* the same dependency graph.
161+
* However, in the case of {@code addDep}, it is not possible to automatically bomb-proof the command,
162+
* as certain sentences may legitimately have multiple words with the same attributes as dependents
163+
* of the same governor. In this case, it is necessary to make the Semgrex pattern itself bomb-proof.
164+
*</p><p>
165+
166+
* As an example, if the intent is to change "Jennifer has lovely
167+
* antennae" to "Jennifer has lovely blue antennae", the following
168+
* command would "bomb":
169+
<pre>
170+
{@code
171+
{word:antennae}=antennae
172+
addDep -gov antennae -reln dep -word blue
173+
}
174+
</pre>
175+
*</p><p>
176+
* The following would not:
177+
<pre>
178+
{@code
179+
{word:antennae}=antennae !> {word:blue}
180+
addDep -gov antennae -reln dep -word blue
181+
}
182+
</pre>
69183
*
70184
* @author Eric Yeh
71185
*/

0 commit comments

Comments
 (0)