|
52 | 52 | <uid>...</uid>
|
53 | 53 | <notes>...</notes>
|
54 | 54 | <semgrex>...</semgrex>
|
| 55 | + <language>...</language> |
55 | 56 | <edit-list>...</edit-list>
|
56 | 57 | </ssurgeon-pattern>
|
57 | 58 | </ssurgeon-pattern-list>
|
|
61 | 62 | * The {@code notes} are comments on the Ssurgeon. <br>
|
62 | 63 | * The {@code semgrex} is a Semgrex pattern to use when matching for this operation. <br>
|
63 | 64 | * The {@code edit-list} is the actual Ssurgeon operation to execute. <br>
|
| 65 | + * The {@code language} is an optional field to determine what |
| 66 | + * language formalism to use when making new dependencies. By default |
| 67 | + * it will be English for SD when using the Java API, although most |
| 68 | + * people probably want UniversalEnglish for UD (including non-English |
| 69 | + * UD datasets) <br> |
| 70 | + *<br> |
| 71 | + * Below, edge means an edge in the Semgrex results, and node refers to a matched word. |
| 72 | + *<br> |
64 | 73 | *
|
65 | 74 | * Available operations and their arguments include:
|
66 | 75 | * <ul>
|
67 |
| - * <li> {@code addEdge -gov a1 -dep a2 -reln dep -weight 0.5} |
| 76 | + * <li> {@code addEdge -gov node1 -dep node2 -reln depType -weight 0.5} |
| 77 | + * <li> {@code relabelNamedEdge -edge edgename -reln depType} |
| 78 | + * <li> {@code removeEdge -gov node1 -dep node2 reln depType} |
| 79 | + * <li> {@code removeNamedEdge -edge edgename} |
| 80 | + * <li> {@code addDep -gov node1 -reln depType -position where ...attributes...} |
| 81 | + * <li> {@code editNode -node node ...attributes...} |
| 82 | + * <li> {@code setRoots n1 (n2 n3 ...)} |
| 83 | + * <li> {@code killAllIncomingEdges -node node} |
| 84 | + * <li> {@code deleteGraphFromNode -node node} |
| 85 | + * <li> {@code killNonRootedNodes} |
68 | 86 | * </ul>
|
| 87 | + * |
| 88 | + *<p> |
| 89 | + * {@code addEdge} adds a new edge between two existing nodes. |
| 90 | + * {@code -gov} and {@code -dep} will be nodes matched by the Semgrex pattern. |
| 91 | + * {@code -reln} is the name of the dependency type to add. |
| 92 | + *</p><p> |
| 93 | + * {@code relabelNamedEdge} changes the dependency type of a named edge. |
| 94 | + * {@code edge} is the name of the edge in the Semgrex pattern. |
| 95 | + * {@code -reln} is the name of the dependency type to use. |
| 96 | + *</p><p> |
| 97 | + * {@code removeEdge} deletes an edge based on its description. |
| 98 | + * {@code -gov} is the governor to delete, a named node from the Semgrex pattern. |
| 99 | + * {@code -dep} is the dependent to delete, a named node from the Semgrex pattern. |
| 100 | + * {@code -reln} is the name of the dependency to delete. |
| 101 | + * If {@code -gov} or {@code -dep} are left empty, then all (matching) edges to or from the |
| 102 | + * remaining argument will be deleted. |
| 103 | + *</p><p> |
| 104 | + * {@code removeNamedEdge} deletes an edge based on its name. |
| 105 | + * {@code edge} is the name of the edge in the Semgrex pattern. |
| 106 | + *</p><p> |
| 107 | + * {@code addDep} adds a word and a dependency arc to the dependency graph. |
| 108 | + * {@code -gov} is the governor to attach to, a named node from the Semgrex pattern. |
| 109 | + * {@code -reln} is the name of the dependency type to use. |
| 110 | + * {@code -position} is where in the sentence the word should go. {@code -} will be the first word of the sentence, |
| 111 | + * {@code +} will be the last word of the sentence, and {@code -node} or {@code +node} will be before or after the |
| 112 | + * named node. |
| 113 | + * {@code ...attributes...} means any attributes which can be set from a string or numerical value |
| 114 | + * eg {@code -text ...} sets the text of the word (currently no spaces allowed, which would be a limitation for Vietnamese), |
| 115 | + * {@code -pos ...} sets the xpos of the word, {@code -cpos ...} sets the upos of the word, etc. |
| 116 | + * You cannot set the index of a word this way; an exception will be thrown. |
| 117 | + *</p><p> |
| 118 | + * {@code editNode} will edit the attributes of a word. |
| 119 | + * {@code -node} is the node to edit. |
| 120 | + * {@code ...attributes...} are the attributes to change, same as with {@code addDep} |
| 121 | + *</p><p> |
| 122 | + * {@code setRoots} sets the roots of the sentence to a new root. |
| 123 | + * {@code n1, n2, ...} are the names of the nodes from the Semgrex to use as the root(s). |
| 124 | + * This is best done in conjunction with other operations which actually manipulate the structure |
| 125 | + * of the graph, or the new root will weirdly have dependents and the graph will be incorrect. |
| 126 | + *</p><p> |
| 127 | + * {@code killAllIncomingEdges} deletes all edges to a node. |
| 128 | + * {@code -node} is the node to edit. |
| 129 | + * Note that this is the same as {@code removeEdge} with only the dependent set. |
| 130 | + *</p><p> |
| 131 | + * {@code deleteGraphFromNode} deletes all nodes reachable from a specific node. |
| 132 | + * {@code -node} is the node to delete. |
| 133 | + * You will only want to do this after separating the node from the parts of the graph you want to keep. |
| 134 | + *</p><p> |
| 135 | + * {@code killNonRootedNodes} searches the graph and deletes all nodes which have no path to a root. |
| 136 | + *</p> |
| 137 | + *<p> |
| 138 | + * A practical example comes from the {@code UD_English-Pronouns} |
| 139 | + * dataset, where some words had both {@code nsubj} and {@code csubj} |
| 140 | + * dependencies: |
| 141 | + *<pre> |
| 142 | +1 Hers hers PRON PRP Gender=Fem|Number=Sing|Person=3|Poss=Yes|PronType=Prs 3 nsubj _ _ |
| 143 | +2 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 cop _ _ |
| 144 | +3 easy easy ADJ JJ Degree=Pos 0 root _ _ |
| 145 | +4 to to PART TO _ 5 mark _ _ |
| 146 | +5 clean clean VERB VB VerbForm=Inf 3 csubj _ SpaceAfter=No |
| 147 | +6 . . PUNCT . _ 5 punct _ _ |
| 148 | +</pre> |
| 149 | + *</p><p> |
| 150 | + * We can update this with the following Semgrex/Ssurgeon pair: |
| 151 | + *<pre> |
| 152 | +{}=source >nsubj {} >csubj=bad {} |
| 153 | +relabelNamedEdge -edge bad -reln advcl |
| 154 | + *</pre> |
| 155 | + *</p><p> |
| 156 | + * The result will be the {@code csubj} updated to {@code advcl} |
| 157 | + *</p><p> |
| 158 | + * For the most part, each of these operations is already bomb-proof, |
| 159 | + * eg the pattern will execute once and not repeat on the same part of |
| 160 | + * the same dependency graph. |
| 161 | + * However, in the case of {@code addDep}, it is not possible to automatically bomb-proof the command, |
| 162 | + * as certain sentences may legitimately have multiple words with the same attributes as dependents |
| 163 | + * of the same governor. In this case, it is necessary to make the Semgrex pattern itself bomb-proof. |
| 164 | + *</p><p> |
| 165 | +
|
| 166 | + * As an example, if the intent is to change "Jennifer has lovely |
| 167 | + * antennae" to "Jennifer has lovely blue antennae", the following |
| 168 | + * command would "bomb": |
| 169 | +<pre> |
| 170 | +{@code |
| 171 | + {word:antennae}=antennae |
| 172 | + addDep -gov antennae -reln dep -word blue |
| 173 | +} |
| 174 | +</pre> |
| 175 | + *</p><p> |
| 176 | + * The following would not: |
| 177 | +<pre> |
| 178 | +{@code |
| 179 | + {word:antennae}=antennae !> {word:blue} |
| 180 | + addDep -gov antennae -reln dep -word blue |
| 181 | +} |
| 182 | +</pre> |
69 | 183 | *
|
70 | 184 | * @author Eric Yeh
|
71 | 185 | */
|
|
0 commit comments