Skip to content

Conversation

@SteveLane
Copy link

This edit allows for multiple tables per page to be read using list_matrices
method. If there is only one table on a page, a matrix is returned, else a list
of matrices are returned.

This edit allows for multiple tables per page to be read using list_matrices
method. If there is only one table on a page, a matrix is returned, else a list
of matrices are returned.
@SteveLane
Copy link
Author

@leeper I haven't added any tests for this - it's just the bare change. I can add tests, or anything else you'd like - just let me know.

Returns a list of strings if there are multiple tables per page.
Now produces a list of data frames if there's multiple tables per page.
The java import using 'asis' thinks there's two tables present. This wasn't an
issue in the previous version of extract_tables (as it only extracted the first
table), but it is now when we extract all tables.
@SteveLane
Copy link
Author

SteveLane commented Dec 22, 2016

Of course, list_characters and list_data_frames needed updating as well, sorry for the extra commits.
Also - whilst testing, the non-western table check failed. The extract_tables function actually pulls in two tables. Not sure if it's the package bug, or a tabula-java bug - I'll open an issue anyway (Issue #32).

@codecov-io
Copy link

codecov-io commented Dec 22, 2016

Codecov Report

❗ No coverage uploaded for pull request base (master@b2e7791). Click here to learn what that means.
The diff coverage is 95.12%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #31   +/-   ##
=========================================
  Coverage          ?   57.82%           
=========================================
  Files             ?       12           
  Lines             ?      569           
  Branches          ?        0           
=========================================
  Hits              ?      329           
  Misses            ?      240           
  Partials          ?        0
Impacted Files Coverage Δ
R/output.R 97.11% <95.12%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2e7791...88df806. Read the comment docs.

@leeper
Copy link
Member

leeper commented Jan 15, 2017

Sorry, @SteveLane, for the delay on this. I will try to get to it as soon as I can.

R/output.R Outdated
for (j in seq_len(ncol(out[[n]]))) {
out[[n]][i, j] <- tab$getCell(i-1L, j-1L)$getText()
outTab <- list()
for(nTabs in seq_len(nxt$size())){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

outTab[[nTabs]] <- matrix(NA_character_,
nrow = tab$getRows()$size(),
ncol = tab$getCols()$size())
for (i in seq_len(nrow(outTab[[nTabs]]))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments indicating what's going on here?

R/output.R Outdated
if (!is.null(encoding)) {
Encoding(out[[n]]) <- encoding
## Put outTab into out, depending on size
if(nxt$size() == 1L){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

R/output.R Outdated
m <- list_matrices(tables, encoding = encoding, ...)
lapply(m, function(x) {
paste0(apply(x, 1, paste, collapse = sep), collapse = "\n")
if(inherits(x, "matrix")){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

R/output.R Outdated
o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...))
if (inherits(o, "try-error")) {
return(x)
if(inherits(x, "character")){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

R/output.R Outdated
if (inherits(o, "try-error")) {
return(x)
if(inherits(x, "character")){
o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this try() be silent?

R/output.R Outdated
}
} else {
return(o)
lapply(x, function(y){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space between brackets.

R/output.R Outdated
} else {
return(o)
lapply(x, function(y){
o <- try(read.delim(text = y, stringsAsFactors = stringsAsFactors, ...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this try() be silent?

@pachadotdev
Copy link
Contributor

let me test how these changes go with tabula 1.0.5

@pachadotdev
Copy link
Contributor

hi @SteveLane
I will need to review this manually and add bits of it after the multiple refactors I made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants